LLM Application RAG Architecture (RAG — Retrieval Augmented Generation) — LLMOps

Balamurugan Balakreshnan
4 min readJan 20, 2024

--

Simple way to build Generation AI Application

1. Introduction

  • Build a Chat style generation ai application using LLM (Long Language Model) and RAG (Retrieval Augmented Generation) architecture.
  • LLM is a language model that can generate text of arbitrary length.
  • Using your own Company or Enterprise documents to build the LLM application.
  • RAG is a retrieval augmented generation model that can generate text by referring to a document.
  • Most effective and economical way to build a generation ai application.
  • Please choose use cases that can be solved by generation ai application.
  • Given LLM are language model, anything to generate text are the best target use cases
  • Knowledge mining or chatting with lots of documents or information in text form are the best target use cases for RAG.
  • To simplify i have split the application in 2 process
  • 1st process is to generate embeddings for documents and create the LLM application and evaluate and test the LLM application.
  • 2nd process is to deploy to web application framework for users to consume using chat UI.
Note: This is to show how to build a LLM application using RAG architecture. 
we can use different services or products to achieve the same results.

Architecture

2. LLM Application

  • First and foremost collect the documents for the use case.
  • Most important is to use the LLM with your own datasets.
  • Research and and find the best way to chunk the document for example page, or chunk size with leading and trailing sentences.
  • Use the Azure open ai ada model version 2 to create the embeddings for the documents.
  • Once all the embeddings are created store them in a AI vector index or database.
  • This is an iterative process and you can try different chunk size and leading and trailing sentences to get the best results.
  • Now create the LLM application using prompt flow.
  • Prompt flow will allows us to create step by step sequence to build LLM application
  • Make sure try with different prompts and parameters to get the best results.
  • Please have some questions to ask based on the documents.
  • Also have expected answers created for the questions.
  • Would be better if you can create a separate a dataset for the questions and answers to evaluate the LLM application.
  • Depending on the use case, you might want to provide more context or other parameters for evaluation
  • Evaluation is part of the development process and it is very important to get the best results.
  • Can also evaluate with other LLM available in huggingdface model hub or other Open AI models.
  • Create a evaluation score board to track the progress of the different LLM models.
  • If you want to better results iterative with different chnking strategies and prompt engineering.
  • Once we know if the LLM application is working as expected, we can move to the next step.
  • You can deploy the model usually as web service or web application or API.
  • Front end can be web Chat UI or Mobile UI.
  • We can also download the LLM application and export it and then build LLM Ops using Azure DevOps or Github Actions.
  • LLM Ops can be deployed using CLI so that we can use any Deployment tools.
  • This is one way to deploy to different environments like Dev, Test, UAT and Prod.
  • There is also single click deployment from UI to deploy to Azure App service as Web site.
  • If you want to do enterprise deployment, either create the flow using code or export the code and create deployments.
  • We can either deploy the entire end to end including development to testing and production.
  • Or we can deploy the final application with documents to testing and production.
  • Of course apply the best practicses like Security, Scale, Monitoring, Logging, Alerting, etc.
  • Scale is very important for LLM application as it is very compute intensive.
  • Scale your application based on workload requirements for the use case.
  • Plan to monitor the application for future enhancements and improvements.
  • Include more documents and improve prompts for better results.
  • Get user feedback on the responses and save that in database for further analysis.
  • Save user meta data information and other preference for providing better experience.
  • You can measure the metrics like relevance, coherence, fluency, etc.
Note: Default model is general knowledge based, so will be challenging for industry specific use cases.
We can build domain specific models or fine tune the model for better results.
  • Fine tuning will be next step to improve the LLM application.
  • We can use Azure Machine learning to fine tune open source models like LLama 2, GPT3, etc.
  • Is an iterative process and also time consuming
  • Data collection and data preparation is very important for fine tuning.
  • Creating data set is where most time will be spent.
  • Remaining process can be automated using Azure Machine learning.
  • What to fine tune and how to will change based on use case.
  • Do you want to change all the weights or only some weights.
  • Depending on the above point, will cost more time and money.
  • These models to require GPU compute, mostly A100 with 8 GPU minimum for the larger models.
  • Yes we can also use Small language models to consume or evaluate the LLM application.

Original article — Samples2024/LLMArch/llmarch.md at main · balakreshnan/Samples2024 (github.com)

--

--

Responses (1)