Retrieval Augmented Generation (RAG) is a technique that combines the power of retrieval and generation to improve the performance of LLMs.
RAG tickles the following problems:
LLM knowledge could be outdated.
LLM knowledge could have no source, since its not trained on it.
let LLM says “I don’t know” instead of hallucinating.
How RAG works?
Basically giving the LLM power to access a pile of documents (database) and retrieve the most relevant information to answer the question.
LLM basically does the following:
Retrieval: LLMs call on a vector database to retrieve the most relevant information.
Generation: Generate the answer based on the retrieved information.
The Vector Database
The vector database is a database that stores the embeddings of the documents. It is an vector representing the structured or unstructured data for the LLM to query.
We first need to convert the documents into embeddings.
Download the documentations from the code library, and convert them into markdown format.
Chunk the documents into smaller size.
Extract Feature into Embeddings
Text Preprocessing and Chunking
Preprocessing the text. Documentations are usually in rst format. We first prase the text into markdown format.
Chunking is the process of splitting large documents into smaller, manageable segments (chunks) before storing them in a retrieval system. This ensures that relevant information can be efficiently retrieved and used by the language model.
Used in my experiment because each document in the code library maybe too large.
Handling Large Documents: If entire documents were retrieved, they might exceed the token limit.
Improving Retrieval Efficiency: easier to retrieve highly relevant sections.
Boosting Retrieval Accuracy: the retriever finds the most precise and useful information.
for doc in all_docs: if doc.metadata['source'].endswith('.md'): temp_docs = markdown_splitter.split_documents([doc]) for temp_doc in temp_docs: temp_doc.page_content = f"Source: {doc.metadata['source']}\n\n{temp_doc.page_content}" split_docs.extend(temp_docs) elif doc.metadata['source'].endswith('.py'): temp_docs = python_splitter.split_documents([doc]) for temp_doc in temp_docs: temp_doc.page_content = f"Source: {doc.metadata['source']}\n\n{temp_doc.page_content}" split_docs.extend(temp_docs)
from langchain_community.vectorstores import Chroma
def_get_embedding_function(self) -> Embeddings: """Returns a function that uses litellm for embeddings.""" classLiteLLMEmbeddings(Embeddings): def__init__(self, embedding_model): self.embedding_model = embedding_model