MDT Library - RAG Chatbot - Experimental Project
This project was developed and tested as part of our experimental lane and is a made as a simpel proof-of-concept to get familiar with and experiment with the various parts of a RAG-setup.
The example use-case is a chatbot, allowing us to interact with our Confluence IT documentation in natural language. As a fun twist, it is named after our recently retired colleague Peter, who has been a great source of knowledge.
Key Findings & Experimentation Details
To learn more about our experiments and findings, please refer to our presentation given during the WUR Model & Data Day.
Collaboration Invitation
We believe that innovation thrives on collaboration. We invite LLM enthusiasts, researchers, and developers to join the LLM Focus Group. Your insights and shared experiments are highly welcome. Feel free to contact Cristina Huidiu for more information cristina.huidiu@wur.nl.
We're currently planning multiple concrete challenges that still need to be tackled, and will be reaching out for volunteers in due time.
Disclaimer
Please be aware that the code in this repository is provided "as-is" without any support or guarantee. Use it, modify it, or improve it as you see fit, but understand that this is an experimental initiative.
Warnings ⚠️
-
This project has been created mostly in my free time and is for educational purposes only. It is not intended for production use. The code is not optimized for performance or security, nor does it have proper error handling or any test-coverage.
-
This project allows you to use your own data. Note that quality of input data is crucial for the quality of the final output.
-
An important todo that's still open: Implement a way to structurally assess quality and reliability of a whole RAG chain and the individual components.
-
Please note that
allow_dangerous_deserialization
is set to true for loading an earlier created FAISS index from disk.
Used packages
- Huggingface Transformers to download & load the LLM model
- Default model: Gemma 2 9B (using 4 bits quantization)
- Streamlit for the chatbot interface
- Langchain for:
- PDF, HTML, Markdown Document loaders
- Embeddings Encoder
- FAISS DB for similarity search
- SBert Sentence Transformers for cross encoder & reranking
Main components
The project consists of two important main components:
- A pipeline/script to build the RAG index from a list of documents.
- An interface to interact with the RAG index using natural language.
Note: this project does not link to a Confluence instance, but uses a simple index and a few example documents to demonstrate the RAG concept. This allows you to easily test and keep control over the data.
Setting up a synchronization pipeline to Confluence can be done using the Confluence API but is not part of this project.
Architectural overview
Note: below diagrams are a high level overview and might skip some details for the sake of simplicity.
Indexing pipeline
RAG
Folder structure
Primary folders:
-
documents
: Should contain the documents to be indexed by the vector DB, including a requiredindex.yaml
file with metadata. -
src
: Contains the source code. -
config
: Config files for LLM models including system prompts to use. -
storage
: Cache for faiss-index and downloaded LLM models.
Other folders:
-
.streamlit
: Streamlit UI config. -
docs
: Anything related to the README.md documentation.
Components used to build the RAG index
-
src/rag/simple_document_index.py
: Loads the document index as defined in thedocuments/index.yaml
file, including metadata. -
src/rag/pdf_processor.py
: Extracts text from PDF files, splits text it into chunks of a predefined size and appends metadata. -
src/rag/encoder.py
: Used to encode the text chunks into embeddings. -
src/rag/faiss_db.py
: Used to store the embeddings in a FAISS database for fast similarity search.
Components used for the Chat interface
-
src/app.py
: Main entrypoint: The Streamlit app that allows you to interact with the chatbot -
src/model/context_retriever.py
: Retrieves the most relevant context from the RAG index given a question-
src/rag/faiss_db.py
: Used to retrieve the most similar context from the FAISS database -
src/rag/reranker.py
: Reranks the retrieved contexts using SBert cross encoder -
src/rag/context_extender.py
: Extends the context to the whole source document if enabled
-
-
src/utils/response_generator.py
: Generates the response given a question and context-
src/model/prompt_builder.py
: Generates the system prompt for the LLM model -
src/model/llm_model.py
: Loads the LLM model and generates answers given a question and context. Returns output as a stream. -
src/utils/audio_generator.py
: (Optional, not used without configuration.) Automatically plays an audio fragment for given words. Was used as a fun extra.
-
How to install
Prerequisites
- Python 3.8 or higher
- An Nvidia GPU with at least 16GB of VRAM
- CUDA drivers
- Pytorch with CUDA support (see Pytorch installation guide)
Installation
- Clone the repository
- Install the required packages with
pip install -r requirements.txt
(Note: pip is used here. You need to create a venv yourself if desired.) - Manually install the correct Pytorch version with GPU support: (see Pytorch installation guide. You'll only need the
torch
package and can skiptorchvision
andtorchaudio
.) - Create an
.env
file in the root of the project with the following content:HF_ACCESS_TOKEN=Put your Hugging Face access token here (required for downloading models)
How to run
Step 1. Creating a RAG index
- Add documents you want to index to the
documents
folder. Currently supported file-extensions: .pdf, .html, .md - Copy
documents/example_index.yaml
todocuments/index.yaml
file and add your PDF documents, including metadata. This metadata will be used to show reference links below answers. - Run the
build-rag-index.py
script to create the RAG index from your documents.
Step 2. Running the chatbot
- You can start the Chatbot using
streamlit run src/app.py
. - This will start a Streamlit server at
http://localhost:8501
.
Explanation of settings in Chatbot sidebar
LLM
- max_new_tokens: The maximum number of tokens the LLM model can generate in one go.
RAG & Reranking
- k1 (rag): The number of similar contexts to retrieve from the FAISS database.
- k2 (rerank): The number of contexts to keep after reranking has been applied.
- threshold (rerank): The threshold for the SBert cross encoder to consider a context relevant. Results below this threshold will be discarded.
- Expand context: If enabled, the found context chunks will be expanded to the whole source documents. Generally produces better results, but can cause context-length issues.
Other
- Funny response chance: The chance (0.0-1.0) that a funny prompt will be used instead of the default prompt. (Disabled by default.)
Debug
- Show RAG context: Displays an overview of retrieved documents at various steps in the RAG pipeline.
- Disable LLM response: Disables the LLM response generation. Useful for debugging the RAG pipeline without running inference on the LLM model.
Using a different LLM model
LLM models can easily be swapped using a config. As different models react differently to our system prompts,
you can also define custom system prompt templates for each model. See the /config
folder for examples.
Supplied models
The following model configurations are supplied with the project:
Model | Multi-language support | Parameters | Max. context length | General notes |
---|---|---|---|---|
gemma-2b-it.yaml | Very poor | 2B | 8192 | Very small but fast model, with fast and decent responses. |
gemma-2-2b-it.yaml | Poor | 2B | 8192 | Smallest Gemma 2 version, a bit more powerful than the above model. |
gemma-2-9b-it.yaml (default) | Decent/Good | 9B | 8192 | Gemma 2 9B version. More powerful, and decent multilanguage support. Tested with EN/NL/DE |
Add a MODEL_CONFIG_FILE=name_of_the_model_config.yaml
to the .env
file in the root of your project to switch to a different model.
How to add a new model
- Create a new model yaml config file in the
/config
folder. You can use theconfig/gemma-2-9b-it.yaml
as an example. Note that different models might require different system prompts. The system prompts can also be defined in the config file. - Add a
MODEL_CONFIG_FILE=name_of_your_config.yaml
to the.env
file in the root of your project. - (Re)start the chatbot to use the new model.
See huggingface for available models: https://huggingface.co/models?pipeline_tag=text-generation
License
The MIT License (MIT). Please see License File for more information.