Extractive Question Answering with ChromaDB, and Gemini

This project implements an Extractive Question Answering (EQA) system that extracts answers from a set of downloaded text files based on user queries. It utilizes ChromaDB as a vector database to store document embeddings for efficient retrieval of relevant information. Gemini is then used to refine the extracted information, providing more relevant, concise, and human-like responses.

Overview

The project follows these steps:

Data Download: The script downloads the compressed archive "new_articles.zip" from Dropbox using the provided link.
Text Extraction: The downloaded archive is unzipped, and all text files within are extracted.
Text Preprocessing: Each text file is preprocessed to clean and normalize the text content (optional).
Embedding Generation: Embeddings are created for each preprocessed text document using a chosen embedding model (e.g., Sentence Transformers).
Data Storage in ChromaDB: Text documents and their corresponding embeddings are stored in ChromaDB.
Retrieval: When a user asks a question, the system generates an embedding for the query and retrieves the most similar documents from ChromaDB using cosine similarity.
Response Refinement with Gemini: The retrieved text snippets are passed to the Gemini LLM, which refines the information to provide a more relevant, concise, and human-like response to the user's query.

Technologies Used

ChromaDB: Vector database for storing and retrieving embeddings.
Hugging Face Transformers (optional): For generating document embeddings.
LangChain (optional): For streamlining document preprocessing and loading.
Python: Programming language for implementation.
Requests: To download the data from Dropbox.
File manipulation libraries (e.g., os, zipfile): For handling file downloads and extraction.
Gemini: Large Language Model for refining extracted information and generating human-like responses.

Name		Name	Last commit message	Last commit date
Latest commit History 4 Commits
LICENSE		LICENSE
RAG_with_ChromaDB.ipynb		RAG_with_ChromaDB.ipynb
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Extractive Question Answering with ChromaDB, and Gemini

Overview

Technologies Used

About

Releases

Packages

Languages

License

Farhaj499/RAG_with_ChromaDB

Folders and files

Latest commit

History

Repository files navigation

Extractive Question Answering with ChromaDB, and Gemini

Overview

Technologies Used

About

Topics

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages