Extractive Question Answering with ChromaDB, and Gemini

This project implements an Extractive Question Answering (EQA) system that extracts answers from a set of downloaded text files based on user queries. It utilizes ChromaDB as a vector database to store document embeddings for efficient retrieval of relevant information. Gemini is then used to refine the extracted information, providing more relevant, concise, and human-like responses.

Overview

The project follows these steps:

Data Download: The script downloads the compressed archive "new_articles.zip" from Dropbox using the provided link.
Text Extraction: The downloaded archive is unzipped, and all text files within are extracted.
Text Preprocessing: Each text file is preprocessed to clean and normalize the text content (optional).
Embedding Generation: Embeddings are created for each preprocessed text document using a chosen embedding model (e.g., Sentence Transformers).
Data Storage in ChromaDB: Text documents and their corresponding embeddings are stored in ChromaDB.
Retrieval: When a user asks a question, the system generates an embedding for the query and retrieves the most similar documents from ChromaDB using cosine similarity.
Response Refinement with Gemini: The retrieved text snippets are passed to the Gemini LLM, which refines the information to provide a more relevant, concise, and human-like response to the user's query.

Technologies Used

ChromaDB: Vector database for storing and retrieving embeddings.
Hugging Face Transformers (optional): For generating document embeddings.
LangChain (optional): For streamlining document preprocessing and loading.
Python: Programming language for implementation.
Requests: To download the data from Dropbox.
File manipulation libraries (e.g., os, zipfile): For handling file downloads and extraction.
Gemini: Large Language Model for refining extracted information and generating human-like responses.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

README.md

Extractive Question Answering with ChromaDB, and Gemini

Overview

Technologies Used

Files

README.md

Latest commit

History

README.md

File metadata and controls

Extractive Question Answering with ChromaDB, and Gemini

Overview

Technologies Used