A semantic search and question-answering system for research papers using Retrieval-Augmented Generation (RAG) and Large Language Models (LLMs).
This system enables users to:
- Upload research papers (PDF)
- Ask natural language questions
- Receive concise, context-grounded answers with traceable sources
- 📄 PDF parsing with text cleaning
- 🔍 Section segmentation & semantic chunking
- 🧬 Embedding generation with
all-MiniLM-L6-v2 - 📦 Vector indexing with Pinecone
- 🤖 Contextual answer generation using Gemini 1.5 Pro
- 🌐 Web UI for upload and Q&A
- ✅ Transparent answers with source document snippets
chmod +x run_app.sh
./run_app.sh- Clone the repository
git clone https://github.com/ajay-del-bot/research_paper_RAG_chain.git
cd research_paper_RAG_chain- Create and activate a virtual environment
# For Linux/macOS
python3 -m venv venv
source venv/bin/activate
# For Windows
python -m venv venv
venv\Scripts\activate- Install dependencies
pip install -r requirements.txt- Set up environment variables in
.env
PINECONE_API_KEY=YOUR_PINNECONE_API_KEY
GOOGLE_API_KEY=YOUR_GOOGLE_API_KEY
INDEX_NAME='test-db'python3 src/server.py
- Support for tables, figures, equations
- Better layout handling for multi-column PDFs
- User authentication & session history
- Integration with multiple LLMs