GitHub - GiriGummadi/RAG_using_LangChain_FAISS_OPENAI: Helpful conversational assistant that provides factual answers to student's queries related to university courses, fees, graduation, financial aid, etc.

📚 Retrieval-Augmented Generation (RAG) Pipeline using LangChain + FAISS + OpenAI

This repository contains a Jupyter Notebook implementation of an end-to-end RAG pipeline built using:

🔹 LangChain – document loading, chunking, retrieval, prompting
🔹 FAISS – vector indexing + similarity search
🔹 OpenAI – embeddings + LLM generation
🔹 Python – easy-to-follow step-based workflow

This project demonstrates how Large Language Models can be grounded with retrieved knowledge to generate accurate, context-aware answers without hallucination.

🔥 What this project does

Step	Description	Tools Used
1. Data Ingestion	Reads PDF/Text/Markdown documents	LangChain Loaders
2. Chunking	Splits text into manageable overlapping chunks	`RecursiveCharacterTextSplitter`
3. Embeddings	Converts chunks into numerical vectors	`OpenAIEmbeddings`
4. Vector Storage	Indexes embeddings for similarity search	FAISS
5. Retrieval	Retrieves context relevant to a query	MMR / k-NN search
6. Augmentation	Injects retrieved chunks into prompt as context	`PromptTemplate`
7. Generation	Produces final grounded answer	`ChatOpenAI`
8. Source Transparency	Returns text chunks used for the answer	`return_source_documents=True`

The entire flow is implemented inside a single .ipynb notebook for easy demonstration.

📁 Project Structure
📦 RAG-LangChain-FAISS
│
├── data/ # Place your documents here
├── store/ # FAISS index will be saved/loaded here automatically
├── RAG_Pipeline.ipynb # Main notebook containing full implementation
└── README.md # (this file)
The first run requires a document inside data/.
If the folder is empty, the notebook automatically generates a sample file.

🚀 Setup Instructions

Clone the project
git clone
cd RAG-LangChain-FAISS
Create virtual environment (recommended)
python -m venv .venv
.venv\Scripts\activate # Windows
Install dependencies
pip install -r requirements.txt
Add your OpenAI API key
setx OPENAI_API_KEY "sk-xxxxx"

Restart terminal afterward.

▶ Run the Notebook
jupyter notebook
Open RAG_Pipeline.ipynb
Run each cell in order to execute the full pipeline.

🧠 How the pipeline works – Conceptual Flow
📄 Documents (.pdf/.txt/.md)
│
[1] Load & Ingest
│
[2] Split into Chunks
│
[3] Create Embeddings (OpenAI)
│
[4] Store in FAISS Vector Index
│
User asks question
│
[5] Retrieve Similar Chunks (MMR)
│
[6] Augment Prompt w/ Context
│
[7] Generate Answer (LLM)
↓
🧩 Final context-grounded answer

🌟 Key Features ✔ No hallucinations — answers come from your data
✔ Local vector search (FAISS) for speed
✔ Works fully inside Jupyter (no backend server needed)
✔ Supports TXT, PDF, Markdown ingestion
✔ Transparent — prints retrieved context chunks
✔ Great starter template for production RAG systems

📌 Ideal Use Cases
🔹QA on internal knowledge documents
🔹University/Policy/Manual based answering systems
🔹Company private knowledge bots
🔹PDF summarization + question answering
🔹Personal research assistant

Name		Name	Last commit message	Last commit date
Latest commit History 4 Commits
data		data
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
rag_langchain_faiss.ipynb		rag_langchain_faiss.ipynb
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages