Skip to content

GiriGummadi/RAG_using_LangChain_FAISS_OPENAI

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

4 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

📚 Retrieval-Augmented Generation (RAG) Pipeline using LangChain + FAISS + OpenAI

This repository contains a Jupyter Notebook implementation of an end-to-end RAG pipeline built using:

🔹 LangChain – document loading, chunking, retrieval, prompting
🔹 FAISS – vector indexing + similarity search
🔹 OpenAI – embeddings + LLM generation
🔹 Python – easy-to-follow step-based workflow

This project demonstrates how Large Language Models can be grounded with retrieved knowledge to generate accurate, context-aware answers without hallucination.

🔥 What this project does

Step Description Tools Used
1. Data Ingestion Reads PDF/Text/Markdown documents LangChain Loaders
2. Chunking Splits text into manageable overlapping chunks RecursiveCharacterTextSplitter
3. Embeddings Converts chunks into numerical vectors OpenAIEmbeddings
4. Vector Storage Indexes embeddings for similarity search FAISS
5. Retrieval Retrieves context relevant to a query MMR / k-NN search
6. Augmentation Injects retrieved chunks into prompt as context PromptTemplate
7. Generation Produces final grounded answer ChatOpenAI
8. Source Transparency Returns text chunks used for the answer return_source_documents=True

The entire flow is implemented inside a single .ipynb notebook for easy demonstration.

📁 Project Structure
📦 RAG-LangChain-FAISS

├── data/ # Place your documents here
├── store/ # FAISS index will be saved/loaded here automatically
├── RAG_Pipeline.ipynb # Main notebook containing full implementation
└── README.md # (this file)
The first run requires a document inside data/.
If the folder is empty, the notebook automatically generates a sample file.

🚀 Setup Instructions

  1. Clone the project
    git clone
    cd RAG-LangChain-FAISS

  2. Create virtual environment (recommended)
    python -m venv .venv
    .venv\Scripts\activate # Windows

  3. Install dependencies
    pip install -r requirements.txt

  4. Add your OpenAI API key
    setx OPENAI_API_KEY "sk-xxxxx"

Restart terminal afterward.

▶ Run the Notebook
jupyter notebook
Open RAG_Pipeline.ipynb
Run each cell in order to execute the full pipeline.

🧠 How the pipeline works – Conceptual Flow
📄 Documents (.pdf/.txt/.md)

[1] Load & Ingest

[2] Split into Chunks

[3] Create Embeddings (OpenAI)

[4] Store in FAISS Vector Index

User asks question

[5] Retrieve Similar Chunks (MMR)

[6] Augment Prompt w/ Context

[7] Generate Answer (LLM)

🧩 Final context-grounded answer

🌟 Key Features ✔ No hallucinations — answers come from your data
✔ Local vector search (FAISS) for speed
✔ Works fully inside Jupyter (no backend server needed)
✔ Supports TXT, PDF, Markdown ingestion
✔ Transparent — prints retrieved context chunks
✔ Great starter template for production RAG systems

📌 Ideal Use Cases
🔹QA on internal knowledge documents
🔹University/Policy/Manual based answering systems
🔹Company private knowledge bots
🔹PDF summarization + question answering
🔹Personal research assistant

About

Helpful conversational assistant that provides factual answers to student's queries related to university courses, fees, graduation, financial aid, etc.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors