This repository provides a ready-to-use Adaptive Retrieval-Augmented Generation (RAG) template using Pathway. It enables you to build, configure, and run a document-based question-answering system with support for both Gemini and OpenAI models.
- Features
- Installation
- Environment Variables (
.env) - YAML Configuration
- Running the Application
- Making Requests (cURL Example)
- References
- Document Ingestion: Reads and indexes documents from the
data/folder. - Flexible LLM Support: Works with Gemini and OpenAI models.
- Configurable via YAML: Easily adjust data sources, models, embedders, and more.
- REST API: Exposes endpoints for question answering.
- Caching: Supports disk and memory caching for efficiency.
For detailed step-by-step instructions, refer to the "Day 1 GenAI Hackathon IIT Jodhpur.pdf.pdf" in this repository.
# 1. Clone the repository
git clone https://github.com/AISocietyIITJ/pathway_hackathon.git
cd pathway_hackathon
# 2. Create a virtual environment using uv (recommended for speed and reproducibility)
uv venv venv --python 3.10
source venv/bin/activate
# 3. Install all dependencies (including Pathway and extras)
uv pip install "pathway[all]"# 1. Clone the repository
git clone https://github.com/AISocietyIITJ/pathway_hackathon.git
cd pathway_hackathon
# 2. Build the Docker image
docker build -t my-pathway-app .
# 3. Run the container (maps your current directory to /app in the container)
docker run -it --rm -v %cd%:/app -p 8008:8000 my-pathway-appCreate a .env file in the root directory with the following structure:
# For Gemini (Google) API
GEMINI_API_KEY="your-gemini-api-key"
# For OpenAI API (if using OpenAI models)
OPENAI_API_KEY="your-openai-api-key"- Get your Gemini API key from Google Cloud Console.
- Get your OpenAI API key from OpenAI.
The application is configured using YAML files (app.yaml or app_hydrid_retriever.yaml). These files define:
- Data Sources: Where your documents are loaded from (e.g., local
data/folder). - LLM Model: Which language model to use (Gemini, OpenAI, etc.).
- Embedder: For text embeddings (GeminiEmbedder, OpenAIEmbedder).
- Splitter & Parser: How documents are chunked and parsed.
- Retriever: How documents are indexed and retrieved (BruteForce, Hybrid, etc.).
- Server Settings: Host, port, caching, and error handling.
To modify the pipeline, edit the relevant YAML file.
For more details, see comments in app.yaml and Pathway YAML documentation.
Make sure your .env and YAML files are set up.
# Run the main application
python main.py- By default, the server will start at
http://0.0.0.0:8000. - You can change the host and port in the YAML file.
To query the API, use the following cURL command:
curl --location 'http://localhost:8000/v2/answer' \
--header 'accept: */*' \
--header 'Content-Type: application/json' \
--data '{
"prompt": "Give me highlights of Q3 Financial Summay"
}'- Add your query in
'prompt'field - The server will return a JSON response with the answer.
- Pathway Documentation
- YAML Configuration Guide
- Day 1 GenAI Hackathon IIT Jodhpur.pdf.pdf (for detailed setup)
Happy Hacking!