Skip to content

AISocietyIITJ/pathway_hackathon

Repository files navigation

Pathway Hackathon: Adaptive RAG Template

This repository provides a ready-to-use Adaptive Retrieval-Augmented Generation (RAG) template using Pathway. It enables you to build, configure, and run a document-based question-answering system with support for both Gemini and OpenAI models.


Table of Contents


Features

  • Document Ingestion: Reads and indexes documents from the data/ folder.
  • Flexible LLM Support: Works with Gemini and OpenAI models.
  • Configurable via YAML: Easily adjust data sources, models, embedders, and more.
  • REST API: Exposes endpoints for question answering.
  • Caching: Supports disk and memory caching for efficiency.

Installation

For detailed step-by-step instructions, refer to the "Day 1 GenAI Hackathon IIT Jodhpur.pdf.pdf" in this repository.

Linux

# 1. Clone the repository
git clone https://github.com/AISocietyIITJ/pathway_hackathon.git
cd pathway_hackathon

# 2. Create a virtual environment using uv (recommended for speed and reproducibility)
uv venv venv --python 3.10
source venv/bin/activate

# 3. Install all dependencies (including Pathway and extras)
uv pip install "pathway[all]"

Windows

# 1. Clone the repository
git clone https://github.com/AISocietyIITJ/pathway_hackathon.git
cd pathway_hackathon

# 2. Build the Docker image
docker build -t my-pathway-app .

# 3. Run the container (maps your current directory to /app in the container)
docker run -it --rm -v %cd%:/app -p 8008:8000 my-pathway-app

Environment Variables (.env)

Create a .env file in the root directory with the following structure:

# For Gemini (Google) API
GEMINI_API_KEY="your-gemini-api-key"

# For OpenAI API (if using OpenAI models)
OPENAI_API_KEY="your-openai-api-key"

YAML Configuration

The application is configured using YAML files (app.yaml or app_hydrid_retriever.yaml). These files define:

  • Data Sources: Where your documents are loaded from (e.g., local data/ folder).
  • LLM Model: Which language model to use (Gemini, OpenAI, etc.).
  • Embedder: For text embeddings (GeminiEmbedder, OpenAIEmbedder).
  • Splitter & Parser: How documents are chunked and parsed.
  • Retriever: How documents are indexed and retrieved (BruteForce, Hybrid, etc.).
  • Server Settings: Host, port, caching, and error handling.

To modify the pipeline, edit the relevant YAML file.
For more details, see comments in app.yaml and Pathway YAML documentation.


Running the Application

Make sure your .env and YAML files are set up.

# Run the main application
python main.py
  • By default, the server will start at http://0.0.0.0:8000.
  • You can change the host and port in the YAML file.

Making Requests (cURL Example)

To query the API, use the following cURL command:

curl --location 'http://localhost:8000/v2/answer' \
  --header 'accept: */*' \
  --header 'Content-Type: application/json' \
  --data '{
    "prompt": "Give me highlights of Q3 Financial Summay"
}'
  • Add your query in 'prompt' field
  • The server will return a JSON response with the answer.

References


Happy Hacking!


About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors