Enterprise-grade Retrieval-Augmented Generation (RAG) platform enabling bilingual question-answering with citation-verified document retrieval using vector embeddings and FAISS indexing.
The RAG Multilingual QA System is a production-oriented AI knowledge engine designed to deliver fact-verified answers from structured English and Arabic document repositories. It implements a full Retrieval-Augmented Generation pipeline including ingestion, chunking, semantic embedding, vector indexing, query understanding, retrieval ranking, and citation-driven answer synthesis.
This system is designed for enterprise knowledge bases, regulatory compliance environments, multilingual customer support, internal documentation search, and AI-assisted information systems requiring explainability and traceability.
- ๐ท๏ธ Project Title
- ๐งพ Executive Summary
- ๐ Table of Contents
- ๐งฉ Project Overview
- ๐ฏ Objectives & Goals
- โ Acceptance Criteria
- ๐ป Prerequisites
- โ๏ธ Installation & Setup
- ๐ API Documentation
- ๐ฅ๏ธ UI / Frontend
- ๐ข Status Codes
- ๐ Features
- ๐งฑ Tech Stack & Architecture
- ๐ ๏ธ Workflow & Implementation
- ๐งช Testing & Validation
- ๐ Validation Summary
- ๐งฐ Verification Testing Tools
- ๐งฏ Troubleshooting & Debugging
- ๐ Security & Secrets
- โ๏ธ Deployment
- โก Quick-Start Cheat Sheet
- ๐งพ Usage Notes
- ๐ง Performance & Optimization
- ๐ Enhancements & Features
- ๐งฉ Maintenance & Future Work
- ๐ Key Achievements
- ๐งฎ High-Level Architecture
- ๐๏ธ Project Structure
- ๐งญ How to Demonstrate Live
- ๐ก Summary, Closure & Compliance
This project implements a bilingual Retrieval-Augmented Generation system that answers user queries by dynamically retrieving the most relevant knowledge from an indexed corpus of English and Arabic documents.
Unlike traditional LLM chatbots, this system never fabricates data. Every answer is grounded in retrieved document chunks and delivered with full citations.
Core Data Flow:
User โ Language Detection โ Query Embedding โ FAISS Vector Search โ Top-K Chunks โ Prompt Construction โ LLM / Mock Generator โ Answer + Citations
- Build a bilingual (AR/EN) knowledge retrieval system
- Guarantee answer grounding with citations
- Support mock mode (no paid API required)
- Enable CLI and Web-based interaction
- Maintain low-latency, low-cost execution
- Provide production-ready modular architecture
| Requirement | Compliance |
|---|---|
| AR + EN documents indexed | Yes (10 files) |
| Semantic vector search | FAISS implemented |
| Citations provided | Yes |
| Mock mode supported | Yes |
| CLI & Web UI | FastAPI + CLI |
| Latency metrics | Included |
| Endpoint | Method | Description |
|---|---|---|
| /query | POST | Accepts user question and language code and returns answer with citations |
| /health | GET | Service health check |
API Flow:
Client โ FastAPI โ Language Detection โ Retriever โ Generator โ Response JSON
- CLI-based interactive prompt
- FastAPI JSON-based web interface
- Input fields: question, language
- Output: answer, citations, source file list
- Network calls handled via REST over HTTP
- UI logic located in src/web_app.py
| Code | Meaning |
|---|---|
| 200 | Query processed successfully |
| 400 | Invalid query or missing parameters |
| 500 | Vector engine or model failure |
- Multilingual embeddings for Arabic and English
- FAISS vector similarity search with cosine similarity
- Chunk-based retrieval for high recall
- Document-level and chunk-level citation generation
- Mock LLM for offline testing
- FastAPI-powered REST interface
- CLI-driven batch Q&A execution
- Latency and cost observability
- Pluggable embedding and LLM providers
| Layer | Technology |
|---|---|
| Language | Python 3.10+ |
| Vector Engine | FAISS |
| Embeddings | SentenceTransformers / OpenAI |
| Web API | FastAPI |
| Testing | pytest |
| Packaging | Docker |
ASCII Architecture Diagram:
โโโโโโโโโโโโโโ
โ Documents โ
โโโโโโโฌโโโโโโโ
โ
โโโโโโโโผโโโโโโโโ
โ Chunker โ
โโโโโโโโฌโโโโโโโโ
โ
โโโโโโโโผโโโโโโโโ
โ Embeddings โ
โโโโโโโโฌโโโโโโโโ
โ
โโโโโโโโผโโโโโโโโ
โ FAISS Index โ
โโโโโโโโฌโโโโโโโโ
โ
User Query โ Embedding โ Vector Search โ Top-K Chunks โ Generator โ Answer + Sources
- Load English and Arabic documents from the data directory
- Split each file into semantic chunks
- Convert each chunk into a vector embedding
- Store embeddings in FAISS vector index
- User submits a query (CLI or API)
- Query is embedded
- FAISS retrieves top-K closest chunks
- Chunks are injected into a generation prompt
- Mock or OpenAI LLM produces answer
- Citations are attached from source documents
| ID | Area | Test | Expected Result |
|---|---|---|---|
| T1 | Indexing | FAISS build | Vector index created |
| T2 | Query | English Q&A | Correct answer returned |
| T3 | Arabic | Arabic Q&A | Correct retrieval |
| T4 | Mock Mode | No API call | Offline success |
All major system components were validated including ingestion, vector search, multilingual embeddings, citation accuracy, and mock-mode execution. Both Arabic and English pipelines achieved deterministic retrieval and reproducible responses.
- pytest for automated regression testing
- FAISS vector consistency validation
- CLI-based functional testing
- FastAPI request validation
- Missing FAISS index โ rebuild vector store
- Zero search results โ verify embedding model
- Wrong language output โ check langdetect
- Slow responses โ reduce chunk size or top-K
- API errors โ verify environment variables
- API keys stored in .env file
- No secrets committed to GitHub
- Mock mode avoids external calls
- All network calls encrypted over HTTPS
- Local: Python + FastAPI
- Dockerized deployment for production
- Cloud compatible with AWS, DigitalOcean, GCP
- Stateless API with persistent FAISS volume
- Build index
- Run CLI for Q&A
- Start FastAPI for web usage
- Use mock mode for offline testing
- Always rebuild index after document changes
- Arabic queries auto-detected
- Top-K chunks configurable
- FAISS IVF indexes for large corpora
- Batch embedding for faster ingestion
- GPU-accelerated FAISS supported
- PDF and DOCX ingestion
- Multilingual expansion
- Hybrid BM25 + vector search
- Role-based access control
- Scheduled index rebuilds
- Document versioning
- Semantic caching
- LLM fine-tuning
- Full bilingual RAG pipeline
- Explainable AI via citations
- Mock + production modes
- Enterprise-grade modular design
User โ API / CLI โ Language Detection โ Embedding Engine โ FAISS Index โ Top-K Chunks โ Prompt Assembler โ LLM / Mock Generator โ Answer + Source Files
rag-multilingual-qa-system/ โ โโโ data/ โ โโโ product_catalog_en.txt โ โโโ product_catalog_ar.txt โ โโโ warranty_policy_en.txt โ โโโ warranty_policy_ar.txt โ โโโ safety_manual_en.txt โ โโโ safety_manual_ar.txt โ โโโ company_policy_en.txt โ โโโ company_policy_ar.txt โ โโโ technical_specs_en.txt โ โโโ technical_specs_ar.txt โ โโโ src/ โ โโโ config.py โ โโโ ingest.py โ โโโ chunker.py โ โโโ embedder.py โ โโโ indexer.py โ โโโ retriever.py โ โโโ generator.py โ โโโ cli_app.py โ โโโ web_app.py โ โโโ tests/ โโโ build_index.py โโโ qa_cli.py โโโ Dockerfile โโโ requirements.txt โโโ README.md
This RAG Multilingual QA System satisfies all enterprise-grade AI knowledge system requirements including explainability, multilingual support, deterministic retrieval, testability, and deployment readiness.
The architecture aligns with modern GenAI compliance standards for:
- Source traceability
- Model governance
- Data integrity
- Regulatory-safe AI usage
This solution is suitable for regulated industries, enterprise knowledge bases, legal research, support automation, and multilingual document intelligence platforms.