Live Demo: https://emotionragapp-2.streamlit.app/
This project detects facial emotions from images and uses that emotion to retrieve and summarize relevant customer reviews. Built for the Junior AI Engineer assignment.
Example: Upload a "happy" face → System finds reviews explaining why customers felt happy.
- Emotion Detection: Classifies 7 emotions (Happy, Sad, Angry, Disgust, Fear, Surprise, Neutral)
- RAG System: Links emotions to relevant reviews using FAISS vector search
- AI Summaries: Summarizes retrieved reviews using Flan-T5
- Sentiment Analysis: Analyzes sentiment of each review
- Web Interface: Streamlit app for easy interaction
- Vision Model: ResNet-50 (fine-tuned on FER-2013)
- Embeddings: sentence-transformers/all-MiniLM-L6-v2
- Vector Store: FAISS
- LLM: google/flan-t5-base
- Sentiment: cardiffnlp/twitter-roberta-base-sentiment-latest
- Framework: LangChain
- UI: Streamlit
User uploads image
↓
Emotion Detection (ResNet-50)
↓
Query Generation
↓
RAG Pipeline (FAISS + LangChain)
↓
LLM Summarization (Flan-T5)
↓
Sentiment Analysis (RoBERTa)
↓
Display Results
Model: Fine-tuned ResNet-50 on FER-2013 dataset (28,709 training images, 7 emotion classes)
Results:
- Test Accuracy: 64.8%
- Weighted F1-Score: 0.64
- Best performing: Happy (82%)
- Most challenging: Disgust (48%)
Generated 700 synthetic reviews (100 per emotion) to simulate customer feedback for each emotion.
- Embeddings: Converted reviews to vectors using
all-MiniLM-L6-v2 - Vector Store: Stored in FAISS index for fast similarity search
- Retrieval: LangChain retrieves top-4 relevant reviews
- Summarization: Flan-T5 generates concise summaries
- Sentiment: RoBERTa classifies each review sentiment
- ResNet-50: Proven CNN architecture, faster training than ViT for small datasets
- FAISS: Efficient local vector search, perfect for 700 reviews
- LangChain: Simplified RAG pipeline development
- Flan-T5: Lightweight, good summarization without API costs
Image (bytes) → Emotion (string) → Query (string) → Embeddings (vector) → FAISS Search → Reviews (text) → LLM Summary → UI Display
Deployment: Would use FastAPI with 3 endpoints:
POST /predict_emotion- Image → EmotionPOST /query_rag- Query → Summary + ReviewsPOST /analyze_sentiment- Text → Sentiment
For millions of reviews:
- Vector DB: Switch to Pinecone or Weaviate (managed, distributed)
- Embeddings: Upgrade to
all-mpnet-base-v2or API-based models - Indexing: Use HNSW index for faster search
- Vision Bias: FER-2013 has demographic imbalances. Solution: Retrain on diverse datasets (FairFace, RAF-DB)
- NLP Bias: Synthetic reviews may reinforce stereotypes. Solution: Use real, diverse review data + user feedback system
AI Engineer assignment/
├── Deliverables (Stage 1)/
│ ├── model/ # Fine-tuned ResNet-50
│ ├── confusion_matrix.png
│ ├── pre_Class acuuracy.png
│ ├── evaluation_metrics.txt
│ └── stage1_predictions.csv
├── Deliverables (Stage 2)/
│ ├── rag/ # FAISS index files
│ ├── generated_reviews.csv
│ └── query_results.csv
├── Deliverables (Stage 3)/ # App screenshots
├── AI_part1.ipynb # Stage 1: Model training
├── AI_part2.ipynb # Stage 2: RAG pipeline
├── model_utils.py # Core functions
├── streamlit_app.py # Web interface
├── requirements.txt
└── README.md
- Python 3.8+
- Git
clone it first
# Create virtual environment
python -m venv venv
source venv/bin/activate # On Windows: venv\Scripts\activate
# Install dependencies
pip install -r requirements.txt
# Run app
streamlit run streamlit_app.pyApp opens at http://localhost:8501
First Run: Downloads ~2GB of models (cached for future runs)
Stage 1 (40 pts): ResNet-50 trained on FER-2013, 64.8% accuracy
Stage 2 (40 pts): RAG pipeline with LangChain, FAISS, sentiment analysis
Stage 3 (20 pts): Architecture design, scalability analysis, ethics
Bonus (+10 pts): Full Streamlit app with image upload & query tabs
Total: 110/100 points
/confusion_matrix.png)
/pre_Class%20acuuracy.png)
/happy.png)
/angry.png)
/sad.png)
/surprise.png)