Skip to content
View TheSkyBiz's full-sized avatar
😌
paso corto, vista larga
😌
paso corto, vista larga

Block or report TheSkyBiz

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Maximum 250 characters. Please don’t include any personal information such as legal names or email addresses. Markdown is supported. This note will only be visible to you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
TheSkyBiz/README.md

Hey, I'm Aakash 👋

Software Engineer (AI/ML) • LLM Systems • Reliability • RAG

I build AI systems that actually work in the real world ⚙️ (not just on benchmarks)


🚀 What I'm Doing Right Now

  • 🧠 Building a Confidence-Guided LLM Routing System (Thesis)
  • ⚙️ Learning System Design, APIs, FastAPI, Docker
  • 📊 Exploring LLM Evaluation, Calibration & Reliability
  • 💻 Practicing DSA + Software Engineering fundamentals

🧩 What I've Built

  • 🔹 LLM Routing System
    → Dynamic routing between small & large models using uncertainty

  • 🔹 Text-to-SQL (RAG System)
    → Natural language → SQL using FAISS + FastAPI

  • 🔹 LLM Calibration & Reliability Analysis
    → Fixing confidence–accuracy mismatch (ECE, temp scaling)

  • 🔹 Adversarial LLM Evaluation
    → Testing robustness & persona drift in language models

  • 🔹 Financial Risk Prediction System
    → ML pipeline with ROC-AUC up to 0.97


⚙️ Tech Stack

Languages: Python
ML/DL: PyTorch · Transformers · scikit-learn
LLM Systems: RAG · Prompt Engineering · LoRA
Backend: FastAPI · REST APIs
Infra: FAISS · SQLite · (learning Docker 🐳)
Data: Pandas · NumPy


🧠 What Interests Me

  • LLM Systems & Multi-model Architectures
  • Model Reliability & Failure Analysis
  • Cost vs Performance Trade-offs in AI
  • Building real-world ML systems end-to-end

🎮 Beyond Code

  • 🏎️ Formula 1 enthusiast (race weekends > everything)
  • ⚽ Football buff
  • 🎮 Competitive & story-driven games
  • 🎧 Music to reset the brain

🌐 Connect With Me

💼 LinkedIn
📧 [email protected]


Precision matters — whether in model calibration or a last-lap overtake.

Pinned Loading

  1. llm-persona-drift-evaluation llm-persona-drift-evaluation Public

    945-generation adversarial evaluation of 3 open LLMs across 3 personas and 20 attack types, measuring semantic drift, override rates, and distributional instability.

    Python

  2. confidence-correctness-mismatch-analyser confidence-correctness-mismatch-analyser Public

    Behavioral reliability study of extractive QA models on 500 SQuAD samples. Compares DistilBERT (70% accuracy) and RoBERTa (92.6% accuracy), analyzing weighted ECE, overconfidence (0.112 vs 0.016), …

    Jupyter Notebook

  3. instruction-fine-tuning-sarvam-2B instruction-fine-tuning-sarvam-2B Public

    This project explores instruction fine-tuning of the Sarvam AI 2B model using LoRA to improve response efficiency without full retraining. Only ~0.13% of parameters are trained on a small synthetic…

    Python

  4. trust-agent trust-agent Public

    TrustAgent is a lightweight LLM reliability pipeline that pairs a fast generator with a stronger critic to evaluate responses before they are trusted. It uses asymmetric model architecture, structu…

    Python

  5. genai-multi-agent-experiments genai-multi-agent-experiments Public

    Hands-on experiments with multi-agent systems powered by Gemini API and Generative AI.

    Jupyter Notebook

  6. yt-video-summariser yt-video-summariser Public

    YouTube Video Summarizer is a Streamlit web app that extracts transcripts from YouTube videos, generates concise and detailed AI summaries, and answers user questions using advanced semantic search…

    Python 1