Anthony Lui 2096955

README

I don't know much about GenAI. But I'm relentlessly curious about what actually works in production - and what's just expensive theatre.

What I'm Figuring Out

Graph-Enhanced RAG When does graph structure actually beat pure vector similarity? I've seen it matter in fraud detection where entity relationships span multiple hops. Still not convinced it's worth the complexity for most use cases.

Hybrid Search Architecture Combining BM25 + semantic embeddings. The mathematics suggest certain tradeoffs, production data suggests others. Currently reconciling why keyword matching still outperforms transformers for specific query patterns.

Multi-Agent Orchestration Trying to make this reliable instead of just impressive in demos. State management, failure recovery, and cost control in production are harder than the tutorials suggest. LangGraph makes it look easy until you hit edge cases.

Evaluation Frameworks How do you measure RAG quality when ground truth is ambiguous? I have metrics I trust (RAGAS, custom semantic similarity scores, human eval pipelines). They're probably wrong. Still looking for the minimum viable framework that catches regressions without killing velocity.

Things I've Built That Worked

Multi-Hop Reasoning for Behavioural Pattern Detection Graph structures capture how entities behave across time and relationships. Pure vector similarity finds documents that look similar - graphs find patterns in how entities interact. Useful when you need to spot anomalies in networks (fraud, collusion, influence) rather than just matching text. Multi-hop reasoning reveals second and third-order connections that matter. Implemented this for financial crime detection using Memgraph, where relationship patterns mattered more than individual transactions.

MCP Integration for Structured Data If your data has schema, use it. Vector RAG on structured databases wastes computational resources on problems that don't need semantic search. MCP provides standardised tool communication - let the LLM call database queries, APIs, and business logic directly. Deterministic retrieval where you need it, probabilistic reasoning where it adds value. Built this for KYC automation where regulatory compliance required auditable, repeatable queries - not approximate embeddings.

Computer Vision + NLP Pipelines Document extraction beyond OCR. Combining visual understanding (layout, tables, signatures, handwriting) with language models for context. Dealing with multiple languages, poor scan quality, regulatory requirements for audit trails. The interesting bit is making probabilistic extraction meet compliance standards. Deployed across healthcare workflows and financial services document processing.

Synchronous + Asynchronous Multi-Agent Orchestration Why pay for token burn when you can parallelise? Synchronous agents for sequential reasoning, asynchronous for independent tasks. The challenge isn't making agents communicate - it's managing state, handling failures, and controlling costs when fan-out gets expensive. Designing for graceful degradation instead of perfect coordination. Applied this to conversational AI with Soul Machines where sub-200ms latency requirements forced architectural rethinking.

Hybrid RAG Architectures Combining keyword search (BM25) with semantic embeddings. The mathematics suggest certain tradeoffs, production data reveals others. Keyword matching still outperforms transformers for specific query patterns - understanding when matters more than assuming embeddings solve everything. Built evaluation frameworks (RAGAS + custom metrics) to measure this properly rather than relying on vibes.

Cloud-Native AI Infrastructure Multi-cloud deployments across Azure, AWS, GCP. Cost management as a first-class engineering concern - inference costs spiral without proper monitoring, rate limiting, caching strategies. Built production systems handling £100k-£1M monthly compute budgets where architectural choices directly impact P&L.

Early Agentic AI Implementation Deployed multi-agent orchestration systems in August 2024, before mainstream adoption. Built declarative and episodic memory architectures for persistent agent learning. Strategic indexing for agentic RAG enabling efficient knowledge retrieval in multi-agent environments. LangGraph-based async/sync systems with GraphRAG and tool handling at enterprise scale.

Research I'm Attempting

Expositional Engineering Writing up learnings from production GenAI deployments since 2022. Targeting Q1 2026 publication. Thesis: most organisations fail at AI because they optimise for demos instead of operations. We'll see if it holds water.

Model Context Protocol Early exploration of MCP for standardised AI tool communication. Anthropic's approach to solving the n×m integration problem. Testing real-world viability beyond the examples - particularly interested in how MCP skills and hooks change the fine-tuning calculus.

Current Stack

Not claiming expertise - this is what I'm actively working with:

LLMs: Claude, OpenAI, Gemini, DeepSeek, Qwen, Llama, Mistral, Grok, Kimi, Groq, NVIDIA, Cohere
Vector DBs: Faiss, Qdrant, Weaviate, ChromaDB, FalkorDB
Graph DBs: Memgraph, Neo4j
Frameworks: LangChain, LangGraph, LlamaIndex, Haystack
Infrastructure: Azure (primary), AWS
Evaluation: RAGAS, custom metrics, human-in-loop pipelines
Monitoring: LangSmith, custom observability tooling

Contact

London → Sydney (Thinking about relocating)

Open to conversations with anyone deploying real production GenAI systems. Especially interested in talking to people who've watched expensive implementations fail - that's where the learning happens.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly