Skip to content

MostafaAI10/TeXt-Embedding-Model-

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

4 Commits
Β 
Β 
Β 
Β 

Repository files navigation

πŸ“„ Semantic Search Engine (FastAPI + Embeddings)

A scalable Semantic Search Engine built with FastAPI that allows users to upload PDF documents, automatically extract and embed their contents, and perform semantic + metadata-aware search across stored documents.

The system follows a clean CSR (Controller–Service–Repository) architecture, supports tag-based filtering, and is designed to be extensible for multilingual embeddings.


Key Features

  • PDF Upload

    • Upload PDF files via API
    • Automatic text extraction per page
    • Intelligent chunking for semantic indexing
  • Semantic Search

    • Vector-based similarity search using embeddings
    • Natural language queries (not keyword-only)
  • Tag Support

    • Assign multiple tags to PDFs (e.g. AI, ML, transformers)
    • Filter search results by tag
  • Multi-Language Ready

    • Supports multilingual embedding models
    • Language stored as metadata per document
  • Clean Architecture (CSR)

    • Controller layer (FastAPI routes)
    • Service layer (business logic)
    • Repository layer (data + vector DB)
    • Client layer (embedding models)

πŸ“ Project Structure


text_embedding_system/
β”œβ”€β”€ app
β”‚   β”œβ”€β”€ main.py
β”‚   β”œβ”€β”€ config.py
β”‚   β”œβ”€β”€ models.py
β”‚   β”œβ”€β”€ controllers
β”‚   β”‚   β”œβ”€β”€ entries.py
β”‚   β”‚   └── search.py
β”‚   β”œβ”€β”€ services
β”‚   β”‚   β”œβ”€β”€ entry_service.py
β”‚   β”‚   └── search_service.py
β”‚   β”œβ”€β”€ repository
β”‚   β”‚   └── dataset_repo.py
β”‚   └── clients
β”‚       β”œβ”€β”€ embedder_client.py
β”‚       └── faiss_client.py
└── requirements.txt


🧩 Tech Stack

  • Backend: FastAPI
  • Language: Python 3.10+
  • PDF Parsing: pypdf
  • Vector Database: ChromaDB
  • Embeddings: Sentence Transformers
  • Validation: Pydantic
  • Architecture: CSR Pattern

Author

Mostafa Abdelhamed

About

Semantic Search Engine built with FastAPI and vector embeddings. Upload PDFs, extract and embed content, and perform semantic search with tag-based filtering using a clean CSR architecture.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages