Skip to content

iscc/iscc-search

Repository files navigation

iscc-search

Release Tests codecov Commit activity License

Warning

BETA - This project is under active development. The API is not yet stable and may change without notice. Use at your own risk.

High-performance similarity search engine for ISCC (International Standard Content Code). Ships as a Python package, a CLI, and a FastAPI REST server, with pluggable backends for in-memory, LMDB, and HNSW-accelerated indexes.

Note: iscc-usearch is a separate project - a patched fork of the usearch vector search library that provides the NPHD metric and low-level vector indexes. iscc-search uses it internally as one of its backends. Most users only need to install iscc-search.

Features

  • REST API server (FastAPI) for indexing and searching ISCC assets
  • CLI (iscc-search) for managing multiple local or remote indexes and ingesting assets
  • Protocol-based backend abstraction with three implementations:
    • memory:// — in-memory, no persistence (tests and demos)
    • lmdb:///path — LMDB-backed persistent storage with bidirectional prefix search
    • usearch:///path — HNSW + LMDB for high-performance approximate nearest neighbor search
  • Variable-length ISCC-UNIT indexing using the NPHD metric (via iscc-usearch)
  • Granular ISCC-SIMPRINT search for fine-grained content matching
  • Cross-platform (Linux, macOS, Windows)
  • Python 3.10–3.13

What is ISCC?

The International Standard Content Code (ISCC) is a similarity-preserving content identifier for digital media. ISCC codes are variable-length binary vectors that enable efficient similarity search across different media types. This project provides the indexing and search engine for those codes.

Installation

pip install iscc-search

For development:

git clone https://github.com/iscc/iscc-search.git
cd iscc-search
uv sync

Quick Start

Run the server

# Start the REST API server (development mode with auto-reload)
iscc-search serve --dev

# Or production mode
iscc-search serve --host 0.0.0.0 --port 8000

Interactive API docs are available at http://localhost:8000/docs.

Use the CLI

# Register an index configuration (local or remote)
iscc-search index add my-index --uri usearch:///path/to/data
iscc-search index use my-index

# Add assets, search, retrieve
iscc-search add asset.json
iscc-search search asset.json
iscc-search get ISCC:KACYPXW557...

Configure the server

The server reads its configuration from environment variables prefixed with ISCC_SEARCH_ (or a .env file):

Variable Default Description
ISCC_SEARCH_INDEX_URI usearch:///... Backend URI (memory://, lmdb:///path, usearch:///path)
ISCC_SEARCH_HOST 0.0.0.0 Server bind host
ISCC_SEARCH_PORT 8000 Server bind port
ISCC_SEARCH_API_SECRET (unset) Optional API key; when unset the API is public
ISCC_SEARCH_CORS_ORIGINS * Comma-separated CORS origins
ISCC_SEARCH_LOG_LEVEL info Loguru log level

Additional knobs control HNSW parameters, shard sizes, match thresholds, and scoring — see iscc_search/options.py or the deployment guide for the full list.

Architecture

iscc-search uses a protocol-based design so the CLI, REST API, and library users all talk to the same IsccIndexProtocol interface regardless of backend:

  CLI / REST API / Remote client
              │
              ▼
     IsccIndexProtocol
              │
    ┌─────────┼─────────┐
    ▼         ▼         ▼
  memory    lmdb      usearch
            (LMDB)    (HNSW + LMDB)

See docs/architecture.md for the full picture.

Development

This project uses uv for package management and poethepoet for task automation.

Prerequisites

  • Python 3.10 or higher
  • uv package manager

Common tasks

uv run poe build            # Rebuild schema.py + openapi.json and validate
uv run poe format           # Format code and markdown
uv run poe test             # Run tests with coverage (must stay at 100%)
uv run poe check-complexity # Radon complexity report
uv run poe precommit        # Run pre-commit hooks
uv run poe all              # Build, format, test, and complexity

Running tests

# Run full test suite in parallel with coverage
uv run poe test

# Run a single test
uv run pytest tests/test_indexes_usearch_index.py::test_foo

Technical Notes

NPHD Metric

The Normalized Prefix Hamming Distance (NPHD) is a valid metric specifically designed for variable-length prefix-compatible codes like ISCC. Unlike standard Hamming distance, NPHD:

  • Correctly handles variable-length comparisons
  • Normalizes over the common prefix length
  • Satisfies all metric axioms (non-negativity, identity, symmetry, triangle inequality)

The implementation lives in the external iscc-usearch package, which iscc-search depends on for its HNSW backend.

Storage

  • LMDB is used for durable key-value storage: ISCC entries, metadata, and the inverted prefix-search index.
  • usearch (HNSW) is used for approximate nearest-neighbor search over ISCC-UNITs and ISCC-SIMPRINTS.
  • Multi-worker deployments are not supported with the usearch backend — see docs/deployment.md for details.

License

Apache License 2.0 - see LICENSE file for details.

Contributing

Contributions are welcome! Please ensure:

  • All tests pass (uv run poe test)
  • Code is formatted (uv run poe format)
  • Coverage remains at 100%
  • Changes are documented

See CONTRIBUTING.md for details.


Repository initiated with fpgmaas/cookiecutter-uv.

About

High-performance ISCC similarity search engine

Resources

License

Contributing

Stars

Watchers

Forks

Packages

 
 
 

Contributors