intellistream
diff --git a/‎README.md‎
Lines changed: 15 additions & 244 deletions b/‎README.md‎
Lines changed: 15 additions & 244 deletions
diff --git a/‎bench/algorithms/.gitignore‎
Lines changed: 0 additions & 20 deletions b/‎bench/algorithms/.gitignore‎
Lines changed: 0 additions & 20 deletions
diff --git a/‎bench/algorithms/__init__.py‎
Lines changed: 52 additions & 48 deletions b/‎bench/algorithms/__init__.py‎
Lines changed: 52 additions & 48 deletions
@@ -1,250 +1,21 @@
 # SAGE-DB-Bench
 
-流式向量索引基准测试框架
+整理后的目录导航：
 
-## 1. 一键部署
+- 快速使用与完整说明：见 [docs/README.md](docs/README.md)
+- 部署与安装指南：见 [docs/INSTALL.md](docs/INSTALL.md)
+- 目录结构与测试约定：见 [docs/STRUCTURE.md](docs/STRUCTURE.md)
+- 算法发布与 CI 备注：见 [docs/ALGORITHM_DEPLOYMENT.md](docs/ALGORITHM_DEPLOYMENT.md) 和 [docs/CICD_FIXES.md](docs/CICD_FIXES.md)
+- 脚本：位于 [scripts/](scripts/)（如 deploy、install、activate、test_local）
+- 容器配置：位于 [docker/](docker/)（Dockerfile、docker-compose.yml）
+- 测试与工具配置：位于 [config/](config/)（pytest.ini、pre-commit、setup.cfg）；`tests/conftest.py` 自动生效
 
-```bash
-# 克隆仓库（包含 submodules）
-git clone --recursive https://github.com/intellistream/SAGE-DB-Bench.git
-cd SAGE-DB-Bench
+核心代码与数据：
 
-# 运行部署脚本
-./deploy.sh
+- 基准框架与算法接口：bench/
+- 数据集管理：datasets/
+- 第三方/子模块算法实现：algorithms_impl/
+- 实验配置：runbooks/
+- 结果导出与工具：compute_gt.py、run_benchmark.py、export_results.py
 
-# 激活虚拟环境
-source sage-db-bench/bin/activate
-```
-
-**部署选项：**
-```bash
-./deploy.sh --skip-system-deps  # 跳过系统依赖安装（已有依赖时使用）
-./deploy.sh --skip-build        # 跳过构建（仅设置环境）
-./deploy.sh --help              # 查看帮助
-```
-
-## 2. 数据集
-
-### 支持的数据集
-
-| 数据集 | 维度 | 数据量 | 说明 |
-|--------|------|--------|------|
-| sift | 128 | 1M | SIFT 特征向量 |
-| glove | 100 | 1.2M | GloVe 词向量 |
-| random-xs | 32 | 10K | 随机数据（测试用） |
-| random-s | 64 | 100K | 随机数据（小规模） |
-| random-m | 128 | 1M | 随机数据（中规模） |
-
-### 下载数据集
-
-```bash
-python prepare_dataset.py --dataset sift
-python prepare_dataset.py --dataset glove
-```
-
-### 添加新数据集
-
-在 `datasets/registry.py` 中添加：
-
-```python
-class MyDataset(Dataset):
-    def __init__(self):
-        self.nb = 100000      # 数据量
-        self.nq = 10000       # 查询数量
-        self.d = 128          # 向量维度
-        self.dtype = 'float32'
-        self.basedir = 'raw_data/mydataset'
-        
-    def prepare(self):
-        # 下载或生成数据
-        pass
-        
-    def get_data_in_range(self, start, end):
-        # 返回 [start, end) 范围的数据
-        pass
-        
-    def get_queries(self):
-        # 返回查询向量
-        pass
-        
-    def distance(self):
-        return 'euclidean'  # 或 'ip'
-
-# 注册数据集
-DATASETS['mydataset'] = lambda: MyDataset()
-```
-
-## 3. 算法
-
-### 支持的算法
-
-| 算法 | 类型 | 说明 |
-|------|------|------|
-| faiss_HNSW | 图索引 | Faiss HNSW 实现 |
-| faiss_HNSW_Optimized | 图索引 | 支持 Gorder 优化的 HNSW |
-| faiss_IVFPQ | 量化 | 倒排文件 + 乘积量化 |
-| diskann | 图索引 | DiskANN |
-| vsag_hnsw | 图索引 | VSAG HNSW |
-
-### 添加新算法
-
-1. 在 `bench/algorithms/` 下创建目录：
-
-```
-bench/algorithms/my_algo/
-├── __init__.py
-├── my_algo.py
-└── config.yaml
-```
-
-2. 实现算法接口 (`my_algo.py`)：
-
-```python
-from ..base import BaseStreamingANN
-
-class MyAlgorithm(BaseStreamingANN):
-    def __init__(self, metric, index_params):
-        self.metric = metric
-        self.name = "my_algo"
-        # 解析 index_params
-        
-    def setup(self, dtype, max_pts, ndim):
-        # 初始化索引
-        pass
-        
-    def insert(self, X, ids):
-        # 插入向量
-        pass
-        
-    def delete(self, ids):
-        # 删除向量
-        pass
-        
-    def query(self, X, k):
-        # 查询，返回 (ids, distances)
-        pass
-        
-    def set_query_arguments(self, query_args):
-        # 设置查询参数（如 ef）
-        pass
-```
-
-3. 创建配置文件 (`config.yaml`)：
-
-```yaml
-sift:
-  my_algo:
-    module: benchmark_anns.bench.algorithms.my_algo.my_algo
-    constructor: MyAlgorithm
-    base-args: ["@metric"]
-    run-groups:
-      base:
-        args: |
-          [{"param1": 32, "param2": 100}]
-        query-args: |
-          [{"ef": 40}]
-```
-
-4. 在 `__init__.py` 中导出：
-
-```python
-from .my_algo import MyAlgorithm
-__all__ = ['MyAlgorithm']
-```
-
-## 4. 测试流程
-
-### 4.1 计算 Ground Truth
-
-```bash
-python compute_gt.py \
-    --dataset sift \
-    --runbook_file runbooks/simple.yaml \
-    --gt_cmdline_tool ./DiskANN/build/apps/utils/compute_groundtruth
-```
-
-生成的真值文件保存在 `raw_data/{dataset}/{size}/{runbook}.yaml/` 目录。
-
-### 4.2 运行测试
-
-```bash
-# 基本用法
-python run_benchmark.py \
-    --algorithm faiss_HNSW_Optimized \
-    --dataset sift \
-    --runbook runbooks/simple.yaml
-
-# 启用 Cache Miss 测量
-python run_benchmark.py \
-    --algorithm faiss_HNSW_Optimized \
-    --dataset sift \
-    --runbook runbooks/simple.yaml \
-    --enable-cache-profiling
-```
-
-### 4.3 导出结果
-
-```bash
-python export_results.py \
-    --dataset sift \
-    --algorithm faiss_HNSW_Optimized \
-    --runbook simple
-```
-
-导出的结果包含：
-- **recall**: 每个批次的召回率
-- **query_qps**: 查询吞吐量
-- **query_latency_ms**: 查询延迟
-- **cache_misses**: Cache Miss 数量（如果启用）
-
-结果文件保存在 `results/{dataset}/{algorithm}/` 目录。
-
-## 5. Runbook 格式
-
-```yaml
-sift:
-  max_pts: 1000000
-  1:
-    operation: "startHPC"
-  2:
-    operation: "initial"
-    start: 0
-    end: 50000
-  3:
-    operation: "batch_insert"
-    start: 50000
-    end: 100000
-    batchSize: 2500
-    eventRate: 10000
-  4:
-    operation: "waitPending"
-  5:
-    operation: "search"
-  6:
-    operation: "endHPC"
-```
-
-**支持的操作：**
-- `startHPC` / `endHPC`: 启动/停止工作线程
-- `initial`: 初始数据加载
-- `batch_insert`: 批量插入（同时执行查询）
-- `batch_insert_delete`: 带删除的批量插入
-- `search`: 单独的搜索操作
-- `waitPending`: 等待待处理操作完成
-
-## 6. 目录结构
-
-```
-SAGE-DB-Bench/
-├── bench/                 # 测试框架核心
-│   └── algorithms/        # 算法实现
-├── datasets/              # 数据集管理
-├── algorithms_impl/       # C++ 算法库（Faiss, DiskANN 等）
-├── runbooks/              # 实验配置
-├── raw_data/              # 数据集文件
-├── results/               # 测试结果
-├── deploy.sh              # 一键部署脚本
-├── compute_gt.py          # 计算 Ground Truth
-├── run_benchmark.py       # 运行测试
-└── export_results.py      # 导出结果
-```
+运行 pytest 或 pre-commit 请参考 docs/STRUCTURE.md 中的最新路径与命令。
@@ -1,51 +1,55 @@
-"""
-Algorithms Module for Benchmark ANNS
+"""Compatibility shims: ANN implementations moved to sage.libs.annimplementations.
 
-This module provides algorithm interfaces and implementations.
-All algorithms are automatically discovered from subdirectories.
+This package now proxies imports to `sage.libs.annimplementations`.
 """
+from __future__ import annotations
+
+import importlib
+import sys
+from typing import Iterable
+
+_MIGRATED = [
+    "base",
+    "registry",
+    "candy_lshapg",
+    "candy_mnru",
+    "candy_sptag",
+    "cufe",
+    "diskann",
+    "faiss_HNSW",
+    "faiss_HNSW_Optimized",
+    "faiss_IVFPQ",
+    "faiss_NSW",
+    "faiss_fast_scan",
+    "faiss_lsh",
+    "faiss_onlinepq",
+    "faiss_pq",
+    "gti",
+    "ipdiskann",
+    "plsh",
+    "puck",
+    "pyanns",
+    "vsag_hnsw",
+]
+
+_BASE = "sage.libs.annimplementations"
+
+
+def _load(name: str):
+    module = importlib.import_module(f"{_BASE}.{name}")
+    sys.modules[f"{__name__}.{name}"] = module
+    return module
+
+
+for _name in _MIGRATED:
+    _load(_name)
+
+
+def __getattr__(name: str):  # pragma: no cover - compatibility path
+    if name in _MIGRATED:
+        return sys.modules[f"{__name__}.{name}"]
+    raise AttributeError(name)
+
 
-from .base import BaseANN, BaseStreamingANN, DummyStreamingANN
-from .registry import (
-    ALGORITHMS, 
-    register_algorithm, 
-    get_algorithm,
-    discover_algorithms,
-    auto_register_algorithms
-)
-
-# 尝试导入各种算法 wrapper（向后兼容 - 已弃用）
-try:
-    from .candy_wrapper import CANDYWrapper, get_candy_algorithm
-    __all_wrappers = ['CANDYWrapper', 'get_candy_algorithm']
-except ImportError:
-    __all_wrappers = []
-
-try:
-    from .faiss_wrapper import FaissWrapper
-    __all_wrappers.extend(['FaissWrapper'])
-except ImportError:
-    pass
-
-try:
-    from .diskann_wrapper import DiskANNWrapper
-    __all_wrappers.extend(['DiskANNWrapper'])
-except ImportError:
-    pass
-
-try:
-    from .puck_wrapper import PuckWrapper
-    __all_wrappers.extend(['PuckWrapper'])
-except ImportError:
-    pass
-
-__all__ = [
-    'BaseANN',
-    'BaseStreamingANN',
-    'DummyStreamingANN',
-    'ALGORITHMS',
-    'register_algorithm',
-    'get_algorithm',
-    'discover_algorithms',
-    'auto_register_algorithms',
-] + __all_wrappers
+def __dir__() -> Iterable[str]:  # pragma: no cover - introspection
+    return sorted(list(globals().keys()) + _MIGRATED)