Skip to content

Commit 3fb9afd

Browse files
committed
feat: complete migration to SCIP-based code indexing architecture
**Major Architecture Overhaul:** - Migrate from language-specific analyzers to unified SCIP protobuf system - Implement industry-standard SCIP (Source Code Intelligence Protocol) format - Replace 8 separate language analyzers with single SCIP-based approach **Core SCIP Infrastructure:** - Add comprehensive SCIP core components (symbol_manager, position_calculator, reference_resolver, moniker_manager) - Implement two-phase SCIP analysis: symbol collection → reference resolution - Create unified index management with index_provider and unified_index_manager - Add SCIP-compliant symbol ID generation with cross-file reference support **Enhanced Strategy System:** - Refactor all language strategies (Python, Java, JavaScript, Objective-C) to SCIP compliance - Add external symbol extraction and dependency tracking capabilities - Implement proper SCIP symbol roles and syntax kind classification - Add robust encoding detection and error handling **Dependency Updates:** - Replace tree-sitter-languages with pathspec for improved file filtering - Add comprehensive dependency list (watchdog, protobuf, tree-sitter components) - Ensure consistent cross-platform dependency resolution **Technical Improvements:** - Add SCIPSymbolAnalyzer to replace legacy query tools - Implement detailed logging throughout SCIP indexing pipeline - Add symbol analysis tools and inspection scripts - Update sample projects for multilingual testing This establishes a robust, industry-standard foundation for accurate code intelligence with proper SCIP compliance and improved performance.
1 parent 49306fc commit 3fb9afd

37 files changed

+6211
-2125
lines changed

pyproject.toml

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -21,7 +21,7 @@ dependencies = [
2121
"tree-sitter-typescript>=0.20.0",
2222
"tree-sitter-java>=0.20.0",
2323
"tree-sitter-c>=0.20.0",
24-
"tree-sitter-languages>=1.10.2",
24+
"pathspec>=0.12.1",
2525
]
2626

2727
[project.urls]

requirements.txt

Lines changed: 8 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1 +1,9 @@
11
mcp>=0.3.0
2+
watchdog>=3.0.0
3+
protobuf>=4.21.0
4+
tree-sitter>=0.20.0
5+
tree-sitter-javascript>=0.20.0
6+
tree-sitter-typescript>=0.20.0
7+
tree-sitter-java>=0.20.0
8+
tree-sitter-c>=0.20.0
9+
pathspec>=0.12.1
Lines changed: 150 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,150 @@
1+
"""
2+
索引提供者接口定义
3+
4+
定义所有索引访问的标准接口,确保不同实现的一致性。
5+
"""
6+
7+
from typing import List, Optional, Dict, Any, Protocol
8+
from dataclasses import dataclass
9+
10+
11+
@dataclass
12+
class SymbolInfo:
13+
"""符号信息标准数据结构"""
14+
name: str
15+
kind: str # 'class', 'function', 'method', 'variable', etc.
16+
location: Dict[str, int] # {'line': int, 'column': int}
17+
scope: str
18+
documentation: List[str]
19+
20+
21+
# Define FileInfo here to avoid circular imports
22+
@dataclass
23+
class FileInfo:
24+
"""文件信息标准数据结构"""
25+
relative_path: str
26+
language: str
27+
absolute_path: str
28+
29+
def __hash__(self):
30+
return hash(self.relative_path)
31+
32+
def __eq__(self, other):
33+
if isinstance(other, FileInfo):
34+
return self.relative_path == other.relative_path
35+
return False
36+
37+
38+
@dataclass
39+
class IndexMetadata:
40+
"""索引元数据标准结构"""
41+
version: str
42+
format_type: str
43+
created_at: float
44+
last_updated: float
45+
file_count: int
46+
project_root: str
47+
tool_version: str
48+
49+
50+
class IIndexProvider(Protocol):
51+
"""
52+
索引提供者标准接口
53+
54+
所有索引实现都必须遵循这个接口,确保一致的访问方式。
55+
"""
56+
57+
def get_file_list(self) -> List[FileInfo]:
58+
"""
59+
获取所有索引文件列表
60+
61+
Returns:
62+
文件信息列表
63+
"""
64+
...
65+
66+
def get_file_info(self, file_path: str) -> Optional[FileInfo]:
67+
"""
68+
获取特定文件信息
69+
70+
Args:
71+
file_path: 文件相对路径
72+
73+
Returns:
74+
文件信息,如果文件不在索引中则返回None
75+
"""
76+
...
77+
78+
def query_symbols(self, file_path: str) -> List[SymbolInfo]:
79+
"""
80+
查询文件中的符号信息
81+
82+
Args:
83+
file_path: 文件相对路径
84+
85+
Returns:
86+
符号信息列表
87+
"""
88+
...
89+
90+
def search_files(self, pattern: str) -> List[FileInfo]:
91+
"""
92+
按模式搜索文件
93+
94+
Args:
95+
pattern: glob模式或正则表达式
96+
97+
Returns:
98+
匹配的文件列表
99+
"""
100+
...
101+
102+
def get_metadata(self) -> IndexMetadata:
103+
"""
104+
获取索引元数据
105+
106+
Returns:
107+
索引元数据信息
108+
"""
109+
...
110+
111+
def is_available(self) -> bool:
112+
"""
113+
检查索引是否可用
114+
115+
Returns:
116+
True if index is available and functional
117+
"""
118+
...
119+
120+
121+
class IIndexManager(Protocol):
122+
"""
123+
索引管理器接口
124+
125+
定义索引生命周期管理的标准接口。
126+
"""
127+
128+
def initialize(self) -> bool:
129+
"""初始化索引管理器"""
130+
...
131+
132+
def get_provider(self) -> Optional[IIndexProvider]:
133+
"""获取当前活跃的索引提供者"""
134+
...
135+
136+
def refresh_index(self, force: bool = False) -> bool:
137+
"""刷新索引"""
138+
...
139+
140+
def save_index(self) -> bool:
141+
"""保存索引状态"""
142+
...
143+
144+
def clear_index(self) -> None:
145+
"""清理索引状态"""
146+
...
147+
148+
def get_index_status(self) -> Dict[str, Any]:
149+
"""获取索引状态信息"""
150+
...

0 commit comments

Comments
 (0)