Skip to content

Latest commit

 

History

History
229 lines (184 loc) · 12.1 KB

File metadata and controls

229 lines (184 loc) · 12.1 KB

01 — Technology Choices and Architecture | 技术选型与架构设计

Status [x] Updated (v4.0 aligned) | [ ] In Review | [ ] Approved
Version 1.0
Related PRD Section 5 System Architecture, Section 8 Tech Stack

1. Technology Choices | 技术选型

1.1 Language and Runtime | 语言与运行时

Item Choice Version Notes
Language Python 3.10+
Package Mgr pip requirements.txt
Runtime CPython / Docker See 05-deployment-runbook.md

1.2 Web and API | Web 与 API

Item Choice Version Notes
Framework FastAPI >=0.109 Async, auto OpenAPI
Server Uvicorn >=0.27 ASGI server
Docs OpenAPI 3.x Generated by FastAPI; see 02-api-specification.yaml

1.3 Agent Orchestration | Agent 编排

Item Choice Version Notes
Workflow Engine LangGraph Latest Stateful graph-based agent orchestration; StateGraph with conditional routing, parallel execution, checkpointing
LLM Framework LangChain Latest Unified LLM abstraction, prompt templates, tool integration, RAG chains
State Management LangGraph Checkpointing Cross-phase state persistence; MemorySaver (MVP) or DB-backed (production)

1.4 LLM Providers | LLM 提供商

Item Choice Version Notes
Cloud LLM OpenAI (ChatGPT) Via LangChain ChatOpenAI; compatible with Azure OpenAI, Claude, Qwen
Local LLM Ollama Via LangChain ChatOllama; data stays on-prem
LLM Client Cached @lru_cache — one client per process lifetime

1.5 Vector Store and RAG | 向量库与 RAG

Item Choice Version Notes
Vector DB Chroma >=0.4 Embedded, persisted to CHROMA_PERSIST_DIR; phase-specific collections
Embeddings HuggingFace sentence-transformers/all-MiniLM-L6-v2
Chunking RecursiveCharacter 1024 chars, 128 overlap (configurable)
Graph RAG LightRAG Entity-relationship aware retrieval; ENABLE_GRAPH_RAG

1.6 Document Parsing | 文档解析

Format Library Version Notes
All (primary) Docling Latest Table/heading preserving; OCR capable; PARSER_ENGINE=auto
PDF PyMuPDF (fitz) >=1.23 Fallback when Docling unavailable
Word python-docx >=1.1
Excel openpyxl >=3.1
PPT python-pptx >=0.6
SAST/DAST Custom parsers SARIF, SonarQube JSON, Checkmarx XML, Burp XML, ZAP
Text/MD Built-in .txt, .md
Router Custom Dispatches by extension in parse_file()

1.7 Identity and Integrations | 身份与集成

Item Choice Notes
Auth OAuth2/OIDC (AAD) Placeholder in app/integrations; see docs/04-integration-guide.md
Metadata ServiceNow Placeholder in app/integrations; see docs/04-integration-guide.md
SAST/DAST Tool connectors SonarQube, Checkmarx, Burp Suite, OWASP ZAP; see docs/04-integration-guide.md
Config pydantic-settings app/core/config.py reads .env

1.8 Storage and Cache | 存储与缓存

Item Choice Notes
Task State LangGraph Checkpoints Persistent state across phases; MemorySaver for MVP, DB-backed for production
Vector Store Local disk Persisted to CHROMA_PERSIST_DIR; separate collections per SSDLC phase
Files Transient Stream processing; parsed content goes to KB/Agent
Checkpoints Local disk / DB LANGGRAPH_CHECKPOINT_DIR for MVP; PostgreSQL for production

2. Architecture and Data Flow | 整体架构与数据流

2.1 Logical Architecture | 逻辑架构

Aligned with PRD Section 5.1.

[ Access Layer ]    API (FastAPI) / MCP Server (stdio) / CLI
       |
[ SSDLC Orchestration ]  LangGraph StateGraph
       |                  ├── Phase Router (conditional edges)
       |                  ├── SSDLC Pipeline (6-stage router)
       |                  ├── Requirements Agent
       |                  ├── Design Agent
       |                  ├── Development Agent
       |                  ├── Testing Agent
       |                  ├── Deployment Agent
       |                  ├── Operations Agent
       |                  └── Reviewer Agent
       |
[ Core Services ]    Knowledge Base (Vector + Graph RAG) | Parser (Docling / legacy) | Memory | Skills (persona + SSDLC stage skills)
       |
[ LLM Layer ]        LangChain Abstraction
       |              ├── OpenAI / Claude / Qwen (Cloud)
       |              └── Ollama / vLLM (Local)
       |
[ Integrations ]     AAD (Auth) | ServiceNow (Metadata) | SAST/DAST Tools

2.2 Components and Interfaces | 组件职责与接口

Component Responsibility Interface
API Layer Auth, routing, rate limiting, validation. REST, see 02-api-specification.yaml
LangGraph Orchestrator SSDLC workflow, phase routing, state management, checkpointing. Internal Python API; StateGraph definition
Phase Agents Phase-specific assessment logic via LangChain tools. LangGraph node functions; shared SSDLCState
Memory Cross-phase context, LangGraph checkpoints. LangGraph state + checkpointer
Skills Phase-specific assessment capabilities. I/O Contract, see 03-assessment-report...md
Knowledge Base Ingest (Parse->Chunk->Embed) and Retrieve per phase. upload(), query(text, collection)
Parser File to unified JSON/Markdown (including SAST/DAST reports). parse(file_stream) -> Schema 03
LLM Layer Unified chat/completion API via LangChain. invoke(prompt, context) / LangChain Tools

2.3 KB Chunking Strategy | 知识库切块策略

Parameter Value (Default) Description
Chunk Size 1024 Characters or tokens per chunk.
Overlap 128 Overlap to maintain context at boundaries.
Splitter Recursive Splits by paragraphs, then sentences.
Metadata Yes Filename, page number, section headers, SSDLC phase tag.
Collections Per phase kb_requirements, kb_design, kb_development, kb_testing, kb_deployment, kb_operations

3. Module Layout | 目录结构

Target implementation structure:

DocSentinel/
├── app/
│   ├── api/                # FastAPI routes: health, assessments, kb, skills
│   ├── core/               # Configuration (pydantic-settings), guardrails
│   ├── agent/              # LangGraph orchestrator and phase agents
│   │   ├── orchestrator.py      # LangGraph StateGraph definition
│   │   ├── state.py             # SSDLCState TypedDict
│   │   ├── router.py            # Phase routing logic
│   │   ├── ssdlc/              # SSDLC pipeline: router, stage skills, checklists
│   │   ├── agents/              # Phase agent implementations
│   │   │   ├── requirements.py
│   │   │   ├── design.py
│   │   │   ├── development.py
│   │   │   ├── testing.py
│   │   │   ├── deployment.py
│   │   │   └── operations.py
│   │   ├── reviewer.py          # Cross-phase review agent
│   │   ├── skills_registry.py   # Built-in skills per SSDLC phase
│   │   └── skills_service.py    # Skill CRUD and management
│   ├── kb/                 # KnowledgeBaseService (Chroma + chunking + phase collections)
│   │   └── graph_rag.py    # LightRAG integration
│   ├── llm/                # LangChain LLM factory and invocation
│   ├── parser/             # Parsers: Docling + legacy + SAST/DAST report parsers
│   ├── integrations/       # AAD, ServiceNow, SAST/DAST tool connectors
│   ├── models/             # Pydantic models for API and internal data
│   ├── main.py             # App entry point
│   └── mcp_server.py       # MCP Server
├── docs/                   # Design docs and schemas
├── tests/                  # Pytest suite
├── requirements.txt        # Production dependencies
└── .env.example            # Environment template

4. Key Dependencies | 关键依赖

Maintained in requirements.txt. Key architectural dependencies:

# Web & API
fastapi>=0.109.0
uvicorn[standard]>=0.27.0

# Agent Orchestration
langgraph>=0.2.0
langchain>=0.2.0
langchain-community
langchain-openai
langgraph              # Graph-based agent orchestration

# Vector Store & Graph RAG
chromadb>=0.4.22
lightrag-hku          # Graph RAG (entity-relationship retrieval)

# Parsing
docling>=2.0.0        # Primary parser (table/heading/OCR)
pymupdf>=1.23         # PDF fallback
python-docx>=1.1      # Word fallback
openpyxl>=3.1         # Excel fallback
python-pptx>=0.6      # PPT fallback

# Embeddings
sentence-transformers

# MCP
mcp[cli]              # Model Context Protocol server

# Utils
httpx
pydantic-settings>=2.1
python-multipart

5. Changelog | 修订记录

Version Date Changes
1.0 2026-03 Major rewrite: LangGraph orchestration, SSDLC phase agents, phase-specific KB collections, SAST/DAST parsers, SSDLC stage skills.
0.4 2026-03 Added Graph RAG, Docling parser, MCP Server, singleton KB, async assessment.
0.2 2025-03 Updated tech stack versions and module layout.
0.1 Initial Draft selection.