Skip to content

Commit 1fd84ae

Browse files
authored
Merge pull request #8 from longbridge/canary
feat: 补充知识库的文档
2 parents 4784eed + 5f8e872 commit 1fd84ae

18 files changed

Lines changed: 575 additions & 63 deletions

File tree

docs/.vitepress/locales/en/index.ts

Lines changed: 10 additions & 11 deletions
Original file line numberDiff line numberDiff line change
@@ -96,17 +96,16 @@ export const enConfig: LocaleSpecificConfig<DefaultTheme.Config> = {
9696
collapsed: true,
9797
items: [{ text: 'Agent Run', link: '/en/ai/docs/api/agent-run' }],
9898
},
99-
// {
100-
// text: 'Knowledge Base',
101-
// collapsed: true,
102-
// items: [
103-
// { text: 'Introduction', link: '/en/ai/docs/knowledge-base/introduction' },
104-
// { text: 'Create Knowledge Base', link: '/en/ai/docs/knowledge-base/create' },
105-
// { text: 'Manage Knowledge Base', link: '/en/ai/docs/knowledge-base/manage' },
106-
// { text: 'Metadata', link: '/en/ai/docs/knowledge-base/metadata' },
107-
// { text: 'Recall Test', link: '/en/ai/docs/knowledge-base/recall-test' },
108-
// ],
109-
// },
99+
{
100+
text: 'Knowledge Base',
101+
collapsed: true,
102+
items: [
103+
{ text: 'Introduction', link: '/en/ai/docs/knowledge-base/introduction' },
104+
{ text: 'Create Knowledge Base', link: '/en/ai/docs/knowledge-base/create' },
105+
{ text: 'Manage Knowledge Base', link: '/en/ai/docs/knowledge-base/manage' },
106+
{ text: 'Recall Test', link: '/en/ai/docs/knowledge-base/recall-test' },
107+
],
108+
},
110109
{
111110
text: 'Workspace',
112111
collapsed: true,

docs/.vitepress/locales/zh-CN/index.ts

Lines changed: 10 additions & 11 deletions
Original file line numberDiff line numberDiff line change
@@ -96,17 +96,16 @@ export const zhCNConfig: LocaleSpecificConfig<DefaultTheme.Config> = {
9696
collapsed: true,
9797
items: [{ text: 'Agent 运行', link: '/zh-CN/ai/docs/api/agent-run' }],
9898
},
99-
// {
100-
// text: '知识库',
101-
// collapsed: true,
102-
// items: [
103-
// { text: '功能简介', link: '/zh-CN/ai/docs/knowledge-base/introduction' },
104-
// { text: '创建知识库', link: '/zh-CN/ai/docs/knowledge-base/create' },
105-
// { text: '管理知识库', link: '/zh-CN/ai/docs/knowledge-base/manage' },
106-
// { text: '元数据', link: '/zh-CN/ai/docs/knowledge-base/metadata' },
107-
// { text: '召回测试', link: '/zh-CN/ai/docs/knowledge-base/recall-test' },
108-
// ],
109-
// },
99+
{
100+
text: '知识库',
101+
collapsed: true,
102+
items: [
103+
{ text: '功能简介', link: '/zh-CN/ai/docs/knowledge-base/introduction' },
104+
{ text: '创建知识库', link: '/zh-CN/ai/docs/knowledge-base/create' },
105+
{ text: '管理知识库', link: '/zh-CN/ai/docs/knowledge-base/manage' },
106+
{ text: '召回测试', link: '/zh-CN/ai/docs/knowledge-base/recall-test' },
107+
],
108+
},
110109
{
111110
text: '工作区',
112111
collapsed: true,

docs/.vitepress/locales/zh-HK/index.ts

Lines changed: 10 additions & 11 deletions
Original file line numberDiff line numberDiff line change
@@ -98,17 +98,16 @@ export const zhHKConfig: LocaleSpecificConfig<DefaultTheme.Config> = {
9898
collapsed: true,
9999
items: [{ text: 'Agent 運行', link: '/zh-HK/ai/docs/api/agent-run' }],
100100
},
101-
// {
102-
// text: '知識庫',
103-
// collapsed: true,
104-
// items: [
105-
// { text: '功能簡介', link: '/zh-HK/ai/docs/knowledge-base/introduction' },
106-
// { text: '創建知識庫', link: '/zh-HK/ai/docs/knowledge-base/create' },
107-
// { text: '管理知識庫', link: '/zh-HK/ai/docs/knowledge-base/manage' },
108-
// { text: '元數據', link: '/zh-HK/ai/docs/knowledge-base/metadata' },
109-
// { text: '召回測試', link: '/zh-HK/ai/docs/knowledge-base/recall-test' },
110-
// ],
111-
// },
101+
{
102+
text: '知識庫',
103+
collapsed: true,
104+
items: [
105+
{ text: '功能簡介', link: '/zh-HK/ai/docs/knowledge-base/introduction' },
106+
{ text: '創建知識庫', link: '/zh-HK/ai/docs/knowledge-base/create' },
107+
{ text: '管理知識庫', link: '/zh-HK/ai/docs/knowledge-base/manage' },
108+
{ text: '召回測試', link: '/zh-HK/ai/docs/knowledge-base/recall-test' },
109+
],
110+
},
112111
{
113112
text: '工作區',
114113
collapsed: true,
Lines changed: 77 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -1,3 +1,78 @@
1-
# Create Knowledge Base
1+
# Create a Knowledge Base
22

3-
This section describes how to create and configure a knowledge base.
3+
This page describes how to create and configure a knowledge base and the main options available.
4+
5+
## Define the Knowledge Base and Retrieval Method
6+
7+
Go to **My Knowledge Base****Create New Knowledge Base**.
8+
9+
In this step you define:
10+
11+
- **Name** and **Description**: To identify and describe the knowledge base.
12+
- **Retrieval settings**: How the system finds and extracts relevant content from your documents when it receives a user query, so the LLM can use it.
13+
14+
Three retrieval strategies are supported. See the table below and the sections that follow.
15+
16+
| Strategy | Summary |
17+
|----------|---------|
18+
| **Vector retrieval** | Turns the question into a vector and compares it to document vectors to return the most similar chunks. |
19+
| **Full-text retrieval** | Builds a full-text index over documents and returns chunks that match the user’s keywords. |
20+
| **Hybrid retrieval** | Runs both full-text and vector retrieval, then merges and reranks the results. |
21+
22+
### Vector retrieval
23+
24+
**What it does**: Converts the user’s question into a query vector, compares it to document vectors by similarity, and returns the closest chunks.
25+
26+
**Options**:
27+
28+
| Option | Description |
29+
|--------|-------------|
30+
| **Rerank model** | Off by default. When on, reranks vector-retrieval results to improve accuracy of the chunks sent to the LLM. |
31+
| **TopK** | Number of most similar chunks to return. The system may adjust this based on the model’s context window. Default is 3; higher values return more chunks. |
32+
| **Score threshold** | Minimum similarity score; only chunks above this value are returned. Off by default; higher values return fewer chunks. |
33+
34+
### Full-text retrieval
35+
36+
**What it does**: Builds a full-text index so users can query by any word and get back chunks that contain those words.
37+
38+
**Options**:
39+
40+
| Option | Description |
41+
|--------|-------------|
42+
| **Rerank model** | Off by default. When on, reranks full-text results to improve chunk quality. |
43+
| **TopK** | Number of chunks to return. The system may adjust this based on context. Default is 3. |
44+
| **Score threshold** | Only chunks with scores above this value are returned. Off by default; higher values return fewer chunks. |
45+
46+
### Hybrid retrieval
47+
48+
**What it does**: Runs both full-text and vector retrieval, then merges and reranks the results into a single set of chunks.
49+
50+
**Options**:
51+
52+
| Option | Description |
53+
|--------|-------------|
54+
| **Weight** | Balance between semantic (vector) and keyword (full-text) retrieval. Weight 1 for semantic = vector-only, which helps with paraphrasing and cross-language matching. Weight 1 for keyword = full-text only, which suits exact terms and lower compute. You can also set a custom mix for your use case. |
55+
| **Rerank model** | Off by default. When on, reranks hybrid results. |
56+
| **TopK** | Number of chunks to return. Default is 3. |
57+
| **Score threshold** | Only chunks above this score are returned. Off by default. |
58+
59+
After name, description, and retrieval settings, you can either **Create empty knowledge base** to finish, or click **Create** to go to the file upload step.
60+
61+
## Upload Files
62+
63+
- **Count and size**: Up to 5 files per upload; each file up to 10 MB.
64+
- **Formats**: DOCX, PPTX, HTML, PDF, MD, CSV, XLSX, VTT, JPG, PNG, TXT.
65+
- **Source**: Only local upload is supported. Web or other external sources are not supported.
66+
67+
## Text and Chunking Settings
68+
69+
This step preprocesses your content and splits it into **chunks**, which are the units used for retrieval. Chunk quality directly affects recall and answer quality.
70+
71+
- **Chunk length**: For all formats except TXT, you can set the approximate length (in characters) per chunk; the system will split as close to that as possible.
72+
- **TXT-only options**:
73+
- **Chunk delimiter**: Characters (or string) that trigger a new chunk. Default is `\n\n` (paragraph breaks).
74+
- **Chunk overlap**: Number of overlapping characters between adjacent chunks. Some overlap helps keep context and can improve recall; a typical value is 10–25% of the chunk length in tokens.
75+
76+
## Finish Creation
77+
78+
Confirm your upload and chunking settings, then wait for processing to complete. Once done, the knowledge base can be used in your apps for retrieval.
Lines changed: 37 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -1,3 +1,38 @@
1-
# Function Introduction
1+
# Knowledge Base Overview
22

3-
The Knowledge Base is one of the core features of the PortAI platform, used for storing and managing structured knowledge to provide contextual information for AI applications.
3+
On the Infra platform, you can use your own data as "knowledge" inside AI applications. A knowledge base provides domain context to large language models (LLMs), making responses more accurate, relevant, and less prone to hallucination. This is powered by **Retrieval-Augmented Generation (RAG)**.
4+
5+
## How RAG Works
6+
7+
The LLM is not limited to its pretrained data; it also uses your custom knowledge as an additional source of facts:
8+
9+
| Stage | Description |
10+
|-------|-------------|
11+
| **Retrieve** | The system retrieves the most relevant information from your knowledge base for the user's question. |
12+
| **Augment** | Retrieved content and the user's question are sent together to the LLM as augmented context. |
13+
| **Generate** | The LLM produces a more accurate answer using that context. |
14+
15+
Knowledge is stored as files you upload into a knowledge base. You can create multiple knowledge bases by domain, use case, or data source, and connect them to your apps as needed.
16+
17+
## Use Cases
18+
19+
Knowledge bases work well for AI applications that rely on your own data and domain expertise. Common examples include:
20+
21+
- **Customer support**: Answer customer questions using up-to-date product docs, FAQs, and troubleshooting guides.
22+
- **Investment style analysis**: Use holdings and historical data of specific investors to generate style reports and support style-based investment decisions.
23+
- **Financial report analysis**: Upload annual reports and peer financial data to build a financial knowledge base for report analysis and comparison.
24+
25+
## Creating a Knowledge Base
26+
27+
- **Quick setup**: Upload files, set processing rules, then follow the wizard and click **Continue** to finish.
28+
- **Empty knowledge base**: You can create a knowledge base with only a name and add documents later.
29+
30+
## Managing and Tuning Your Knowledge Base
31+
32+
- **Documents and chunks**: View, add, edit, and delete documents and chunks to keep content up to date and consistent.
33+
- **Recall testing**: Run test queries to check how well the knowledge base recalls relevant content.
34+
- **Settings**: Adjust indexing, embedding model, retrieval strategy, and other parameters as needed.
35+
36+
## Using a Knowledge Base in Your App
37+
38+
Connect a knowledge base to your AI app via the **Knowledge Retrieval** node in the application flow. That node supplies context from the knowledge base to your conversations or tasks.
Lines changed: 52 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -1,3 +1,53 @@
1-
# Manage Knowledge Base
1+
# Manage a Knowledge Base
22

3-
This section describes how to manage and maintain the contents of the knowledge base.
3+
This page describes how to manage documents, chunks, and knowledge base settings.
4+
5+
## Manage Documents
6+
7+
Each uploaded file appears in the knowledge base as a **document**. In the document list you can view and manage all documents, keep or remove them as needed, and use upload time to judge freshness.
8+
9+
The document list supports these actions:
10+
11+
| Action | Description |
12+
|--------|-------------|
13+
| **Add files** | Click **Add files** in the document list to start upload. You choose files, set chunking, and see processing progress. Chunk settings can only be changed at upload or re-upload time; editing chunk settings for a single document in the list is not supported yet. |
14+
| **Delete document** | Permanently deletes the document. This cannot be undone. |
15+
| **Enable / Disable** | Temporarily includes or excludes the document from retrieval. |
16+
| **Edit** | Opens the document and lets you edit chunk content. See **Manage chunks** below. |
17+
| **Rename** | Changes the document’s display name. |
18+
19+
## Manage Chunks
20+
21+
A document is split into one or more **chunks** based on its chunking settings. Chunks are the units used for retrieval. In a document’s chunk list you can view and manage chunks to improve retrieval quality and accuracy.
22+
23+
The chunk list supports these actions:
24+
25+
| Action | Description |
26+
|--------|-------------|
27+
| **Add chunk** | Add a new chunk on the document page. You can set keywords for the chunk to help retrieval match it later. |
28+
| **Delete chunk** | Permanently deletes the chunk. This cannot be undone. |
29+
| **Enable / Disable** | Temporarily includes or excludes the chunk from retrieval. Use the circles next to chunks to select multiple and act on them. Disabled chunks can still be edited. |
30+
| **Add / Edit / Delete keywords** | Maintain keywords for a chunk to make it easier to retrieve. |
31+
32+
## Chunking Tips
33+
34+
After chunking, it’s a good idea to scan each chunk: it should be semantically complete, not cut off mid-sentence, and a reasonable length. That helps retrieval accuracy and relevance.
35+
36+
Common issues and effects:
37+
38+
| Issue | Effect |
39+
|-------|--------|
40+
| **Chunks too short** | Not enough context; easier to lose meaning and get wrong answers. |
41+
| **Chunks too long** | May include irrelevant content, add noise, and reduce retrieval precision. |
42+
| **Broken semantics** | Sentences or paragraphs cut by chunk boundaries can lead to missing or misleading retrieval results. |
43+
44+
## Knowledge Base Settings
45+
46+
Open **Knowledge base settings** from the top-right of the knowledge base page.
47+
48+
Supported actions:
49+
50+
| Action | Description |
51+
|--------|-------------|
52+
| **Change name and description** | Set a custom name and description to identify the knowledge base. The name must be unique; if it duplicates another knowledge base, the change will fail. |
53+
| **Change retrieval settings and model** | Adjust how the knowledge base retrieves content and which model and parameters it uses. For details, see the retrieval strategy section in **Create a Knowledge Base**. |

docs/en/ai/docs/knowledge-base/metadata.md

Lines changed: 0 additions & 3 deletions
This file was deleted.
Lines changed: 20 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -1,3 +1,21 @@
1-
# Recall Test
1+
# Recall Testing
22

3-
This section describes how to perform recall tests on the Knowledge Base to verify retrieval effectiveness.
3+
Recall testing lets you run sample queries against your knowledge base, see what gets retrieved, and try different retrieval settings to find a good configuration.
4+
5+
## Open Recall Testing
6+
7+
On the **Knowledge base document list** page or the **Knowledge base settings** page, click the **Recall test** icon in the top-right to open the test view.
8+
9+
## How It Works
10+
11+
| Item | Description |
12+
|------|-------------|
13+
| **Testing** | Enter a question and see which chunks are recalled from the knowledge base and in what order. |
14+
| **Scope of settings** | Any retrieval changes you make in the recall test view apply only to the current test session. They are for comparing strategies and parameters, not for saving. For parameter details, see the retrieval strategy section in **Create a Knowledge Base**. |
15+
| **History** | The page keeps a history of recall tests for this knowledge base (method, query text, time). You can click a past query to copy it for reruns or comparison. |
16+
17+
## Suggested Workflow
18+
19+
- Use questions that are typical in your business and check whether the recalled chunks cover the needed information.
20+
- Switch between retrieval strategies (vector, full-text, hybrid) and try different TopK and score thresholds to compare recall quality and volume.
21+
- Use history to compare results for the same question under different settings and choose a configuration that fits the knowledge base.
Lines changed: 75 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,4 +1,78 @@
11
# 创建知识库
22

3-
本章节介绍如何创建和配置知识库
3+
本文说明创建与配置知识库的步骤及主要选项
44

5+
## 定义知识库与检索方式
6+
7+
路径:**我的知识库****创建新知识库**
8+
9+
在此步骤需填写:
10+
11+
- **名称****描述**:用于区分与说明该知识库。
12+
- **检索设置**:决定在收到用户查询后,系统如何在文档中查找相关内容并提取片段供 LLM 使用。
13+
14+
当前支持三种检索策略,见下表及下文说明。
15+
16+
| 策略 | 说明概要 |
17+
|----------|----------|
18+
| 向量检索 | 将问题向量化,与知识库文本向量比较,取最相似的若干分段。 |
19+
| 全文检索 | 对文档建全文索引,按用户输入词汇匹配并返回包含该词的片段。 |
20+
| 混合检索 | 同时进行全文检索与向量检索,再按规则合并、重排结果。 |
21+
22+
### 向量检索
23+
24+
**含义**:将用户问题转为查询向量,与知识库中文本向量做相似度比较,取最相近的分段。
25+
26+
**可配置项**
27+
28+
|| 说明 |
29+
|-------------|------|
30+
| Rerank 模型 | 默认关闭。开启后对向量检索结果做重排序,可提升 LLM 所用片段的准确度。 |
31+
| TopK | 返回与问题最相似的文本分段数量。系统会结合所选模型的上下文窗口动态调节。默认 3,数值越大返回片段越多。 |
32+
| Score 阈值 | 相似度最低分数,仅高于该值的分段会被召回。默认关闭;阈值越高,召回越少。 |
33+
34+
### 全文检索
35+
36+
**含义**:为文档建立全文索引,支持按任意词查询,返回包含这些词的文本片段。
37+
38+
**可配置项**
39+
40+
|| 说明 |
41+
|-------------|------|
42+
| Rerank 模型 | 默认关闭。开启后对全文检索结果做重排序,提升片段质量。 |
43+
| TopK | 返回的文本分段数量。系统会结合模型上下文动态调整。默认 3。 |
44+
| Score 阈值 | 仅召回分数高于该值的分段。默认关闭;阈值越高,召回越少。 |
45+
46+
### 混合检索
47+
48+
**含义**:同时执行全文检索与向量检索,并对两种结果进行合并与重排,得到最终召回片段。
49+
50+
**可配置项**
51+
52+
|| 说明 |
53+
|-------------|------|
54+
| 权重设置 | 可调节「语义(向量)」与「关键词(全文)」的权重。语义权重为 1 表示仅用向量检索,利于跨词义、跨语言匹配;关键词权重为 1 表示仅用全文检索,适合已知确切术语、且需控制计算量的场景。也可自定义两者比例以适配业务。 |
55+
| Rerank 模型 | 默认关闭。开启后对混合检索结果做重排序。 |
56+
| TopK | 最终返回的文本分段数量。默认 3。 |
57+
| Score 阈值 | 仅召回高于该分数的分段。默认关闭。 |
58+
59+
完成名称、描述与检索设置后,可选择「创建空白知识库」直接完成创建,或点击「创建」进入下一步上传文件。
60+
61+
## 上传文件
62+
63+
- **数量与大小**:单次最多上传 5 个文件,单个文件不超过 10 MB。
64+
- **支持格式**:DOCX、PPTX、HTML、PDF、MD、CSV、XLSX、VTT、JPG、PNG、TXT。
65+
- **来源**:当前仅支持从本地上传,不支持从网页或其他渠道导入。
66+
67+
## 文本与分段设置
68+
69+
该步骤对内容做预处理与分段,得到后续检索所使用的「分段」。分段质量直接影响召回效果与回答准确性。
70+
71+
- **预计分块长度**:除 TXT 外,其余格式均可设置每个分段的大致字数,系统将尽量按该长度切分。
72+
- **TXT 额外设置**
73+
- **分段标识符**:遇到该字符(串)时进行分段。默认 `\n\n`,即按段落分块。
74+
- **分段重叠长度**:相邻分段之间的重叠字数。适当重叠有助于保持语义连贯、提升召回效果,建议设为分段长度对应 Token 数的 10%~25%。
75+
76+
## 完成创建
77+
78+
确认上传与分段设置后,等待系统完成数据处理即可。处理完成后,该知识库即可在应用中被引用与检索。

0 commit comments

Comments
 (0)