Skip to content

Commit 24836c4

Browse files
authored
feat: Add FE Audit TopSQL tool for analyzing audit logs and generating reports (#25)
1 parent b5821ea commit 24836c4

File tree

13 files changed

+1131
-22
lines changed

13 files changed

+1131
-22
lines changed

README.md

Lines changed: 58 additions & 12 deletions
Original file line numberDiff line numberDiff line change
@@ -1,31 +1,77 @@
11
# cloud-cli
22

3-
`cloud-cli` is a command-line tool designed to simplify the management and diagnosis of server-side applications. It provides an interactive menu to access various diagnostic tools, making it easier for developers and system administrators to troubleshoot processes.
3+
`cloud-cli` is an interactive command-line tool for on-call troubleshooting of Apache Doris / SelectDB clusters. It provides a TUI menu that groups common FE/BE diagnostic workflows and writes collected artifacts into an output directory for easy archiving and sharing.
44

55
## Features
66

7-
The tool is organized into two main categories:
7+
The tool is organized into two categories: `FE` and `BE`.
88

9-
### FE (Frontend/Java Applications)
9+
### FE
1010

11-
- **`jstack`**: Prints Java thread stack traces for a given Java process, helping to diagnose hangs and deadlocks.
12-
- **`jmap`**: Generates heap dumps and provides memory statistics for a Java process, useful for analyzing memory leaks.
11+
- `fe-list`: List and select the FE target host (IP) for the current session based on `clusters.toml`.
12+
- `jmap` (`jmap-dump`, `jmap-histo`): Java heap dump / histogram.
13+
- `jstack`: Java thread dump.
14+
- `fe-profiler`: Generate a flame graph using Doris `bin/profile_fe.sh` (requires async-profiler).
15+
- `table-info`: Interactive database/table browser that collects and summarizes schema, indexes, partitions, and bucket details (supports exporting `.txt` reports).
16+
- `routine-load`: Routine Load helper tools:
17+
- `Get Job ID`: List and select a Routine Load job (cached locally for later analysis).
18+
- `Performance Analysis`: Analyze per-commit rows/bytes/time from FE logs.
19+
- `Traffic Monitor`: Aggregate per-minute `loadedRows` from FE logs.
20+
- `fe-audit-topsql`: Parse `fe.audit.log`, normalize SQL into templates, and aggregate by CPU/time to generate a TopSQL report. Default filter is `count > 10`; if nothing matches, it falls back to showing results without the count filter.
1321

14-
### BE (Backend/General Processes)
22+
### BE
1523

16-
- **`pstack`**: Displays the stack trace for any running process, offering insights into its execution state.
17-
- **`get_be_vars`**: Retrieves and displays the environment variables of a running process.
24+
- `be-list`: List and select the BE target host (IP) for the current session based on `clusters.toml` (defaults to `127.0.0.1`).
25+
- `pstack`: Use `gdb` to dump process thread stacks (writes a `.txt` file).
26+
- `jmap` (`jmap-dump`, `jmap-histo`): Java heap dump / histogram (only meaningful for JVM processes).
27+
- `be-config`:
28+
- `get-vars` (`get-be-vars`): Query BE configuration variables via HTTP `/varz`.
29+
- `update-config` (`set-be-config`): Update configuration via HTTP `/api/update_config` (supports persist).
30+
- `pipeline-tasks`: Fetch running pipeline tasks via HTTP `/api/running_pipeline_tasks` (auto-saves when the response is large).
31+
- `memz`:
32+
- `current` (`memz`): Fetch the current Jemalloc memory view via HTTP `/memz` (saves HTML and prints a summary).
33+
- `global` (`memz-global`): Fetch the global memory view via HTTP `/memz?type=global`.
1834

1935
## Usage
2036

21-
To run the application, execute the binary. An interactive menu will appear, allowing you to select the desired diagnostic tool.
37+
### Build and run
2238

2339
```sh
24-
./cloud-cli
40+
cargo build --release
41+
./target/release/cloud-cli
2542
```
2643

44+
### Configuration
45+
46+
- Persistent config: `~/.config/cloud-cli/config.toml`
47+
- MySQL key file: `~/.config/cloud-cli/key`
48+
- Cluster topology cache: `~/.config/cloud-cli/clusters.toml`
49+
- First run: if an FE process is detected and MySQL credentials are missing, it will prompt you to configure and test the connection. On success, it writes `config.toml` and fetches cluster topology into `clusters.toml` (used by `fe-list`/`be-list`).
50+
- Common environment variables (override persistent config):
51+
- `JDK_PATH`: Set the JDK path (used by `jmap`/`jstack`).
52+
- `OUTPUT_DIR`: Set the output directory (default: `/tmp/doris/collection`).
53+
- `CLOUD_CLI_TIMEOUT`: External command timeout in seconds (default: `60`).
54+
- `CLOUD_CLI_NO_PROGRESS`: Disable progress animation (`1`/`true`).
55+
- `MYSQL_HOST` / `MYSQL_PORT`: Set Doris MySQL endpoint (defaults are derived from config; otherwise `127.0.0.1:9030`).
56+
- `PROFILE_SECONDS`: Override `fe-profiler` collection duration in seconds.
57+
- `CLOUD_CLI_FE_AUDIT_TOPSQL_INPUT`: Provide an explicit `fe.audit.log` path for `fe-audit-topsql` (skips interactive selection).
58+
59+
### Output
60+
61+
- Most tools write artifacts into the output directory (default: `/tmp/doris/collection`) and print `Output saved to: ...`.
62+
- A few tools primarily print to stdout (e.g., `get-be-vars`), but still follow consistent prompts and error messages.
63+
64+
### Runtime dependencies
65+
66+
The CLI invokes some system commands at runtime. Ensure these are installed and available in `PATH`:
67+
68+
- `mysql` (for cluster info, Routine Load, Table Info, etc.)
69+
- `curl` (for BE HTTP tools)
70+
- `gdb` (for `pstack`)
71+
- `bash`
72+
2773
## Releases
2874

29-
This project uses GitHub Actions to automatically build and release binaries for Linux (`x86_64` and `aarch64`). When a new version is tagged (e.g., `v1.0.0`), a new release is created.
75+
This project uses GitHub Actions to build and publish Linux (`x86_64` / `aarch64`) binaries. When you create a tag (e.g., `v1.0.0`), a corresponding GitHub Release is produced.
3076

31-
You can download the latest pre-compiled binaries from the [GitHub Releases](https://github.com/QuakeWang/cloud-cli/releases) page.
77+
You can download the latest prebuilt binaries from GitHub Releases: https://github.com/QuakeWang/cloud-cli/releases

src/lib.rs

Lines changed: 0 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -7,11 +7,9 @@ pub mod process;
77
pub mod tools;
88
pub mod ui;
99

10-
use config::Config;
1110
use config_loader::persist_configuration;
1211
use dialoguer::Confirm;
1312
use error::Result;
14-
use tools::Tool;
1513
use tools::mysql::CredentialManager;
1614
use ui::*;
1715

@@ -98,7 +96,3 @@ pub fn run_cli() -> Result<()> {
9896
ui::print_goodbye();
9997
Ok(())
10098
}
101-
102-
fn execute_tool_enhanced(config: &Config, tool: &dyn Tool, service_name: &str) -> Result<()> {
103-
ui::tool_executor::execute_tool_enhanced(config, tool, service_name)
104-
}

src/tools/common/fs_utils.rs

Lines changed: 4 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -92,6 +92,10 @@ pub fn collect_fe_logs(dir: &Path) -> Result<Vec<PathBuf>> {
9292
collect_log_files(dir, "fe.log")
9393
}
9494

95+
pub fn collect_fe_audit_logs(dir: &Path) -> Result<Vec<PathBuf>> {
96+
collect_log_files(dir, "fe.audit.log")
97+
}
98+
9599
pub fn collect_be_logs(dir: &Path) -> Result<Vec<PathBuf>> {
96100
collect_log_files(dir, "be.INFO")
97101
}
Lines changed: 144 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,144 @@
1+
use std::collections::HashMap;
2+
3+
#[derive(Debug, Clone)]
4+
pub struct TemplateStats {
5+
pub sql_template: String,
6+
pub table: Option<String>,
7+
pub slowest_stmt: String,
8+
pub slowest_query_id: Option<String>,
9+
pub slowest_time_ms: u64,
10+
pub count: u64,
11+
pub total_time_ms: u64,
12+
pub total_cpu_ms: u64,
13+
pub max_time_ms: u64,
14+
pub min_time_ms: u64,
15+
}
16+
17+
impl TemplateStats {
18+
pub fn avg_time_ms(&self) -> f64 {
19+
if self.count == 0 {
20+
return 0.0;
21+
}
22+
self.total_time_ms as f64 / self.count as f64
23+
}
24+
25+
pub fn avg_cpu_ms(&self) -> f64 {
26+
if self.count == 0 {
27+
return 0.0;
28+
}
29+
self.total_cpu_ms as f64 / self.count as f64
30+
}
31+
}
32+
33+
#[derive(Debug, Clone)]
34+
pub struct AnalysisResult {
35+
pub items: Vec<TemplateStats>,
36+
pub total_templates: u64,
37+
pub total_executions: u64,
38+
pub total_time_ms: u64,
39+
pub total_cpu_ms: u64,
40+
pub used_fallback: bool,
41+
}
42+
43+
#[derive(Debug, Default)]
44+
struct TemplateAgg {
45+
table: Option<String>,
46+
slowest_stmt: String,
47+
slowest_query_id: Option<String>,
48+
slowest_time_ms: u64,
49+
count: u64,
50+
total_time_ms: u64,
51+
total_cpu_ms: u64,
52+
max_time_ms: u64,
53+
min_time_ms: u64,
54+
}
55+
56+
pub struct TemplateAggregator {
57+
map: HashMap<String, TemplateAgg>,
58+
}
59+
60+
impl TemplateAggregator {
61+
pub fn new() -> Self {
62+
Self {
63+
map: HashMap::new(),
64+
}
65+
}
66+
67+
pub fn push(
68+
&mut self,
69+
template: String,
70+
table: Option<String>,
71+
time_ms: u64,
72+
cpu_ms: u64,
73+
stmt: String,
74+
query_id: Option<String>,
75+
) {
76+
let entry = self.map.entry(template).or_insert_with(|| TemplateAgg {
77+
min_time_ms: u64::MAX,
78+
..Default::default()
79+
});
80+
81+
if entry.table.is_none() {
82+
entry.table = table;
83+
}
84+
85+
entry.count += 1;
86+
entry.total_time_ms += time_ms;
87+
entry.total_cpu_ms += cpu_ms;
88+
entry.max_time_ms = entry.max_time_ms.max(time_ms);
89+
entry.min_time_ms = entry.min_time_ms.min(time_ms);
90+
91+
if time_ms > entry.slowest_time_ms {
92+
entry.slowest_time_ms = time_ms;
93+
entry.slowest_stmt = stmt;
94+
entry.slowest_query_id = query_id;
95+
}
96+
}
97+
98+
pub fn finish(self, min_count_exclusive: u64) -> AnalysisResult {
99+
let has_threshold_matches = self.map.values().any(|x| x.count > min_count_exclusive);
100+
101+
let mut items: Vec<TemplateStats> = self
102+
.map
103+
.into_iter()
104+
.filter(|(_, x)| !has_threshold_matches || x.count > min_count_exclusive)
105+
.map(|(sql_template, x)| TemplateStats {
106+
sql_template,
107+
table: x.table,
108+
slowest_stmt: x.slowest_stmt,
109+
slowest_query_id: x.slowest_query_id,
110+
slowest_time_ms: x.slowest_time_ms,
111+
count: x.count,
112+
total_time_ms: x.total_time_ms,
113+
total_cpu_ms: x.total_cpu_ms,
114+
max_time_ms: x.max_time_ms,
115+
min_time_ms: if x.min_time_ms == u64::MAX {
116+
0
117+
} else {
118+
x.min_time_ms
119+
},
120+
})
121+
.collect();
122+
123+
items.sort_by_key(|x| std::cmp::Reverse(x.total_cpu_ms));
124+
125+
let total_templates = items.len() as u64;
126+
let (total_executions, total_time_ms, total_cpu_ms) =
127+
items.iter().fold((0u64, 0u64, 0u64), |acc, it| {
128+
(
129+
acc.0 + it.count,
130+
acc.1 + it.total_time_ms,
131+
acc.2 + it.total_cpu_ms,
132+
)
133+
});
134+
135+
AnalysisResult {
136+
used_fallback: !has_threshold_matches && !items.is_empty(),
137+
items,
138+
total_templates,
139+
total_executions,
140+
total_time_ms,
141+
total_cpu_ms,
142+
}
143+
}
144+
}

src/tools/fe/audit_topsql/mod.rs

Lines changed: 7 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,7 @@
1+
mod aggregate;
2+
mod normalize;
3+
mod parser;
4+
mod report;
5+
mod tool;
6+
7+
pub use tool::FeAuditTopSqlTool;
Lines changed: 74 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,74 @@
1+
use once_cell::sync::Lazy;
2+
use regex::Regex;
3+
use std::collections::HashSet;
4+
5+
static RE_LITERALS: Lazy<Regex> = Lazy::new(|| Regex::new(r"'[^']*'|\b\d+(?:\.\d+)?\b").unwrap());
6+
static RE_IN_LIST: Lazy<Regex> =
7+
Lazy::new(|| Regex::new(r"(?i)\bin\s*\(\s*\?(?:\s*,\s*\?)*\s*\)").unwrap());
8+
static RE_NOT_IN_LIST: Lazy<Regex> =
9+
Lazy::new(|| Regex::new(r"(?i)\bnot\s+in\s*\(\s*\?(?:\s*,\s*\?)*\s*\)").unwrap());
10+
static RE_MULTI_SPACE: Lazy<Regex> = Lazy::new(|| Regex::new(r"\s+").unwrap());
11+
static RE_OPERATOR_SPACE: Lazy<Regex> = Lazy::new(|| Regex::new(r"\s*([=(),])\s*").unwrap());
12+
static RE_LINE_COMMENT: Lazy<Regex> = Lazy::new(|| Regex::new(r"(?m)--[^\n]*").unwrap());
13+
static RE_BLOCK_COMMENT: Lazy<Regex> = Lazy::new(|| Regex::new(r"(?s)/\*.*?\*/").unwrap());
14+
static RE_CTE: Lazy<Regex> =
15+
Lazy::new(|| Regex::new(r"(?i)(?:^|\bwith\b|,)\s*([a-z0-9_]+)\s+as\s*\(").unwrap());
16+
static RE_FROM: Lazy<Regex> = Lazy::new(|| Regex::new(r"(?i)\bfrom\s+([a-z0-9_.`]+)").unwrap());
17+
18+
pub fn normalize_sql(sql: &str) -> String {
19+
if sql.is_empty() {
20+
return String::new();
21+
}
22+
23+
let mut out = RE_BLOCK_COMMENT.replace_all(sql, " ").to_string();
24+
if out.contains('\n') {
25+
out = RE_LINE_COMMENT.replace_all(&out, " ").to_string();
26+
}
27+
out = out.replace(['\r', '\n', '\t'], " ");
28+
29+
out = RE_LITERALS.replace_all(&out, "?").to_string();
30+
out = RE_NOT_IN_LIST.replace_all(&out, "not in (?)").to_string();
31+
out = RE_IN_LIST.replace_all(&out, "in (?)").to_string();
32+
out = RE_OPERATOR_SPACE.replace_all(&out, "$1").to_string();
33+
out.make_ascii_lowercase();
34+
out = RE_MULTI_SPACE.replace_all(&out, " ").to_string();
35+
out.trim().to_string()
36+
}
37+
38+
pub fn guess_table(normalized_sql: &str) -> Option<String> {
39+
if normalized_sql.is_empty() {
40+
return None;
41+
}
42+
43+
let mut ctes: HashSet<String> = HashSet::new();
44+
for cap in RE_CTE.captures_iter(normalized_sql) {
45+
if let Some(name) = cap.get(1) {
46+
ctes.insert(name.as_str().to_ascii_lowercase());
47+
}
48+
}
49+
50+
for cap in RE_FROM.captures_iter(normalized_sql) {
51+
let Some(m) = cap.get(1) else { continue };
52+
let table = m.as_str().replace('`', "");
53+
let table_lc = table.to_ascii_lowercase();
54+
if ctes.contains(&table_lc) {
55+
continue;
56+
}
57+
if matches!(
58+
table_lc.as_str(),
59+
"a" | "b"
60+
| "c"
61+
| "d"
62+
| "t"
63+
| "t_index"
64+
| "params"
65+
| "current_data"
66+
| "last_period_data"
67+
) {
68+
continue;
69+
}
70+
return Some(table);
71+
}
72+
73+
None
74+
}

0 commit comments

Comments
 (0)