GitHub - ev-flow/quark-rule-generate: Generating Detection Rules for Quark Engine !

Android Malware Family Analysis · Rule Generation · Weight Optimization

Web Dashboard • Quick Start • How It Works • CLI Tools • Configuration

🖥 Web Dashboard

Start the server with ./start.sh and open http://localhost:9527.

Auto Pipeline

Enter a malware family name, click start — the system handles everything from sample collection to rule deployment. Real-time progress, log streaming, and multi-family queue management.

Rule Review

Click Review Rules on any completed family to inspect generated rules with their machine-optimized scores and 5-stage detection analysis.

Threat Intelligence Report

After downloading samples, the system queries OpenAI to generate a threat intelligence report for the malware family — including a summary of known behaviors, severity ratings, and associated Android API functions. The report feeds directly into rule generation to improve detection coverage.

API endpoint: GET /api/auto-pipeline/families/{family}/report
Refresh: POST /api/auto-pipeline/families/{family}/report/refresh
Output: data/reports/{family}_threat_intel.json
Requires OPENAI_API_KEY (optional — pipeline continues without it)

What is Quark Rule Generate?

Quark Rule Generate is an end-to-end pipeline for analyzing Android malware families and automatically generating optimized detection rules for Quark-Engine.

Given a malware family name (e.g. hydra, cerberus, anubis), it:

Searches MalwareBazaar for known samples of the family
Downloads APK files from AndroZoo
Searches threat intelligence reports to identify known malicious behaviors and associated Android APIs
Generates Quark-Engine detection rules via static analysis — guided by threat intel when available
Trains a PyTorch neural network to optimize rule confidence weights per family
Deploys optimized, weighted rules to your quark-rules repository

The core value: instead of manually crafting rules and guessing weights, the system learns per-family detection patterns and produces rules whose score fields are machine-optimized.

✨ Features

Family Analysis & Rule Optimization

Per-Family Pipeline — Enter a malware family name, the system automatically collects samples, generates rules, and trains optimal weights
Iterative Weight Training — PyTorch neural network auto-prunes low-signal rules, rounds weights, and retries until 100% detection accuracy
Score Optimization — Each rule's score field is machine-learned to reflect its actual detection power against the target family
Threat Intelligence Reports — AI-powered analysis of known family behaviors, severity ratings, and Android API associations — fed into rule generation for better coverage
Pre-filtering — Automatically removes zero-signal rules before training to save time and memory
OOM Protection — Configurable rule caps (MAX_RULES_FOR_TRAINING) and sample limits for memory-constrained environments

Web Dashboard

Real-Time Monitoring — Live progress bars, log streaming, pipeline status per family
Rule Review UI — Inspect generated rules with optimized scores, API call sequences, and 5-stage analysis breakdown
Multi-Family Queue — Queue multiple families (e.g. hydra, cerberus, anubis) for sequential processing
Restart from Any Stage — Re-run threat intel search, rule generation, or weight training independently
Settings UI — Configure all parameters without editing .env manually
Bilingual UI — Chinese / English toggle

Extras

AI Descriptions — Optional GPT-powered rule description generation
quark-rules Integration — Automatically copies optimized rules to the quark-rules repo with proper indexing

🚀 Quick Start

Prerequisites

Tool	Purpose
Git	Clone repositories
Python 3.13+	Runtime (auto-installed via `uv`)
uv	Python package manager

One-Line Install (Ubuntu)

git clone https://github.com/ev-flow/quark-rule-generate.git
cd quark-rule-generate
chmod +x install.sh && ./install.sh

The install script handles everything: system dependencies, uv, Python 3.13, quark-engine, and all Python packages.

Manual Install (macOS / other)

# 1. Install uv
curl -LsSf https://astral.sh/uv/install.sh | sh

# 2. Clone this repo
git clone https://github.com/ev-flow/quark-rule-generate.git
cd quark-rule-generate

# 3. Clone quark-engine (sibling directory)
git clone https://github.com/quark-engine/quark-engine.git ../quark-engine -b for_rule_adjust

# 4. Install Python dependencies
uv sync

# 5. Install quark-engine into the virtual environment
uv pip install -e ../quark-engine

# 6. Setup environment
cp .env.template .env
# Edit .env and fill in your API keys

Get API Keys

Key	Source	Required
`ANDROZOO_API_KEY`	AndroZoo	✅ For APK downloads
`MALWAREBAZAAR_API_KEY`	MalwareBazaar	✅ For sample search
`OPENAI_API_KEY`	OpenAI	Optional (AI descriptions)
`VIRUS_TOTAL_API_KEY`	VirusTotal	Optional (label lookup)

Start the Web UI

./start.sh
# → Open http://localhost:9527

🔧 How It Works

Pipeline Architecture

Key insight: Rules are generated from actual malware family behaviors, then each rule's score is optimized by training a neural network on malicious vs. benign samples — resulting in detection rules that are both family-specific and weight-optimized.

Stage Details

Stage 1: Search & Download

Queries MalwareBazaar for SHA256 hashes matching the family signature
Downloads APK samples from AndroZoo (requires API key)
Filters out corrupted or unavailable samples
Output: data/apks/{family}/ + data/lists/family/{family}.csv

Stage 2: Rule Generation

Uses Quark-Engine static analysis + Ray distributed computing
Analyzes APK bytecode to extract API call sequences
Generates detection rules as JSON files (one per suspicious behavior)
Deduplicates against existing quark-rules repository
Output: data/rules/{family}/*.json

Stage 3: Family Analysis & Weight Optimization (Training)

Runs Quark-Engine analysis on each rule × each APK sample (both malicious family samples and benign samples)
Evaluates the 5-stage detection confidence: permission check → API class → API method → descriptor → data flow
Builds a PyTorch neural network (RuleAdjustmentModel) that learns optimal score for each rule
Iterative pruning: removes bottom-percentile rules each iteration — keeping only the most discriminative rules for the family
Weight rounding: ensures scores are human-readable integers
Targets 100% detection accuracy before stopping
Output: data/predictions/{family}_prediction.csv

Stage 4: Score Application & Export

Writes optimized score field back to each rule JSON
Optional: generates AI-powered rule descriptions via OpenAI
Optional: copies rules to your quark-rules repo with auto-numbered filenames

Quark Rule JSON Format

Each generated rule is a JSON file describing a suspicious Android behavior:

{
  "crime": "Capture and transmit SMS messages to remote server",
  "permission": [
    "android.permission.RECEIVE_SMS",
    "android.permission.INTERNET"
  ],
  "api": [
    {
      "class": "Landroid/telephony/SmsMessage",
      "method": "getMessageBody",
      "descriptor": "()Ljava/lang/String;"
    },
    {
      "class": "Ljava/net/HttpURLConnection",
      "method": "getOutputStream",
      "descriptor": "()Ljava/io/OutputStream;"
    }
  ],
  "score": 4,
  "label": ["Spyware", "SMS Stealer"]
}

🛠 CLI Tools

All tools can be run independently via uv run:

Generate Rules

uv run tools/generate_rules.py \
  -a data/lists/family/hydra.csv \
  -w data/generated_rules/hydra \
  -o data/rules/hydra

Train Weights (Iterative)

uv run tools/iterative_train.py \
  --target-family hydra \
  -a data/lists/family/hydra.csv \
  -r data/rules/hydra \
  -o data/predictions/hydra_prediction.csv \
  --epochs 200 --lrs 0.1,0.05,0.01 \
  --max-iterations 10 --prune-percentile 50

Train Weights (Interactive with MLflow)

uv run tools/adjust_rule_score.py \
  --target-family hydra \
  --rule-folder data/rules/hydra \
  --apk-list data/lists/family/hydra.csv \
  --lrs 0.1,0.05,0.01 --epochs 100

Apply Scores to Rule JSONs

uv run tools/apply_rule_info.py \
  --apk_prediction data/predictions/hydra_prediction.csv \
  --rule_info data/lists/family/hydra_rule_review.csv \
  --rule_base_folder data/rules/hydra

Analyze APKs Against Rules

uv run tools/analyze_apk.py \
  -a data/lists/family/hydra.csv \
  -r data/rules/hydra \
  -o data/test_results/hydra_analysis

Copy Rules to quark-rules Repo

uv run tools/copy_rule_to_quark_rules.py \
  --rule_list data/lists/family/hydra_rule_list.csv \
  --rule_base_folder data/rules/hydra \
  --quark_rule_folder ../quark-rules \
  --start_index 1

Download APKs

# From MalwareBazaar + AndroZoo
uv run tools/collect_apk_by_family.py \
  -a data/lists/family/hydra.csv \
  -f hydra \
  -o data/apks/hydra

Pre-filter Rules

uv run tools/prefilter_rules.py \
  --apk-csv data/lists/family/hydra.csv \
  --rules-dir data/rules/hydra \
  --sample-count 3

Generate AI Descriptions

uv run tools/generate_rule_description.py \
  --rule-folder data/rules/hydra

⚙ Configuration

Environment Variables (`.env`)

Copy .env.template and fill in your values:

cp .env.template .env

Variable	Default	Description
`ANDROZOO_API_KEY`	—	AndroZoo API key (required)
`MALWAREBAZAAR_API_KEY`	—	MalwareBazaar API key (required)
`OPENAI_API_KEY`	—	OpenAI API key (optional, for AI descriptions)
`TRAIN_SAMPLE_COUNT`	`10`	Max APK samples for training (0 = all)
`MAX_RULES_FOR_TRAINING`	`0`	Hard cap on rules entering training (recommended: 200 for <4GB RAM)
`GENERATE_RULES_MAX_APIS`	`200`	Max API combinations for rule generation
`GENERATE_RULES_CPUS`	auto	Ray worker count for rule generation
`GENERATE_RULES_OBJECT_STORE_MB`	`4096`	Ray object store memory limit
`MIN_SAMPLES`	`10`	Minimum APK samples required to proceed
`MAX_APK_DOWNLOAD`	`100`	Max APKs to download per family
`BENIGN_APK_LIST`	—	CSV of benign APK SHA256s (required for training)
`QUARK_RULES_FOLDER`	—	Path to quark-rules repo (leave empty to skip copy)
`ANALYSIS_PYTHON`	`uv run`	Python interpreter for analysis subprocesses
`TRAIN_TIMEOUT`	`7200`	Training subprocess timeout in seconds
`PREFILTER_SAMPLE_COUNT`	`3`	Samples used for pre-filtering rules
`AUTO_COVERAGE_CHECK`	`false`	Run coverage check after training

Memory Optimization

For machines with limited RAM (< 4GB):

MAX_RULES_FOR_TRAINING=200    # Cap rules before training
TRAIN_SAMPLE_COUNT=5          # Limit APK samples
GENERATE_RULES_OBJECT_STORE_MB=2048  # Reduce Ray memory

📁 Project Structure

quark-rule-generate/
├── web/
│   ├── app.py                 # FastAPI backend (API + pipeline orchestration)
│   └── static/
│       └── index.html         # Single-page web dashboard
├── tools/                     # CLI tools for each pipeline stage
│   ├── generate_rules.py      # Rule generation (Ray + Quark-Engine)
│   ├── iterative_train.py     # Iterative weight training (PyTorch)
│   ├── adjust_rule_score.py   # Interactive training (PyTorch + MLflow)
│   ├── apply_rule_info.py     # Apply scores to rule JSONs
│   ├── analyze_apk.py         # APK analysis (Ray)
│   ├── prefilter_rules.py     # Pre-filter zero-signal rules
│   ├── copy_rule_to_quark_rules.py  # Export to quark-rules repo
│   ├── collect_apk_by_family.py     # APK download
│   ├── generate_rule_description.py # AI-powered descriptions
│   ├── search_family_report.py      # Threat intelligence reports
│   └── ...
├── data_preprocess/           # Core analysis libraries
│   ├── analysis_result.py     # Quark analysis with caching
│   ├── dataset.py             # PyTorch dataset (ApkDataset)
│   ├── apk.py                 # APK download & management
│   └── rule.py                # Rule file utilities
├── model/
│   └── __init__.py            # RuleAdjustmentModel (PyTorch)
├── data/                      # Runtime data (gitignored)
│   ├── apks/                  # Downloaded APK files
│   ├── rules/                 # Generated rule JSONs
│   ├── predictions/           # Training prediction CSVs
│   └── lists/family/          # Sample lists per family
├── docs/                      # Documentation assets
├── .env.template              # Environment variable template
├── install.sh                 # One-click Ubuntu installer
├── start.sh                   # Web server start script
├── pyproject.toml             # Python project config (uv)
└── uv.lock                    # Dependency lock file

🔗 Related Projects

Project	Description
Quark-Engine	Android malware analysis engine
quark-rules	Community detection rule repository

📄 License

This project is part of the Quark-Engine ecosystem.

Name		Name	Last commit message	Last commit date
Latest commit History 67 Commits
.vscode		.vscode
data_preprocess		data_preprocess
model		model
quark-rules		quark-rules
research		research
scripts		scripts
service/prefect		service/prefect
test_rules		test_rules
tests		tests
tools		tools
web		web
.env.template		.env.template
.gitignore		.gitignore
.python-version		.python-version
.tool-versions		.tool-versions
README.md		README.md
SKILL.md		SKILL.md
apk-mafias-list.csv		apk-mafias-list.csv
install.sh		install.sh
maliciousAPKs_test.csv		maliciousAPKs_test.csv
malwarebazaar.http		malwarebazaar.http
overall.png		overall.png
process_of_preparing_rules_for_malware_family.png		process_of_preparing_rules_for_malware_family.png
pyproject.toml		pyproject.toml
requirements-analysis.txt		requirements-analysis.txt
rule-score-auto-adjust.skill		rule-score-auto-adjust.skill
rule_score_auto_adjust_haeter.code-workspace		rule_score_auto_adjust_haeter.code-workspace
ruletest		ruletest
runashell		runashell
start.sh		start.sh
test_rule.csv		test_rule.csv
uv.lock		uv.lock

Folders and files

Latest commit

History

Repository files navigation

🖥 Web Dashboard

Auto Pipeline

Rule Review

Threat Intelligence Report

What is Quark Rule Generate?

✨ Features

Family Analysis & Rule Optimization

Web Dashboard

Extras

🚀 Quick Start

Prerequisites

One-Line Install (Ubuntu)

Manual Install (macOS / other)

Get API Keys

Start the Web UI

🔧 How It Works

Pipeline Architecture

Stage Details

Stage 1: Search & Download

Stage 2: Rule Generation

Stage 3: Family Analysis & Weight Optimization (Training)

Stage 4: Score Application & Export

Quark Rule JSON Format

🛠 CLI Tools

Generate Rules

Train Weights (Iterative)

Train Weights (Interactive with MLflow)

Apply Scores to Rule JSONs

Analyze APKs Against Rules

Copy Rules to quark-rules Repo

Download APKs

Pre-filter Rules

Generate AI Descriptions

⚙ Configuration

Environment Variables (.env)

Memory Optimization

📁 Project Structure

🔗 Related Projects

📄 License

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases 1

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Environment Variables (`.env`)

Packages