Android Malware Family Analysis · Rule Generation · Weight Optimization
Web Dashboard • Quick Start • How It Works • CLI Tools • Configuration
Start the server with ./start.sh and open http://localhost:9527.
Enter a malware family name, click start — the system handles everything from sample collection to rule deployment. Real-time progress, log streaming, and multi-family queue management.
Click Review Rules on any completed family to inspect generated rules with their machine-optimized scores and 5-stage detection analysis.
After downloading samples, the system queries OpenAI to generate a threat intelligence report for the malware family — including a summary of known behaviors, severity ratings, and associated Android API functions. The report feeds directly into rule generation to improve detection coverage.
- API endpoint:
GET /api/auto-pipeline/families/{family}/report - Refresh:
POST /api/auto-pipeline/families/{family}/report/refresh - Output:
data/reports/{family}_threat_intel.json - Requires
OPENAI_API_KEY(optional — pipeline continues without it)
Quark Rule Generate is an end-to-end pipeline for analyzing Android malware families and automatically generating optimized detection rules for Quark-Engine.
Given a malware family name (e.g. hydra, cerberus, anubis), it:
- Searches MalwareBazaar for known samples of the family
- Downloads APK files from AndroZoo
- Searches threat intelligence reports to identify known malicious behaviors and associated Android APIs
- Generates Quark-Engine detection rules via static analysis — guided by threat intel when available
- Trains a PyTorch neural network to optimize rule confidence weights per family
- Deploys optimized, weighted rules to your quark-rules repository
The core value: instead of manually crafting rules and guessing weights, the system learns per-family detection patterns and produces rules whose score fields are machine-optimized.
- Per-Family Pipeline — Enter a malware family name, the system automatically collects samples, generates rules, and trains optimal weights
- Iterative Weight Training — PyTorch neural network auto-prunes low-signal rules, rounds weights, and retries until 100% detection accuracy
- Score Optimization — Each rule's
scorefield is machine-learned to reflect its actual detection power against the target family - Threat Intelligence Reports — AI-powered analysis of known family behaviors, severity ratings, and Android API associations — fed into rule generation for better coverage
- Pre-filtering — Automatically removes zero-signal rules before training to save time and memory
- OOM Protection — Configurable rule caps (
MAX_RULES_FOR_TRAINING) and sample limits for memory-constrained environments
- Real-Time Monitoring — Live progress bars, log streaming, pipeline status per family
- Rule Review UI — Inspect generated rules with optimized scores, API call sequences, and 5-stage analysis breakdown
- Multi-Family Queue — Queue multiple families (e.g.
hydra,cerberus,anubis) for sequential processing - Restart from Any Stage — Re-run threat intel search, rule generation, or weight training independently
- Settings UI — Configure all parameters without editing
.envmanually - Bilingual UI — Chinese / English toggle
- AI Descriptions — Optional GPT-powered rule description generation
- quark-rules Integration — Automatically copies optimized rules to the quark-rules repo with proper indexing
| Tool | Purpose |
|---|---|
| Git | Clone repositories |
| Python 3.13+ | Runtime (auto-installed via uv) |
| uv | Python package manager |
git clone https://github.com/ev-flow/quark-rule-generate.git
cd quark-rule-generate
chmod +x install.sh && ./install.shThe install script handles everything: system dependencies, uv, Python 3.13, quark-engine, and all Python packages.
# 1. Install uv
curl -LsSf https://astral.sh/uv/install.sh | sh
# 2. Clone this repo
git clone https://github.com/ev-flow/quark-rule-generate.git
cd quark-rule-generate
# 3. Clone quark-engine (sibling directory)
git clone https://github.com/quark-engine/quark-engine.git ../quark-engine -b for_rule_adjust
# 4. Install Python dependencies
uv sync
# 5. Install quark-engine into the virtual environment
uv pip install -e ../quark-engine
# 6. Setup environment
cp .env.template .env
# Edit .env and fill in your API keys| Key | Source | Required |
|---|---|---|
ANDROZOO_API_KEY |
AndroZoo | ✅ For APK downloads |
MALWAREBAZAAR_API_KEY |
MalwareBazaar | ✅ For sample search |
OPENAI_API_KEY |
OpenAI | Optional (AI descriptions) |
VIRUS_TOTAL_API_KEY |
VirusTotal | Optional (label lookup) |
./start.sh
# → Open http://localhost:9527
Key insight: Rules are generated from actual malware family behaviors, then each rule's
scoreis optimized by training a neural network on malicious vs. benign samples — resulting in detection rules that are both family-specific and weight-optimized.
- Queries MalwareBazaar for SHA256 hashes matching the family signature
- Downloads APK samples from AndroZoo (requires API key)
- Filters out corrupted or unavailable samples
- Output:
data/apks/{family}/+data/lists/family/{family}.csv
- Uses Quark-Engine static analysis + Ray distributed computing
- Analyzes APK bytecode to extract API call sequences
- Generates detection rules as JSON files (one per suspicious behavior)
- Deduplicates against existing
quark-rulesrepository - Output:
data/rules/{family}/*.json
- Runs Quark-Engine analysis on each rule × each APK sample (both malicious family samples and benign samples)
- Evaluates the 5-stage detection confidence: permission check → API class → API method → descriptor → data flow
- Builds a PyTorch neural network (
RuleAdjustmentModel) that learns optimalscorefor each rule - Iterative pruning: removes bottom-percentile rules each iteration — keeping only the most discriminative rules for the family
- Weight rounding: ensures scores are human-readable integers
- Targets 100% detection accuracy before stopping
- Output:
data/predictions/{family}_prediction.csv
- Writes optimized
scorefield back to each rule JSON - Optional: generates AI-powered rule descriptions via OpenAI
- Optional: copies rules to your
quark-rulesrepo with auto-numbered filenames
Each generated rule is a JSON file describing a suspicious Android behavior:
{
"crime": "Capture and transmit SMS messages to remote server",
"permission": [
"android.permission.RECEIVE_SMS",
"android.permission.INTERNET"
],
"api": [
{
"class": "Landroid/telephony/SmsMessage",
"method": "getMessageBody",
"descriptor": "()Ljava/lang/String;"
},
{
"class": "Ljava/net/HttpURLConnection",
"method": "getOutputStream",
"descriptor": "()Ljava/io/OutputStream;"
}
],
"score": 4,
"label": ["Spyware", "SMS Stealer"]
}All tools can be run independently via uv run:
uv run tools/generate_rules.py \
-a data/lists/family/hydra.csv \
-w data/generated_rules/hydra \
-o data/rules/hydrauv run tools/iterative_train.py \
--target-family hydra \
-a data/lists/family/hydra.csv \
-r data/rules/hydra \
-o data/predictions/hydra_prediction.csv \
--epochs 200 --lrs 0.1,0.05,0.01 \
--max-iterations 10 --prune-percentile 50uv run tools/adjust_rule_score.py \
--target-family hydra \
--rule-folder data/rules/hydra \
--apk-list data/lists/family/hydra.csv \
--lrs 0.1,0.05,0.01 --epochs 100uv run tools/apply_rule_info.py \
--apk_prediction data/predictions/hydra_prediction.csv \
--rule_info data/lists/family/hydra_rule_review.csv \
--rule_base_folder data/rules/hydrauv run tools/analyze_apk.py \
-a data/lists/family/hydra.csv \
-r data/rules/hydra \
-o data/test_results/hydra_analysisuv run tools/copy_rule_to_quark_rules.py \
--rule_list data/lists/family/hydra_rule_list.csv \
--rule_base_folder data/rules/hydra \
--quark_rule_folder ../quark-rules \
--start_index 1# From MalwareBazaar + AndroZoo
uv run tools/collect_apk_by_family.py \
-a data/lists/family/hydra.csv \
-f hydra \
-o data/apks/hydrauv run tools/prefilter_rules.py \
--apk-csv data/lists/family/hydra.csv \
--rules-dir data/rules/hydra \
--sample-count 3uv run tools/generate_rule_description.py \
--rule-folder data/rules/hydraCopy .env.template and fill in your values:
cp .env.template .env| Variable | Default | Description |
|---|---|---|
ANDROZOO_API_KEY |
— | AndroZoo API key (required) |
MALWAREBAZAAR_API_KEY |
— | MalwareBazaar API key (required) |
OPENAI_API_KEY |
— | OpenAI API key (optional, for AI descriptions) |
TRAIN_SAMPLE_COUNT |
10 |
Max APK samples for training (0 = all) |
MAX_RULES_FOR_TRAINING |
0 |
Hard cap on rules entering training (recommended: 200 for <4GB RAM) |
GENERATE_RULES_MAX_APIS |
200 |
Max API combinations for rule generation |
GENERATE_RULES_CPUS |
auto | Ray worker count for rule generation |
GENERATE_RULES_OBJECT_STORE_MB |
4096 |
Ray object store memory limit |
MIN_SAMPLES |
10 |
Minimum APK samples required to proceed |
MAX_APK_DOWNLOAD |
100 |
Max APKs to download per family |
BENIGN_APK_LIST |
— | CSV of benign APK SHA256s (required for training) |
QUARK_RULES_FOLDER |
— | Path to quark-rules repo (leave empty to skip copy) |
ANALYSIS_PYTHON |
uv run |
Python interpreter for analysis subprocesses |
TRAIN_TIMEOUT |
7200 |
Training subprocess timeout in seconds |
PREFILTER_SAMPLE_COUNT |
3 |
Samples used for pre-filtering rules |
AUTO_COVERAGE_CHECK |
false |
Run coverage check after training |
For machines with limited RAM (< 4GB):
MAX_RULES_FOR_TRAINING=200 # Cap rules before training
TRAIN_SAMPLE_COUNT=5 # Limit APK samples
GENERATE_RULES_OBJECT_STORE_MB=2048 # Reduce Ray memoryquark-rule-generate/
├── web/
│ ├── app.py # FastAPI backend (API + pipeline orchestration)
│ └── static/
│ └── index.html # Single-page web dashboard
├── tools/ # CLI tools for each pipeline stage
│ ├── generate_rules.py # Rule generation (Ray + Quark-Engine)
│ ├── iterative_train.py # Iterative weight training (PyTorch)
│ ├── adjust_rule_score.py # Interactive training (PyTorch + MLflow)
│ ├── apply_rule_info.py # Apply scores to rule JSONs
│ ├── analyze_apk.py # APK analysis (Ray)
│ ├── prefilter_rules.py # Pre-filter zero-signal rules
│ ├── copy_rule_to_quark_rules.py # Export to quark-rules repo
│ ├── collect_apk_by_family.py # APK download
│ ├── generate_rule_description.py # AI-powered descriptions
│ ├── search_family_report.py # Threat intelligence reports
│ └── ...
├── data_preprocess/ # Core analysis libraries
│ ├── analysis_result.py # Quark analysis with caching
│ ├── dataset.py # PyTorch dataset (ApkDataset)
│ ├── apk.py # APK download & management
│ └── rule.py # Rule file utilities
├── model/
│ └── __init__.py # RuleAdjustmentModel (PyTorch)
├── data/ # Runtime data (gitignored)
│ ├── apks/ # Downloaded APK files
│ ├── rules/ # Generated rule JSONs
│ ├── predictions/ # Training prediction CSVs
│ └── lists/family/ # Sample lists per family
├── docs/ # Documentation assets
├── .env.template # Environment variable template
├── install.sh # One-click Ubuntu installer
├── start.sh # Web server start script
├── pyproject.toml # Python project config (uv)
└── uv.lock # Dependency lock file
| Project | Description |
|---|---|
| Quark-Engine | Android malware analysis engine |
| quark-rules | Community detection rule repository |
This project is part of the Quark-Engine ecosystem.