Skip to content

ev-flow/quark-rule-generate

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

67 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

ChatGPT Image 2026年3月20日 下午04_35_15

Android Malware Family Analysis · Rule Generation · Weight Optimization

Web DashboardQuick StartHow It WorksCLI ToolsConfiguration


🖥 Web Dashboard

Start the server with ./start.sh and open http://localhost:9527.

Auto Pipeline

Enter a malware family name, click start — the system handles everything from sample collection to rule deployment. Real-time progress, log streaming, and multi-family queue management.

截圖 2026-03-20 下午4 14 10

Rule Review

Click Review Rules on any completed family to inspect generated rules with their machine-optimized scores and 5-stage detection analysis.

截圖 2026-03-20 下午4 19 27

Threat Intelligence Report

After downloading samples, the system queries OpenAI to generate a threat intelligence report for the malware family — including a summary of known behaviors, severity ratings, and associated Android API functions. The report feeds directly into rule generation to improve detection coverage.

  • API endpoint: GET /api/auto-pipeline/families/{family}/report
  • Refresh: POST /api/auto-pipeline/families/{family}/report/refresh
  • Output: data/reports/{family}_threat_intel.json
  • Requires OPENAI_API_KEY (optional — pipeline continues without it)
截圖 2026-03-20 下午4 14 32

What is Quark Rule Generate?

Quark Rule Generate is an end-to-end pipeline for analyzing Android malware families and automatically generating optimized detection rules for Quark-Engine.

Given a malware family name (e.g. hydra, cerberus, anubis), it:

  1. Searches MalwareBazaar for known samples of the family
  2. Downloads APK files from AndroZoo
  3. Searches threat intelligence reports to identify known malicious behaviors and associated Android APIs
  4. Generates Quark-Engine detection rules via static analysis — guided by threat intel when available
  5. Trains a PyTorch neural network to optimize rule confidence weights per family
  6. Deploys optimized, weighted rules to your quark-rules repository

The core value: instead of manually crafting rules and guessing weights, the system learns per-family detection patterns and produces rules whose score fields are machine-optimized.


✨ Features

Family Analysis & Rule Optimization

  • Per-Family Pipeline — Enter a malware family name, the system automatically collects samples, generates rules, and trains optimal weights
  • Iterative Weight Training — PyTorch neural network auto-prunes low-signal rules, rounds weights, and retries until 100% detection accuracy
  • Score Optimization — Each rule's score field is machine-learned to reflect its actual detection power against the target family
  • Threat Intelligence Reports — AI-powered analysis of known family behaviors, severity ratings, and Android API associations — fed into rule generation for better coverage
  • Pre-filtering — Automatically removes zero-signal rules before training to save time and memory
  • OOM Protection — Configurable rule caps (MAX_RULES_FOR_TRAINING) and sample limits for memory-constrained environments

Web Dashboard

  • Real-Time Monitoring — Live progress bars, log streaming, pipeline status per family
  • Rule Review UI — Inspect generated rules with optimized scores, API call sequences, and 5-stage analysis breakdown
  • Multi-Family Queue — Queue multiple families (e.g. hydra, cerberus, anubis) for sequential processing
  • Restart from Any Stage — Re-run threat intel search, rule generation, or weight training independently
  • Settings UI — Configure all parameters without editing .env manually
  • Bilingual UI — Chinese / English toggle

Extras

  • AI Descriptions — Optional GPT-powered rule description generation
  • quark-rules Integration — Automatically copies optimized rules to the quark-rules repo with proper indexing

🚀 Quick Start

Prerequisites

Tool Purpose
Git Clone repositories
Python 3.13+ Runtime (auto-installed via uv)
uv Python package manager

One-Line Install (Ubuntu)

git clone https://github.com/ev-flow/quark-rule-generate.git
cd quark-rule-generate
chmod +x install.sh && ./install.sh

The install script handles everything: system dependencies, uv, Python 3.13, quark-engine, and all Python packages.

Manual Install (macOS / other)

# 1. Install uv
curl -LsSf https://astral.sh/uv/install.sh | sh

# 2. Clone this repo
git clone https://github.com/ev-flow/quark-rule-generate.git
cd quark-rule-generate

# 3. Clone quark-engine (sibling directory)
git clone https://github.com/quark-engine/quark-engine.git ../quark-engine -b for_rule_adjust

# 4. Install Python dependencies
uv sync

# 5. Install quark-engine into the virtual environment
uv pip install -e ../quark-engine

# 6. Setup environment
cp .env.template .env
# Edit .env and fill in your API keys

Get API Keys

Key Source Required
ANDROZOO_API_KEY AndroZoo ✅ For APK downloads
MALWAREBAZAAR_API_KEY MalwareBazaar ✅ For sample search
OPENAI_API_KEY OpenAI Optional (AI descriptions)
VIRUS_TOTAL_API_KEY VirusTotal Optional (label lookup)

Start the Web UI

./start.sh
# → Open http://localhost:9527

🔧 How It Works

Pipeline Architecture

ChatGPT Image 2026年3月20日 下午04_46_09

Key insight: Rules are generated from actual malware family behaviors, then each rule's score is optimized by training a neural network on malicious vs. benign samples — resulting in detection rules that are both family-specific and weight-optimized.

Stage Details

Stage 1: Search & Download

  • Queries MalwareBazaar for SHA256 hashes matching the family signature
  • Downloads APK samples from AndroZoo (requires API key)
  • Filters out corrupted or unavailable samples
  • Output: data/apks/{family}/ + data/lists/family/{family}.csv

Stage 2: Rule Generation

  • Uses Quark-Engine static analysis + Ray distributed computing
  • Analyzes APK bytecode to extract API call sequences
  • Generates detection rules as JSON files (one per suspicious behavior)
  • Deduplicates against existing quark-rules repository
  • Output: data/rules/{family}/*.json

Stage 3: Family Analysis & Weight Optimization (Training)

  • Runs Quark-Engine analysis on each rule × each APK sample (both malicious family samples and benign samples)
  • Evaluates the 5-stage detection confidence: permission check → API class → API method → descriptor → data flow
  • Builds a PyTorch neural network (RuleAdjustmentModel) that learns optimal score for each rule
  • Iterative pruning: removes bottom-percentile rules each iteration — keeping only the most discriminative rules for the family
  • Weight rounding: ensures scores are human-readable integers
  • Targets 100% detection accuracy before stopping
  • Output: data/predictions/{family}_prediction.csv

Stage 4: Score Application & Export

  • Writes optimized score field back to each rule JSON
  • Optional: generates AI-powered rule descriptions via OpenAI
  • Optional: copies rules to your quark-rules repo with auto-numbered filenames

Quark Rule JSON Format

Each generated rule is a JSON file describing a suspicious Android behavior:

{
  "crime": "Capture and transmit SMS messages to remote server",
  "permission": [
    "android.permission.RECEIVE_SMS",
    "android.permission.INTERNET"
  ],
  "api": [
    {
      "class": "Landroid/telephony/SmsMessage",
      "method": "getMessageBody",
      "descriptor": "()Ljava/lang/String;"
    },
    {
      "class": "Ljava/net/HttpURLConnection",
      "method": "getOutputStream",
      "descriptor": "()Ljava/io/OutputStream;"
    }
  ],
  "score": 4,
  "label": ["Spyware", "SMS Stealer"]
}


🛠 CLI Tools

All tools can be run independently via uv run:

Generate Rules

uv run tools/generate_rules.py \
  -a data/lists/family/hydra.csv \
  -w data/generated_rules/hydra \
  -o data/rules/hydra

Train Weights (Iterative)

uv run tools/iterative_train.py \
  --target-family hydra \
  -a data/lists/family/hydra.csv \
  -r data/rules/hydra \
  -o data/predictions/hydra_prediction.csv \
  --epochs 200 --lrs 0.1,0.05,0.01 \
  --max-iterations 10 --prune-percentile 50

Train Weights (Interactive with MLflow)

uv run tools/adjust_rule_score.py \
  --target-family hydra \
  --rule-folder data/rules/hydra \
  --apk-list data/lists/family/hydra.csv \
  --lrs 0.1,0.05,0.01 --epochs 100

Apply Scores to Rule JSONs

uv run tools/apply_rule_info.py \
  --apk_prediction data/predictions/hydra_prediction.csv \
  --rule_info data/lists/family/hydra_rule_review.csv \
  --rule_base_folder data/rules/hydra

Analyze APKs Against Rules

uv run tools/analyze_apk.py \
  -a data/lists/family/hydra.csv \
  -r data/rules/hydra \
  -o data/test_results/hydra_analysis

Copy Rules to quark-rules Repo

uv run tools/copy_rule_to_quark_rules.py \
  --rule_list data/lists/family/hydra_rule_list.csv \
  --rule_base_folder data/rules/hydra \
  --quark_rule_folder ../quark-rules \
  --start_index 1

Download APKs

# From MalwareBazaar + AndroZoo
uv run tools/collect_apk_by_family.py \
  -a data/lists/family/hydra.csv \
  -f hydra \
  -o data/apks/hydra

Pre-filter Rules

uv run tools/prefilter_rules.py \
  --apk-csv data/lists/family/hydra.csv \
  --rules-dir data/rules/hydra \
  --sample-count 3

Generate AI Descriptions

uv run tools/generate_rule_description.py \
  --rule-folder data/rules/hydra

⚙ Configuration

Environment Variables (.env)

Copy .env.template and fill in your values:

cp .env.template .env
Variable Default Description
ANDROZOO_API_KEY AndroZoo API key (required)
MALWAREBAZAAR_API_KEY MalwareBazaar API key (required)
OPENAI_API_KEY OpenAI API key (optional, for AI descriptions)
TRAIN_SAMPLE_COUNT 10 Max APK samples for training (0 = all)
MAX_RULES_FOR_TRAINING 0 Hard cap on rules entering training (recommended: 200 for <4GB RAM)
GENERATE_RULES_MAX_APIS 200 Max API combinations for rule generation
GENERATE_RULES_CPUS auto Ray worker count for rule generation
GENERATE_RULES_OBJECT_STORE_MB 4096 Ray object store memory limit
MIN_SAMPLES 10 Minimum APK samples required to proceed
MAX_APK_DOWNLOAD 100 Max APKs to download per family
BENIGN_APK_LIST CSV of benign APK SHA256s (required for training)
QUARK_RULES_FOLDER Path to quark-rules repo (leave empty to skip copy)
ANALYSIS_PYTHON uv run Python interpreter for analysis subprocesses
TRAIN_TIMEOUT 7200 Training subprocess timeout in seconds
PREFILTER_SAMPLE_COUNT 3 Samples used for pre-filtering rules
AUTO_COVERAGE_CHECK false Run coverage check after training

Memory Optimization

For machines with limited RAM (< 4GB):

MAX_RULES_FOR_TRAINING=200    # Cap rules before training
TRAIN_SAMPLE_COUNT=5          # Limit APK samples
GENERATE_RULES_OBJECT_STORE_MB=2048  # Reduce Ray memory

📁 Project Structure

quark-rule-generate/
├── web/
│   ├── app.py                 # FastAPI backend (API + pipeline orchestration)
│   └── static/
│       └── index.html         # Single-page web dashboard
├── tools/                     # CLI tools for each pipeline stage
│   ├── generate_rules.py      # Rule generation (Ray + Quark-Engine)
│   ├── iterative_train.py     # Iterative weight training (PyTorch)
│   ├── adjust_rule_score.py   # Interactive training (PyTorch + MLflow)
│   ├── apply_rule_info.py     # Apply scores to rule JSONs
│   ├── analyze_apk.py         # APK analysis (Ray)
│   ├── prefilter_rules.py     # Pre-filter zero-signal rules
│   ├── copy_rule_to_quark_rules.py  # Export to quark-rules repo
│   ├── collect_apk_by_family.py     # APK download
│   ├── generate_rule_description.py # AI-powered descriptions
│   ├── search_family_report.py      # Threat intelligence reports
│   └── ...
├── data_preprocess/           # Core analysis libraries
│   ├── analysis_result.py     # Quark analysis with caching
│   ├── dataset.py             # PyTorch dataset (ApkDataset)
│   ├── apk.py                 # APK download & management
│   └── rule.py                # Rule file utilities
├── model/
│   └── __init__.py            # RuleAdjustmentModel (PyTorch)
├── data/                      # Runtime data (gitignored)
│   ├── apks/                  # Downloaded APK files
│   ├── rules/                 # Generated rule JSONs
│   ├── predictions/           # Training prediction CSVs
│   └── lists/family/          # Sample lists per family
├── docs/                      # Documentation assets
├── .env.template              # Environment variable template
├── install.sh                 # One-click Ubuntu installer
├── start.sh                   # Web server start script
├── pyproject.toml             # Python project config (uv)
└── uv.lock                    # Dependency lock file

🔗 Related Projects

Project Description
Quark-Engine Android malware analysis engine
quark-rules Community detection rule repository

📄 License

This project is part of the Quark-Engine ecosystem.

About

Generating Detection Rules for Quark Engine !

Resources

Stars

Watchers

Forks

Packages

 
 
 

Contributors