Skip to content

Preset naming standardization / tag cleanup#2986

Open
liquidsec wants to merge 36 commits into3.0from
preset-naming-standardization
Open

Preset naming standardization / tag cleanup#2986
liquidsec wants to merge 36 commits into3.0from
preset-naming-standardization

Conversation

@liquidsec
Copy link
Contributor

@liquidsec liquidsec commented Mar 23, 2026

Closes #2959

Summary

URL Events as Structured Data

  • URL events (URL, URL_UNVERIFIED, URL_HINT) now inherit from DictHostEvent
  • .data is now {"url": "https://..."} instead of a bare string
  • Added .url property on BaseEvent — returns the URL string for any event type, empty string for non-URL events
  • Eliminates fragile patterns like event.data["url"], type-checking like event.data if event.type == "URL" else event.data["url"]
  • Fixed WEB_PARAMETER.sanitize_data to call super() so parsed_url gets set
  • Migrated all modules to use event.url instead of direct data access
  • STORAGE_BUCKET and HTTP_RESPONSE MRO simplified (URL_UNVERIFIED already brings DictHostEvent)

host_metadata

  • New host_metadata field on BaseEvent — a dict keyed by host string (IP or domain)
  • Stores structured per-host data: cloud providers, match type (ip/domain/cname), with room for future fields (ASN, whois, org)
  • Serialized in JSON output, added to pydantic and SQL models
  • Example: {"104.18.26.217": {"cloud_providers": {"cloudflare": {"types": ["waf"], "match": "ip"}}}}

Cloud Tag Cleanup

  • Cloudcheck now populates host_metadata with structured cloud provider info
  • Tags simplified: just provider name (cloudflare, amazon) + type (cloud, cdn, waf)
  • Removed compound tags: cloud-microsoft, microsoft-ip, microsoft-domain, microsoft-cname
  • Updated consumers: portfilter, baddns_direct, asset_inventory, bucket_file_enum, subdomain_enum template

Slot Scoping

  • Moved web-specific slots from BaseEvent to URL_UNVERIFIED: web_spider_distance, parsed_url, url_extension, num_redirects
  • Moved _data_path to DictPathEvent, envelopes to WEB_PARAMETER
  • Reduces memory per non-URL event

Flag Renames

  • aggressive -> loud — "generates a large amount of network traffic"
  • deadly -> invasive — "intrusive or potentially destructive"
  • Restored safe flag — explicitly on every module that is not loud or invasive
  • Test enforces: every scan module must have safe, loud, and/or invasive (safe is mutually exclusive with loud/invasive)
  • Reclassified portscan and iis_shortnames as loud
  • web-basic -> web, web-thorough -> web-heavy

Preset Renames

  • web-basic.yml -> web.yml, web-thorough.yml -> web-heavy.yml
  • spider-intense.yml -> spider-heavy.yml, baddns-intense.yml -> baddns-heavy.yml
  • nuclei-intense.yml -> nuclei-heavy.yml
  • lightfuzz-medium.yml -> lightfuzz.yml, lightfuzz-superheavy.yml -> lightfuzz-max.yml
  • Detailed feature tables in lightfuzz preset descriptions

Deadly Gate -> Console Warnings

  • Removed --allow-deadly CLI argument
  • Added pre-scan console warnings for loud/invasive modules

Tag Cleanup (high-cardinality dynamic tags)

  • Removed ip-{ip} tags — IP data in resolved_hosts attribute
  • Removed http-title-{title} tags — http_title as a proper event attribute
  • Kept bounded tags: status-{code}, distance-{n}, extension-{ext}, {rdtype}-record

Fix: Docker-based test hangs (pre-existing bug)

  • The mongo output module was the only output module missing a cleanup() method — AsyncMongoClient was never closed, leaving 5+ pymongo background tasks (kill_cursors, server_monitor, server_rtt, poll_cancellation) orphaned on the session-scoped event loop
  • The mongo test called client.close() without await — the returned coroutine was silently discarded, so the client was never actually closed
  • Fixed blocking time.sleep() in async context in mongo and elastic tests
  • Fixed all Docker tests (docker stop) to await process completion before returning
  • This was a pre-existing bug, but the flag/preset renames in this branch changed the alphabetical test collection order, putting heavier Docker-based tests (mysql, rabbitmq) right after the leaking mongo test where they'd get blocked by the orphaned tasks

Tests and Docs

  • Updated all test assertions for URL dict data, simplified cloud tags, new flag names
  • Regenerated docs tables, updated README and scanning docs

@liquidsec liquidsec changed the title Preset naming standardization Preset naming standardization / tag cleanup Mar 23, 2026
@github-actions
Copy link
Contributor

github-actions bot commented Mar 23, 2026

Performance Benchmark Report

Failed to generate detailed benchmark comparison

The benchmark comparison failed to run. This might be because:

  • Benchmark tests don't exist on the base branch yet
  • Dependencies are missing
  • Test execution failed

Please check the workflow logs for details.

Benchmark artifacts may be available for download from the workflow run.

source_domain was silently failing on both azure_tenant and oauth
due to __slots__. Adding it properly enables the cross-module
domain context handoff that oauth relies on.
@codecov
Copy link

codecov bot commented Mar 24, 2026

Codecov Report

❌ Patch coverage is 91.94079% with 49 lines in your changes missing coverage. Please review.
✅ Project coverage is 91%. Comparing base (3a6f2ec) to head (79aac59).

Files with missing lines Patch % Lines
bbot/cli.py 0% 12 Missing ⚠️
...test_step_2/module_tests/test_module_cloudcheck.py 73% 6 Missing ⚠️
bbot/core/event/base.py 92% 5 Missing ⚠️
bbot/modules/output/asset_inventory.py 20% 4 Missing ⚠️
bbot/modules/nuclei.py 50% 3 Missing ⚠️
bbot/modules/trufflehog.py 34% 2 Missing ⚠️
bbot/modules/url_manipulation.py 50% 2 Missing ⚠️
bbot/core/event/helpers.py 80% 1 Missing ⚠️
bbot/modules/baddns_direct.py 67% 1 Missing ⚠️
bbot/modules/bypass403.py 67% 1 Missing ⚠️
... and 12 more
Additional details and impacted files
@@          Coverage Diff          @@
##             3.0   #2986   +/-   ##
=====================================
- Coverage     91%     91%   -0%     
=====================================
  Files        436     436           
  Lines      36960   37038   +78     
=====================================
+ Hits       33587   33640   +53     
- Misses      3373    3398   +25     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

Add a .url property on BaseEvent that returns the URL string for any
event type via parsed_url.geturl(). Works uniformly across URL,
URL_UNVERIFIED, HTTP_RESPONSE, FINDING, WEB_PARAMETER, TECHNOLOGY,
STORAGE_BUCKET, etc. Returns empty string for non-URL events.

Also fix WEB_PARAMETER.sanitize_data to call super() so parsed_url
gets set (was silently broken).

Replace all event.data["url"], event.data.get("url"), and
type-checking patterns across 34 files to use event.url instead.
class URL_UNVERIFIED(BaseEvent):
_status_code_regex = re.compile(r"^status-(\d{1,3})$")

__slots__ = ["_http_title"]
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we need to be careful not to override the base class

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

slots with inheritance are additive, not overriding. Shouldn't cause any problems. This may go away anyway though.

Comment on lines +174 to +175
# Cross-module communication
"source_domain",
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what is this for?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's for cross-module communication between azure_tenant and oauth. When azure_tenant discovers a federated login URL, it stamps source_domain on the event so that oauth knows which original domain the URL was discovered from (since the URL itself is on a different domain like login.microsoftonline.com). The oauth module then uses it for scope checks and finding descriptions.

- Rename "noisy" flag to "loud" across all modules, cli, tests, docs
- Restore "safe" flag — explicitly added to every module that isn't
  loud or invasive (130 modules)
- Add test: every scan module must have safe, loud, and/or invasive;
  safe is mutually exclusive with loud/invasive
- Regenerate docs tables
Each preset YAML and doc section now explicitly lists what submodules,
companion modules, POST behavior, WAF handling, and other settings
are enabled, making the progression from light→default→heavy→max clear.
@TheTechromancer
Copy link
Collaborator

One last thing, regarding cloud providers:

  • cloud provider tags currently serve two purposes: 1) trying to capture detail/nuance about cloud provider data 2) easy module filtering and user grepping
  • they've become a bit cluttered, so to fix this, we can add a JSON field host_metadata which contains the details for each resolved host + and its cloud providers (+ room for any future fields like ASN, org, whoami, etc.).
  • this will free us up to clean the tags, so instead of cloud, cloud-microsoft, microsoft-ip, and microsoft-domain, we'd just have cloud and microsoft. the exact ip/domain details can be stored in host_metadata:

The gist of host_metadata is that it's a dict of {host: metadata} where metadata is a dictionary where we can put any current and future information about that specific host.

"host_metadata": {
    "spacex.com": {
      "whois": {
        "org": "SPACEX INDUSTRIES"
      }
    },
    "104.18.26.217": {
      "cloud_providers": ["microsoft", "github"],
      "asns": ["AS1234"],
      "orgs": ["Microsoft"]
    },

…vent

The monkeypatch was setting parsed_url to a file:// URL, which has no
hostname, causing make_ip_type(None) crash in trufflehog. Also harden
DictHostEvent._host() to tolerate host-less parsed_url schemes.
@liquidsec
Copy link
Contributor Author

will close #2771

@liquidsec liquidsec mentioned this pull request Mar 25, 2026
- Add cleanup() to mongo output module (only output module missing one)
- Await client.aclose() instead of unawaited client.close() in mongo test
- Replace blocking time.sleep() with async sleep in mongo and elastic tests
- Await docker stop process completion in mongo, rabbitmq, kafka, elastic, nats tests
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants