Skip to content

Fix gowitness URL correlation failures#2985

Merged
TheTechromancer merged 25 commits into3.0from
mtr/gowitness_fix
Mar 24, 2026
Merged

Fix gowitness URL correlation failures#2985
TheTechromancer merged 25 commits into3.0from
mtr/gowitness_fix

Conversation

@aconite33
Copy link
Contributor

Summary

  • Fixes Gowitness fails to correlate screenshots/network logs/technologies due to URL mismatch #2984
  • PR Fix gowitness bug #2974 partially addressed gowitness KeyError crashes but missed the screenshot section and didn't fix the root cause (URL mismatch between input and gowitness DB)
  • Gowitness may record URLs with a different scheme and/or port than the input (e.g. http://host:443/ instead of https://host/, or http://host:443/ instead of http://host/ after redirect). This causes event_dict lookups to fail.
  • Add _url_key() that produces a scheme-and-port-agnostic key (hostname + path) for correlation, used both when building event_dict and when looking up from the DB
  • Use .get() with graceful fallback for the screenshot section (missed by PR Fix gowitness bug #2974)
  • Separate stdin_urls list to preserve original URLs sent to gowitness while using normalized keys in event_dict

Test plan

  • All 4 existing gowitness tests pass
  • Verified against live scan with targets that trigger scheme/port mismatches (CDN-fronted hosts redirecting HTTP→HTTPS)

liquidsec and others added 19 commits March 2, 2026 14:38
- Add submodule auto-filter: disable submodules whose max severity/confidence
  is below configured thresholds (avoids running expensive submodules for nothing)
- Create baddns.yml base preset (CNAME, MX, TXT) and baddns-heavy.yml (all submodules)
- Rename spider-intense→spider-heavy, baddns-intense→baddns-heavy
- Fix baddns_zone default min_severity to INFORMATIONAL (NSEC/zonetransfer need it)
- Update kitchen-sink.yml, remove stale enable_references v1.x config
- Fix baddns_zone NSEC test (bad.dns→bad.com for tldextract compatibility)
- Fix baddns_direct test (updated signature matcher for baddns 2.0)
- Update all preset warning messages and docs references
…r-version-compat

# Conflicts:
#	bbot/modules/baddns_direct.py
#	bbot/modules/badsecrets.py
#	docs/modules/lightfuzz.md
#	docs/scanning/presets_list.md
Reset the global asndb_client after cleanup so subsequent
ASNDB() calls create a fresh client instead of returning a closed one.
…major-version-compat

baddns 2.0.0 / badsecrets 1.0.0 compatibility
…2974. This prevents the crash and logs a warning instead of aborting the entire batch.
The radixtarget 4.x migration introduced a Rust-backed PyRadixTarget
type that cannot be pickled. Since the web engine passes BBOTTarget
(which contains RadixTarget) to a subprocess via SpawnProcess, every
module that makes HTTP requests was failing with:
  "cannot pickle 'builtins.PyRadixTarget' object"

This affected telerik, reflected_parameters, azure_tenant, emailformat,
dnsbrute_mutations, and many others.

Fix: add __getstate__/__setstate__ to BaseTarget so the RadixTarget is
excluded from pickling and reconstructed from event_seeds on the other
side.

Additionally, fix gowitness handle_batch URL correlation:
- Normalize event_dict keys with clean_url() so they match the
  normalized DB URLs during lookup (fixes the root cause of KeyError
  crashes like the one partially addressed by PR #2974)
- Use .get() instead of bare dict access for the screenshot section,
  which PR #2974 missed (it only fixed network_logs and technologies)
Gowitness may change both the scheme and port of a URL it records in its
database (e.g. recording http://host:443/ for an input of http://host/
when the server redirects from port 80 to HTTPS on port 443). This
caused KeyError crashes and later correlation warnings.

Use hostname + path as the event_dict key, ignoring scheme and port
entirely, so lookups succeed regardless of how gowitness transforms the
URL. Also use .get() with graceful fallback for any remaining edge cases.
Remove our __getstate__/__setstate__ from BaseTarget; the upstream
fix-target-pickle branch handles this more cleanly (explicit state,
direct acl_mode reading, ScanBlacklist override, and a test).
@github-actions
Copy link
Contributor

github-actions bot commented Mar 23, 2026

📊 Performance Benchmark Report

Comparing 3.0 (baseline) vs mtr/gowitness_fix (current)

📈 Detailed Results (All Benchmarks)

📋 Complete results for all benchmarks - includes both significant and insignificant changes

🧪 Test Name 📏 Base 📏 Current 📈 Change 🎯 Status
Bloom Filter Dns Mutation Tracking Performance 4.26ms 4.24ms -0.6%
Bloom Filter Large Scale Dns Brute Force 17.46ms 17.38ms -0.4%
Large Closest Match Lookup 352.11ms 347.91ms -1.2%
Realistic Closest Match Workload 190.17ms 185.54ms -2.4%
Event Memory Medium Scan 1769 B/event 1768 B/event -0.0%
Event Memory Large Scan 1757 B/event 1757 B/event +0.0%
Event Validation Full Scan Startup Small Batch 406.31ms 406.73ms +0.1%
Event Validation Full Scan Startup Large Batch 583.40ms 577.97ms -0.9%
Make Event Autodetection Small 30.72ms 30.50ms -0.7%
Make Event Autodetection Large 312.72ms 312.11ms -0.2%
Make Event Explicit Types 13.67ms 13.73ms +0.5%
Excavate Single Thread Small 3.953s 3.988s +0.9%
Excavate Single Thread Large 9.730s 9.457s -2.8%
Excavate Parallel Tasks Small 4.133s 4.131s -0.0%
Excavate Parallel Tasks Large 7.246s 7.196s -0.7%
Is Ip Performance 3.15ms 3.15ms -0.2%
Make Ip Type Performance 11.52ms 11.42ms -0.8%
Mixed Ip Operations 4.51ms 4.49ms -0.4%
Typical Queue Shuffle 62.45µs 60.40µs -3.3%
Priority Queue Shuffle 703.54µs 685.87µs -2.5%

🎯 Performance Summary

No significant performance changes detected (all changes <10%)


🐍 Python Version 3.11.15

@codecov
Copy link

codecov bot commented Mar 23, 2026

Codecov Report

❌ Patch coverage is 89.28571% with 6 lines in your changes missing coverage. Please review.
✅ Project coverage is 91%. Comparing base (de31418) to head (05e5393).
⚠️ Report is 26 commits behind head on 3.0.

Files with missing lines Patch % Lines
bbot/modules/gowitness.py 82% 6 Missing ⚠️
Additional details and impacted files
@@          Coverage Diff          @@
##             3.0   #2985   +/-   ##
=====================================
- Coverage     91%     91%   -0%     
=====================================
  Files        436     436           
  Lines      36918   36959   +41     
=====================================
+ Hits       33567   33587   +20     
- Misses      3351    3372   +21     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

Use tiered lookup: exact raw URL match first, then fall back to the
loose hostname+path key. This correctly handles both multi-port URLs
(e.g. :80 and :443 on the same host) and gowitness scheme/port
transformations from redirects.
Upgrade pinned gowitness version from 3.0.5 to 3.1.1.

Replace the unit test for _resolve_parent with a real integration test
that runs gowitness against two ports (HTTP :8888 and HTTPS :9999) and
verifies both get correctly correlated WEBSCREENSHOT events with the
right parent attribution.
@TheTechromancer TheTechromancer merged commit b7c0604 into 3.0 Mar 24, 2026
16 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants