Skip to content

[7.x] Enhanced compact output for taint issues: include chain summary #11795

@alies-dev

Description

@alies-dev

Problems

1. Compact format redundantly prefixes every line with ERROR

The severity prefix carries no information when the issue type already encodes it. TaintedSql is always an error; PossiblyInvalidArgument is self-evidently not info. Removing ERROR from CompactReport::create() makes the output shorter and no less informative. Non-error findings could retain their severity prefix (INFO, WARNING) since those are not the default and carry real signal.

2. Taint findings expose no chain — making triage impossible without re-running Psalm

Current compact output for taint issues:

app/Http/Controllers/Auth/LoginController.php:45:35 TaintedHeader: Detected tainted header
app/Http/Controllers/Admin/ReportController.php:94:23 TaintedSql: Detected tainted SQL

To determine if a finding is a real vulnerability or a false positive, you must either re-run without --output-format to get the full default trace, or open each flagged file and manually trace the data flow.

Proposed changes

1. Drop ERROR prefix from compact format

// Before
ERROR app/Http/Controllers/Auth/LoginController.php:45:35 TaintedHeader: Detected tainted header

// After
app/Http/Controllers/Auth/LoginController.php:45:35 TaintedHeader: Detected tainted header

2. Add source→sink chain as a second line for taint findings

When taint_trace !== null, emit a second indented line showing the data flow in actual PHP expressions. Non-taint findings remain single-line.

Line 1 — location and type:

{file}:{line}:{col} {TaintType} [{N}|direct]

Line 2 — source→sink chain (app-code only, stubs stripped):

  {source_expr}[@{line}|@[{OtherFile.php}:{line}]] → [{$var} →]* {SinkClass}::{method}()@{line}

Examples

Direct — classifiable immediately, no file access needed:

app/Http/Controllers/Auth/LoginController.php:45:35 TaintedHeader [direct]
  $request->input('redirect_url')@45 → Redirector::to()@45

Via local variable — assignment obvious from name:

app/Http/Controllers/PaymentController.php:34:35 TaintedHeader [2]
  $request->input('redirect_url')@28 → $nextStepUrl → Redirector::to()@34

Cross-function, same file — line jump implies the call:

app/Http/Controllers/Admin/NotificationController.php:133:24 TaintedCallable [2]
  $request->input('notification')@104 → $notificationFqcn → new $notificationFqcn@133

SQL column injection — stub hops (explode, array-destructuring) stripped:

app/Http/Controllers/Admin/ArticleController.php:94:23 TaintedSql [3]
  $request->input('sort_by')@82 → $sortBy → $sortByColumn → Builder::orderBy()@94

Cross-file — source in a different controller, visible without grep:

app/Services/ReferralPerformanceReportBuilder.php:94:23 TaintedSql [3]
  $request->input('sort_by')@[AdminReferralController.php:50] → $this->sortBy → Builder::orderBy()@94

DB-stored source — false positive detectable without opening any file:

app/Notifications/WelcomeCoach.php:34:18 TaintedHeader [2]
  $member->email → MailMessage::cc()@34

SSRF:

app/Http/Controllers/Admin/PreviewController.php:21:30 TaintedSSRF [direct]
  $request->input('url')@17 → Http::get()@21

Chain design decisions

Use actual PHP expressions — no invented notation

The taint_trace[1]->snippet already contains the real PHP call that introduces taint (e.g. $request->input('sort_by')). This is used directly — no abbreviation or taxonomy layer needed. The PHP expression itself tells you everything:

  • $request->input(...) → live request parameter, investigate immediately
  • $member->email → model attribute loaded from DB, likely false positive
  • session(...) → session-stored value, context-dependent

Source expression and @line

  • Extracted from taint_trace[1]->snippet (first app-code node), not from taint_trace[0]->label (the unlocated stub)
  • @line included when source is in the same file as the sink: $request->input('sort_by')@82
  • @[OtherFile.php:line] when source is in a different file: $request->input('sort_by')@[AdminReferralController.php:50]
  • @line omitted when source location is unknown (e.g. model attribute with no static call site)

Intermediate nodes — variable names only, no line numbers

Line numbers on intermediate hops add noise without triage value. The variable name is the signal: $sortBy → $sortByColumn shows simple manipulation; $validated or $allowedValue in the chain flags a possible runtime escape that Psalm couldn't model. Only source and sink carry @line.

Cross-file marker

When taint crosses a file boundary, prefix the first node in the new file with [ShortFileName.php]. Subsequent nodes in the same file need no prefix.

Sink shown as ShortClass::method()@line

Not just the tainted variable name. Builder::orderBy() vs Builder::whereRaw() have different severity; the method name makes this visible without reading code.

Stripping rules

Strip from the chain:

  • Nodes where file_path contains /vendor/ — eliminates enormous framework/queue serialization chains
  • Nodes where line_from === 0 — stubs (same logic SarifReport already uses)
  • Psalm-synthetic internal labels: variable-use, arrayvalue-fetch, coalesce, concat — graph plumbing with no PHP equivalent

Hop count

Count of displayed app-code segments minus 1. direct when the chain collapses to source → sink with nothing between.

Why this matters

Token efficiency for AI-assisted triage. Full JSON output for a 7-step taint chain is ~800 tokens. The two-line format above is ~35 tokens — over 20× smaller. For codebases with 30–200 taint findings, this is the difference between fitting a full security scan in a context window or not.

False positive resolution without file access.

Chain signal Classification
$request->input(...) + [direct] Real — investigate immediately
$request->input(...) + [N] + no validation-named var Real — verify chain
$request->input(...) + [N] + $validated/$allowed* in chain Possible false positive
$member->prop or $model->attr as source Likely false positive — DB-stored

Token comparison

Format Tokens/finding Triage without opening files
Current compact ~20 ~10%
This two-line format ~35 ~90%
Full JSON ~800 100%

Implementation notes

Change 1 (CompactReport::create()): remove the $severity . ' ' prefix for REPORT_ERROR findings. Retain severity prefix only for non-error findings.

Change 2 (CompactReport::create()): when $issue_data->taint_trace !== null:

  1. Source: extract call expression from taint_trace[1]->snippet; append @line or @[File.php:line] depending on whether source file matches sink file
  2. Intermediates: iterate taint_trace, skip vendor/stub/synthetic nodes, collect $variableName; inject [ShortFile.php] prefix on file boundary transitions
  3. Sink: short class name + ::method() from last trace step label + @line
  4. Hop count: count of displayed nodes minus 1; emit direct when count is 1
  5. Emit two lines: {file}:{line}:{col} {TaintType} [{N}|direct]\n {chain}\n

Future: entry_path_type for reliable source classification

Accurate $request->input() vs $member->email distinction via pattern-matching on the snippet is sufficient for most cases but fragile. Populating entry_path_type on the taint source node would enable a reliable taxonomy (request_input, db_attribute, session, http_response). This is often unresolvable statically for framework-heavy stacks where the full call goes index.php → Kernel → Router → Controller through vendor paths. Left as a future enhancement.

Related

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions