[7.x] Enhanced compact output for taint issues: include chain summary

## Problems

**1. Compact format redundantly prefixes every line with `ERROR`**

The severity prefix carries no information when the issue type already encodes it. `TaintedSql` is always an error; `PossiblyInvalidArgument` is self-evidently not info. Removing `ERROR` from `CompactReport::create()` makes the output shorter and no less informative. Non-error findings could retain their severity prefix (`INFO`, `WARNING`) since those are not the default and carry real signal.

**2. Taint findings expose no chain — making triage impossible without re-running Psalm**

Current compact output for taint issues:

```
app/Http/Controllers/Auth/LoginController.php:45:35 TaintedHeader: Detected tainted header
app/Http/Controllers/Admin/ReportController.php:94:23 TaintedSql: Detected tainted SQL
```

To determine if a finding is a real vulnerability or a false positive, you must either re-run without `--output-format` to get the full default trace, or open each flagged file and manually trace the data flow.

## Proposed changes

### 1. Drop `ERROR` prefix from compact format

```
// Before
ERROR app/Http/Controllers/Auth/LoginController.php:45:35 TaintedHeader: Detected tainted header

// After
app/Http/Controllers/Auth/LoginController.php:45:35 TaintedHeader: Detected tainted header
```

### 2. Add source→sink chain as a second line for taint findings

When `taint_trace !== null`, emit a second indented line showing the data flow in actual PHP expressions. Non-taint findings remain single-line.

**Line 1** — location and type:
```
{file}:{line}:{col} {TaintType} [{N}|direct]
```

**Line 2** — source→sink chain (app-code only, stubs stripped):
```
  {source_expr}[@{line}|@[{OtherFile.php}:{line}]] → [{$var} →]* {SinkClass}::{method}()@{line}
```

## Examples

**Direct — classifiable immediately, no file access needed:**
```
app/Http/Controllers/Auth/LoginController.php:45:35 TaintedHeader [direct]
  $request->input('redirect_url')@45 → Redirector::to()@45
```

**Via local variable — assignment obvious from name:**
```
app/Http/Controllers/PaymentController.php:34:35 TaintedHeader [2]
  $request->input('redirect_url')@28 → $nextStepUrl → Redirector::to()@34
```

**Cross-function, same file — line jump implies the call:**
```
app/Http/Controllers/Admin/NotificationController.php:133:24 TaintedCallable [2]
  $request->input('notification')@104 → $notificationFqcn → new $notificationFqcn@133
```

**SQL column injection — stub hops (explode, array-destructuring) stripped:**
```
app/Http/Controllers/Admin/ArticleController.php:94:23 TaintedSql [3]
  $request->input('sort_by')@82 → $sortBy → $sortByColumn → Builder::orderBy()@94
```

**Cross-file — source in a different controller, visible without grep:**
```
app/Services/ReferralPerformanceReportBuilder.php:94:23 TaintedSql [3]
  $request->input('sort_by')@[AdminReferralController.php:50] → $this->sortBy → Builder::orderBy()@94
```

**DB-stored source — false positive detectable without opening any file:**
```
app/Notifications/WelcomeCoach.php:34:18 TaintedHeader [2]
  $member->email → MailMessage::cc()@34
```

**SSRF:**
```
app/Http/Controllers/Admin/PreviewController.php:21:30 TaintedSSRF [direct]
  $request->input('url')@17 → Http::get()@21
```

## Chain design decisions

### Use actual PHP expressions — no invented notation

The `taint_trace[1]->snippet` already contains the real PHP call that introduces taint (e.g. `$request->input('sort_by')`). This is used directly — no abbreviation or taxonomy layer needed. The PHP expression itself tells you everything:

- `$request->input(...)` → live request parameter, investigate immediately
- `$member->email` → model attribute loaded from DB, likely false positive
- `session(...)` → session-stored value, context-dependent

### Source expression and `@line`

- Extracted from `taint_trace[1]->snippet` (first app-code node), not from `taint_trace[0]->label` (the unlocated stub)
- `@line` included when source is in the same file as the sink: `$request->input('sort_by')@82`
- `@[OtherFile.php:line]` when source is in a different file: `$request->input('sort_by')@[AdminReferralController.php:50]`
- `@line` omitted when source location is unknown (e.g. model attribute with no static call site)

### Intermediate nodes — variable names only, no line numbers

Line numbers on intermediate hops add noise without triage value. The variable name is the signal: `$sortBy → $sortByColumn` shows simple manipulation; `$validated` or `$allowedValue` in the chain flags a possible runtime escape that Psalm couldn't model. Only source and sink carry `@line`.

### Cross-file marker

When taint crosses a file boundary, prefix the first node in the new file with `[ShortFileName.php]`. Subsequent nodes in the same file need no prefix.

### Sink shown as `ShortClass::method()@line`

Not just the tainted variable name. `Builder::orderBy()` vs `Builder::whereRaw()` have different severity; the method name makes this visible without reading code.

### Stripping rules

Strip from the chain:
- Nodes where `file_path` contains `/vendor/` — eliminates enormous framework/queue serialization chains
- Nodes where `line_from === 0` — stubs (same logic `SarifReport` already uses)
- Psalm-synthetic internal labels: `variable-use`, `arrayvalue-fetch`, `coalesce`, `concat` — graph plumbing with no PHP equivalent

### Hop count

Count of displayed app-code segments minus 1. `direct` when the chain collapses to `source → sink` with nothing between.

## Why this matters

**Token efficiency for AI-assisted triage.** Full JSON output for a 7-step taint chain is ~800 tokens. The two-line format above is ~35 tokens — over 20× smaller. For codebases with 30–200 taint findings, this is the difference between fitting a full security scan in a context window or not.

**False positive resolution without file access.**

| Chain signal | Classification |
|---|---|
| `$request->input(...)` + `[direct]` | Real — investigate immediately |
| `$request->input(...)` + `[N]` + no validation-named var | Real — verify chain |
| `$request->input(...)` + `[N]` + `$validated`/`$allowed*` in chain | Possible false positive |
| `$member->prop` or `$model->attr` as source | Likely false positive — DB-stored |

## Token comparison

| Format | Tokens/finding | Triage without opening files |
|---|---|---|
| Current compact | ~20 | ~10% |
| **This two-line format** | **~35** | **~90%** |
| Full JSON | ~800 | 100% |

## Implementation notes

**Change 1** (`CompactReport::create()`): remove the `$severity . ' '` prefix for `REPORT_ERROR` findings. Retain severity prefix only for non-error findings.

**Change 2** (`CompactReport::create()`): when `$issue_data->taint_trace !== null`:

1. Source: extract call expression from `taint_trace[1]->snippet`; append `@line` or `@[File.php:line]` depending on whether source file matches sink file
2. Intermediates: iterate `taint_trace`, skip vendor/stub/synthetic nodes, collect `$variableName`; inject `[ShortFile.php]` prefix on file boundary transitions
3. Sink: short class name + `::method()` from last trace step label + `@line`
4. Hop count: count of displayed nodes minus 1; emit `direct` when count is 1
5. Emit two lines: `{file}:{line}:{col} {TaintType} [{N}|direct]\n  {chain}\n`

## Future: `entry_path_type` for reliable source classification

Accurate `$request->input()` vs `$member->email` distinction via pattern-matching on the snippet is sufficient for most cases but fragile. Populating `entry_path_type` on the taint source node would enable a reliable taxonomy (`request_input`, `db_attribute`, `session`, `http_response`). This is often unresolvable statically for framework-heavy stacks where the full call goes `index.php → Kernel → Router → Controller` through vendor paths. Left as a future enhancement.

## Related

- `SarifReport` already filters stub trace steps via `$trace->line_from > 0` — same logic applies here
- `JsonReport` emits the full `taint_trace`; this adds a compact projection of the same data
- See #11796 for filtering compact output to taint-only findings

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[7.x] Enhanced compact output for taint issues: include chain summary #11795

Problems

Proposed changes

1. Drop `ERROR` prefix from compact format

2. Add source→sink chain as a second line for taint findings

Examples

Chain design decisions

Use actual PHP expressions — no invented notation

Source expression and `@line`

Intermediate nodes — variable names only, no line numbers

Cross-file marker

Sink shown as `ShortClass::method()@line`

Stripping rules

Hop count

Why this matters

Token comparison

Implementation notes

Future: `entry_path_type` for reliable source classification

Related

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Chain signal	Classification
`$request->input(...)` + `[direct]`	Real — investigate immediately
`$request->input(...)` + `[N]` + no validation-named var	Real — verify chain
`$request->input(...)` + `[N]` + `$validated`/`$allowed*` in chain	Possible false positive
`$member->prop` or `$model->attr` as source	Likely false positive — DB-stored

Format	Tokens/finding	Triage without opening files
Current compact	~20	~10%
This two-line format	~35	~90%
Full JSON	~800	100%

[7.x] Enhanced compact output for taint issues: include chain summary #11795

Description

Problems

Proposed changes

1. Drop ERROR prefix from compact format

2. Add source→sink chain as a second line for taint findings

Examples

Chain design decisions

Use actual PHP expressions — no invented notation

Source expression and @line

Intermediate nodes — variable names only, no line numbers

Cross-file marker

Sink shown as ShortClass::method()@line

Stripping rules

Hop count

Why this matters

Token comparison

Implementation notes

Future: entry_path_type for reliable source classification

Related

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions

1. Drop `ERROR` prefix from compact format

Source expression and `@line`

Sink shown as `ShortClass::method()@line`

Future: `entry_path_type` for reliable source classification