Add probabilistic evaluation metrics (CRPS, rank histograms) via existing scores dependency

Hi team,

While going through the downstream validation pipeline for the neural-lam 
probabilistic forecasting track (issue mllam/neural-lam#62), I noticed 
that `mllam-verification` currently only covers deterministic 
statistics like `rmse` and `mae`.

`pyproject.toml` already pulls `scores>=1.2.0` which exposes exactly 
what's needed for ensemble evaluation:

- `scores.probability.crps_for_ensemble` — for member-indexed ensemble 
  outputs, which is the format neural-lam's datastore already uses 
  consistently (`ensemble_member` dimension in `weather_dataset.py` 
  and `datastore/base.py`)
- `scores.plotdata.rank_histogram` — for Talagrand diagram evaluation 
  of ensemble calibartion

Two concrete additions that would follow the existing architecture exactly:

1. A `crps()` function in `statistics.py` wrapping 
`scores.probability.crps_for_ensemble` via `compute_pipeline_statistic` 
— same pattern as how `rmse` wraps `scores.continuous.rmse`
2. A `plot_rank_histogram()` in `plot.py` wrapping 
`scores.plotdata.rank_histogram` — following the `plot_single_metric_timeseries` 
structure

No new dependancies needed. Both functions already exist in the 
pinned `scores` version.

If this makes sense I'll go ahead and implement it  otherwise just 
let me know and I'll close the issue.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add probabilistic evaluation metrics (CRPS, rank histograms) via existing scores dependency #3

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Add probabilistic evaluation metrics (CRPS, rank histograms) via existing scores dependency #3

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions