Sparse ancestry-adjusted GRM builder (FastSparseGRM / Lin-Dey)

## Goal

Make sparse ancestry-adjusted GRM a first-class stage in the STAAR pipeline. Default path uses FastSparseGRM-style construction (Lin/Dey 2024). Auto-invoked by `favor staar` with cache + skip conditions, not a mandatory pre-step users must remember.

Reference: Lin X, Dey R, Li X, Li Z. Scalable analysis of large multi-ancestry biobanks by leveraging sparse ancestry-adjusted sample-relatedness. Res Sq Preprint 2024. doi:10.21203/rs.3.rs-5343361/v1. PMC11601839.
Software: rounakdey/FastSparseGRM (R).

## What FastSparseGRM Produces

Three artifacts feed the existing null-model fit (GMMAT glmmkin port). Not a replacement solver.

| artifact | consumed by | purpose |
|---|---|---|
| sparse K (block-diagonal) | `--kinship` | random-effect covariance |
| PCs (n x k) | phenotype covariates | fixed-effect ancestry adjustment |
| variant subset used | run manifest | provenance |

Algorithm:
1. LD-prune common variants
2. Compute genetic PCs (hdpca, bias-corrected)
3. Regress each SNP on top-k PCs, residualize
4. Pairwise kinship on residualized genotypes
5. Threshold at cutoff (default 0.022) -> block-sparse K

## Pipeline Integration

New stage `EnsureGrm`, parallel to existing `EnsureStore` / `EnsureScoreCache`:

```
stage            auto-invoked?   skip if?                              cache key
EnsureStore      yes             cohort manifest exists                 VCF + annotation hash
EnsureGrm        yes             --kinship provided                     (store hash, grm config) hash
                                 OR relatedness probe returns none
                                 OR cache hit
FitNullModel     yes             null cache hit                         (pheno, covar, K) hash
EnsureScoreCache yes             score cache exists                     (mask, MAF, store) hash
```

- `favor staar` runs `EnsureGrm` automatically when `--kinship` is not supplied
- fast pre-probe on a subset of samples to detect any relatedness; if none, skip GRM build and fall through to Glm/Logistic null
- cached by input hash; rerun is free
- `--dry-run` surfaces the stage
- `--format json` emits the decision (built / cache-hit / skipped-unrelated / provided-by-user)

Also expose standalone `favor grm` subcommand for pre-building, scripting, or inspection. Same builder, same cache.

## Correctness Default

Auto-invoke matters: a user with a related multi-ancestry cohort who forgets the GRM gets inflated type-I error silently. Pre-probe + cache means the default path is correct and cheap on reruns.

## Skip Conditions (explicit)

- `--kinship <path>` supplied: trust the user, skip build
- relatedness probe on random sample subset shows zero pairs above kinship cutoff: skip build, log decision
- cache hit on (store hash, grm config): reuse, skip build

## Needs

- `favor grm` subcommand that produces K + PCs + manifest
- `EnsureGrm` stage wired into `src/staar/pipeline.rs` with the same `run.json` + cache discipline as other stages
- relatedness probe (cheap pairwise kinship on a sample subset) to drive the skip decision
- machine-readable output honoring `--format json`
- docs: when FastSparseGRM helps (related + mixed ancestry), when it is skipped (unrelated, single ancestry, user-provided K)

## Out Of Scope

- replacing the null solver. `glmmkin` port stays.
- rebuilding PCA from scratch. Reuse hdpca ideas or existing Rust/faer PCA; no need to port the R code verbatim.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Sparse ancestry-adjusted GRM builder (FastSparseGRM / Lin-Dey) #99

Goal

What FastSparseGRM Produces

Pipeline Integration

Correctness Default

Skip Conditions (explicit)

Needs

Out Of Scope

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

artifact	consumed by	purpose
sparse K (block-diagonal)	`--kinship`	random-effect covariance
PCs (n x k)	phenotype covariates	fixed-effect ancestry adjustment
variant subset used	run manifest	provenance

Sparse ancestry-adjusted GRM builder (FastSparseGRM / Lin-Dey) #99

Description

Goal

What FastSparseGRM Produces

Pipeline Integration

Correctness Default

Skip Conditions (explicit)

Needs

Out Of Scope

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions