-
Notifications
You must be signed in to change notification settings - Fork 31
Description
Hi,
I am running XP-EHH on unphased VCF data and see the same behavior with selscan v2.1.1 and v3.0 (and their corresponding norm):
- Many
xpehhvalues are-nanin the raw output. - After normalization, the
.xpehh.out.normfile looks corrupted: one SNP position is repeated many times andnormxpehhis also-nan.
Command and log (example)
selscan --xpehh --unphased \
--vcf ../XP-EHH/pop1/Chr01A.vcf.gz \
--vcf-ref ../XP-EHH/pop2/Chr01A.vcf.gz \
--pmap \
--max-gap 250000 \
--threads 30 \
--out Chr01ALog excerpt:
selscan v2.1.1
Loading 14 haplotypes and 1031943 loci. Skipped 0 loci
Loading 47 haplotypes and 1031943 loci. Skipped 0 loci
...
Starting XP-EHH calculations.
WARNING: Reached chromosome edge before EHH decayed below 0.05.
--trunc-ok set. Skipping calculation at position 8422 id: .
...
Finished XP-EHH.(I get similar warnings and output patterns with v3.0.)
Raw XP-EHH output (excerpt)
id pos gpos p1 ihh1 p2 ihh2 xpehh
. 8173 0.008173 0.464286 0 0.882979 0 -nan
. 10802 0.010802 0.464286 0 0.861702 0 -nan
. 10809 0.010809 0.428571 0 0.861702 0 -nan
. 10833 0.010833 0.428571 0 0.882979 0 -nan
. 10834 0.010834 0.107143 0.00028772 0.297872 0.000508399 -0.247235
...At many SNPs near the chromosome start, ihh1 = ihh2 = 0 and xpehh = -nan.
Normalized output (excerpt)
id pos gpos p1 ihh1 p2 ihh2 xpehh normxpehh crit
. 8173 0.008173 0.464286 0 0.882979 0 0 -nan 0
. 8173 0.008173 0.464286 0 0.882979 0 0 -nan 0
. 8173 0.008173 0.464286 0 0.882979 0 0 -nan 0
...
(repeated many times with the same position)So after norm, one position (8173) is duplicated many times, xpehh becomes 0 and normxpehh is -nan for all these rows.
Questions
-
Are
xpehh = -nanvalues near chromosome edges (withihh1 = ihh2 = 0and truncation warnings) expected, or do they suggest a problem with my data or parameters (--max-gap, EHH cutoff, MAF, etc.)? -
Is it expected that
normbehaves like this when the input contains manyNaNvalues, or does this indicate a bug or misuse?- Under what conditions would
normoutput one SNP position many times withnormxpehh = -nan? - Should I pre-filter rows with
xpehh= NaN before runningnorm?
- Under what conditions would
-
Are there recommended parameter settings or QC steps (e.g. excluding chromosome ends, adjusting
--max-gapor EHH cutoff, extra filtering for XP-EHH on unphased data) to reduce theseNaNvalues and obtain more stable normalized scores?
I can share a small subset of the VCFs, the map file, and the corresponding .xpehh.out / .xpehh.out.norm files if helpful.
Thank you very much for any guidance.