feat: add MetOp AMSU-A and AVHRR Level 1B data sources#792
feat: add MetOp AMSU-A and AVHRR Level 1B data sources#792NickGeneva wants to merge 13 commits intoNVIDIA:mainfrom
Conversation
Add standardized async utilities to earth2studio/data/utils.py: - ensure_utc: normalize timezone-aware datetimes to naive UTC - async_retry: retry with exponential backoff and per-attempt timeout - managed_session: context manager for fsspec session cleanup - gather_with_concurrency: bounded parallel execution via semaphore - cancellable_to_thread: wrapper for asyncio.to_thread with timeout Update create-data-source skill to: - Emphasize pure async I/O is ALWAYS preferred over asyncio.to_thread - Reference new utilities from utils.py instead of inline code - Add pytest timeout considerations for thread-based operations - Clarify that threads cannot be force-cancelled in Python - Add standard common parameters (cache, verbose, async_timeout, async_workers, retries)
Add MetOpAMSUA and MetOpAVHRR DataFrameSource implementations for EUMETSAT MetOp satellite Level 1B products via the eumdac library. MetOpAMSUA: Custom EPS native binary parser for 15-channel microwave radiometer data (~50km resolution). Parses MDR-1B records to extract brightness temperatures, geolocation, and satellite geometry. MetOpAVHRR: Uses satpy avhrr_l1b_eps reader for 6-channel visible/IR radiometer data (1km resolution, configurable subsampling). Includes MetOpAMSUALexicon/MetOpAVHRRLexicon, 21 new E2STUDIO_VOCAB entries, unit tests with mock coverage, and documentation.
The eumdac collection.search(sat=...) expects 'Metop-B' not 'M01'. Also update create-data-source skill with fork-to-upstream PR workflow and mandatory sanity-check plot confirmation.
Greptile SummaryAdds
|
| Filename | Overview |
|---|---|
| earth2studio/data/metop_amsua.py | New DataFrameSource with custom EPS binary parser; well-structured and tested. Minor: _parse_sensing_time can throw ValueError for malformed MPHR without being caught. |
| earth2studio/data/metop_avhrr.py | New DataFrameSource using satpy for AVHRR parsing; relies on private satpy internals for performance and sets global Dask config without context manager. |
| earth2studio/lexicon/metop.py | New lexicons for AMSU-A (14 channels) and AVHRR (6 channels); clean implementation following existing patterns. |
| test/data/test_metop_amsua.py | Good unit test coverage with synthetic binary builder; tests GRH parsing, MPHR parsing, BT conversion, and end-to-end mock fetch. |
| test/data/test_metop_avhrr.py | Mock-based tests cover end-to-end call, fields subset, empty response, and exception handling. |
Reviews (2): Last reviewed commit: "fix: try/finally for temp cache cleanup ..." | Re-trigger Greptile
- Remove channel 15 (89 GHz) from AMSU-A lexicon/vocab (~97% missing) - Parser now only outputs channels present in lexicon - Rewrite AVHRR parser to bypass satpy geo-interpolation (12x speedup) - Use OptionalDependencyFailure pattern for eumdac/satpy imports - Update SKILL.md: variable validation step, summary requirements, OptionalDependencyFailure pattern, fork-to-upstream PR workflow
- Remove unused _SURFACE_PROPERTIES_OFFSET constant
- Use E2STUDIO_SCHEMA.field('elev') for schema consistency
- Use io.BytesIO for ZIP extraction instead of temp file
- Add code comments documenting inter-satellite wavenumber bias (<0.03K)
|
/blossom-ci |
- Wrap __call__ fetch in try/finally so temp cache is cleaned on error, timeout, or cancellation - Update SKILL.md: document _tmp_cache_hash pattern and try/finally requirement in Steps 7a, 7c, 7g, and implementation notes
- Step 12e: target 90%+ line coverage guideline - Step 13b: coverage report command with --slow, --cov, --cov-fail-under=90 - CONFIRM gate: coverage percentage now required
|
/blossom-ci |
- Step 2 CONFIRM gate: must include license name + link per new dep, flag non-permissive licenses (GPL/AGPL/SSPL) - PR template: Dependencies table with Package/Version/License/URL/Reason columns and guidance on license lookup
Remove satpy dependency and implement a clean-room AVHRR Level 1B EPS native format parser using struct.unpack_from with byte offsets from the public EUMETSAT Product Format Specification. - Parse GIADR-radiance record for calibration coefficients - Parse MDR records: SCENE_RADIANCES, EARTH_LOCATIONS, ANGULAR_RELATIONS - Calibrate visible channels to reflectance (%), thermal to BT (K) - Handle 3a/3b channel switching via FRAME_INDICATOR bit 16 - Use 103 EPS navigation tie points directly (no interpolation) - ~120x faster than satpy approach (1.2s vs ~150s per 907MB file) - Zero GPL dependencies: only stdlib struct, numpy, pandas, eumdac (MIT)
|
/blossom-ci |
|
/blossom-ci |


Description
Add
MetOpAMSUAandMetOpAVHRRDataFrameSource implementations for EUMETSAT MetOp satellite Level 1B products, accessed via theeumdaclibrary.Data source details
EO:EUM:DAT:METOP:AMSUL1EO:EUM:DAT:METOP:AVHRRL1Implementation highlights
eumdacfor OAuth2-authenticated data access to the EUMETSAT Data Store.MetOpAMSUALexiconandMetOpAVHRRLexiconwith 20 newE2STUDIO_VOCABentries.Data licensing
Dependencies added
eumdac>=3.1.0No GPL dependencies. Both AMSU-A and AVHRR parsers are clean-room implementations using only
struct,numpy, andpandas.Performance
Files changed
earth2studio/data/metop_amsua.pyearth2studio/data/metop_avhrr.pyearth2studio/lexicon/metop.pyearth2studio/lexicon/base.pytest/data/test_metop_amsua.pytest/data/test_metop_avhrr.pyearth2studio/data/__init__.pyearth2studio/lexicon/__init__.pydocs/modules/datasources_dataframe.rstpyproject.toml+uv.lockCHANGELOG.mdSanity-check plots
AMSU-A (channels 4, 9, 14 — 2025-01-15)
AVHRR (channels 4, 5 — 2025-01-15)
Sanity-check script (not committed)
validate_avhrr_satpy.py