Skip to content

feat: add MetOp AMSU-A and AVHRR Level 1B data sources#792

Open
NickGeneva wants to merge 13 commits intoNVIDIA:mainfrom
NickGeneva:feat/data-source-metop
Open

feat: add MetOp AMSU-A and AVHRR Level 1B data sources#792
NickGeneva wants to merge 13 commits intoNVIDIA:mainfrom
NickGeneva:feat/data-source-metop

Conversation

@NickGeneva
Copy link
Copy Markdown
Collaborator

@NickGeneva NickGeneva commented Apr 3, 2026

Description

Add MetOpAMSUA and MetOpAVHRR DataFrameSource implementations for EUMETSAT MetOp satellite Level 1B products, accessed via the eumdac library.

Data source details

Property MetOpAMSUA MetOpAVHRR
Source type DataFrameSource DataFrameSource
Collection ID EO:EUM:DAT:METOP:AMSUL1 EO:EUM:DAT:METOP:AVHRRL1
Format EPS native binary (custom parser) EPS native binary (custom parser)
Channels 14 (23.8–57.3 GHz microwave) 6 (0.63–12.0 µm visible/IR)
Spatial resolution ~50 km at nadir ~20 km (103 EPS tie points per scan line)
Temporal coverage 2006-present 2006-present
Region Global (LEO polar orbit, ~14 orbits/day) Global (LEO polar orbit)
Authentication EUMETSAT API key/secret (env vars) EUMETSAT API key/secret (env vars)

Implementation highlights

  • MetOpAMSUA: Custom binary parser for EPS native MDR-1B records (3464 bytes/scan). Extracts calibrated radiances, converts to brightness temperatures via Planck inverse function, with full geolocation and satellite geometry. Channel 15 (89 GHz) excluded due to ~97% missing data in L1B products.
  • MetOpAVHRR: Custom binary parser for EPS native format (~907 MB/orbit). Parses GIADR-radiance for calibration coefficients, reads MDR SCENE_RADIANCES at 103 navigation tie points per scan line. Visible channels calibrated to reflectance (%), thermal channels to brightness temperature (K). Handles 3a/3b channel switching via FRAME_INDICATOR bit 16. ~120x faster than satpy approach (1.2s vs ~150s per file). Validation script is below, about 10x faster to load file than satpy (which is GPL anyway).
  • Both sources use eumdac for OAuth2-authenticated data access to the EUMETSAT Data Store.
  • All binary format offsets and calibration formulas taken from public EUMETSAT Product Format Specification documents linked in docstrings
  • MetOpAMSUALexicon and MetOpAVHRRLexicon with 20 new E2STUDIO_VOCAB entries.
  • 24 non-network unit tests (mock + binary parser unit tests), 8 network integration tests.

Data licensing

License: EUMETSAT Data Policy
URL: https://www.eumetsat.int/data-policy

Full operational data freely available. Open access for both commercial and non-commercial use under EUMETSAT's data policy established by Member States. Complies with WMO data sharing principles.

Dependencies added

Package Version License License URL Reason
eumdac >=3.1.0 MIT LICENSE.txt EUMETSAT Data Access Client for OAuth2 auth and product search/download

No GPL dependencies. Both AMSU-A and AVHRR parsers are clean-room implementations using only struct, numpy, and pandas.

Performance

Source File size Parse time (subsample=16) Observations
MetOpAMSUA ~2.6 MB <0.1s ~23k/orbit
MetOpAVHRR ~907 MB 1.2s (6 channels) ~1M/orbit

Files changed

File Description
earth2studio/data/metop_amsua.py MetOpAMSUA DataFrameSource (custom EPS binary parser)
earth2studio/data/metop_avhrr.py MetOpAVHRR DataFrameSource (custom EPS binary parser)
earth2studio/lexicon/metop.py Lexicon classes for both instruments
earth2studio/lexicon/base.py 20 new E2STUDIO_VOCAB entries
test/data/test_metop_amsua.py 13 tests (9 mock/unit + 4 network)
test/data/test_metop_avhrr.py 19 tests (15 mock/unit + 4 network)
earth2studio/data/__init__.py Register new sources
earth2studio/lexicon/__init__.py Register new lexicons
docs/modules/datasources_dataframe.rst Documentation entries
pyproject.toml + uv.lock New dependency (eumdac)
CHANGELOG.md Release notes

Sanity-check plots

AMSU-A (channels 4, 9, 14 — 2025-01-15)

AMSU-A sanity check

AVHRR (channels 4, 5 — 2025-01-15)

AVHRR sanity check

Sanity-check script (not committed)
"""Sanity-check plots for MetOp AMSU-A and AVHRR data sources."""
import os
from datetime import datetime, timedelta
import matplotlib.pyplot as plt
import cartopy.crs as ccrs
import cartopy.feature as cfeature

os.environ["EUMETSAT_CONSUMER_KEY"] = "YOUR_KEY"
os.environ["EUMETSAT_CONSUMER_SECRET"] = "YOUR_SECRET"

from earth2studio.data import MetOpAMSUA, MetOpAVHRR

# --- AMSU-A ---
ds_a = MetOpAMSUA(time_tolerance=timedelta(hours=2), cache=True, verbose=True)
df_a = ds_a(datetime(2025, 1, 15, 11), ["amsua04", "amsua09", "amsua14"])

fig, axes = plt.subplots(1, 3, figsize=(18, 6),
                         subplot_kw={"projection": ccrs.Robinson()})
for ax, var in zip(axes, ["amsua04", "amsua09", "amsua14"]):
    sub = df_a[df_a["variable"] == var]
    sc = ax.scatter(sub["lon"], sub["lat"], c=sub["observation"],
                    s=1, transform=ccrs.PlateCarree(), cmap="RdYlBu_r")
    ax.add_feature(cfeature.COASTLINE, linewidth=0.5)
    ax.set_global()
    ax.set_title(f"{var} (N={len(sub):,})")
    plt.colorbar(sc, ax=ax, shrink=0.6, label="BT (K)")
fig.suptitle("MetOp AMSU-A — 2025-01-15 11:00 UTC (±2h)", fontsize=14)
plt.tight_layout()
plt.savefig("sanity_check_metop_amsua.png", dpi=150)

# --- AVHRR ---
ds_v = MetOpAVHRR(time_tolerance=timedelta(hours=2), subsample=16,
                   cache=True, verbose=True)
df_v = ds_v(datetime(2025, 1, 15, 11), ["avhrr04", "avhrr05"])

fig, axes = plt.subplots(1, 2, figsize=(16, 6),
                         subplot_kw={"projection": ccrs.Robinson()})
for ax, var in zip(axes, ["avhrr04", "avhrr05"]):
    sub = df_v[df_v["variable"] == var]
    sc = ax.scatter(sub["lon"], sub["lat"], c=sub["observation"],
                    s=0.1, transform=ccrs.PlateCarree(), cmap="RdYlBu_r")
    ax.add_feature(cfeature.COASTLINE, linewidth=0.5)
    ax.set_global()
    ax.set_title(f"{var} (N={len(sub):,})")
    plt.colorbar(sc, ax=ax, shrink=0.6, label="BT (K)")
fig.suptitle("MetOp AVHRR — 2025-01-15 11:00 UTC (±2h)", fontsize=14)
plt.tight_layout()
plt.savefig("sanity_check_metop_avhrr.png", dpi=150)

validate_avhrr_satpy.py

Add standardized async utilities to earth2studio/data/utils.py:
- ensure_utc: normalize timezone-aware datetimes to naive UTC
- async_retry: retry with exponential backoff and per-attempt timeout
- managed_session: context manager for fsspec session cleanup
- gather_with_concurrency: bounded parallel execution via semaphore
- cancellable_to_thread: wrapper for asyncio.to_thread with timeout

Update create-data-source skill to:
- Emphasize pure async I/O is ALWAYS preferred over asyncio.to_thread
- Reference new utilities from utils.py instead of inline code
- Add pytest timeout considerations for thread-based operations
- Clarify that threads cannot be force-cancelled in Python
- Add standard common parameters (cache, verbose, async_timeout,
  async_workers, retries)
Add MetOpAMSUA and MetOpAVHRR DataFrameSource implementations for
EUMETSAT MetOp satellite Level 1B products via the eumdac library.

MetOpAMSUA: Custom EPS native binary parser for 15-channel microwave
radiometer data (~50km resolution). Parses MDR-1B records to extract
brightness temperatures, geolocation, and satellite geometry.

MetOpAVHRR: Uses satpy avhrr_l1b_eps reader for 6-channel visible/IR
radiometer data (1km resolution, configurable subsampling).

Includes MetOpAMSUALexicon/MetOpAVHRRLexicon, 21 new E2STUDIO_VOCAB
entries, unit tests with mock coverage, and documentation.
The eumdac collection.search(sat=...) expects 'Metop-B' not 'M01'.
Also update create-data-source skill with fork-to-upstream PR
workflow and mandatory sanity-check plot confirmation.
@greptile-apps
Copy link
Copy Markdown
Contributor

greptile-apps bot commented Apr 3, 2026

Greptile Summary

Adds MetOpAMSUA and MetOpAVHRR as new DataFrameSource implementations for EUMETSAT MetOp satellite Level 1B products, with corresponding lexicons, unit tests, and documentation. The custom EPS binary parser for AMSU-A is well-structured and thoroughly tested; the AVHRR path leverages satpy with a performance-oriented shortcut through private internals. All previously raised concerns have been addressed.

Confidence Score: 5/5

Safe to merge; all remaining findings are minor style/robustness suggestions that don't affect the primary data path.

All P0/P1 concerns from previous review rounds have been resolved. The three remaining comments are P2: a global Dask config mutation without a context manager, reliance on private satpy internals (acknowledged performance trade-off), and an unguarded ValueError in the MPHR sensing-time parser for malformed files. None of these affect normal operation with real EPS products.

earth2studio/data/metop_avhrr.py (dask.config.set and private satpy API access)

Important Files Changed

Filename Overview
earth2studio/data/metop_amsua.py New DataFrameSource with custom EPS binary parser; well-structured and tested. Minor: _parse_sensing_time can throw ValueError for malformed MPHR without being caught.
earth2studio/data/metop_avhrr.py New DataFrameSource using satpy for AVHRR parsing; relies on private satpy internals for performance and sets global Dask config without context manager.
earth2studio/lexicon/metop.py New lexicons for AMSU-A (14 channels) and AVHRR (6 channels); clean implementation following existing patterns.
test/data/test_metop_amsua.py Good unit test coverage with synthetic binary builder; tests GRH parsing, MPHR parsing, BT conversion, and end-to-end mock fetch.
test/data/test_metop_avhrr.py Mock-based tests cover end-to-end call, fields subset, empty response, and exception handling.

Reviews (2): Last reviewed commit: "fix: try/finally for temp cache cleanup ..." | Re-trigger Greptile

- Remove channel 15 (89 GHz) from AMSU-A lexicon/vocab (~97% missing)
- Parser now only outputs channels present in lexicon
- Rewrite AVHRR parser to bypass satpy geo-interpolation (12x speedup)
- Use OptionalDependencyFailure pattern for eumdac/satpy imports
- Update SKILL.md: variable validation step, summary requirements,
  OptionalDependencyFailure pattern, fork-to-upstream PR workflow
NickGeneva added a commit to NickGeneva/earth2studio that referenced this pull request Apr 3, 2026
@NickGeneva
Copy link
Copy Markdown
Collaborator Author

NickGeneva commented Apr 3, 2026

Sanity-Check Plots (cartopy Robinson projection)

MetOpAMSUA — channels 4, 9, 14

AMSU-A sanity check

MetOpAVHRR — channels 4, 5

AVHRR sanity check

Test date: 2025-01-15 12:00 UTC, Metop-B, ±2h tolerance

  • AMSU-A: 162,000 obs (3 vars × 54,000 obs/var), 4 orbit passes
  • AVHRR: 556,406 obs (2 vars), 4 orbit passes, subsample=32

NickGeneva added a commit to NickGeneva/earth2studio that referenced this pull request Apr 3, 2026
- Remove unused _SURFACE_PROPERTIES_OFFSET constant
- Use E2STUDIO_SCHEMA.field('elev') for schema consistency
- Use io.BytesIO for ZIP extraction instead of temp file
- Add code comments documenting inter-satellite wavenumber bias (<0.03K)
@NickGeneva
Copy link
Copy Markdown
Collaborator Author

/blossom-ci

@NickGeneva
Copy link
Copy Markdown
Collaborator Author

@greptile-apps

- Wrap __call__ fetch in try/finally so temp cache is cleaned on
  error, timeout, or cancellation
- Update SKILL.md: document _tmp_cache_hash pattern and try/finally
  requirement in Steps 7a, 7c, 7g, and implementation notes
- Step 12e: target 90%+ line coverage guideline
- Step 13b: coverage report command with --slow, --cov, --cov-fail-under=90
- CONFIRM gate: coverage percentage now required
@NickGeneva
Copy link
Copy Markdown
Collaborator Author

/blossom-ci

- Step 2 CONFIRM gate: must include license name + link per new dep,
  flag non-permissive licenses (GPL/AGPL/SSPL)
- PR template: Dependencies table with Package/Version/License/URL/Reason
  columns and guidance on license lookup
Remove satpy dependency and implement a clean-room AVHRR Level 1B
EPS native format parser using struct.unpack_from with byte offsets
from the public EUMETSAT Product Format Specification.

- Parse GIADR-radiance record for calibration coefficients
- Parse MDR records: SCENE_RADIANCES, EARTH_LOCATIONS, ANGULAR_RELATIONS
- Calibrate visible channels to reflectance (%), thermal to BT (K)
- Handle 3a/3b channel switching via FRAME_INDICATOR bit 16
- Use 103 EPS navigation tie points directly (no interpolation)
- ~120x faster than satpy approach (1.2s vs ~150s per 907MB file)
- Zero GPL dependencies: only stdlib struct, numpy, pandas, eumdac (MIT)
@NickGeneva
Copy link
Copy Markdown
Collaborator Author

/blossom-ci

@NickGeneva
Copy link
Copy Markdown
Collaborator Author

/blossom-ci

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant