Deploy the trained segmentation model for pan-arctic inference (60-74°N) on 2025 PlanetScope basemap imagery to produce an RTS survey map. The pipeline prioritizes precision over recall to minimize false alarms in the final product.
| Resource | Specification |
|---|---|
| Cloud | Google Cloud Platform |
| VM Type | GPU-enabled VM (specific type TBD with PDG team) |
| Storage | Google Cloud Storage bucket: abruptthawmapping |
| Collaboration | PDG workflow optimization team (Luigi/Todd) |
gs://abruptthawmapping/
├── models/
│ └── rts-v2/
│ ├── best_model.pth
│ ├── normalization_stats.json
│ └── config.yaml
├── inference/
│ ├── 2025-Q3/
│ │ ├── tiles/ # Raw prediction tiles
│ │ │ ├── tile_0001.tif
│ │ │ └── ...
│ │ ├── merged/ # Merged prediction rasters
│ │ │ ├── region_yakutia.tif
│ │ │ └── ...
│ │ ├── vectors/ # Vectorized polygons
│ │ │ ├── rts_predictions.gpkg
│ │ │ └── ...
│ │ └── logs/
│ │ └── inference_log.json
│ └── ...
└── basemaps/
└── 2025-Q3/
└── ... (input imagery)
Base Image: Same as training—pytorch/pytorch:2.2.0-cuda12.1-cudnn8-devel
Additional Inference Requirements:
| Package | Purpose |
|---|---|
| google-cloud-storage | GCS bucket access |
| geopandas | Vector operations |
| shapely | Geometry handling |
| pyproj | Coordinate transformations |
Docker Configuration for Inference:
| Flag | Purpose |
|---|---|
--gpus all |
Enable GPU access |
-v /path/to/cache:/cache |
Local cache for tiles |
--env GOOGLE_APPLICATION_CREDENTIALS=/path/to/key.json |
GCS authentication |
GCS Authentication:
- Create service account with Storage Object Viewer and Storage Object Creator roles
- Download JSON key file
- Mount key file into container and set environment variable
| Attribute | Value |
|---|---|
| Product | Global Quarterly PlanetScope Basemap |
| Year | 2025 |
| Quarter | Q3 (July-September) |
| Bands | RGB |
| Resolution | ~3 m |
| Coverage | 60-74°N (pan-arctic) |
| CRS | EPSG:3413 |
| Parameter | Estimate |
|---|---|
| Total area | ~20 million km² |
| Tile size | 512×512 @ 3m = ~2.36 km² per tile |
| Estimated tiles | ~8-10 million tiles (without overlap) |
| With 50% overlap | ~32-40 million tile inferences |
| Parameter | Value | Rationale |
|---|---|---|
| Tile size | 512×512 pixels | Matches training tile size |
| Spatial coverage | ~1.5 km × 1.5 km | At 3m resolution |
| CRS | EPSG:3413 | Consistent with training |
| Format | GeoTIFF | Preserves georeferencing |
Overlapping tiles ensure RTS at tile boundaries are detected where both headwall and floor are visible.
| Parameter | Value | Rationale |
|---|---|---|
| Overlap (stride) | 256 pixels (50%) | Ensures most partial RTS captured in adjacent tile |
| Step size | 256 pixels | tile_size - overlap |
Overlap rationale: An RTS split at a tile boundary may show only floor in tile A and only headwall in tile B. With 50% overlap, an intermediate tile C will likely contain both features.
The inference tile grid is pre-filtered externally (land-only, permafrost zones) before the inference pipeline runs. The inference code receives a pre-filtered tile list and processes it as-is — no filtering logic inside the inference container.
- Define bounding box for inference region (or per-region bounding boxes)
- Generate tile grid with specified overlap
- Apply land/permafrost filtering externally (outside this pipeline)
- Save filtered tile grid as CSV with tile IDs and bounding boxes → this is the
--tile-listinput to the inference script
Critical: Use the exact normalization statistics from training.
- Load
normalization_stats.jsonfrom model directory - Verify dataset version matches expected training data version
- Apply mean subtraction and std division to each input tile
For each input tile:
- Load RGB values as float32
- Subtract channel means:
x = x - mean[channel] - Divide by channel stds:
x = x / std[channel] - Feed normalized tensor to model
Warning: If normalization statistics are not applied identically to training, model predictions will be unreliable.
RTS range from ~50m to 2+ km. A single resolution cannot optimally detect all sizes:
- Native 3m: Good for small-medium RTS, may miss context for large RTS
- Downscaled: Larger effective field of view captures large RTS
| Scale | Effective Resolution | Field of View | Target RTS |
|---|---|---|---|
| 1.0 | 3m (native) | 1.5 km | Small-medium (50m-500m) |
| 0.5 | 6m | 3 km | Medium-large (200m-1km) |
| 0.25 | 12m | 6 km | Very large (1km+) |
Recommended configuration: Start with scales [1.0, 0.5]. Add 0.25 only if large RTS recall is problematic.
For each tile location:
Scale 1.0 (native):
- Load 512×512 tile at native resolution
- Normalize using training statistics
- Run inference → probability map P_1.0
Scale 0.5:
- Load 1024×1024 region centered on tile location
- Downsample to 512×512 (bilinear interpolation)
- Normalize using training statistics
- Run inference → probability map at 512×512
- Upsample prediction back to 1024×1024
- Crop center 512×512 → P_0.5
Scale 0.25:
- Load 2048×2048 region centered on tile location
- Downsample to 512×512
- Normalize, run inference
- Upsample to 2048×2048, crop center 512×512 → P_0.25
Combine predictions across scales using pixel-wise maximum:
P_final = max(P_1.0, P_0.5, P_0.25)
Rationale: If any scale confidently detects RTS, include it. Maximum operation is conservative toward detection while individual scale thresholds control precision.
| Setting | Transforms | Speed Multiplier |
|---|---|---|
| Disabled | None | 1× |
| Minimal | Identity, hflip | 2× |
| Standard | Identity, hflip, vflip, rot180 | 4× |
Recommendation: For pan-arctic inference, use Minimal TTA (2×) as balance between accuracy and compute cost. Full TTA on 40M+ tiles is expensive.
For each input tile:
- Original → predict → P_orig
- Horizontal flip → predict → flip back → P_hflip
- Average: P_tta = (P_orig + P_hflip) / 2
Order of operations:
- For each scale: a. Apply TTA transforms b. Average TTA predictions at this scale
- Take maximum across scales
Total inference passes per tile location: n_scales × n_tta_transforms
| Configuration | Passes per Location |
|---|---|
| 2 scales, no TTA | 2 |
| 2 scales, minimal TTA | 4 |
| 3 scales, minimal TTA | 6 |
| 2 scales, standard TTA | 8 |
| Parameter | Value | Notes |
|---|---|---|
| Batch size | 64-128 | Tune based on GPU memory |
| Tile loading | Async prefetch | Overlap I/O with compute |
| GPU utilization target | >90% | Monitor with nvidia-smi |
- Initialize: Load model, load normalization stats, set model to eval mode
- Tile iteration:
- Load batch of tiles from GCS (with prefetching)
- Normalize batch
- For each scale: run inference (with TTA if enabled)
- Fuse scales
- Save predictions to GCS
- Progress tracking: Log completed tiles, estimated time remaining
- Checkpointing: Save progress every N tiles for resumability
The inference job must be resumable after interruption:
- Maintain manifest of completed tiles in
inference_log.json - On restart, load manifest and skip completed tiles
- Use atomic writes to GCS (write to temp, then rename)
Adjacent tiles overlap by 50%. The overlapping regions have multiple predictions that must be merged.
Merging strategy: Maximum
For pixels covered by multiple tiles, take the maximum probability:
P_merged(x, y) = max(P_tile1(x, y), P_tile2(x, y), ...)
Rationale: Consistent with the detection philosophy—if any tile view detects RTS, include it.
Option A: On-the-fly merging (memory-efficient)
- Create output raster for region with NoData fill
- For each tile prediction:
- Read overlapping region from output
- Compute pixel-wise maximum with new prediction
- Write merged result back
- Advantage: Low memory; Disadvantage: Many I/O operations
Option B: Batch merging (faster)
- Accumulate all tile predictions in memory (or memory-mapped file)
- Apply reduction (maximum) across overlapping tiles
- Write final merged raster
- Advantage: Faster; Disadvantage: High memory for large regions
Recommendation: Use Option B for manageable regions (e.g., per Arctic subregion), Option A for full pan-arctic if memory-constrained.
For pan-arctic scale, produce merged outputs per region rather than single global raster:
- Easier to manage and distribute
- Enables parallel processing
- Allows region-specific quality control
| Attribute | Value |
|---|---|
| Format | Cloud-Optimized GeoTIFF (COG) |
| Data type | Float32 |
| Range | [0.0, 1.0] |
| NoData value | -1.0 |
| CRS | EPSG:3413 |
| Resolution | 3m (native) |
| Compression | LZW |
| Attribute | Value |
|---|---|
| Format | Cloud-Optimized GeoTIFF (COG) |
| Data type | UInt8 |
| Values | 0 (background), 1 (RTS) |
| NoData value | 255 |
| CRS | EPSG:3413 |
| Resolution | 3m |
| Compression | LZW |
Threshold applied: Use calibrated threshold from training (documented in model config).
| Attribute | Value |
|---|---|
| Format | GeoPackage (.gpkg) |
| Geometry | Polygon (MultiPolygon for fragmented) |
| CRS | EPSG:3413 |
Attributes per polygon:
| Field | Type | Description |
|---|---|---|
| rts_id | Integer | Unique identifier |
| area_m2 | Float | Polygon area in square meters |
| perimeter_m | Float | Polygon perimeter in meters |
| centroid_lat | Float | Centroid latitude (WGS84) |
| centroid_lon | Float | Centroid longitude (WGS84) |
| mean_prob | Float | Mean probability within polygon |
| max_prob | Float | Maximum probability within polygon |
| detection_scale | String | Scale(s) that detected this RTS |
| tile_ids | String | Comma-separated tile IDs containing this RTS |
Save with each inference run:
inference_log.json:
| Field | Description |
|---|---|
| model_version | Model identifier |
| model_checkpoint | Path to model weights |
| normalization_stats_hash | MD5 hash of normalization file |
| inference_date | ISO timestamp |
| basemap_version | 2025-Q3 |
| scales_used | [1.0, 0.5] |
| tta_config | "minimal" |
| threshold | 0.XX (from calibration) |
| n_tiles_processed | Total tiles |
| n_tiles_with_detection | Tiles with any RTS prediction |
| total_rts_area_km2 | Sum of predicted RTS area |
| processing_time_hours | Wall clock time |
| gpu_type | e.g., "NVIDIA H100" |
| Check | Action if Failed |
|---|---|
| Tile has valid data (not all NoData) | Skip tile, log warning |
| Prediction values in [0, 1] | Clip and log error |
| Tile georeferencing valid | Stop and investigate |
| GPU memory stable | Reduce batch size |
Performed before releasing results (detailed in post-inference.md):
- Visual inspection of sample predictions
- Comparison with known RTS locations
- False positive analysis
- Regional performance assessment
| Technique | Description |
|---|---|
| Tile caching | Cache frequently accessed tiles locally |
| Prefetching | Load next batch while current batch processes |
| COG format | Cloud-Optimized GeoTIFF enables efficient partial reads |
| Batch GCS operations | Upload predictions in batches, not per-tile |
| Technique | Description |
|---|---|
| Mixed precision (FP16) | 2× throughput on tensor cores |
| Batch size tuning | Maximize GPU utilization |
| Multiple streams | Overlap data transfer and compute |
| Model compilation | torch.compile() for additional speedup |
| Configuration | Tiles/Second (est.) | Time for 40M tiles |
|---|---|---|
| 1 scale, no TTA, batch=64 | ~100-200 | 2-4 days |
| 2 scales, minimal TTA, batch=64 | ~50-100 | 4-8 days |
| 2 scales, standard TTA, batch=64 | ~25-50 | 8-16 days |
Note: Estimates are rough; actual performance depends on I/O bandwidth, tile complexity, and GCS latency.
The inference pipeline integrates with the existing PDG (Permafrost Discovery Gateway) workflow infrastructure developed for DARTS inference.
Integration points:
- Input: Basemap tiles from GCS
- Output: Prediction tiles and vectors to GCS
- Logging: Compatible format for PDG monitoring
- Parallelization: Workflow handles VM orchestration
The inference container exposes a CLI interface for PDG workflow integration:
python scripts/inference.py --config configs/inference.yaml --tile-list tiles.csv--config: YAML file specifying model path, GCS paths, scales, TTA config, threshold--tile-list: CSV file with tile IDs and bounding boxes to process (pre-filtered by PDG/RTS team)- Output: Prediction tiles written to GCS path defined in config;
inference_log.jsonupdated on completion
Tile-level parallelism (managed by PDG workflow):
- RTS team generates the full filtered tile grid (CSV)
- PDG team (Luigi/Todd) partitions the CSV into chunks and spawns VMs
- Each VM runs the inference container with its assigned tile list chunk
- RTS team merges outputs after all chunks complete
Within-VM parallelism:
- Single GPU processes tiles in batches
- Multiple CPU workers handle I/O prefetching
- No multi-GPU within single VM (simplifies code)
| Responsibility | Owner |
|---|---|
| Tile grid generation (filtered CSV) | RTS team |
| VM orchestration + tile partitioning | PDG team (Luigi/Todd) |
| Inference Docker container | RTS team |
| Output merging | RTS team |
| Quality control | RTS team |
Interface contract (to finalize with PDG team):
- Input:
configs/inference.yaml+tiles.csv(tile_id, bbox columns) - Output: Prediction tiles at
{config.output_path}/{tile_id}.tif; log at{config.output_path}/inference_log.json
- Model artifacts uploaded to GCS (model, normalization stats, config)
- Docker image built and pushed to container registry
- Tile grid generated and validated
- GCS permissions configured (service account)
- Test inference on small region successful
- Throughput estimate matches budget
- Progress monitoring active
- GPU utilization >90%
- No error accumulation in logs
- Checkpoint saves working
- All tiles processed (compare manifest to grid)
- Merged rasters generated
- Vectorization complete
- Metadata logged
- Sanity checks passed
- Ready for quality control (post-inference.md)
| Issue | Possible Cause | Solution |
|---|---|---|
| OOM errors | Batch size too large | Reduce batch size |
| Slow inference | I/O bottleneck | Enable prefetching, use local cache |
| Inconsistent predictions | Wrong normalization | Verify normalization_stats.json hash |
| Missing tiles in output | Job interrupted | Check manifest, restart from checkpoint |
| High false positive rate | Threshold too low | Re-calibrate threshold on validation set |
| Predictions all zero | Model loading error | Verify model checkpoint, test on known positive |