Skip to content

Align jepa forecast finetuning#2149

Merged
shmh40 merged 55 commits intoecmwf:developfrom
csjfwang:align_jepa_forecast_finetuning
Mar 31, 2026
Merged

Align jepa forecast finetuning#2149
shmh40 merged 55 commits intoecmwf:developfrom
csjfwang:align_jepa_forecast_finetuning

Conversation

@csjfwang
Copy link
Copy Markdown
Contributor

@csjfwang csjfwang commented Mar 31, 2026

Description

Align config_jepa_forecasting_finetuning.yml with config_forecasting.yml for fair comparison in the future.

Issue Number

Fixes #2150

Is this PR a draft? Mark it as draft.

Checklist before asking for review

  • I have performed a self-review of my code
  • My changes comply with basic sanity checks:
    • I have fixed formatting issues with ./scripts/actions.sh lint
    • I have run unit tests with ./scripts/actions.sh unit-test
    • I have documented my code and I have updated the docstrings.
    • I have added unit tests, if relevant
  • I have tried my changes with data and code:
    • I have run the integration tests with ./scripts/actions.sh integration-test
    • (bigger changes) I have run a full training and I have written in the comment the run_id(s): launch-slurm.py --time 60
    • (bigger changes and experiments) I have shared a hegdedoc in the github issue with all the configurations and runs for this experiments
  • I have informed and aligned with people impacted by my change:
    • for config changes: the MatterMost channels and/or a design doc
    • for changes of dependencies: the MatterMost software development channel

wang85 and others added 30 commits July 16, 2025 10:07
latent_noise_deterministic_latents: True

freeze_modules: ".*encoder.*|.*latent_pre_norm.*|.*latent_heads.*"
freeze_modules: ""
Copy link
Copy Markdown
Contributor

@shmh40 shmh40 Mar 31, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should keep these modules frozen as default also :)

fe_layer_norm_after_blocks: [] # Index starts at 0. Thus, [3] adds a LayerNorm after the fourth layer
fe_impute_latent_noise_std: 0.0 # 1e-4
fe_layer_norm_after_blocks: [7] # Index starts at 0. Thus, [3] adds a LayerNorm after the fourth layer
fe_impute_latent_noise_std: 1e-4
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry, could we actually leave the latent noise as 0 for now!

#####################################

streams_directory: "./config/streams/era5_1deg/"
streams_directory: "./config/streams/era5_1deg_forecasting/"
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great, thanks

lr_start: 1e-6
lr_max: 5e-5
lr_final_decay: 1e-6
lr_final_decay: 2e-6
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

great, thanks

training_mode: ["masking"]

num_mini_epochs: 32
num_mini_epochs: 64
Copy link
Copy Markdown
Contributor

@shmh40 shmh40 Mar 31, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@sophie-xhonneux probably this one we can leave as 32 epochs?

Copy link
Copy Markdown
Contributor

@shmh40 shmh40 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great, thanks! Just a few changes needed and then let's wait for @sophie-xhonneux and @MatKbauer to double check too.

@shmh40 shmh40 self-requested a review March 31, 2026 09:42
Copy link
Copy Markdown
Contributor

@shmh40 shmh40 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also can you create an issue and link to it in the PR so that the checks pass :)

@github-project-automation github-project-automation bot moved this to In Progress in WeatherGen-dev Mar 31, 2026
@csjfwang
Copy link
Copy Markdown
Contributor Author

Also can you create an issue and link to it in the PR so that the checks pass :)

Thanks, will create one now!

@github-actions github-actions bot added model Related to model training or definition (not generic infra) model:pretrain science Scientific questions labels Mar 31, 2026
with_mixed_precision: True
with_flash_attention: True
compile_model: False
with_fsdp: False
Copy link
Copy Markdown
Contributor Author

@csjfwang csjfwang Mar 31, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@shmh40 @sophie-xhonneux
Should we set with_fsdp: True, as in config_forecasting.yml?

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's leave it as False for now, thanks!

Copy link
Copy Markdown
Contributor

@shmh40 shmh40 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you @csjfwang !

@shmh40 shmh40 merged commit 5b14709 into ecmwf:develop Mar 31, 2026
5 checks passed
@github-project-automation github-project-automation bot moved this from In Progress to Done in WeatherGen-dev Mar 31, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

model:pretrain model Related to model training or definition (not generic infra) science Scientific questions

Projects

Status: Done

Development

Successfully merging this pull request may close these issues.

Align JEPA forecasting fine-tuning config with forecasting config

2 participants