Skip to content

Conversation

@moritzhauschulz
Copy link
Contributor

@moritzhauschulz moritzhauschulz commented Dec 4, 2025

Description

Introducing a repeat flag, which fills up the samples_per_mini_epoch where the dataset has fewer elements. This is done using tiling, whereas in case of samples_per_mini_epoch not being divisible by the dataset size, the final 'remainder' tile is sampled without replacement from the dataset. Pretty simple, see code.

Issue Number

Closes #1379

Checklist before asking for review

  • I have performed a self-review of my code
  • My changes comply with basic sanity checks:
    • I have fixed formatting issues with ./scripts/actions.sh lint
    • I have run unit tests with ./scripts/actions.sh unit-test
    • I have documented my code and I have updated the docstrings.
    • I have added unit tests, if relevant
  • I have tried my changes with data and code:
    • I have run the integration tests with ./scripts/actions.sh integration-test
    • (bigger changes) I have run a full training and I have written in the comment the run_id(s): launch-slurm.py --time 60
    • (bigger changes and experiments) I have shared a hegdedoc in the github issue with all the configurations and runs for this experiments
  • I have informed and aligned with people impacted by my change:
    • for config changes: the MatterMost channels and/or a design doc
    • for changes of dependencies: the MatterMost software development channel

@moritzhauschulz
Copy link
Contributor Author

@clessig does this make sense?

Copy link
Collaborator

@clessig clessig left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, looks good. Just two minor comments.

ae_adapter_with_residual: True
ae_adapter_dropout_rate: 0.1

repeat_data: False
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We need a more descriptive name here. The future structure with nested dicts will also help here.

self.perms = np.tile(self.perms, self.samples_per_mini_epoch // len(self.perms))
else:
self.perms = np.tile(self.perms, self.samples_per_mini_epoch // len(self.perms))
random_filler = self.rng.choice(
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could we get rid of the branch and have a random_filler of len=0 when it divides?

@moritzhauschulz
Copy link
Contributor Author

Thanks for the feedback @clessig, should all be addressed now.

Comment on lines +291 to +328

# check repeat_data flag and fill up perms accordingly
if self.repeat_data and len(self.perms) < self.samples_per_mini_epoch:
self.perms = np.tile(self.perms, self.samples_per_mini_epoch // len(self.perms))
random_filler = self.rng.choice(
self.perms, size=self.samples_per_mini_epoch - len(self.perms), replace=False
)
self.perms = np.concatenate([self.perms, random_filler])

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

since you are already touching this logic, can you make sure the behavior described in #1085 does not persist?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I will have a look – but I think I know where the issue comes from roughly and I should have fixed it already on the diffusion branch. So should be easy to do.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe I was a bit overconfident there. I am not exactly sure where the issue comes from. You mentioned that you have a test suite that checks this – could you share this? Otherwise, in my code self.len and len(self.perms) should always be the same, unless self.len is set to self.chunk in this line here.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I will check, dont worry I will take care of this issue.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay, thanks @grassesi.

fsm = self.forecast_steps[0]
if len(ds) > 0:
self.len = min(self.len, len(ds) - (self.len_hrs * (fsm + 1)) // self.step_hrs)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please check if this is okay to be removed. I think it gets overwritten anyways, but I might be overlooking something. @clessig (or @grassesi)

Copy link
Contributor

@grassesi grassesi Dec 9, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, it definitely gets overwritten, but the question is if this is intentional or not. Maybe at some point the idea was to take the minimum between this quantity and int(index_range.end - index_range.start)? Be it intentional or not I would still remove it: If it is unintentional behavior, we should treat it in a separate PR/Issue since any changes in the sampling behavior are quite wide reaching and require thorough testing.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good, thanks for double checking.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we remove it from the code then.

Copy link
Collaborator

@clessig clessig left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I just tried with

repeat_data_in_mini_epoch
start_date: 202012300000
end_date: 202012310000

and get:

forecast_steps at mini_epoch=0 : 2
Traceback (most recent call last):
  File "/users/lessig/santis/WeatherGenerator/src/weathergen/run_train.py", line 176, in train_with_args
    trainer.run(cf, devices)
  File "/users/lessig/santis/WeatherGenerator/src/weathergen/train/trainer.py", line 342, in run
    self.train(mini_epoch)
  File "/users/lessig/santis/WeatherGenerator/src/weathergen/train/trainer.py", line 517, in train
    for bidx, batch in enumerate(dataset_iter):
                       ^^^^^^^^^^^^^^^^^^^^^^^
  File "/users/lessig/santis/WeatherGenerator/.venv/lib/python3.12/site-packages/torch/utils/data/dataloader.py", line 708, in __next__
    data = self._next_data()
           ^^^^^^^^^^^^^^^^^
  File "/users/lessig/santis/WeatherGenerator/.venv/lib/python3.12/site-packages/torch/utils/data/dataloader.py", line 1480, in _next_data
    return self._process_data(data)
           ^^^^^^^^^^^^^^^^^^^^^^^^
  File "/users/lessig/santis/WeatherGenerator/.venv/lib/python3.12/site-packages/torch/utils/data/dataloader.py", line 1505, in _process_data
    data.reraise()
  File "/users/lessig/santis/WeatherGenerator/.venv/lib/python3.12/site-packages/torch/_utils.py", line 733, in reraise
    raise exception
AssertionError: Caught AssertionError in DataLoader worker process 0.
Original Traceback (most recent call last):
  File "/users/lessig/santis/WeatherGenerator/.venv/lib/python3.12/site-packages/torch/utils/data/_utils/worker.py", line 349, in _worker_loop
    data = fetcher.fetch(index)  # type: ignore[possibly-undefined]
           ^^^^^^^^^^^^^^^^^^^^
  File "/users/lessig/santis/WeatherGenerator/.venv/lib/python3.12/site-packages/torch/utils/data/_utils/fetch.py", line 42, in fetch
    data = next(self.dataset_iter)
           ^^^^^^^^^^^^^^^^^^^^^^^
  File "/users/lessig/santis/WeatherGenerator/src/weathergen/datasets/multi_stream_data_sampler.py", line 720, in __iter__
    self.reset()
  File "/users/lessig/santis/WeatherGenerator/src/weathergen/datasets/multi_stream_data_sampler.py", line 288, in reset
    assert idx_end > 0, "dataset size too small for forecast range"
           ^^^^^^^^^^^
AssertionError: dataset size too small for forecast range

[6] > /users/lessig/santis/WeatherGenerator/.venv/lib/python3.12/site-packages/torch/_utils.py(733)reraise()
-> raise exception

How did you test?

fsm = self.forecast_steps[0]
if len(ds) > 0:
self.len = min(self.len, len(ds) - (self.len_hrs * (fsm + 1)) // self.step_hrs)

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we remove it from the code then.

ae_adapter_with_residual: True
ae_adapter_dropout_rate: 0.1

repeat_data_in_mini_epoch: False
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we move this down to shuffle

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes (thought I already did) and Yes. Will make another push today.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I haven't had this error – I always tested with 18 hrs time window, which is the minimum. What did you put for the validation start/end dates @clessig? I have had this error coming from not making the right adjustment for the validation time window before.

Copy link
Contributor Author

@moritzhauschulz moritzhauschulz Dec 12, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@clessig So what happened I think is that you have forecast_steps and forecast_offset set differently than me. If you try with the attached config, it should work – explanation below.

In the config for diffusion overfitting, we set forecast_steps=1 and forecast_offset=0 (compatible with a date range of 3=2+1+0 time steps, yielding a single source and single target sample). If you want forecast_steps =2 and forecast_offset=1 (as in default config), then I think you need a minimum date range that has at least 5=2+2+1 time steps. It works for example with start_date: 202012300000 and
end_date: 202012310600. This minimum is not affected by my changes (was already there), but evidently the general rule is:

minimum time steps = 2 + forecast_steps + forecast_offset

Hope this makes sense.

adjusted_default_config.yml

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@moritzhauschulz thanks for the explanation. can we encode it as a check in the config with an informative message? setting the correct combination of dates, steps and offsets is becoming more and more subtle

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I encoded this in the new version, with a brief message. Since the corresponding code check has changed slightly again, I suggest to not print a long explanation, and instead let the user refer to the code checks (also to avoid duplication).

@clessig
Copy link
Collaborator

clessig commented Jan 5, 2026

@moritzhauschulz : can you point it at develop please. How big are the conflicts then?

commit 9336fe1
Author: moritzhauschulz <[email protected]>
Date:   Fri Dec 12 20:10:50 2025 +0100

    requested changes

commit dadde23
Author: moritzhauschulz <[email protected]>
Date:   Mon Dec 8 18:54:44 2025 +0100

    remove 1 line

commit c871f9c
Author: moritzhauschulz <[email protected]>
Date:   Mon Dec 8 18:16:50 2025 +0100

    remove unnecessary statement

commit e3e46eb
Author: moritzhauschulz <[email protected]>
Date:   Mon Dec 8 12:49:03 2025 +0100

    lint

commit 559add7
Author: moritzhauschulz <[email protected]>
Date:   Mon Dec 8 12:47:35 2025 +0100

    rename flag and simplify cases

commit f6e1c39
Author: moritzhauschulz <[email protected]>
Date:   Thu Dec 4 21:07:42 2025 +0100

    reset config and lint

commit 27cb0c8
Author: moritzhauschulz <[email protected]>
Date:   Thu Dec 4 20:57:14 2025 +0100

    repeat flag

commit bf17bfe
Author: Christian Lessig <[email protected]>
Date:   Thu Dec 4 16:53:51 2025 +0100

    Updated config

commit 7745e47
Author: Christian Lessig <[email protected]>
Date:   Thu Dec 4 16:35:19 2025 +0100

    Switched to lists of model / target stratgies

commit 12bae15
Author: Christian Lessig <[email protected]>
Date:   Thu Dec 4 15:01:07 2025 +0100

    Fixes for diffusion

commit 9065219
Author: Christian Lessig <[email protected]>
Date:   Thu Dec 4 13:33:42 2025 +0100

    Changed that model takes sample as input

commit 3f52a8d
Author: Christian Lessig <[email protected]>
Date:   Thu Dec 4 13:32:14 2025 +0100

    Changed core functions to take sample as arg

commit d36367a
Author: Christian Lessig <[email protected]>
Date:   Thu Dec 4 13:31:55 2025 +0100

    Changed args to embedding

commit b69b743
Author: Christian Lessig <[email protected]>
Date:   Thu Dec 4 13:30:41 2025 +0100

    Cleaned up comments and return values a bit

commit 59510dd
Author: Christian Lessig <[email protected]>
Date:   Thu Dec 4 00:01:50 2025 +0100

    Fixed problem with non_blocking=True

commit 69b53a6
Author: Christian Lessig <[email protected]>
Date:   Thu Dec 4 00:00:42 2025 +0100

    Removed old comments

commit 51754fa
Author: Christian Lessig <[email protected]>
Date:   Thu Dec 4 00:00:20 2025 +0100

    Fixed missing non_blocking=True in to_device()

commit 2cd3971
Author: Christian Lessig <[email protected]>
Date:   Wed Dec 3 23:56:41 2025 +0100

    Completed migration to new batch class by removing reference to old list of lists

commit 402b8de
Author: Julian Kuehnert <[email protected]>
Date:   Wed Dec 3 17:11:15 2025 +0100

    1390 - Adapt forward pass of new batch object (ecmwf#1391)

    * Add to device to ModelBatch, etc & adapt model

    TODO adapt validate and inference
    TODO test forecasting and multiple stream because predict changed
    substantially

    * Rename view to sample and fix validate

    * Revert predict function and fix inference

    * Fix invalid access with mask

    * Linting

    * Fixed handling of target_idxs and other minor issues

    ---------

    Co-authored-by: sophiex <[email protected]>
    Co-authored-by: Christian Lessig <[email protected]>

commit 9a1a6a9
Author: Christian Lessig <[email protected]>
Date:   Wed Dec 3 13:12:52 2025 +0100

    Re-enabled multi-source training

commit 3641e1f
Author: Christian Lessig <[email protected]>
Date:   Wed Dec 3 00:20:42 2025 +0100

    Fix for integration test

commit 9f5e49c
Author: Christian Lessig <[email protected]>
Date:   Wed Dec 3 00:20:25 2025 +0100

    Fixed uv.lock

commit 33d9d8d
Merge: 23e0267 c8a2aad
Author: Christian Lessig <[email protected]>
Date:   Wed Dec 3 00:13:05 2025 +0100

    Merge branch 'shmh40/dev/1270-idx-global-local' of github.com:ecmwf/WeatherGenerator into shmh40/dev/1270-idx-global-local

commit 23e0267
Author: Christian Lessig <[email protected]>
Date:   Wed Dec 3 00:11:48 2025 +0100

    Update

commit c8a26d7
Author: Christian Lessig <[email protected]>
Date:   Wed Dec 3 00:11:37 2025 +0100

    Commit

commit 2599ec2
Author: Christian Lessig <[email protected]>
Date:   Wed Dec 3 00:10:13 2025 +0100

    Restructured code so that mask generation and application is cleanly separated

commit c8a2aad
Author: Tim Hunter <[email protected]>
Date:   Tue Dec 2 17:06:56 2025 +0100

    commenting tests

commit 2b2c977
Author: Tim Hunter <[email protected]>
Date:   Tue Dec 2 17:03:41 2025 +0100

    linter warnings

commit dc736e5
Merge: 6fe8561 7ff6e0b
Author: Tim Hunter <[email protected]>
Date:   Tue Dec 2 16:48:24 2025 +0100

    merge with dev

commit 6fe8561
Merge: 15b46e9 f136d60
Author: Christian Lessig <[email protected]>
Date:   Fri Nov 28 14:16:41 2025 +0100

    Merge branch 'develop' of github.com:ecmwf/WeatherGenerator into shmh40/dev/1270-idx-global-local

commit 15b46e9
Author: Sebastian Hickman <[email protected]>
Date:   Fri Nov 28 13:30:54 2025 +0100

    fix indentation of else: assert False in _get_sample msds

commit 4281aff
Author: Sebastian Hickman <[email protected]>
Date:   Fri Nov 28 12:40:24 2025 +0100

    restore loader_num_workers to 8

commit 6ea07e7
Author: Seb Hickman <[email protected]>
Date:   Fri Nov 28 11:34:41 2025 +0000

    restore masking_strategy to random

    Had placeholder for testing, now back to "random" for masking strategy in the base level of default_config

commit 1a37dd1
Author: Sebastian Hickman <[email protected]>
Date:   Fri Nov 28 10:31:43 2025 +0100

    remove unused mask generation in diffusion_forecast

commit 657094a
Author: Christian Lessig <[email protected]>
Date:   Fri Nov 28 08:59:39 2025 +0100

    Fixed problem in engines introduced in recent commits merging develop. This fixes masking training

commit d526dfc
Author: Christian Lessig <[email protected]>
Date:   Fri Nov 28 08:37:02 2025 +0100

    Restored masking as training mode. Not working due to NaN in prediction

commit 6289959
Author: Christian Lessig <[email protected]>
Date:   Fri Nov 28 08:36:38 2025 +0100

    Removed duplicate lines due to mergeing

commit bc8d23e
Author: Christian Lessig <[email protected]>
Date:   Fri Nov 28 08:18:01 2025 +0100

    More linting

commit 47750a5
Author: Christian Lessig <[email protected]>
Date:   Fri Nov 28 08:10:09 2025 +0100

    Restoring masking as training_mode in default_config

commit 0db8b62
Author: Christian Lessig <[email protected]>
Date:   Fri Nov 28 08:09:41 2025 +0100

    Linting

commit e41a575
Author: Christian Lessig <[email protected]>
Date:   Fri Nov 28 08:09:28 2025 +0100

    Linting

commit 03166a2
Author: Christian Lessig <[email protected]>
Date:   Fri Nov 28 08:09:10 2025 +0100

    Linting

commit 652500a
Author: Christian Lessig <[email protected]>
Date:   Fri Nov 28 08:08:53 2025 +0100

    Linting

commit d8998a9
Author: Christian Lessig <[email protected]>
Date:   Fri Nov 28 08:08:38 2025 +0100

    Linting

commit 8ef3a4c
Author: Christian Lessig <[email protected]>
Date:   Fri Nov 28 08:08:04 2025 +0100

    Simplified and clarified handling of default target_aux_calcualtor

commit 3e4de7a
Author: Christian Lessig <[email protected]>
Date:   Fri Nov 28 08:07:51 2025 +0100

    Linting

commit 5f803e5
Merge: b47b0fa 0e2801b
Author: Christian Lessig <[email protected]>
Date:   Fri Nov 28 08:03:02 2025 +0100

    Merge branch 'develop' of github.com:ecmwf/WeatherGenerator into shmh40/dev/1270-idx-global-local

commit b47b0fa
Merge: 9b702c5 26f7b5b
Author: Christian Lessig <[email protected]>
Date:   Fri Nov 28 07:09:19 2025 +0100

    Merge branch 'shmh40/dev/1270-idx-global-local' of github.com:ecmwf/WeatherGenerator into shmh40/dev/1270-idx-global-local

commit 26f7b5b
Author: Sebastian Hickman <[email protected]>
Date:   Thu Nov 27 15:33:22 2025 +0100

    add diffusion forecast option for the data sampling, and with noise_level_rn in the metadata. The Trainer needs to be copied from Sophies branch, currently we only get so far

commit 6d909d6
Author: Sebastian Hickman <[email protected]>
Date:   Thu Nov 27 11:32:32 2025 +0100

    add mask to SampleMetaData and add forecast_dt to Sample so it is accessible. Can specify the loss in the default config with student-teacher views

commit e0d7346
Author: Sebastian Hickman <[email protected]>
Date:   Wed Nov 26 14:31:52 2025 +0100

    remove prints, pdb

commit c27156c
Author: Sebastian Hickman <[email protected]>
Date:   Wed Nov 26 12:35:03 2025 +0100

    add SampleMetaData integration and functionality, and update masker to use SampleMetadata. Pass through source_cell_lens and target_coords_idx to student_teacher_batch in iter, and hence pass through to trainer. source_cell_lens and target_coords_idx are now part of Sample, which is itself the components of ModelBatch. To tidy

commit 4f8f62b
Author: Sebastian Hickman <[email protected]>
Date:   Tue Nov 25 18:56:56 2025 +0100

    instructions for sophie

commit fa24fc1
Author: Sebastian Hickman <[email protected]>
Date:   Tue Nov 25 16:36:52 2025 +0100

    very hacky first pass of full masking_strategy_config for the student and teacher views. Much to fix up

commit b193a50
Author: Sebastian Hickman <[email protected]>
Date:   Mon Nov 24 17:13:37 2025 +0100

    updated configs so code runs. Note default config to be overhauled still

commit af9a3c1
Merge: 2905cb0 b452bd2
Author: Sebastian Hickman <[email protected]>
Date:   Mon Nov 24 16:37:55 2025 +0100

    merge with develop, include trainer idx_inv_rt, merged default_config, rm tokenizer_forecast

commit 2905cb0
Author: Sebastian Hickman <[email protected]>
Date:   Sat Nov 22 13:59:37 2025 +0000

    fix masking for NPP-ATMS by correctly selecting final timestep mask and aligning between source and target. working for num_input_steps = 1, broken for > 1, compute_offsets_scatter_embed not working

commit b9a60f3
Author: Sebastian Hickman <[email protected]>
Date:   Fri Nov 21 18:38:40 2025 +0000

    tidy up, remove unused arguments, types

commit ece1dd0
Author: Sebastian Hickman <[email protected]>
Date:   Fri Nov 21 16:22:27 2025 +0000

    move build_views_for_stream into masker

commit 1a418bf
Author: Sebastian Hickman <[email protected]>
Date:   Fri Nov 21 12:54:33 2025 +0000

    add max_num_samples functionality to tokenizer_masking and pass through in multi_stream_data_sampler. coords_per_cell is a bit nasty

commit 91c3d7a
Author: Sebastian Hickman <[email protected]>
Date:   Fri Nov 21 12:53:31 2025 +0000

    add max_num_targets to era5

commit 647e4b2
Author: Sebastian Hickman <[email protected]>
Date:   Thu Nov 20 18:31:45 2025 +0000

    multiple idxs for each teacher, need to confirm for not student case, and updated ModelBatch for this

commit 1806ae5
Author: Sebastian Hickman <[email protected]>
Date:   Thu Nov 20 16:28:30 2025 +0000

    tidy up, remove unused build_stream_views in tokenizer_masking

commit 9b702c5
Author: Christian Lessig <[email protected]>
Date:   Thu Nov 20 14:34:34 2025 +0100

    Re-enabling inversion of targert ordering.

commit 87ad45f
Author: Sebastian Hickman <[email protected]>
Date:   Thu Nov 20 13:10:34 2025 +0000

    add teacher num_views parameter to config

commit b34b6da
Author: Sebastian Hickman <[email protected]>
Date:   Thu Nov 20 13:09:19 2025 +0000

    collect num_source_samples and num_target_samples, add loop over teacher masks hence allowing multiple teacher views, and add source_target_idx to keep track of which student belongs to which teacher

commit b2be982
Author: Sebastian Hickman <[email protected]>
Date:   Thu Nov 20 13:07:47 2025 +0000

    fix typo in ModelBatch

commit d18cf86
Author: Christian Lessig <[email protected]>
Date:   Thu Nov 20 08:26:40 2025 +0100

    Added todo

commit e8ccb8d
Author: Christian Lessig <[email protected]>
Date:   Thu Nov 20 08:22:26 2025 +0100

    Added required reflexivity between source and target samples to Batch

commit 5d5e999
Author: Christian Lessig <[email protected]>
Date:   Thu Nov 20 08:21:31 2025 +0100

    Linting problems but removed unused ViewMetaData dependence

commit 3bca490
Author: Christian Lessig <[email protected]>
Date:   Thu Nov 20 08:21:13 2025 +0100

    linting

commit 6a96065
Author: Christian Lessig <[email protected]>
Date:   Thu Nov 20 08:20:42 2025 +0100

    Linting

commit c1d32fb
Author: Christian Lessig <[email protected]>
Date:   Thu Nov 20 08:20:21 2025 +0100

    linting

commit 1b1654c
Author: Christian Lessig <[email protected]>
Date:   Wed Nov 19 22:32:05 2025 +0100

    Added basic support for use of ModelBatch class to define rough structure and interface.

commit 848880b
Author: Christian Lessig <[email protected]>
Date:   Wed Nov 19 20:06:41 2025 +0100

    Renaming and minor clean up.

commit 6d685c0
Author: Christian Lessig <[email protected]>
Date:   Wed Nov 19 19:57:46 2025 +0100

    Moved _get_student_teacher_masks() so that masks are generated for all streams first.

commit ed26c02
Author: Christian Lessig <[email protected]>
Date:   Wed Nov 19 19:57:23 2025 +0100

    Changes to have spoofing on a per data reader sample

commit 9fe94f5
Author: Christian Lessig <[email protected]>
Date:   Wed Nov 19 19:30:48 2025 +0100

    Changes necessary for spoofing flag per IOReaderData

commit 4613f7a
Author: Christian Lessig <[email protected]>
Date:   Wed Nov 19 17:58:10 2025 +0100

    Cleaned up parametrization

commit 1235aab
Author: Christian Lessig <[email protected]>
Date:   Wed Nov 19 17:47:40 2025 +0100

    More refactoring. Code working again.

commit 1e70f5c
Author: Christian Lessig <[email protected]>
Date:   Wed Nov 19 17:09:20 2025 +0100

    More refactoring and cleanup

commit 46147d4
Author: Christian Lessig <[email protected]>
Date:   Wed Nov 19 17:01:29 2025 +0100

    More refactoring

commit 81cf929
Author: Christian Lessig <[email protected]>
Date:   Wed Nov 19 15:58:57 2025 +0100

    Changes for better student teacher structure

commit dfc03f2
Merge: a824bfc 31dc658
Author: Christian Lessig <[email protected]>
Date:   Wed Nov 19 15:58:37 2025 +0100

    Merge branch 'shmh40/dev/1270-idx-global-local' of github.com:ecmwf/WeatherGenerator into shmh40/dev/1270-idx-global-local

commit a824bfc
Author: Christian Lessig <[email protected]>
Date:   Wed Nov 19 12:23:47 2025 +0100

    Not working draft for restructuring

commit 31dc658
Author: Sebastian Hickman <[email protected]>
Date:   Wed Nov 19 11:04:29 2025 +0000

    created function for _get_student_teacher_sample_data which returns the streams_data of the teacher and multiple streams_datas for the student views.

commit 2536cec
Author: Sebastian Hickman <[email protected]>
Date:   Tue Nov 18 17:40:26 2025 +0000

    correct imports with new batch.py

commit b3dfa2f
Merge: 11ad4e6 c1580c4
Author: Sebastian Hickman <[email protected]>
Date:   Tue Nov 18 17:36:15 2025 +0000

    merge changes

commit 11ad4e6
Author: Sebastian Hickman <[email protected]>
Date:   Tue Nov 18 17:34:19 2025 +0000

    basic if statement to yield the student and teacher views

commit 36ea287
Author: Sebastian Hickman <[email protected]>
Date:   Tue Nov 18 17:33:53 2025 +0000

    slight restructure of ViewMetadata

commit 66cf9cd
Author: Sebastian Hickman <[email protected]>
Date:   Tue Nov 18 17:33:08 2025 +0000

    added stream id to era5 config

commit 3c26ddc
Author: Sebastian Hickman <[email protected]>
Date:   Tue Nov 18 17:32:00 2025 +0000

    updated default config training_config to allow student-teacher

commit c1580c4
Author: Christian Lessig <[email protected]>
Date:   Tue Nov 18 16:30:44 2025 +0100

    Renaming

commit 85fa139
Author: Christian Lessig <[email protected]>
Date:   Tue Nov 18 16:28:46 2025 +0100

    Comments

commit dd6f85a
Author: Christian Lessig <[email protected]>
Date:   Tue Nov 18 15:30:22 2025 +0100

    Added mode and refactored get_sample_data into separate function.

commit 668912d
Author: Christian Lessig <[email protected]>
Date:   Tue Nov 18 13:47:40 2025 +0100

    Partially enabled correct handling of multiple input steps.

commit c3b5c3b
Author: Christian Lessig <[email protected]>
Date:   Tue Nov 18 12:02:17 2025 +0100

    Added basic support for multi-step sources.

commit ab9eecc
Merge: a934f97 c733280
Author: Christian Lessig <[email protected]>
Date:   Tue Nov 18 10:00:37 2025 +0100

    Merge branch 'shmh40/dev/1270-idx-global-local' of github.com:ecmwf/WeatherGenerator into shmh40/dev/1270-idx-global-local

commit a934f97
Author: Christian Lessig <[email protected]>
Date:   Tue Nov 18 09:58:19 2025 +0100

    NOT WORKING: updating class to handle multiple input steps and improving overall structure

commit c733280
Author: Sebastian Hickman <[email protected]>
Date:   Mon Nov 17 18:32:40 2025 +0000

    change view_metadata to dict in ModelInput

commit 7d5c300
Author: Sebastian Hickman <[email protected]>
Date:   Mon Nov 17 18:22:33 2025 +0000

    draft of training_config in default_config

commit 047b299
Author: Sebastian Hickman <[email protected]>
Date:   Mon Nov 17 18:19:56 2025 +0000

    draft changes to allow global local view generation in masker and tokenizer_masking. generate the mask, otherwise using batchify_source and batchify_target as before, with the capacity to remember what mask we have now when it comes to generating the targets. Update to inputs_metadata structure but not put in to practice

commit 761e263
Author: Sebastian Hickman <[email protected]>
Date:   Mon Nov 17 18:13:57 2025 +0000

    update ViewMetadata spec

commit 7f3c718
Author: Christian Lessig <[email protected]>
Date:   Mon Nov 17 14:51:01 2025 +0100

    Updating config to working version

commit ae5a2e6
Author: Sebastian Hickman <[email protected]>
Date:   Mon Nov 17 11:54:18 2025 +0000

    added file with ModelBatch and SampleMetadata dataclasses

commit debbb8f
Author: Christian Lessig <[email protected]>
Date:   Mon Nov 17 12:28:07 2025 +0100

    Changes to  prepare_logging to apply index inversion

commit 5d127bf
Author: Christian Lessig <[email protected]>
Date:   Sun Nov 16 17:01:08 2025 +0100

    Inversion of target output ordering to match input one in forcast mode. Unclear how to deal with it with MTM

commit 8fa544d
Author: Christian Lessig <[email protected]>
Date:   Fri Nov 14 20:43:57 2025 +0100

    Removed unused parameters

commit ce6c735
Author: Christian Lessig <[email protected]>
Date:   Fri Nov 14 16:56:51 2025 +0100

    Removing centroids options for embedding that was unused and should not be used.

commit 0634105
Author: Christian Lessig <[email protected]>
Date:   Fri Nov 14 09:59:13 2025 +0100

    Enabled support for forecast. Cleaned up some bits and pieces.

commit ec38123
Author: Christian Lessig <[email protected]>
Date:   Fri Nov 14 08:27:21 2025 +0100

    Fixed remaining problems that occured for NPP-ATMS and SYNOP.
    TODO:
    - Forecast still needs to be adapted
    - Some more cleanup of variable naming, return values etc

commit db6f285
Author: Christian Lessig <[email protected]>
Date:   Thu Nov 13 23:26:31 2025 +0100

    Fixed linting

commit 9229e48
Author: Christian Lessig <[email protected]>
Date:   Thu Nov 13 23:19:21 2025 +0100

    Minor cleanup

commit a581405
Author: Christian Lessig <[email protected]>
Date:   Thu Nov 13 23:17:29 2025 +0100

    Working version for ERA5, NPP-ATMS. Problems with SYNOP with empty cell handling

commit e4a9cc0
Author: Christian Lessig <[email protected]>
Date:   Thu Nov 13 18:58:28 2025 +0100

    Masking target is working in principle but errors when feeding data to the model.

commit 51f437f
Author: Christian Lessig <[email protected]>
Date:   Thu Nov 13 07:04:23 2025 +0100

    NOT WORKING: Finished src, target still to be done.

commit 81bd6eb
Author: Christian Lessig <[email protected]>
Date:   Wed Nov 12 09:38:53 2025 +0100

    NOT WORKING: initial draft for index-based masking. Implemented for random and healpix masking. Open issues with _coords_local, centroids and probably other things.
@moritzhauschulz moritzhauschulz changed the base branch from shmh40/dev/1270-idx-global-local to develop January 7, 2026 09:08
@moritzhauschulz
Copy link
Contributor Author

Fixes #1379

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

Status: No status

Development

Successfully merging this pull request may close these issues.

Implement hot-fix for single sample training

4 participants