Skip to content

Multi-domain: A dynamic way of training across domains and resolutions #919

@einrone

Description

@einrone

Is your feature request related to a problem? Please describe.

Issue:

Currently, anemoi-core does not support dynamically switching between different global datasets, regional datasets or experiment with different resolutions during the same training without introducing additional encoders and decoders.

Each training run assumes a fixed spatial configuration (same domain and resolution), which makes it difficult to train a model with mixed datasets where samples come from different regions and resolutions during each training iteration steps.

without multi-domain:

  1. Introducing additional transfer learning steps when fine-tuning to a new domain.
  2. Transfer learning can cause catastrophic forgetting, where learned features of previous steps may be forgotten.
  3. When training with multiple encoders, the similarities between datasets are not optimally used.
  4. Harder to achieve cross-domain or cross-resolution generalization as the model is not able to continuously see different regional (or global at different resolution) samples
  5. Harder to get a good prediction on a unseen domain i.e domain(s) that is not contained in the training dataset (model has not generalized)

Describe the solution you'd like

Quick summary of the solution:

multi-domain is a concept that allows the user to train a model on different datasets simultaneously, without introducing additional model components, e.g encoders and decoders. Compared to multi-dataset PR, the model architecture is a single encoder-processor-decoder structure that can be supported with different back bones. This is done by removing all graph and grid dependency in the model architecture i.e not initializing the model (during creation) with a graph. Tested with both deterministic and ensemble models (implemented in a fork on the DestinE repository). Down below there is an illustration of the model trains, see gif:

During training the dataloader holds different anemoi dataset objects which are labeled. For each training iteration the sample is randomly fetched and provides an output containing the data (shape = [time, Ensemble, Variable, Grid]) and its label, which is ties the data and its corresponding graph. An image of the output from the dataloader is provided

Image

Additional details can be provided in a document or in a meeting.

What needs to be changed/modified/implemented:

Each of these tasks can be a separate PR or combined PRs. This can be of course be discussed

anemoi-models:

  • layers/mapper.py (remove all grid dependencies)
  • layers/block.py (remove all grid dependencies)
  • layers/processor.py (remove all grid dependencies)
  • interface/init.py (add support for multi-domain)
  • models/multi-domain.py (introduce multi-domain model inheriting enc-proc-dec or ens-enc-proc-dec)

anemoi-training:

  • training/train.py (accomodate for multi-domain functionalities, multiple dataset and graph objects)
  • need to modify metadata (supporting arrays, indexes, etc..)
  • datamodule and dataloader (separate or build in multi-domain functionalities into existing modules and dataloaders)
  • train/task/base.py (accomodate for multi-domain or create separate class)
  • train/task/ensforcaster.py (accomodate for multi-domain or create separate class)
  • train/task/forcaster.py (accomodate for multi-domain or create separate class)

anemoi-graphs
no change required

Regarding loss functions and scalers, in the current refactor each domain has its own loss and scaler object, avoiding modifying all scalers and loss. However for an upcoming PR about loss and scalers to fit multi-domain, I am quite open to suggestions and ideas to implement this part in the best way possible to fit other use cases and multi-domain.

Describe alternatives you've considered

No response

Additional context

For those of you who are interested in running test-cases or wants to take a look at the multi-domain code base, please look at the link down below:
multi-domain-repo

Config and guidance can be provided, just let me or sophie know.

Organisation

Norwegian Meterological Institute
KNMI (Royal Netherlands Meterological Institute)

Metadata

Metadata

Labels

enhancementNew feature or request

Projects

Status

To be triaged

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions