Seperate data preprocessing from training/inference

Be able to split the "data-prep" task from the "model-training" task. On well-monitored/constrained HPC, hogging a GPU for a long time just to prepare batches is somewhat wasteful (and our over-eager admins have implemented GPU monitoring and automated emails). I know the all-in-one solution of MEDS-DEV is attractive, but I think we can have both.