Skip to content

Models/evaluate year#124

Open
tommylees112 wants to merge 15 commits intomasterfrom
models/evaluate_year
Open

Models/evaluate year#124
tommylees112 wants to merge 15 commits intomasterfrom
models/evaluate_year

Conversation

@tommylees112
Copy link
Copy Markdown
Contributor

We are trying here to use the TRAINED model to evaluate specific months.

The major NEW functions are:
src/models/base.py: evaluate_train_timesteps

And changes to:
src/models/neural_networks/base.py: predict

  • adding the ability to choose specific year/months to run the predictions

NOTE: need to update all of the models (non-neural networks too)

@gabrieltseng
Copy link
Copy Markdown
Contributor

@tommylees112 , do you want me to take a look at this now, or once the tests are passing?

@tommylees112
Copy link
Copy Markdown
Contributor Author

when tests are passing please buddy

@tommylees112
Copy link
Copy Markdown
Contributor Author

This currently isn't working for me even though it is passing the test. will need to have a play!

@tommylees112 tommylees112 added model validation modelling wip Work in progress - not ready for merging labels Sep 6, 2019
Copy link
Copy Markdown
Contributor

@gabrieltseng gabrieltseng left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey Tommy - I know things aren't working yet, but took a quick look and have some initial comments.

I'll also take a look once everything is working!

Whether to load testing or training data. This also affects the way the data is
returned; for train, it is a concatenated array, but for test it is a dict with dates
so that the netcdf file can easily be reconstructed
>>>>>>>>>CHANGE
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

typo: to remove

test_month: Optional[int] = None) -> List[Path]:

data_folder = data_path / f'features/{experiment}/{mode}'
if (test_year is None) and (test_month is None):
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This API is confusing - there is a year and month associated with the test data too (i.e. if I pass test_year=2018 in our current setup, it wouldn't work).

Perhaps something that makes sense would be a function which determines which years are in the test folder, and which years are in the train folder (which shouldn't be too hard, since that information is in the folder names).

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, if test_year is passed, but test_month is not (which would be useful to predict all of 2018), things wouldn't work.

Maybe limiting the granularity to years? And allowing a range of years to be passed?

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(I see now that this is handled in the base training class, but since the DataLoader is something we would want to expose to users, I think it makes sense to clean up here)

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can we chat through how to improve this API ? So far I have the ideas:

  1. Have a function to check whether the file exists in the train datafolder
  2. Expose this to the user from the DATALOADER not the base neural_network class
  3. Require making predictions of whole years (don't expose months to users)

Did i miss any?

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

for 3., would it be possible to expose both, but if test_month is None, then do the whole year?

def predict(self) -> Tuple[Dict[str, Dict[str, np.ndarray]], Dict[str, np.ndarray]]:

test_arrays_loader = DataLoader(data_path=self.data_path, batch_file_size=self.batch_size,
def predict(self, test_year: Optional[int] = None,
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same comment as above regarding the confusing interface

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, this functionality needs to be added for the regression and parsimonious model

else:
preds_xr.to_netcdf(self.model_dir / f'preds_{key}.nc')

def evaluate_train_timesteps(self, years: List[int],
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See comments in the DataLoader class

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

model validation modelling wip Work in progress - not ready for merging

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants