-
Notifications
You must be signed in to change notification settings - Fork 1.1k
add Kimber soiling model #860
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
with this and #850 do we want to:
|
|
Oops! Sorry, I can move this to Also I was thinking of waiting on this until #850 and #859 are merged, just to make it easier to sync. Thanks! |
Signed-off-by: Mark Mikofski <[email protected]>
* also add a gallery example draft
|
@kanderso-nrel I get this error from my example: But it works on my machine, do I have the wrong matplotlib version? How do you plot timestamps? I guess I can plot from pandas instead of using |
* use pd.to_datetime for plt * start to add manwash test using datetime
|
I installed the same matplotlib and numpy versions as RTD is using and can recreate the error with pandas 1.0.1, but when using pandas 0.25.1 (instead of 1.0.1 like RTD) I get this warning: Switching to 1.0.1 and implementing that matplotlib registration line appears to fix it. I wonder why they stopped registering it by default, that's a little annoying. |
* add test data for manual wash * try to fix example plot using to_pydatetime
|
All tests are passing, the gallery example tenders, I don't know what that patch problem is. Any reviewers? |
kandersolar
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Very clean code! I wonder if it would be worth trying to vectorize it? Whenever I see for date in series.index it triggers alarm bells, although I suppose since it's over a daily timeseries the scale is somewhat constrained.
* reuse rainfall events * combine zero soil assignments
* correct Adrianne spelling * add table of soiling rates from figure 3 from Kimber paper * fix reference indents in docstring
* fix table malformed table in losses * fix long line in docstring
|
okay, any more comments? @CameronTStark maybe add to v0.7.2? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm OK with the default values. I'd prefer different names for some of the argument, soiling_rate in particular, since the unit is not a mass or mass rate.
A minor quibble: I'd prefer that the file names for the sample data and the example start with soiling_kimber to make it easier to associate these files with the function, not difficult now but as more examples accumulate...
We should tell the user that the index returned may not be the same as the index of the input rainfall Series.
FYI sphinx-gallery defaults to only executing examples if their filename starts with I suggest changing that regex to just match everything -- if for some reason in the future we want to include examples in the gallery without executing them, we can make it more specific at that point. |
* combine panels cleaned today with grace-period, since it includes today
kandersolar
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Three last suggestions from me.
* also fix missing "for" in note Co-Authored-By: Kevin Anderson <[email protected]>
* remove extra whitespace
|
@kanderso-nrel you are right, vectorizing this code makes it run about 20 to 30 times faster on my machine (from 130-140ms to 4-6ms, depending on washes), but it requires the input rainfall time-series to be monotonic - which means that it needs some special code to fix TMY3, just like #889 , but this doesn't add much overhead, about 0.5-1 ms, and I don't think it would effect other non-TMY3 time-series. There are some pros/cons to the vectorized version.
Here are some plots to compare against, orange is the vectorized version, blue is not. There are minor differences due to the differences above, but I think they might be improvements? I'd have to save new test data. The code is copied from the HSU model, which was already vectorized: Detailsdef soiling_kimber_vec(rainfall, cleaning_threshold=6, soiling_loss_rate=0.0015,
grace_period=14, max_soiling=0.3, manual_wash_dates=None,
initial_soiling=0):
# convert grace_period to timedelta
grace_period = datetime.timedelta(days=grace_period)
# protect against TMY3 by rolling back and then adding 1
rain_index = np.roll(rainfall.index.values, 1) + np.timedelta64(1, 'h')
periods_per_day = (rain_index[1] - rain_index[0]) / np.timedelta64(24, 'h')
rain_tz = rainfall.index.tz
rain_index = pd.DatetimeIndex(
rain_index).tz_localize('UTC').tz_convert(rain_tz)
rainfall = pd.Series(rainfall.values, index=rain_index, name='rainfall')
daily_rainfall = rainfall.rolling('D').sum() # closed right
norain = np.ones_like(rainfall.values) * soiling_loss_rate * periods_per_day
norain[0] = initial_soiling
norain = np.cumsum(norain)
rain_events = daily_rainfall > cleaning_threshold
grace_windows = rain_events.rolling(grace_period).sum() > 0
soiling = pd.Series(float('NaN'), index=rainfall.index)
soiling.iloc[0] = initial_soiling
soiling[grace_windows] = norain[grace_windows]
# manual wash dates
if manual_wash_dates is not None:
manual_wash_dates = pd.DatetimeIndex(manual_wash_dates).tz_localize(rain_tz)
norain = pd.Series(norain, index=rain_index, name='soiling')
soiling[manual_wash_dates] = norain[manual_wash_dates]
return norain - soiling.ffill() |
|
I guess it goes without saying that the faster, more flexible version is everyone's preference? |
|
Todo
|
As long as it's consistent with the publication, yes for me. |
Yes, they are consistent, the only difference is that when Adie did it, I think she was mainly looking at daily rainfall, so that was the default accumulation period and time-step interval, which meant that everything begins and ends at midnnight. With the vectorized version, we can interpret the accumulation period more broadly, as the total rainfall over the last 24-hours. So it could be 8AM to 8AM instead of always midnight to midnight. The difference occurs if there's rain overnight. For example, if the threshold is 6mm, what if there's 3mm from 8pm to midnight, and then another 3mm from midnight to 6am? In the daily discreet version (unvectorized) this would not clean the panels because the accumulated rainfall 3mm on day-1 and 3mm on day-2 was never greater than the threshold. But in the rolling (vectorized) version, this would clean the panels. Personally I think this is better, and now that hourly rainfall is more widely available, I don't see why we wouldn't prefer it. Here's a copy of the paper: Kimber_2006_Full_Paper_r3.pdf |
The rolling window approach is definitely better than the midnight to midnight approach. Because rainfall threshold is a depth rather than a rate, what about an argument |
an improvement on the model for sure, I like it, thanks! |
* use a rolling window and rain_accum_period to get more accurate rain coverage overnite instead of only discrete days from midnite to midnite * add rain_accum_period to allow deviation from kimber model, recommends only 24h but why not be flexible? * add istmy=False to fix last hour of tmy to be monotonically increasing * add another test for initial condition * fix example input arg cleaning_threshold
|
Ok @CameronTStark, it's ready, any last comments? |
CameronTStark
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It looks good to go other than the one NIT. I'll merge either way.
| parse_dates=True, index_col='timestamp') | ||
|
|
||
|
|
||
| @needs_pandas_0_22 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I didn't get to try this but I assume these @needs_pandas_0_22 are necessary. If so I'm even more inclined to bump pandas version to unify the output between versions. No change necessary, just thinking out loud mostly.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pandas DataFrame.rolling() kwarg closed isn't available for py35-min on Travis but everything else passes, py35-min uses pandas-0.18.1
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My understanding is that the soiling_kimber function does not work at all with pandas 0.18. Correct? If so, that's a different kind of breakage than we've traditionally allowed across dependency versions. The function documentation doesn't give any hint as to why it's going to fail in that scenario either. These @needs* decorators have traditionally been used for only optional dependencies or quirks between versions (like pandas sum of nans = nan or = 0).
I thought we were in agreement that we'd bump the pandas spec for 0.8, not 0.7.2. This suggests that we need to do it for 0.7.2. Not the end of the world, but not ideal.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oops, I didn't realize that. The existing soiling_hsu also used this closed kwarg in DataFrame.rolling()
Line 69 in 64a6498
| accum_rain = rainfall.rolling(rain_accum_period, closed='right').sum() |
but it's pandas dependency was masked in the test by requires_scipy
pvlib-python/pvlib/tests/test_losses.py
Lines 83 to 84 in 64a6498
| @requires_scipy | |
| def test_soiling_hsu(rainfall_input, expected_output_2): |
One idea is just dropping this closed kwarg from DataFrame.rolling() in both soiling functions, which works fine in pandas-0.18.1, the default behavior is 'both' which isn't the end of the world, but probably not strictly faithful to the references, which imply exclusion of starting timestamp.
Any other ideas @cwhanse? @kanderso-nrel?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These @needs* decorators have traditionally been used for only optional dependencies or quirks between versions (like pandas sum of nans = nan or = 0).
Thanks for clarifying this further for us @wholmgren.
I thought we were in agreement that we'd bump the pandas spec for 0.8, not 0.7.2. This suggests that we need to do it for 0.7.2. Not the end of the world, but not ideal.
We can revert this merge and chamber it for v0.8. It'll give time to address some of the issues raised brought up and stay consistent with our plan to bump pandas at v0.8. What do you think @wholmgren?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think we need to revert. @mikofski pointed out that the same issue exists in the already released soiling_hsu. We could add a note to the doc strings, or a try/except with a better error message around the rolling calls, or do nothing. I'm ok with any of those.
We should just be more careful with the requires and needs decorators. I know I've been lazy about them in the past.
This is all also an argument for 1) adding scipy to the base requirements so the test suite doesn't have gotchas like with the hsu function (issue #483) and 2) being more aggressive with bumping dependency specs so that contributors don't have to spend as much time supporting older libraries (issue #828).
|
Thanks! Sorry I didn't have a chance to fix the nit. Maybe we can address it later. |


docs/sphinx/source/api.rstfor API changes.docs/sphinx/source/whatsnewfor all changes. Includes link to the GitHub Issue with:issue:`num`or this Pull Request with:pull:`num`. Includes contributor name and/or GitHub username (link with:ghuser:`user`).generated with the following snippet: