Skip to content

Conversation

@mikofski
Copy link
Member

@mikofski mikofski commented Jan 15, 2020

  • Closes Kimber Dynamic Soiling model #669
  • I am familiar with the contributing guidelines
  • Tests added
  • Updates entries to docs/sphinx/source/api.rst for API changes.
  • Adds description and name entries in the appropriate "what's new" file in docs/sphinx/source/whatsnew for all changes. Includes link to the GitHub Issue with :issue:`num` or this Pull Request with :pull:`num`. Includes contributor name and/or GitHub username (link with :ghuser:`user`).
  • New code is fully documented. Includes numpydoc compliant docstrings, examples, and comments where necessary.
  • Pull request is nearly complete and ready for detailed review.
  • Maintainer: Appropriate GitHub Labels and Milestone are assigned to the Pull Request and linked Issue.

kimber

generated with the following snippet:

from pvlib.losses import soiling_kimber
import pvlib
x = pvlib.iotools.read_tmy3('path/to/722287TYA.CSV', coerce_year=1990)
z = soiling_kimber(x[0].Lprecipdepth)
z.plot()
import datetime
manwash = [datetime.date(1990,11,1), ]
z = soiling_kimber(x[0].Lprecipdepth, manual_wash_dates=manwash)
z.plot()
from matplotlib import pyplot as plt
plt.grid()
plt.title('Kimber Soiling Model')
plt.ylabel('soiling build-up fraction')
x[1]
plt.legend([x[1]['Name'], 'manual wash Nov. 1st'])

@cwhanse
Copy link
Member

cwhanse commented Jan 15, 2020

with this and #850 do we want to:

  1. put all loss functions into losses.py (for now),
  2. put all soiling loss functions into soiling.py and anticipate N x other_loss_type.py, or
  3. create a losses folder to contain soiling.py, etc.?

@mikofski
Copy link
Member Author

Oops! Sorry, I can move this to losses.py. I say we kick this question down the road until we are forced to decide. I don't think it will make a big difference now, and hopefully, we should be able to handle the API changes later without too much stress. Is that okay?

Also I was thinking of waiting on this until #850 and #859 are merged, just to make it easier to sync. Thanks!

@mikofski
Copy link
Member Author

@kanderso-nrel I get this error from my example:

Traceback (most recent call last):
  File "/home/docs/checkouts/readthedocs.org/user_builds/pvlib-python/checkouts/860/docs/examples/plot_greensboro_kimber_soiling.py", line 48, in <module>
    soiling_no_wash.index, soiling_no_wash.values*100.0)
TypeError: float() argument must be a string or a number, not 'Timestamp'

But it works on my machine, do I have the wrong matplotlib version? How do you plot timestamps? I guess I can plot from pandas instead of using pyplot

* use pd.to_datetime for plt
* start to add manwash test using datetime
@kandersolar
Copy link
Member

I installed the same matplotlib and numpy versions as RTD is using and can recreate the error with pandas 1.0.1, but when using pandas 0.25.1 (instead of 1.0.1 like RTD) I get this warning:

/home/kevin/anaconda3/lib/python3.7/site-packages/pandas/plotting/_matplotlib/converter.py:103: 
FutureWarning: Using an implicitly registered datetime converter for a matplotlib plotting method. 
The converter was registered by pandas on import.
Future versions of pandas will require you to explicitly register matplotlib converters.

To register the converters:
	>>> from pandas.plotting import register_matplotlib_converters
	>>> register_matplotlib_converters()
  warnings.warn(msg, FutureWarning)

Switching to 1.0.1 and implementing that matplotlib registration line appears to fix it. I wonder why they stopped registering it by default, that's a little annoying.

* add test data for manual wash
* try to fix example plot using to_pydatetime
@mikofski
Copy link
Member Author

All tests are passing, the gallery example tenders, I don't know what that patch problem is. Any reviewers?

@mikofski mikofski marked this pull request as ready for review February 13, 2020 03:23
Copy link
Member

@kandersolar kandersolar left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Very clean code! I wonder if it would be worth trying to vectorize it? Whenever I see for date in series.index it triggers alarm bells, although I suppose since it's over a daily timeseries the scale is somewhat constrained.

* reuse rainfall events
* combine zero soil assignments
* correct Adrianne spelling
* add table of soiling rates from figure 3 from Kimber paper
* fix reference indents in docstring
* fix table malformed table in losses
* fix long line in docstring
@mikofski
Copy link
Member Author

okay, any more comments? @CameronTStark maybe add to v0.7.2?

Copy link
Member

@cwhanse cwhanse left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm OK with the default values. I'd prefer different names for some of the argument, soiling_rate in particular, since the unit is not a mass or mass rate.

A minor quibble: I'd prefer that the file names for the sample data and the example start with soiling_kimber to make it easier to associate these files with the function, not difficult now but as more examples accumulate...

We should tell the user that the index returned may not be the same as the index of the input rainfall Series.

@kandersolar
Copy link
Member

A minor quibble: I'd prefer that the file names for the sample data and the example start with soiling_kimber to make it easier to associate these files with the function

FYI sphinx-gallery defaults to only executing examples if their filename starts with plot_. If you want to name gallery examples some other way, you'll need to configure that behavior: https://sphinx-gallery.github.io/stable/configuration.html#parsing-and-executing-examples-via-matching-patterns

I suggest changing that regex to just match everything -- if for some reason in the future we want to include examples in the gallery without executing them, we can make it more specific at that point.

* combine panels cleaned today with grace-period,
 since it includes today
Copy link
Member

@kandersolar kandersolar left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Three last suggestions from me.

mikofski and others added 3 commits February 13, 2020 11:49
@mikofski
Copy link
Member Author

@kanderso-nrel you are right, vectorizing this code makes it run about 20 to 30 times faster on my machine (from 130-140ms to 4-6ms, depending on washes), but it requires the input rainfall time-series to be monotonic - which means that it needs some special code to fix TMY3, just like #889 , but this doesn't add much overhead, about 0.5-1 ms, and I don't think it would effect other non-TMY3 time-series.

There are some pros/cons to the vectorized version.

  • it can handle rainfall at any time of the day or the week by using rolling windows, so it picks up some soiling that the discreet-daily (unvectorized) version doesn't
  • also it can start building up soiling at any time of the day, or be cleaned anytime
  • and cleaning is instantaneous, not a day-long process

Here are some plots to compare against, orange is the vectorized version, blue is not. There are minor differences due to the differences above, but I think they might be improvements? I'd have to save new test data.

No Wash
kimber_vec

With Wash
kimber_vec_wash

The code is copied from the HSU model, which was already vectorized:

Details
def soiling_kimber_vec(rainfall, cleaning_threshold=6, soiling_loss_rate=0.0015,
                       grace_period=14, max_soiling=0.3, manual_wash_dates=None,
                       initial_soiling=0):

    # convert grace_period to timedelta
    grace_period = datetime.timedelta(days=grace_period)

    # protect against TMY3 by rolling back and then adding 1
    rain_index = np.roll(rainfall.index.values, 1) + np.timedelta64(1, 'h')
    periods_per_day = (rain_index[1] - rain_index[0]) / np.timedelta64(24, 'h')
    rain_tz = rainfall.index.tz
    rain_index = pd.DatetimeIndex(
        rain_index).tz_localize('UTC').tz_convert(rain_tz)

    rainfall = pd.Series(rainfall.values, index=rain_index, name='rainfall')

    daily_rainfall = rainfall.rolling('D').sum()  # closed right

    norain = np.ones_like(rainfall.values) * soiling_loss_rate * periods_per_day
    norain[0] = initial_soiling
    norain = np.cumsum(norain)

    rain_events = daily_rainfall > cleaning_threshold

    grace_windows = rain_events.rolling(grace_period).sum() > 0

    soiling = pd.Series(float('NaN'), index=rainfall.index)
    soiling.iloc[0] = initial_soiling
    soiling[grace_windows] = norain[grace_windows]

    # manual wash dates
    if manual_wash_dates is not None:
        manual_wash_dates = pd.DatetimeIndex(manual_wash_dates).tz_localize(rain_tz)
        norain = pd.Series(norain, index=rain_index, name='soiling')
        soiling[manual_wash_dates] = norain[manual_wash_dates]

    return norain - soiling.ffill()

@cwhanse cwhanse mentioned this pull request Feb 14, 2020
8 tasks
@CameronTStark CameronTStark modified the milestones: Someday, 0.7.2 Feb 14, 2020
@mikofski
Copy link
Member Author

I guess it goes without saying that the faster, more flexible version is everyone's preference?

@mikofski
Copy link
Member Author

mikofski commented Feb 14, 2020

Todo

  • fix soiling[0] is zero, not initial_condition
  • test for initial condition
  • where greater than max, use max, else soiling less than max
  • fix time roll interval use dt not 1H, or check for TMY3 input arg?

@cwhanse
Copy link
Member

cwhanse commented Feb 14, 2020

I guess it goes without saying that the faster, more flexible version is everyone's preference?

As long as it's consistent with the publication, yes for me.

@mikofski
Copy link
Member Author

mikofski commented Feb 14, 2020

As long as it's consistent with the publication, yes for me.

Yes, they are consistent, the only difference is that when Adie did it, I think she was mainly looking at daily rainfall, so that was the default accumulation period and time-step interval, which meant that everything begins and ends at midnnight.

With the vectorized version, we can interpret the accumulation period more broadly, as the total rainfall over the last 24-hours. So it could be 8AM to 8AM instead of always midnight to midnight.

The difference occurs if there's rain overnight. For example, if the threshold is 6mm, what if there's 3mm from 8pm to midnight, and then another 3mm from midnight to 6am? In the daily discreet version (unvectorized) this would not clean the panels because the accumulated rainfall 3mm on day-1 and 3mm on day-2 was never greater than the threshold. But in the rolling (vectorized) version, this would clean the panels.

Personally I think this is better, and now that hourly rainfall is more widely available, I don't see why we wouldn't prefer it.

Here's a copy of the paper: Kimber_2006_Full_Paper_r3.pdf

@cwhanse
Copy link
Member

cwhanse commented Feb 14, 2020

With the vectorized version, we can interpret the accumulation period more broadly, as the total rainfall over the last 24-hours. So it could be 8AM to 8AM instead of always midnight to midnight.

The rolling window approach is definitely better than the midnight to midnight approach. Because rainfall threshold is a depth rather than a rate, what about an argument accumulation_period that defines the time window over which the threshold applies? If you agree, maybe default to 24 hours to be faithful to the reference.

@mikofski
Copy link
Member Author

what about an argument accumulation_period that defines the time window over which the threshold applies? If you agree, maybe default to 24 hours to be faithful to the reference.

an improvement on the model for sure, I like it, thanks!

* use a rolling window and rain_accum_period to get more accurate rain coverage
 overnite instead of only discrete days from midnite to midnite
* add rain_accum_period to allow deviation from kimber model, recommends
only 24h but why not be flexible?
* add istmy=False to fix last hour of tmy to be monotonically increasing
* add another test for initial condition
* fix example input arg cleaning_threshold
@mikofski
Copy link
Member Author

Ok @CameronTStark, it's ready, any last comments?

Copy link
Contributor

@CameronTStark CameronTStark left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It looks good to go other than the one NIT. I'll merge either way.

parse_dates=True, index_col='timestamp')


@needs_pandas_0_22
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I didn't get to try this but I assume these @needs_pandas_0_22 are necessary. If so I'm even more inclined to bump pandas version to unify the output between versions. No change necessary, just thinking out loud mostly.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pandas DataFrame.rolling() kwarg closed isn't available for py35-min on Travis but everything else passes, py35-min uses pandas-0.18.1

Copy link
Member

@wholmgren wholmgren Feb 15, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My understanding is that the soiling_kimber function does not work at all with pandas 0.18. Correct? If so, that's a different kind of breakage than we've traditionally allowed across dependency versions. The function documentation doesn't give any hint as to why it's going to fail in that scenario either. These @needs* decorators have traditionally been used for only optional dependencies or quirks between versions (like pandas sum of nans = nan or = 0).

I thought we were in agreement that we'd bump the pandas spec for 0.8, not 0.7.2. This suggests that we need to do it for 0.7.2. Not the end of the world, but not ideal.

Copy link
Member Author

@mikofski mikofski Feb 15, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oops, I didn't realize that. The existing soiling_hsu also used this closed kwarg in DataFrame.rolling()

accum_rain = rainfall.rolling(rain_accum_period, closed='right').sum()

but it's pandas dependency was masked in the test by requires_scipy

@requires_scipy
def test_soiling_hsu(rainfall_input, expected_output_2):

One idea is just dropping this closed kwarg from DataFrame.rolling() in both soiling functions, which works fine in pandas-0.18.1, the default behavior is 'both' which isn't the end of the world, but probably not strictly faithful to the references, which imply exclusion of starting timestamp.

Any other ideas @cwhanse? @kanderso-nrel?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These @needs* decorators have traditionally been used for only optional dependencies or quirks between versions (like pandas sum of nans = nan or = 0).

Thanks for clarifying this further for us @wholmgren.

I thought we were in agreement that we'd bump the pandas spec for 0.8, not 0.7.2. This suggests that we need to do it for 0.7.2. Not the end of the world, but not ideal.

We can revert this merge and chamber it for v0.8. It'll give time to address some of the issues raised brought up and stay consistent with our plan to bump pandas at v0.8. What do you think @wholmgren?

Copy link
Member

@wholmgren wholmgren Feb 16, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think we need to revert. @mikofski pointed out that the same issue exists in the already released soiling_hsu. We could add a note to the doc strings, or a try/except with a better error message around the rolling calls, or do nothing. I'm ok with any of those.

We should just be more careful with the requires and needs decorators. I know I've been lazy about them in the past.

This is all also an argument for 1) adding scipy to the base requirements so the test suite doesn't have gotchas like with the hsu function (issue #483) and 2) being more aggressive with bumping dependency specs so that contributors don't have to spend as much time supporting older libraries (issue #828).

@CameronTStark
Copy link
Contributor

Thanks for your great contribution @mikofski and thanks to @kanderso-nrel and @cwhanse for your feedback!

@CameronTStark CameronTStark merged commit 64a6498 into pvlib:master Feb 15, 2020
@mikofski
Copy link
Member Author

Thanks! Sorry I didn't have a chance to fix the nit. Maybe we can address it later.

@mikofski mikofski deleted the kimber_soil branch February 15, 2020 15:18
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment