-
-
Notifications
You must be signed in to change notification settings - Fork 1
Open
Labels
featureNew feature or requestNew feature or request
Description
The main differences between KernelDensity.jl and Python arviz's KDE implementation are
- arviz uses an "experimental" bandwidth defined as the average of Silverman's bandwidth and the Improved Sheather-Jones bandwidth (described in https://doi.org/10.1214/10-AOS799). This default is based on a simulation study by @tomicapretto (a version can be found at https://github.com/tomicapretto/density_estimation). While Silverman's rule oversmooths and is bad for multimodal distributions, ISJ is good for multimodal distributions but undersmooths. The average of the two is a useful compromise that is not too much more expensive.
- KernelDensity.jl does not automatically pad by default, so generally the density either extends way beyond the data limits or wraps around at the data limits. Neither of these are great. One solution is to increase the number of user-selected points by ~4 bandwidths on both sides when convolving. Instead of discarding the padded parts of the KDE, following https://doi.org/10.1111/j.2517-6161.1971.tb00855.x and Section 2.10 of https://doi.org/10.1201/9781315140919, arviz reflects the data set within 4 bandwidths of the boundary. (EDIT: for a normal kernel, the ISJ paper shows that this approach is equivalent to replacing the normal kernel with a diffusion kernel on the interval defined by the data range)
These features can and should probably be upstreamed to KernelDensity. However, we will probably still have our own kde method that wraps KernelDensity.kde so that we can change the default settings.
Other optional features that could be ported would be
- adaptive KDE
- circular KDE
but this could be left for future work, as these features are probably not commonly used.
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
featureNew feature or requestNew feature or request