Skip to content

Port Python ArviZ's KDE implementation #6

@sethaxen

Description

@sethaxen

The main differences between KernelDensity.jl and Python arviz's KDE implementation are

  • arviz uses an "experimental" bandwidth defined as the average of Silverman's bandwidth and the Improved Sheather-Jones bandwidth (described in https://doi.org/10.1214/10-AOS799). This default is based on a simulation study by @tomicapretto (a version can be found at https://github.com/tomicapretto/density_estimation). While Silverman's rule oversmooths and is bad for multimodal distributions, ISJ is good for multimodal distributions but undersmooths. The average of the two is a useful compromise that is not too much more expensive.
  • KernelDensity.jl does not automatically pad by default, so generally the density either extends way beyond the data limits or wraps around at the data limits. Neither of these are great. One solution is to increase the number of user-selected points by ~4 bandwidths on both sides when convolving. Instead of discarding the padded parts of the KDE, following https://doi.org/10.1111/j.2517-6161.1971.tb00855.x and Section 2.10 of https://doi.org/10.1201/9781315140919, arviz reflects the data set within 4 bandwidths of the boundary. (EDIT: for a normal kernel, the ISJ paper shows that this approach is equivalent to replacing the normal kernel with a diffusion kernel on the interval defined by the data range)

These features can and should probably be upstreamed to KernelDensity. However, we will probably still have our own kde method that wraps KernelDensity.kde so that we can change the default settings.

Other optional features that could be ported would be

  • adaptive KDE
  • circular KDE

but this could be left for future work, as these features are probably not commonly used.

Metadata

Metadata

Assignees

No one assigned

    Labels

    featureNew feature or request

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions