diive is currently under active developement with frequent updates.
diive is a Python library for time series processing, in particular ecosystem data. Originally developed
by the ETH Grassland Sciences group for Swiss FluxNet.
Recent updates: CHANGELOG Recent releases: Releases
Classes are available directly from the diive namespace with both PascalCase and snake_case names:
# PascalCase (class name)
from diive.core.plotting import TimeSeries
plot = TimeSeries(series=data)
# snake_case (alias)
import diive as dv
plot = dv.plot_time_series(series=data)Plotting: time_series, TimeSeries, cumulative, Cumulative, diel_cycle, DielCycle, heatmap_datetime,
HeatmapDateTime, and more
Gap-filling: randomforest_ts, RandomForestTS, xgboost_ts, XGBoostTS, quick_fill_rfts, QuickFillRFTS,
flux_mds, FluxMDS, optimize_params_ts, OptimizeParamsTS, optimize_params_rfts, OptimizeParamsRFTS
Analysis: gridaggregator, GridAggregator, seasonaltrend, SeasonalTrendDecomposition
Eddy Covariance: FluxDetectionLimit, fdl, MaxCovariance, max_covariance, WindRotation2D, wind_rotation_2d
I/O: load_parquet, save_parquet, load_exampledata_parquet, search_files
For the complete list of available aliases, see diive.__all__.
94 executable examples demonstrating common workflows are organized by topic in the examples/ folder:
Run all examples at once (parallelized):
python examples/run_all_examples.pyRun individual examples:
python examples/visualization/heatmap_datetime.py # HeatmapDateTime heatmaps (6 examples)
python examples/analyses/seasonaltrend.py # SeasonalTrendDecomposition (1 example)
python examples/gap_filling/randomforest_ts.py # Random Forest gap-filling & hyperparameter optimization (3 examples)
python examples/gap_filling/xgboost_ts.py # XGBoost gap-filling & hyperparameter optimization (2 examples)
python examples/gap_filling/comparison.py # Three-way comparison: MDS vs RF vs XGBoost (1 example)
python examples/corrections/offsetcorrection.py # Data corrections (4 examples)
python examples/outlierdetection/absolutelimits.py # Absolute limits filtering day/night thresholds (2 examples)
python examples/outlierdetection/hampel.py # Hampel filter robust outlier detection (2 examples)
python examples/outlierdetection/incremental.py # Z-score increments outlier detection (1 example)
python examples/outlierdetection/localsd.py # Local standard deviation rolling window detection (2 examples)
python examples/outlierdetection/lof.py # Local Outlier Factor density-based detection (2 examples)
python examples/outlierdetection/manualremoval.py # Manual data point/range removal for known issues (2 examples)
python examples/createvar/timesince.py # TimeSince time tracking (3 examples)
python examples/createvar/potentialradiation.py # Solar radiation (4 examples)
python examples/echires/fluxdetectionlimit.py # Flux detection limits (2 examples)
python examples/echires/lag.py # Time lag detection (1 example)
python examples/echires/windrotation.py # Wind rotation and tilt correction (1 example)
python examples/fits/fitter.py # Curve fitting with confidence intervals (1 example)
python examples/flux/common.py # Flux variable detection (1 example)
python examples/flux/hqflux.py # CO2 flux quality analysis with Hampel filter (1 example)Example categories (94 total, 50 files):
- Visualization (22): heatmap_datetime, hexbin, timeseries, cumulative, dielcycle, histogram, ridgeline, scatter
- Analyses (8): correlation, decoupling, gapfinder, gridaggregator, histogram, optimumrange, quantiles, seasonaltrend
- Data Processing (43): binary extraction, corrections (setto, offsetcorrection), outlierdetection (absolutelimits, hampel, incremental, localsd, lof, manualremoval), createvar (air, conversions, daynightflag, laggedvariants, noise, potentialradiation, timesince, vpd)
- Gap-Filling (10): linear_interpolation, mds, mds_comparison, randomforest_ts (3 examples: full, quick, optimize), xgboost_ts (2 examples: full, optimize), comparison (MDS vs RF vs XGB)
- Eddy Covariance & Flux (9): fluxdetectionlimit, lag, windrotation, hqflux, selfheating, uncertainty, ustarthreshold (3 examples)
- Spectral Analysis (2): harmonic (spectrogram analysis)
- Fits (1): fitter
See examples/README.md for a complete index of all examples with descriptions and quick start guides.
Additional examples available in Jupyter notebooks at notebooks/ with comprehensive workflows and tutorials.
- For many examples see notebooks here: Notebook overview
- More notebooks are added constantly.
- Daily correlation: calculate daily correlation between two time
series · func:
daily_correlation()(notebook example) - Decoupling: Investigate binned aggregates (median) of a variable z in binned classes of x and y (notebook example)
- Data gaps identification · class:
GapFinder(notebook example) - Grid aggregator: calculate z-aggregates in bins (classes) of x and
y · class:
GridAggregator(notebook example) - Histogram calculation: calculate histogram from Series (notebook example)
- Optimum range: find x range for optimum y
- Percentiles: Calculate percentiles 0-100 for series (notebook example)
- Seasonal-Trend Decomposition: Separate time series into trend, seasonal, and residual components using STL (
Seasonal-Trend Loess), classical, or harmonic methods · class:
SeasonalTrendDecomposition(notebook example)
- Offset correction for measurement: correct measurement by offset in comparison to
replicate · class:
OffsetCorrection(notebook example) - Offset correction radiation: correct nighttime offset of radiation data and set nighttime to zero
- Offset correction relative humidity: correct RH values > 100%
- Offset correction wind direction: correct wind directions by offset, calculated based on reference time
period · class:
WindDirectionOffset(notebook example) - Set to threshold: set values above or below a threshold value to threshold value · class:
SetToThreshold - Set exact values to missing: set exact values to missing
records · class:
SetToMissing(notebook example)
Functions to create various variables.
- Time since: calculate time since last occurrence, e.g. since last
precipitation · class:
TimeSince(notebook example) - Daytime/nighttime flag: calculate daytime flag, nighttime flag and potential radiation from latitude and
longitude · class:
DaytimeNighttimeFlag(notebook example) - Vapor pressure deficit: calculate VPD from air temperature and
RH · func:
calc_vpd_from_ta_rh()(notebook example) - Calculate ET from LE: calculate evapotranspiration from latent heat
flux · func:
et_from_le()(notebook example) - Calculate air temperature from sonic anemometer temperature · func:
air_temp_from_sonic_temp()(notebook example)
- Flux detection limit: calculate flux detection limit from high-resolution data (20 Hz) · class:
FluxDetectionLimit - Maximum covariance: find maximum covariance between turbulent wind and scalar · class:
MaxCovariance - Turbulence: wind rotation to calculate turbulent departures of wind components and scalar (e.g. CO2) · class:
WindRotation2D
Input/output functions.
- Detect files: detect expected and unexpected (irregular) files in a list of files · class:
FileDetector - Split files: split multiple files into smaller parts and export them as (compressed) CSV files · class:
FileSplitter - Read single data files: read file using
parameters · class:
DataFileReader(notebook example) - Read single data files: read file using pre-defined
filetypes · class:
ReadFileType(notebook example) - Read multiple data files: read files using pre-defined
filetype · class:
MultiDataFileReader(notebook example)
- Bin fitter · class:
BinFitterCP(notebook example)
Function specifically for eddy covariance flux data.
- Flux processing chain · class:
FluxProcessingChain(notebook example)- The notebook example shows the application of:
- Post-processing of eddy covariance flux data.
- Level-2 quality flags
- Level-3.1 storage correction
- Level-3.2 outlier removal
- Level-3.3: USTAR filtering using constant thresholds
- Level-4.1: gap-filling using long-term random forest, XGBoost, and/or MDS
- For info about the Swiss FluxNet flux levels, see here.
- The notebook example shows the application of:
- **Quick flux processing chain ** (notebook example)
- Flux detection limit: calculate flux detection limit from high-resolution eddy covariance
data · class:
FluxDetectionLimit(notebook example) - Self-heating correction for open-path IRGA NEE fluxes:
- create scaling factors table and apply to correct open-path NEE fluxes during a time period of parallel measurements (notebook example)
- apply previously created scaling factors table to long-term open-path NEE flux data, outside the time period of parallel measurements (notebook example)
- USTAR threshold scenarios: display data availability under different USTAR threshold scenarios
Format data to specific formats.
- Format: convert EddyPro fluxnet output files for upload to FLUXNET
database · class:
FormatEddyProFluxnetFileForUpload(notebook example) - Parquet files: load and save parquet
files · funcs:
load_parquet(),save_parquet()(notebook example)
Fill gaps in time series with various methods.
Feature Engineering (v0.91.0) · class: FeatureEngineer
-
Standalone 8-stage feature engineering pipeline (composable, reusable across models)
- Stage 1: Lagged features from past and future values
- Stage 2: Rolling statistics (mean, std, median, min, max, quartiles)
- Stage 3: Temporal differencing (1st and 2nd order momentum)
- Stage 4: Exponential Moving Average (EMA) with recent-value emphasis
- Stage 5: Polynomial expansion (squared, cubed terms)
- Stage 6: STL decomposition (trend, seasonal, residual components)
- Stage 7: Timestamp vectorization (season, month, hour, etc.)
- Stage 8: Continuous record numbering for trend detection
-
Pre-engineer features once, reuse across multiple models (RF + XGB simultaneously)
-
Independent testing and debugging of feature engineering
-
XGBoostTS · class:
XGBoostTS(notebook example (minimal), notebook example (more extensive))- Use
FeatureEngineerto create features, pass pre-engineered data to XGBoostTS
- Use
-
RandomForestTS · class:
RandomForestTS(notebook example)- Use
FeatureEngineerto create features, pass pre-engineered data to RandomForestTS
- Use
-
Long-term gap-filling using RandomForestTS · class:
LongTermGapFillingRandomForestTS(notebook example) -
Long-term gap-filling using XGBoostTS · class:
LongTermGapFillingXGBoostTS(for multi-year data with USTAR scenario support) -
Linear interpolation · func:
linear_interpolation()(notebook example) -
Quick random forest gap-filling · class:
QuickFillRFTS(notebook example) -
MDS gap-filling of ecosystem fluxes · class:
FluxMDS(notebook example), approach by Reichstein et al., 2005
- FluxProcessingChain examples for CO2 half-hourly flux (NEE) gap-filling:
- Both Random Forest and XGBoost examples are fully activated and comprehensively documented
- Optimized feature engineering for diurnal photosynthetic patterns (lag, rolling, EMA, STL decomposition)
- Feature reduction enabled by default (SHAP-based selection reduces ~45-50 features to ~10-20)
- Hyperparameters tuned for ecosystem flux data with detailed tuning guidance
- Model comparison code to select best algorithm for your site
- See
examples/gap_filling/folder for standalone runnable examples (Phase 2, coming soon) - Or see
diive/pkgs/fluxprocessingchain/fluxprocessingchain.pyfor detailed inline examples
- Step-wise outlier detection: combine multiple outlier flags to one single overall flag
Create single outlier flags where 0=OK and 2=outlier.
- Absolute limits: define absolute
limits · class:
AbsoluteLimits(notebook example) - Absolute limits daytime/nighttime: define absolute limits separately for daytime and nighttime
data · class:
AbsoluteLimitsDaytimeNighttime(notebook example) - Hampel filter daytime/nighttime, separately for daytime and nighttime
data · class:
HampelDaytimeNighttime(notebook example) - Local standard deviation: Identify outliers based on the local standard deviation from a running median (notebook example)
- Local outlier factor: Identify outliers based on local outlier factor, across all
data · class:
LocalOutlierFactorAllData(notebook example) - Local outlier factor daytime/nighttime: Identify outliers based on local outlier factor, daytime nighttime separately (notebook example)
- Manual removal: Remove time periods (from-to) or single records from time series (notebook example)
- Missing values: Simply creates a flag that indicated available and missing data in a time
series · class:
MissingValues(notebook example) - Trimming: Remove values below threshold and remove an equal amount of records from high end of data (notebook example)
- z-score: Identify outliers based on the z-score across all time series
data · class:
zScore(notebook example) - z-score increments daytime/nighttime: Identify outliers based on the z-score of double increments (notebook example)
- z-score daytime/nighttime: Identify outliers based on the z-score, separately for daytime and
nighttime · class:
zScoreDaytimeNighttime(notebook example) - z-score rolling: Identify outliers based on the rolling z-score (notebook example)
- Cumulatives across all years for multiple variables · class:
Cumulative(notebook example) - Cumulatives per year · class:
CumulativeYear(notebook example) - Diel cycle per month · class:
DielCycle(notebook example) - Heatmap date/time: showing values (z) of time series as date (y) vs time (
x) · class:
HeatmapDateTime(notebook example) - Heatmap year/month: plot monthly ranks across
years · class:
HeatmapYearMonth(notebook example) - Heatmap XYZ: show z-values in bins of x and y — pairs naturally with
GridAggregator· class:HeatmapXYZ(notebook example) - Hexbin plot: aggregate flux values into 2D hexagonal bins of driver variables; supports percentile normalization
and configurable aggregation functions · class:
HexbinPlot(notebook example) - Histogram: includes options to show z-score limits and to highlight the peak distribution
bin · class:
HistogramPlot(notebook example) - Long-term anomalies: calculate and plot long-term anomaly for a variable, per year, compared to a reference
period · class:
LongtermAnomaliesYear(notebook example) - Ridgeline plot: looks a bit like a
landscape · class:
RidgeLinePlot(notebook example) - Time series plot: Simple (interactive) time series
plot · class:
TimeSeries(notebook example) - ScatterXY plot · class:
ScatterXY(notebook example) - Various classes to generate heatmaps, bar plots, time series plots and scatter plots, among others
- Stepwise MeteoScreening from database · class:
StepwiseMeteoScreeningDb(notebook example)
- Diel cycle: calculate diel cycle per
month · func:
diel_cycle()(notebook example)
- Time series stats · func:
sstats()(notebook example)
- Continuous timestamp: create continuous timestamp based on number of records in the file and the file duration ·
func:
continuous_timestamp_freq() - Time resolution: detect time resolution from
data · class:
DetectFrequency(notebook example) - Timestamps: create and insert additional timestamps in various formats · class:
TimestampSanitizer - Vectorize timestamps: add date attributes as columns to dataframe, including sine/cosine variants fpr cyclical
variables (e.g., day of
year) · func:
vectorize_timestamps()(notebook example)
diive is currently under active developement using Python v3.11.
pip install diive
poetry add diive
Directly use .tar.gz file of the desired version.
pip install https://github.com/holukas/diive/archive/refs/tags/v0.76.2.tar.gz
One way to install and use diive with a specific Python version on a local machine:
- Install miniconda
- Start
minicondaprompt - Create a environment named
diive-envthat contains Python 3.11:conda create --name diive-env python=3.11 - Activate the new environment:
conda activate diive-env - Install
diiveusing pip:pip install diive - To start JupyterLab type
jupyter labin the prompt
