FormulaCompiler.jl provides efficient per-row model matrix evaluation for Julia statistical models through position-mapped compilation and compile-time specialization. The package transforms StatsModels.jl formulas into specialized evaluators that achieve zero-allocation performance for repeated row-wise operations.
The package resolves StatsModels.jl formula complexity at compile time, thereby enabling type-stable execution. This approach supports efficient counterfactual analysis, marginal effects computation, and scenarios requiring many model matrix evaluations.
Key characteristics include:
- Zero-allocation evaluation after initial compilation
- O(1) memory overhead for counterfactual analysis via variable override system
- Support for categorical mixtures in marginal effects calculations
- Dual automatic differentiation (via ForwardDiff.jl) and finite difference backends for derivatives
- Built on the StatsModels.jl ecosystem (GLM.jl, MixedModels.jl, StandardizedPredictors.jl) allowing users to work with the existing and flexible domain-specific language.
The package is registered, and the current version is available via
using Pkg
Pkg.add(url="https://github.com/emfeltham/FormulaCompiler.jl")using FormulaCompiler, GLM, DataFrames, Tables
# Fit model
df = DataFrame(
y = randn(1000),
x = randn(1000),
z = abs.(randn(1000)) .+ 0.1,
group = categorical(rand(["A", "B", "C"], 1000))
)
model = lm(@formula(y ~ x * group + log(z)), df)
# Compile formula once
data = Tables.columntable(df)
compiled = compile_formula(model, data)
row_vec = Vector{Float64}(undef, length(compiled))
# Evaluate rows without allocation
compiled(row_vec, data, 1) # Zero allocations after warmupThe most efficient interface for performance-critical applications:
# Pre-compile for optimal performance
data = Tables.columntable(df)
compiled = compile_formula(model, data)
row_vec = Vector{Float64}(undef, length(compiled))
# Evaluate individual rows
compiled(row_vec, data, 1) # Row 1
compiled(row_vec, data, 100) # Row 100
# Evaluate multiple rows
matrix = Matrix{Float64}(undef, 10, length(compiled))
for i in 1:10
compiled(view(matrix, i, :), data, i)
endAlternative interface with automatic caching:
# Single row (allocating)
row_values = modelrow(model, data, 1)
# Multiple rows (allocating)
matrix = modelrow(model, data, [1, 5, 10, 50])
# In-place with automatic caching
row_vec = Vector{Float64}(undef, size(modelmatrix(model), 2))
modelrow!(row_vec, model, data, 1)Stateful evaluator for repeated use:
evaluator = ModelRowEvaluator(model, df)
result = evaluator(1) # Row 1
evaluator(row_vec, 1) # In-place evaluationFormulaCompiler implements a type-stable counterfactual vector system that enables variable substitution with O(1) memory overhead, avoiding data duplication. This is particularly useful for policy analysis and treatment effect evaluation.
data = Tables.columntable(df)
compiled = compile_formula(model, data)
# Build counterfactual data structure
data_cf, cf_vecs = build_counterfactual_data(data, [:treatment], 1)
treatment_cf = cf_vecs[1]
# Evaluate with counterfactual values
row_vec = Vector{Float64}(undef, length(compiled))
update_counterfactual_replacement!(treatment_cf, true)
compiled(row_vec, data_cf, 1)# Define analysis grid
treatment_values = [false, true]
dose_values = [50.0, 100.0, 150.0]
region_values = ["North", "South"]
# Create counterfactual data structure
data_cf, cf_vecs = build_counterfactual_data(data, [:treatment, :dose, :region], 1)
treatment_cf, dose_cf, region_cf = cf_vecs
# Evaluate all combinations (2×3×2 = 12 scenarios)
results = Matrix{Float64}(undef, 12, length(compiled))
row_vec = Vector{Float64}(undef, length(compiled))
scenario_idx = 1
for treatment in treatment_values
for dose in dose_values
for region in region_values
update_counterfactual_replacement!(treatment_cf, treatment)
update_counterfactual_replacement!(dose_cf, dose)
update_counterfactual_replacement!(region_cf, region)
compiled(row_vec, data_cf, 1)
results[scenario_idx, :] .= row_vec
scenario_idx += 1
end
end
endThe counterfactual vector system uses O(1) memory regardless of dataset size:
# Traditional approach: full column copy
traditional = copy(data.x)
traditional[500_000] = 42.0
# CounterfactualVector approach: ~32 bytes overhead
cf_vec = counterfactualvector(data.x, 500_000)
update_counterfactual_replacement!(cf_vec, 42.0)
# Identical interface, minimal memory usage
@assert traditional[500_000] == cf_vec[500_000]using GLM
# Linear models
linear_model = lm(@formula(mpg ~ hp * cyl + log(wt)), mtcars)
compiled = compile_formula(linear_model, Tables.columntable(mtcars))
# Generalized linear models
logit_model = glm(@formula(vs ~ hp + wt), mtcars, Binomial(), LogitLink())
compiled_logit = compile_formula(logit_model, Tables.columntable(mtcars))The package automatically extracts fixed effects from mixed models:
using MixedModels
mixed_model = fit(MixedModel, @formula(y ~ x + z + (1|group) + (1+x|cluster)), df)
compiled = compile_formula(mixed_model, Tables.columntable(df)) # Fixed effects onlyAll three StandardizedPredictors.jl transformations are supported: ZScore(), Center(), and Scale():
using StandardizedPredictors
contrasts = Dict(:x => ZScore(), :z => Center(), :w => Scale())
model = lm(@formula(y ~ x + z + w + group), df, contrasts=contrasts)
compiled = compile_formula(model, Tables.columntable(df))FormulaCompiler supports categorical mixtures—weighted combinations of categorical levels—for efficient profile-based marginal effects computation:
using FormulaCompiler, Margins
# Create mixture specification
reference_grid = DataFrame(
x = [1.0, 2.0, 3.0],
group = mix("Treatment" => 0.6, "Control" => 0.4)
)
# Compile and evaluate
compiled = compile_formula(model, Tables.columntable(reference_grid))
output = Vector{Float64}(undef, length(compiled))
compiled(output, reference_grid, 1)
# Multiple mixture variables
grid = DataFrame(
age = [30, 40, 50],
treatment = mix("Control" => 0.3, "Drug_A" => 0.4, "Drug_B" => 0.3),
dose = mix("Low" => 0.25, "Medium" => 0.5, "High" => 0.25)
)
compiled = compile_formula(model, Tables.columntable(grid))Mixture evaluation maintains zero-allocation performance through compile-time specialization. Each mixture specification generates a specialized evaluation method.
FormulaCompiler provides computational primitives for zero-allocation derivative computation. For marginal effects, cf. Margins.jl.
using FormulaCompiler, GLM
# Build derivative evaluator
vars = [:x, :z]
de = derivativeevaluator(:ad, compiled, data, vars)
# Compute Jacobian: J[i,j] = ∂modelmatrix[i]/∂vars[j]
J = Matrix{Float64}(undef, length(compiled), length(vars))
derivative_modelrow!(J, de, 1) # Row 1While automatic differentiation is the strongly preferred default option, two backends are available:
:ad(automatic differentiation via ForwardDiff.jl): Recommended for standard formulas. Provides machine-precision accuracy and approximately 20% faster performance than finite differences.:fd(finite differences): Recommended for formulas containing boolean predicates. Guarantees zero allocations for all formula types.
Both backends achieve zero-allocation performance through pre-allocated buffers and in-place operations.
The following GLM link functions are supported for marginal effects on the mean response μ:
- Identity
- Log
- Logit
- Probit
- Cloglog
- Cauchit
- Inverse
- Sqrt
- InverseSquare
The package supports the complete StatsModels.jl formula language:
- Basic terms:
x,log(z),x^2,(x > 0) - Boolean variables: treated as continuous (false → 0.0, true → 1.0)
- Categorical variables: all contrast types, ordered and unordered
- Categorical mixtures: weighted combinations (e.g.,
mix("A" => 0.6, "B" => 0.4)) - Interactions: two-way and multi-way (
x * group,x * y * z) - Mathematical functions:
log,exp,sqrt,sin,cos,abs,^ - Boolean predicates:
(x > 0),(z >= mean(z)) - Complex formulas:
x * log(z) * group + sqrt(abs(y)) - Standardized predictors: ZScore, Center, Scale, and custom transformations
- Mixed models: automatic fixed-effects extraction
The package is designed for scenarios where compilation cost is amortized over many evaluations:
# Compile once
data = Tables.columntable(df)
compiled = compile_formula(model, data)
row_vec = Vector{Float64}(undef, length(compiled))
# Evaluate many times (zero allocations)
for i in 1:n_rows
compiled(row_vec, data, i)
endColumn-table format provides optimal performance:
data = Tables.columntable(df) # Convert once, reusePre-allocate output vectors for zero-allocation performance:
row_vec = Vector{Float64}(undef, length(compiled)) # Reuse across callsBatch evaluation is more efficient than individual allocating calls:
# Efficient: batch evaluation with pre-allocation
matrix = Matrix{Float64}(undef, 1000, length(compiled))
for i in 1:1000
compiled(view(matrix, i, :), data, i)
end
# Inefficient: repeated allocation
results = [modelrow(model, data, i) for i in 1:1000]Typical performance for formula evaluation after compilation:
using BenchmarkTools
# Traditional approach
@benchmark modelmatrix(model)[1, :] # Constructs entire matrix
# FormulaCompiler approach
data = Tables.columntable(df)
compiled = compile_formula(model, data)
row_vec = Vector{Float64}(undef, length(compiled))
@benchmark compiled(row_vec, data, 1) # Zero allocationsFormulaCompiler typically achieves a significant speedup compared to full model matrix construction for single-row evaluation, with zero allocations after compilation.
Both derivative backends achieve zero-allocation performance:
- Automatic differentiation (
:ad): approximately 50-60ns per row for standard formulas - Finite differences (
:fd): approximately 65-85ns per row for standard formulas
All operations use pre-allocated buffers, validated through comprehensive allocation tests.
The package uses a compilation pipeline based on position mapping:
- Position mapping: Formula terms are mapped to fixed scratch and output positions during compilation, with all position information encoded in type parameters.
- Type specialization: Each unique formula generates a specialized evaluation method with concrete types throughout.
- Adaptive dispatch: Small formulas (≤25 operations) use recursive dispatch; larger formulas use generated functions to respect Julia's compilation limits.
- Zero-allocation execution: All memory layouts are determined at compile time, enabling allocation-free runtime evaluation.
The package is designed for applications requiring many model matrix evaluations:
- Marginal effects computation requiring numerical derivatives
- Monte Carlo simulations requiring millions of model evaluations
- Bootstrap resampling with repeated matrix construction
- Applications requiring low-latency prediction
- Large-scale inference with memory constraints
- Margins.jl: Marginal effects computation (uses FormulaCompiler.jl)
- GLM.jl: Generalized linear models
- MixedModels.jl: Mixed-effects models
- StandardizedPredictors.jl: Standardized predictors
- StatsModels.jl: Statistical formula interface
- ForwardDiff.jl: Automatic differentiation
Additional documentation is available in the docs/ directory and the online documentation.
Contributions are welcome. Please open an issue or pull request on GitHub.
MIT License. See LICENSE for details.
@misc{feltham_formulacompilerjl_2026,
title = {{{FormulaCompiler}}.Jl and {{Margins}}.Jl: {{Efficient Marginal Effects}} in {{Julia}}},
shorttitle = {{{FormulaCompiler}}.Jl and {{Margins}}.Jl},
author = {Feltham, Eric},
year = {2026},
month = jan,
number = {arXiv:2601.07065},
eprint = {2601.07065},
primaryclass = {stat},
publisher = {arXiv},
doi = {10.48550/arXiv.2601.07065},
urldate = {2026-01-13},
abstract = {Marginal effects analysis is fundamental to interpreting statistical models, yet existing implementations face computational constraints that limit analysis at scale. We introduce two Julia packages that address this gap. Margins.jl provides a clean two-function API organizing analysis around a 2-by-2 framework: evaluation context (population vs profile) by analytical target (effects vs predictions). The package supports interaction analysis through second differences, elasticity measures, categorical mixtures for representative profiles, and robust standard errors. FormulaCompiler.jl provides the computational foundation, transforming statistical formulas into zero-allocation, type-specialized evaluators that enable O(p) per-row computation independent of dataset size. Together, these packages achieve 622x average speedup and 460x memory reduction compared to R's marginaleffects package, with successful computation of average marginal effects and delta-method standard errors on 500,000 observations where R fails due to memory exhaustion, providing the first comprehensive and efficient marginal effects implementation for Julia's statistical ecosystem.},
archiveprefix = {arXiv},
keywords = {Statistics - Computation},
}