`swaglm` Overview

Overview

The swaglm package is a fast implementation of the Sparse Wrapper Algorithm (SWAG) for Generalized Linear Models (GLM). SWAG is a meta-learning procedure that combines screening and wrapper methods to efficiently find strong low-dimensional attribute combinations for prediction. Additionally, the package provides functions to visualize and extract information from the selected models as well as a statistical test to assess whether the selected models extract meaningful information from the data. For more details, see the arXiv preprint.

Features

Efficiently finds a set of low-dimensional learners with high predictive accuracy.
Follows a forward-step method to iteratively build strong learners.
Provides a permutation-based statistical test (swaglm_test) to determine if the obtained models capture meaningful structure in the data.
Uses entropy-based network measures (entropy of frequency and entropy of eigenvalue centrality) to compare SWAG models against randomized models.

Below are instructions on how to install and make use of the swaglm package.

Installation Instructions

The swaglm package is available on both CRAN and GitHub. The CRAN version is considered stable while the GitHub version is subject to modifications/updates which may lead to installation problems or broken functions. You can install the stable version of the swaglm package with:

install.packages("swaglm")

For users who are interested in having the latest developments, the GitHub version is ideal although more dependencies are required to run a stable version of the package.

# Install dependencies
install.packages(c("devtools"))

# Install/Update the package from GitHub
devtools::install_github("SMAC-Group/swaglm")

# Install the package with Vignettes/User Guides 
devtools::install_github("SMAC-Group/swaglm", build_vignettes = TRUE)

External `R` libraries

The swaglm package relies on a limited number of external libraries, but notably on Rcpp and RcppArmadillo which require a C++ compiler for installation, such as for example gcc.

Getting started

library(swaglm)

# Simulated data
n <- 2000
p <- 50
X <- MASS::mvrnorm(n = n, mu = rep(0, p), Sigma = diag(rep(1 / p, p)))
beta <- c(-15, -10, 5, 10, 15, rep(0, p - 5))

# generate from logistic regression model
z <- 1 + X %*% beta
pr <- 1 / (1 + exp(-z))
set.seed(12345)
y <- as.factor(rbinom(n, 1, pr))
y <- as.numeric(y) - 1

# Run SWAG
swaglm_obj <- swaglm(
  X = X, y = y, p_max = 20, family = binomial(),
  alpha = 0.15, verbose = TRUE, seed = 123
)

## Completed models of dimension 1
## Completed models of dimension 2
## Completed models of dimension 3
## Completed models of dimension 4
## Completed models of dimension 5
## Completed models of dimension 6
## Completed models of dimension 7
## Completed models of dimension 8

print(swaglm_obj)

## SWAGLM results :
## -----------------------------------------
## Input matrix dimension:  2000 50 
## Number of explored models:  129 
## Number of dimensions explored:  8

# plot network
swaglm_network_obj <- compute_network(swaglm_obj)
plot(swaglm_network_obj, scale_vertex = 1)

# Run statistical test
B <- 20
test_results <- swaglm_test(swaglm_obj, B = B, verbose = TRUE)

# View p-values for both entropy-based measures
print(test_results)

## SWAGLM Test Results:
## ----------------------
## p-value (Eigen): 0.34625 
## p-value (Freq): 0.0046

Find vignettes with detailed examples as well as the user’s manual at the package website.

How the statistical test works

The function swaglm_test() performs a permutation test to evaluate whether the selected variables contain meaningful information or are randomly selected.

Null Hypothesis: The selected models are no different from randomly chosen ones.

Procedure:

The response variable is shuffled to break its true relationship with predictors.
SWAG is applied to these shuffled datasets.
The entropy of variable frequency and eigenvalue centrality is computed for the null models.
p-values are computed by comparing the SWAG network with these null models.

Interpretation:

Small p-value (< 0.05): The selected variables are likely informative.
Large p-value (≥ 0.05): The selection may be random.

License

This source code is released under is the GNU AFFERO GENERAL PUBLIC LICENSE (AGPL) v3.0.

References

Molinari, R. et al. SWAG: A Wrapper Method for Sparse Learning (2021)

Name		Name	Last commit message	Last commit date
Latest commit History 121 Commits
.github		.github
R		R
README_files/figure-gfm		README_files/figure-gfm
inst/examples		inst/examples
man		man
pkgdown/favicon		pkgdown/favicon
scripts		scripts
src		src
vignettes		vignettes
.Rbuildignore		.Rbuildignore
.gitignore		.gitignore
CRAN-SUBMISSION		CRAN-SUBMISSION
DESCRIPTION		DESCRIPTION
NAMESPACE		NAMESPACE
README.Rmd		README.Rmd
README.md		README.md
_pkgdown.yml		_pkgdown.yml
cran-comments.md		cran-comments.md
swaglm.Rproj		swaglm.Rproj

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

`swaglm` Overview

Overview

Features

Installation Instructions

External `R` libraries

Getting started

How the statistical test works

Procedure:

Interpretation:

License

References

About

Uh oh!

Releases

Packages

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

swaglm Overview

Overview

Features

Installation Instructions

External R libraries

Getting started

How the statistical test works

Procedure:

Interpretation:

License

References

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

`swaglm` Overview

External `R` libraries

Packages