Available methods
MULTIVARIATE EXPLORATORY DATA ANALYSES
Principal component analysis (PCA)
Usual
- pcasvd SVD decomposition
- pcaeigen Eigen decomposition
- pcaeigenk Eigen decomposition for wide matrices (kernel form)
- pcanipals NIPALS algorithm
Allow missing data
- pcanipalsmiss: NIPALS algorithm allowing missing data
Robust
- pcasph Spherical (with spatial median)
- pcapp Projection pursuit.
- pcaout Outlierness
Sparse
- spca sPCA Shen & Huang 2008
Non linear
- kpca Kernel (KPCA) Scholkopf et al. 2002
Utilities (PCA and PLS)
- xfit X-matrix fitting
- xresid X-residual matrix
Random projections
- rp Random projection
- rpmatgauss Gaussian random projection matrix
- rpmatli Sparse random projection matrix
Manifold
*Wrapper to UMAP.jl**
- umap: Uniform manifold approximation and projection for dimension reduction
Factorial discrimination analysis (FDA)
- fda Eigen decomposition of the consensus "inter/intra"
- fdasvd Weighted SVD of the class centers
Multiblock
2 blocks
- cca Canonical correlation analysis (CCA and RCCA)
- ccawold CCA and RCCA - Wold (1984) Nipals algorithm
- plscan Canonical partial least squares regression (Symmetric PLS)
- plstuck Tucker's inter-battery method of factor analysis (PLS-SVD)
- rasvd Redundancy analysis (RA), a.k.a PCA on instrumental variables (PCAIV)
2 or more blocks
- mbconcat Concatenation of multi-block X-data
- mbpca Multiblock PCA (MBPCA), a.k.a Consensus principal component analysis (CPCA)
- comdim Common components and specific weights analysis (ComDim), a.k.a CCSWA or HPCA
Utilities
- mblock Make blocks from a matrix
- rd Redundancy coefficients between two matrices
- rv RV correlation coefficient
REGRESSION
Ordinary least squares (OLS)
Multiple linear regression (MLR)
- mlr QR algorithm
- mlrchol Normal equations and Choleski factorization
- mlrpinv Pseudo-inverse
- mlrpinvn Normal equations and pseudo-inverse
- mlrvec Simple (Univariate x) linear regression
Anova
- aov1 One-factor ANOVA
Partial least squares (PLSR)
Usual (asymetric regression mode)
- plskern Fast "improved kernel #1" algorithm of Dayal & McGregor 1997
- plsnipals Nipals
- plswold Nipals Wold 1984
- plsrosa ROSA Liland et al. 2016
- plssimp SIMPLS de Jong 1993
Variants of regularization using latent variables
- cglsr Conjugate gradient for the least squares normal equations (CGLS)
- pcr Principal components regression (SVD factorization)
- rrr Reduced rank regression (RRR), a.k.a Redundancy analysis regression
Robust
- plsrout Outlierness
Sparse
- splsr
- sPLSR Lê Cao et al. 2008
- Covsel regression Roger et al. 2011
- spcr
- sPCR Shen & Huang 2008
Averaging PLSR models of different dimensionalities
- plsravg PLSR-AVG
Non linear
- kplsr Non linear kernel (KPLSR) Rosipal & Trejo 2001
- dkplsr Direct non linear kernel (DKPLSR) Bennett & Embrechts 2003
Multiblock
- mbplsr Multiblock PLSR (MBPLSR) - Fast version (PLSR on concatenated blocks)
- mbplswest MBPLSR - Nipals algorithm Westerhuis et al. 1998
- rosaplsr ROSA Liland et al. 2016
- soplsr Sequentially orthogonalized (SO-PLSR)
Ridge (RR, KRR)
RR
- rr SVD factorization
- rrchol Choleski factorization
Non linear
- krr Non linear kernel (KRR), a.k.a Least squares SVM (LS-SVMR)
Local models
- loessr LOESS regression model – With package Loess.jl
kNN
- knnr kNN weighted regression (kNNR)
- lwmlr kNN locally weighted MLR (kNN-LWMLR)
- lwplsr kNN locally weighted PLSR (kNN-LWPLSR)
Averaging
- lwplsravg kNN-LWPLSR-AVG
Support vector machines
Wrapper to LIBSVM.jl
- svmr Epsilon-SVR (SVM-R)
Trees
Wrapper to DecisionTree.jl
- treer Single tree
- rfr Random forest
DISCRIMINATION ANALYSIS (DA)
Based on the prediction of the Y-dummy table
Linear
- mlrda MLR-DA
- plsrda PLSR-DA, a.k.a usual PLSDA
- rrda RR-DA
Sparse
- splsrda Sparse PLSR-DA
Non linear
- kplsrda KPLSR-DA
- dkplsrda DKPLSR-DA
- krrda KRR-DA
Multiblock
- mbplsrda MBPLSR-DA
Probabilistic DA
Parametric
- lda Linear discriminant analysis (LDA)
- qda Quadratic discriminant analysis (QDA, with continuum towards LDA)
- rda Regularized discriminant analysis (RDA)
Non parametric
- kdeda DA by kernel Gaussian density estimation (KDE-DA)
On PLS latent variables
PLSDA
- plslda PLS-LDA
- plsqda PLS-QDA (with continuum)
- plskdeda PLS-KDEDA
Sparse
- splslda: Sparse PLS-LDA
- splsqda: Sparse PLS-QDA
- splskdeda: Sparse PLS-KDEDA
Non linear
- kplslda KPLS-LDA
- kplsqda KPLS-QDA
- kplskdeda KPLS-KDEDA
- dkplslda Direct KPLS-LDA
- dkplsqda Direct KPLS-QDA
- dkplskdeda Direct KPLS-KDEDA
Multiblock
- mbplslda MBPLS-LDA
- mbplsqda MBPLS-QDA
- mbplskdeda MBPLS-KDEDA
Local models
- knnda kNN-DA (Vote within neighbors)
- lwmlrda kNN locally weighted MLR-DA (kNN-LWMLR-DA)
- lwplsrda kNN Locally weighted PLSR-DA (kNN-LWPLSR-DA)
- lwplslda kNN Locally weighted PLS-LDA (kNN-LWPLS-LDA)
- lwplsqda kNN Locally weighted PLS-QDA (kNN-LWPLS-QDA, with continuum)
Support vector machines
Wrapper to LIBSVM.jl
- svmda C-SVC (SVM-DA)
Trees
Wrapper to DecisionTree.jl
- treeda Single tree
- rfda Random forest
ONE-CLASS CLASSIFICATION (OCC)
From a PCA or PLS score space
- occsd Score distance (SD)
- occod Orthogonal distance (OD)
- occsdod Compromise between SD and OD (a.k.a Simca approach)
Other methods
- occstah Stahel-Donoho outlierness
Utilities
- outstah Stahel-Donoho outlierness
- outeucl: Outlierness from Euclidean distances to center
DISTRIBUTIONS
- dmnorm Normal probability density estimation
- dmnormlog Logarithm of the normal probability density estimation
- dmkern Gaussian kernel density estimation (KDE)
- pval Compute p-value(s) for a distribution, a vector or an ECDF
- out Return if elements of a vector are strictly outside of a given range
VARIABLE IMPORTANCE
- isel! Interval variable selection (e.g. Interval PLSR).
- vip Variable importance on projections (VIP)
- viperm! Variable importance by direct permutations
TUNING MODELS
Test-set validation
- gridscore Compute an error rate over a grid of parameters
Cross-validation (CV)
- gridcv Compute an error rate over a grid of parameters
Utilities
- mpar Expand a grid of parameter values
- segmkf Build segments for K-fold CV
- segmts Build segments for test-set validation
Performance scores
Regression
- ssr SSR
- msep MSEP
- rmsep, rmsepstand RMSEP
- rrmsep Relative RMSEP
- sep SEP
- bias Bias
- cor2 Squared correlation coefficient
- r2 R2
- rpd, rpdr Ratio of performance to deviation
- mse Summary for regression
- conf Confusion matrix
Discrimination
- errp Classification error rate
- merrp Mean intra-class classification error rate
Model dimensionality
- aicplsr AIC and Cp for PLSR
- selwold Wold's criterion to select dimensionality in LV models (e.g. PLSR)
DATA PROCESSING
Checking
- finduniq Find the indexes making unique the IDs in a ID vector
- dupl Find replicated rows in a dataset
- tabdupl Tabulate duplicated values in a vector
- findmiss Find rows with missing data in a dataset
Pre-processing
- De-trend transformation (baseline correction)
- detrend_pol Polynomial linear regression
- detrend_lo LOESS
- detrend_asls Asymmetric least squares (ASLS)
- detrend_airpls Adaptive iteratively reweighted penalized least squares (AIRPLS)
- detrend_arpls Asymmetrically reweighted penalized least squares smoothing (ARPLS)
- snv Standard-normal-deviation transformation
- fdif Finite differences
- mavg Smoothing by moving average
- savgk, savgol Savitsky-Golay filtering
- rmgap Remove vertical gaps in spectra, e.g. for ASD NIR data
Scaling
center Column centering
scale Column scaling
cscale Column centering and scaling
blockscal Scaling of multiblock data
Interpolation
- interpl Sampling spectra by interpolation – From DataInterpolations.jl
Calibration transfer
- difmean Compute a detrimental matrix (for calibration transfer) by difference of two matrix-column means.
- eposvd Compute an orthogonalization matrix for calibration transfer
- calds Direct standardization (DS)
- calpds Piecewise direct standardization (PDS)
Build training vs. test sets by sampling
samprand Random (without replacement)
sampsys Systematic over a quantitative variable
sampcla Stratified by class
sampdf From each column of a dataframe (where missing values are allowed)
sampks Kennard-Stone
sampdp Duplex
sampwsp WSP
PLOTTING
- plotsp Plot spectra
- plotxy x-y scatter plot
- plotgrid Plot error/performance rates of a model
- plotconf Plot confusion matrix
MODELS AND PIPELINES
- model Build a model
- pip Build a pipeline of models
UTILITIES
Macros
- @head Display the first rows of a dataset
- @mod Shortcut for function parentmodule
- @names Return the names of the sub-objects contained in a object
- @pars Display the keyword arguments (with their default values) of a function
- @plist Display each element of a list
- @type Display the type and size of a dataset
Others
aggmean Compute column-wise means by class in a dataset
aggstat Compute column-wise statistics by class in a dataset
aggsumv Compute sub-total sums by class of a categorical variable
sumv, meanv, stdv, varv, madv, iqrv, normv Vector operations
covv, covm, corv, corm Weighted covariances and correlations
cosv, cosm Cosinus
colmad, colmean, colmed, colnorm, colstd, colsum, colvar Column-wise operations
colmeanskip, colstdskip, colsumskip, colvarskip Column-wise operations allowing missing data
convertdf Convert the columns of a dataframe to given types
dummy Build dummy table
euclsq, mahsq, mahsqchol Distances (Euclidean, Mahalanobis) between rows of matrices
fblockscal_col, _frob, _mfa, _sd Scale blocks
fcenter, fscale, fcscale Column-wise centering and scaling of a matrix
fconcat Concatenate multiblock data
findmax_cla Find the most occurent level in a categorical variable
frob, frob2 Frobenius norm of a matrix
fweight Weight each row of a matrix
getknn Find nearest neighbours between rows of matrices
iqrv Interval inter-quartiles
krbf, kpol Build kernel Gram matrices
locw Working function for local (kNN) models
mad Median absolute deviation (not exported)
matB, matW Between- and within-class covariance matrices
mlev Return the sorted levels of a vecor or a dataset
mweight Normalize a vector to sum to 1
mweightcla Compute observation weights for a categorical variable, given specified sub-total weights for the classes
nco, nro, Nb. rows and columns of an object
normv Norm of a vector
parsemiss Parsing a string vector allowing missing data
pval Compute p-value(s) for a distribution, an ECDF or vector
recod_catbydict Recode a categorical variable to dictionnary levels
recod_catbyind Recode a categorical variable to indexes of levels
recod_catbyint Recode a categorical variable to integers
recod_catbylev Recode a categorical variable to levels
recod_indbylev Recode an index variable to levels
recod_numbyint Recode a continuous variable to integers
recod_miss Declare data as missing in a dataset
rmcol Remove the columns of a matrix or the components of a vector having indexes s
rmrow Remove the rows of a matrix or the components of a vector having indexes s
rowmean, rownorm, rowstd, rowsum, rowvar: Row-wise operations
rowmeanskip, rowstdskip, rowsumskip, rowvarskip: Row-wise operations allowing missing data
thresh_soft, thresh_hard Thresholding functions
softmax Softmax function
sourcedir Include all the files contained in a directory
summ Summarize the columns of a dataset
tab, tabdupl Tabulations for categorical variables
vcatdf Vertical concatenation of a list of dataframes
wdis Different functions to compute weights from distances
wtal Compute weights from distances using the 'talworth' distribution
winvs Compute weights from distances using an inverse scaled exponential function
Other utility functions in files
_util.jl