Available methods

MULTIVARIATE EXPLORATORY DATA ANALYSES

Principal component analysis (PCA)

Usual

  • pcasvd SVD decomposition
  • pcaeigen Eigen decomposition
  • pcaeigenk Eigen decomposition for wide matrices (kernel form)
  • pcanipals NIPALS algorithm

Allow missing data

  • pcanipalsmiss: NIPALS algorithm allowing missing data

Robust

  • pcasph Spherical (with spatial median)

Sparse

  • spca sPCA Shen & Huang 2008

Non linear

  • kpca Kernel (KPCA) Scholkopf et al. 2002

Utilities (PCA and PLS)

  • xfit X-matrix fitting
  • xresid X-residual matrix

Random projections

  • rp Random projection
  • rpmatgauss Gaussian random projection matrix
  • rpmatli Sparse random projection matrix

Multiblock

2 blocks

  • cca Canonical correlation analysis (CCA and RCCA)
  • ccawold CCA and RCCA - Wold (1984) Nipals algorithm
  • plscan Canonical partial least squares regression (Symmetric PLS)
  • plstuck Tucker's inter-battery method of factor analysis (PLS-SVD)
  • rasvd Redundancy analysis (RA), aka PCA on instrumental variables (PCAIV)

2 or more blocks

  • mbconcat Transformer concatenating multi-block X-data
  • mbpca Multiblock PCA (MBPCA), aka Consensus principal component analysis (CPCA)
  • comdim Common components and specific weights analysis (ComDim), aka CCSWA or HPCA

Utilities

  • mblock Make blocks from a matrix
  • rd Redundancy coefficients between two matrices
  • lg Lg coefficient
  • rv RV correlation coefficient

Factorial discrimination analysis (FDA)

  • fda Eigen decomposition of the compromise "inter/intra"
  • fdasvd Weighted SVD of the class centers

REGRESSION

Ordinary least squares (OLS)

Multiple linear regression (MLR)

  • mlr QR algorithm
  • mlrchol Normal equations and Choleski factorization
  • mlrpinv Pseudo-inverse
  • mlrpinvn Normal equations and pseudo-inverse
  • mlrvec Simple linear regression (Univariate x)

Anova

  • aov1 One-factor ANOVA

Partial least squares (PLSR)

Usual (asymetric regression mode)

  • plskern Fast "improved kernel #1" algorithm of Dayal & McGregor 1997
  • plsnipals Nipals
  • plswold Nipals Wold 1984
  • plsrosa ROSA Liland et al. 2016
  • plssimp SIMPLS de Jong 1993

Variants of regularization using latent variables

  • cglsr Conjugate gradient for the least squares normal equations (CGLS)
  • pcr Principal components regression (SVD factorization)
  • rrr Reduced rank regression (RRR), aka Redundancy analysis regression

Sparse

  • splskern
    • sPLSR Lê Cao et al. 2008
    • Covsel regression Roger et al. 2011

Averaging PLSR models of different dimensionalities

  • plsravg PLSR-AVG

Non linear

  • kplsr Non linear kernel (KPLSR) Rosipal & Trejo 2001
  • dkplsr Direct non linear kernel (DKPLSR) Bennett & Embrechts 2003

Multiblock

  • mbplsr Multiblock PLSR (MBPLSR) - Fast version (PLSR on concatenated blocks)
  • mbplswest MBPLSR - Nipals algorithm Westerhuis et al. 1998
  • rosaplsr ROSA Liland et al. 2016
  • soplsr Sequentially orthogonalized (SO-PLSR)

Ridge (RR, KRR)

RR

  • rr SVD factorization
  • rrchol Choleski factorization

Non linear

  • krr Non linear kernel (KRR), aka Least squares SVM (LS-SVMR)

Local models

  • knnr kNN weighted regression (kNNR)
  • lwmlr kNN locally weighted MLR (kNN-LWMLR)
  • lwplsr kNN locally weighted PLSR (kNN-LWPLSR)

Averaging

  • lwplsravg kNN-LWPLSR-AVG

Wrappers to other packages

SVM regression – with LIBSVM.jl

  • svmr Epsilon-SVR (SVM-R)

Regression trees – with DecisionTree.jl

  • treer_dt Single tree
  • rfr_dt Random forest

DISCRIMINATION ANALYSIS (DA)

Based on the prediction of the Y-dummy table

Linear

  • mlrda MLR-DA
  • plsrda PLSR-DA, aka usual PLSDA
  • rrda RR-DA

Sparse

  • splsrda Sparse PLSR-DA

Non linear

  • kplsrda KPLSR-DA
  • dkplsrda DKPLSR-DA
  • krrda KRR-DA

Multiblock

  • mbplsrda MBPLSR-DA

Probabilistic DA

Parametric

  • lda Linear discriminant analysis (LDA)
  • qda Quadratic discriminant analysis (QDA, with continuum towards LDA)
  • rda Regularized discriminant analysis (RDA)

Non parametric

  • kdeda DA by kernel Gaussian density estimation (KDE-DA)

On PLS latent variables

  • PLSDA
    • plslda PLS-LDA
    • plsqda PLS-QDA (with continuum)
    • plskdeda PLS-KDEDA
  • Sparse
    • splslda: Sparse PLS-LDA
    • splsqda: Sparse PLS-QDA
    • splskdeda: Sparse PLS-KDEDA
  • Non linear

    • kplslda KPLS-LDA
    • kplsqda KPLS-QDA
    • kplskdeda KPLS-KDEDA
    • dkplslda Direct KPLS-LDA
    • dkplsqda Direct KPLS-QDA
    • dkplskdeda Direct KPLS-KDEDA
  • Multiblock

    • mbplslda MBPLS-LDA
    • mbplsqda MBPLS-QDA
    • mbplskdeda MBPLS-KDEDA

Local models

  • knnda kNN-DA (Vote within neighbors)
  • lwmlrda kNN locally weighted MLR-DA (kNN-LWMLR-DA)
  • lwplsrda kNN Locally weighted PLSR-DA (kNN-LWPLSR-DA)
  • lwplslda kNN Locally weighted PLS-LDA (kNN-LWPLS-LDA)
  • lwplsqda kNN Locally weighted PLS-QDA (kNN-LWPLS-QDA, with continuum)

Wrappers to other packages

SVM classification – with LIBSVM.jl

  • svmda C-SVC (SVM-DA)

Classification trees – with DecisionTree.jl

  • treeda_dt Single tree
  • rfda_dt Random forest

One-Class Classification (OCC)

From a PCA or PLS score space

  • occsd Score distance (SD)
  • occod Orthogonal distance (OD)
  • occsdod Compromise between SD and OD (aka Simca approach)

Other methods

  • stah Compute Stahel-Donoho outlierness
  • occstah Stahel-Donoho outlierness

DISTRIBUTIONS

  • dmnorm Normal probability density estimation
  • dmnormlog Logarithm of the normal probability density estimation
  • dmkern Gaussian kernel density estimation (KDE)
  • pval Compute p-value(s) for a distribution, a vector or an ECDF
  • out Return if elements of a vector are strictly outside of a given range

VARIABLE IMPORTANCE

  • isel! Interval variable selection (e.g. Interval PLSR).
  • vip Variable importance on projections (VIP)
  • viperm! Variable importance by direct permutations

TUNING MODELS

Test-set validation

  • gridscore Compute an error rate over a grid of parameters

Cross-validation (CV)

  • gridcv Compute an error rate over a grid of parameters

Utilities

  • mpar Expand a grid of parameter values
  • segmkf Build segments for K-fold CV
  • segmts Build segments for test-set validation

Performance scores

Regression

  • ssr SSR
  • msep MSEP
  • rmsep, rmsepstand RMSEP
  • sep SEP
  • bias Bias
  • cor2 Squared correlation coefficient
  • r2 R2
  • rpd, rpdr Ratio of performance to deviation
  • mse Summary for regression
  • conf Confusion matrix

Discrimination

  • errp Classification error rate
  • merrp Mean intra-class classification error rate

Model dimensionality

  • aicplsr AIC and Cp for PLSR
  • selwold Wold's criterion to select dimensionality in LV models (e.g. PLSR)

DATA PROCESSING

Checking

  • dupl Find replicated rows in a dataset
  • tabdupl Tabulate duplicated values in a vector
  • miss Find rows with missing data in a dataset

Pre-processing

  • detrend Polynomial detrend
  • snv Standard-normal-deviation transformation
  • mavg Smoothing by moving average
  • fdif Finite differences
  • savgk, savgol Savitsky-Golay filtering
  • rmgap Remove vertical gaps in spectra, e.g. for ASD NIR data

Scaling

  • center Column centering

  • scale Column scaling

  • cscale Column centering and scaling

  • blockscal Scaling of multiblock data

Interpolation

  • interpl Sampling spectra by interpolation – From DataInterpolations.jl

Calibration transfer

  • difmean Compute a detrimental matrix (for calibration transfer) by difference of two matrix-column means.
  • eposvd Compute an orthogonalization matrix for calibration transfer
  • calds Direct standardization (DS)
  • calpds Piecewise direct standardization (PDS)

Build training vs. test sets by sampling

  • samprand Random (without replacement)

  • sampsys Systematic over a quantitative variable

  • sampcla Stratified by class

  • sampdf From each column of a dataframe (where missing values are allowed)

  • sampks Kennard-Stone

  • sampdp Duplex

PLOTTING

  • plotsp Plot spectra
  • plotxy x-y scatter plot
  • plotgrid Plot error/performance rates of a model
  • plotconf Plot conf matrix

MODELS AND PIPELINES

  • model Build a model
  • pip Build a pipeline of models

UTILITIES

  • aggstat Compute column-wise statistics by class in a dataset
  • aggsum Compute sub-total sums by class of a categorical variable
  • colmad, colmean, colmed, colnorm, colstd, colsum, colvar Column-wise operations
  • colmeanskip, colstdskip, colsumskip, colvarskip: Column-wise operations allowing missing data
  • covm, corm Weighted covariance and correlation matrices
  • cosv, cosm Cosinus between vectors
  • dummy Build dummy table
  • euclsq, mahsq, mahsqchol Distances (Euclidean, Mahalanobis) between rows of matrices
  • fblockscal_col, _frob, _mfa, _sd Scale blocks
  • fcenter, fscale, fcscale Column-wise centering and scaling of a matrix
  • findmax_cla Find the most occurent level in a categorical variable
  • frob Frobenius norm of a matrix
  • fweight Compute weights from distances
  • getknn Find nearest neighbours between rows of matrices
  • head, @head Display the first rows of a dataset
  • iqr Interval inter-quartiles
  • krbf, kpol Build kernel Gram matrices
  • locw Working function for local (kNN) models
  • mad Median absolute deviation (not exported)
  • matB, matW Between- and within-class covariance matrices
  • mlev Return the sorted levels of a vecor or a dataset
  • mweight Normalize a vector to sum to 1
  • mweightcla Compute observation weights for a categorical variable, given specified sub-total weights for the classes
  • nco, nro, Nb. rows and columns of an object
  • normw Weighted norm of a vector
  • plist Print each element of a list
  • pnames Return the names of the elements of an object
  • psize Return the type and size of a dataset
  • findindex Replace a vector containg levels by the indexes of a set of levels
  • recodcat2int Recode a categorical variable to a integer variable
  • recodnum2int Recode a continuous variable to integer classes
  • replacebylev Replace the elements of a vector by levels of corresponding order
  • replacebylev2 Replace the elements of an index-vector by levels
  • replacedict Replace the elements of a vector by levels defined in a dictionary
  • rmcol Remove the columns of a matrix or the components of a vector having indexes s
  • rmrow Remove the rows of a matrix or the components of a vector having indexes s
  • rowmean, rownorm, rowstd, rowsum, rowvar: Row-wise operations
  • rowmeanskip, rowstdskip, rowsumskip, rowvarskip: Row-wise operations allowing missing data
  • soft Soft thresholding
  • softmax Softmax function
  • sourcedir Include all the files contained in a directory
  • ssq Total inertia of a matrix
  • summ Summarize the columns of a dataset
  • tab, tabdf, tabdupl Tabulations for categorical variables
  • vcatdf Vertical concatenation of a list of dataframes
  • wdist Compute weights from distances
  • Other utility functions in file utility.jl