This example illustrates the principles underlying the fit of three different probabilistic PLSDA models: PLS-LDA, PLS-QDA and PLS-KDEDA (see functions plslda, plsqda, plskdeda).
Step-1: PLS scores are computed on the training data {X, Y-dummy} (where Y-dummy is the table of the binary-coded variables of the classes contained in the categorical response variable y). Then the observations X are projected on this score space, and the multivariate probability density distribution of these projections are estimated for each class. The estimations assume Gaussian distributions for LDA and QDA (parametric methods), and use non-parametric kernels for KDE.
Step-2: The prediction of a given new observation, xnew, is as follows. Observation xnew is projected on the training score space computed in step-1 and, for each class, its location is compared to the density distribution estimated for the class. This determines the probability of xnew to belong to the given class. The class showing the highest probability is chosen as prediction.
In practice, the PLS score space used for discrimination can have many dimensions. In the present example, to simplify the graphical illustrations, a 2-D PLS score space (2 latent variables) is considered.
using Jchemo, JchemoData using JLD2, CairoMakie
path_jdat = dirname(dirname(pathof(JchemoData))) db = joinpath(path_jdat, "data", "iris.jld2") @load db dat @names dat
(:X,)
summ(dat.X)
(res = 5×7 DataFrame
Row │ variable mean std min max n nmissing
│ Symbol Union… Union… Any Any Int64 Int64
─────┼──────────────────────────────────────────────────────────────────
1 │ sepal_length 5.843 0.828 4.3 7.9 150 0
2 │ sepal_width 3.057 0.436 2.0 4.4 150 0
3 │ petal_length 3.758 1.765 1.0 6.9 150 0
4 │ petal_width 1.199 0.762 0.1 2.5 150 0
5 │ species setosa virginica 150 0, ntot = 150)
X = dat.X[:, 1:4] y = dat.X[:, 5] # the classes (species) ntot = nro(X)
150
tab(y)
OrderedCollections.OrderedDict{String, Int64} with 3 entries:
"setosa" => 50
"versicolor" => 50
"virginica" => 50
lev = mlev(y) nlev = length(lev)
3
ntest = 30 s = samprand(ntot, ntest) Xtrain = X[s.train, :] ytrain = y[s.train] Xtest = X[s.test, :] ytest = y[s.test] ntrain = ntot - ntest (ntot = ntot, ntrain, ntest) tab(ytrain) tab(ytest)
OrderedCollections.OrderedDict{String, Int64} with 3 entries:
"setosa" => 8
"versicolor" => 7
"virginica" => 15
Ytrain_dummy = dummy(ytrain).Y
120×3 BitMatrix: 1 0 0 1 0 0 1 0 0 1 0 0 1 0 0 1 0 0 1 0 0 1 0 0 1 0 0 1 0 0 ⋮ 0 0 1 0 0 1 0 0 1 0 0 1 0 0 1 0 0 1 0 0 1 0 0 1 0 0 1
nlv = 2 model = plskern(; nlv) fit!(model, Xtrain, Ytrain_dummy)
Scores
@head Ttrain = model.fitm.T @head Ttest = transf(model, Xtest)
3×2 Matrix{Float64}:
2.6053 -0.286791
2.61995 0.232042
2.79383 0.0994609
... (120, 2)
3×2 Matrix{Float64}:
2.64409 0.252782
2.21507 -0.745692
2.59655 -0.996732
... (30, 2)
i = 1 plotxy(Ttrain[:, i], Ttrain[:, i + 1], ytrain; title = "PLS2 space -Train", xlabel = string("LV", i), ylabel = string("LV", i + 1), leg_title = "Species", zeros = true, ellipse = false).f
The new observation is projected on the score space (blue star in the figure below), and its location is compared to the training observations (red points in the figure below) and densities.
k = 1 # example of the projection of the first obs. of Test
1
f = Figure(size = (900, 300)) ax = list(nlev) for i in eachindex(lev) zs = ytrain .== lev[i] zT = Ttrain[zs, :] ax[i] = Axis(f[1, i]; title = lev[i], xlabel = "PLS-LV1", ylabel = "PLS-LV2") scatter!(ax[i], zT[:, 1], zT[:, 2], color = :red, markersize = 10) ## New point to predict scatter!(ax[i], Ttest[k, 1], Ttest[k, 2], color = :blue, marker = :star5, markersize = 15) ## End hlines!(ax[i], 0; linestyle = :dash, color = :grey) vlines!(ax[i], 0; linestyle = :dash, color = :grey) xlims!(ax[i], -4, 4) ylims!(ax[i], -1.7, 1.7) end f
The same graphical representation can be done with the probability density (instead of the observations as above) estimated for each class with the three methods.
LDA estimate
In LDA, the Gaussian probability densities are estimated assuming that the classes have the same PLS-score covariance matrix (W) (see function dmnorm).
weights = mweight(ones(ntrain)) # observation weights res = matW(Ttrain, ytrain, weights) W = res.W * ntrain / (ntrain - nlev) npoints = 2^7 x1 = LinRange(-4, 4, npoints) x2 = LinRange(-2, 2, npoints) z = mpar(x1 = x1, x2 = x2) grid = reduce(hcat, z) f = Figure(size = (900, 400)) ax = list(nlev) for i in eachindex(lev) zs = ytrain .== lev[i] zT = Ttrain[zs, :] #lims = [[minimum(zT[:, j]) maximum(zT[:, j])] for j = 1:nlv] #x1 = LinRange(lims[1][1], lims[1][2], npoints) #x2 = LinRange(lims[2][1], lims[2][2], npoints) zfitm = dmnorm(colmean(zT), W) # Gaussian estimate zres = predict(zfitm, grid) pred_grid = vec(zres.pred) ax[i] = Axis(f[1, i]; title = lev[i], xlabel = "LV1", ylabel = "LV2") co = contourf!(ax[i], grid[:, 1], grid[:, 2], pred_grid; levels = 10) scatter!(ax[i], zT[:, 1], zT[:, 2], color = :red, markersize = 10) scatter!(ax[i], Ttest[k, 1], Ttest[k, 2], color = :blue, marker = :star5, markersize = 15) hlines!(ax[i], 0; linestyle = :dash, color = :grey) vlines!(ax[i], 0; linestyle = :dash, color = :grey) xlims!(ax[i], -4, 4) ylims!(ax[i], -1.7, 1.7) Colorbar(f[2, i], co; label = "Density", vertical = false) end f
QDA estimate
In QDA, the classes are assumed to have different covariance matrices.
res = matW(Ttrain, ytrain, weights) Wi = res.Wi ni = res.ni npoints = 2^7 x1 = LinRange(-4, 4, npoints) x2 = LinRange(-2, 2, npoints) z = mpar(x1 = x1, x2 = x2) grid = reduce(hcat, z) z = mpar(x1 = x1, x2 = x2) grid = reduce(hcat, z) f = Figure(size = (900, 400)) ax = list(nlev) for i in eachindex(lev) zs = ytrain .== lev[i] zT = Ttrain[zs, :] zS = Wi[i] * ni[i] / (ni[i] - 1) zfitm = dmnorm(colmean(zT), zS) # Gaussian estimate zres = predict(zfitm, grid) pred_grid = vec(zres.pred) ax[i] = Axis(f[1, i]; title = lev[i], xlabel = "LV1", ylabel = "LV2") co = contourf!(ax[i], grid[:, 1], grid[:, 2], pred_grid; levels = 10) scatter!(ax[i], zT[:, 1], zT[:, 2], color = :red, markersize = 10) scatter!(ax[i], Ttest[k, 1], Ttest[k, 2], color = :blue, marker = :star5, markersize = 15) hlines!(ax[i], 0; linestyle = :dash, color = :grey) vlines!(ax[i], 0; linestyle = :dash, color = :grey) xlims!(ax[i], -4, 4) ylims!(ax[i], -1.7, 1.7) Colorbar(f[2, i], co; label = "Density", vertical = false) end f
Non-parametric KDE estimate
In KDE, the densities are estimated by kernel estimators (see function dmkern).
npoints = 2^7 x1 = LinRange(-4, 4, npoints) x2 = LinRange(-2, 2, npoints) z = mpar(x1 = x1, x2 = x2) grid = reduce(hcat, z) f = Figure(size = (900, 400)) ax = list(nlev) for i in eachindex(lev) zs = ytrain .== lev[i] zT = Ttrain[zs, :] zfitm = dmkern(zT; a = .5) # KDE estimate zres = predict(zfitm, grid) pred_grid = vec(zres.pred) ax[i] = Axis(f[1, i]; title = lev[i], xlabel = "LV1", ylabel = "LV2") co = contourf!(ax[i], grid[:, 1], grid[:, 2], pred_grid; levels = 10) scatter!(ax[i], zT[:, 1], zT[:, 2], color = :red, markersize = 10) scatter!(ax[i], Ttest[k, 1], Ttest[k, 2], color = :blue, marker = :star5, markersize = 15) hlines!(ax[i], 0; linestyle = :dash, color = :grey) vlines!(ax[i], 0; linestyle = :dash, color = :grey) xlims!(ax[i], -4, 4) ylims!(ax[i], -1.7, 1.7) Colorbar(f[2, i], co; label = "Density", vertical = false) end f
In the present example, it is clear from all the above figures that the new observation is located in the middle of the class 'setosa', and therefore that the considered PLSDA methods will predict it as belonging to this class.