Linear discriminant analysis — stat

Calculations are delegated to MASS::lda

stat_lda0(x, f, full = FALSE, ...)

stat_lda(x, f, ...)

stat_lda_bootstrap(x, f, ..., k = 1000)

Arguments

x	`matrix or [tibble][tibble::tibble-package] containing the explanatory variables forwarded to [MASS::lda]'s` x`
f	`factor` forwarded to MASS::lda's `grouping`. If missing takes first columns and remove it from x.
full	`logical` whether to prepare useful components (default to `FALSE`)
...	addtional parameters forwardes to MASS::lda
k	`integer` number of permutations for `stat_lda_bootstrap` (default to `1000`)

Details

With full=FALSE, stat_lda0 is roughly 6 times faster which justifies both stat_lda0 existence and full argument. Typically, stat_lda_bootstrap takes profit from that.

Functions

stat_lda0: Vanilla lda
stat_lda: Wrapped lda
stat_lda_bootstrap: Bootstrapped lda

References

stat_lda_bootstrap is based on: Evin, Cucchi, Cardini, Vidarsdottir, Larson and Dobney (2013) "The long and winding road: identifying pig domestication through molar size and shape." Journal of Archaeological Science 40:1 735‑43. https://doi.org/10.1016/j.jas.2012.08.005

Examples

(x <- dummy_df %>% stat_lda_prepare(foo2_NA, a:e))
#> ℹ stat_lda_prepare: dropping NAs
#> ℹ stat_lda_prepare: d were removed (var <1e-05)
#> ℹ stat_lda_prepare: b were removed (cor > 0.99999)
#> $df
#> # A tibble: 98 x 4
#>    foo2_NA      a      c     e
#>    <fct>    <dbl>  <dbl> <dbl>
#>  1 G        0.445 0.179   3.11
#>  2 C        1.50  0.775   3.20
#>  3 D       -1.98  0.841   3.20
#>  4 F        0.125 0.287   3.19
#>  5 D       -1.41  0.974   3.12
#>  6 F       -1.07  0.680   3.13
#>  7 E        0.752 0.435   3.19
#>  8 I        0.588 0.0854  3.15
#>  9 B       -0.158 0.315   3.15
#> 10 F        1.04  0.538   3.12
#> # … with 88 more rows
#> 
#> $f_naked
#>  [1] G C D F D F E I B F D L A B K G D H A L G A D C F J H E E G F E D I G B A F
#> [39] F G I J K D B B I I K B B C F B B C D F C D I E D E I B H L C G K A J B D G
#> [77] D E H C K C H H H A H A L B H D H B E H H F
#> Levels: A B C D E F G H I J K L
#> 
#> $coe_naked
#> # A tibble: 98 x 3
#>         a      c     e
#>     <dbl>  <dbl> <dbl>
#>  1  0.445 0.179   3.11
#>  2  1.50  0.775   3.20
#>  3 -1.98  0.841   3.20
#>  4  0.125 0.287   3.19
#>  5 -1.41  0.974   3.12
#>  6 -1.07  0.680   3.13
#>  7  0.752 0.435   3.19
#>  8  0.588 0.0854  3.15
#>  9 -0.158 0.315   3.15
#> 10  1.04  0.538   3.12
#> # … with 88 more rows
#> 
#> $cols_constant
#> [1] "d"
#> 
#> $cols_collinear
#> [1] "b"
#> 
#> $cols_NA
#> foo2_NA       a       b       c       d       e 
#>       2       0       0       0       0       0 
#> 
#> $rows_NA
#>   [1] 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0
#>  [38] 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
#>  [75] 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
#> 

stat_lda0(x$coe_naked, x$f_naked)
#> # A tibble: 98 x 4
#>    actual predicted correct posterior
#>    <fct>  <fct>     <lgl>       <dbl>
#>  1 A      B         FALSE       0.208
#>  2 A      B         FALSE       0.149
#>  3 A      D         FALSE       0.239
#>  4 A      D         FALSE       0.183
#>  5 A      D         FALSE       0.188
#>  6 A      H         FALSE       0.149
#>  7 A      H         FALSE       0.151
#>  8 B      B         TRUE        0.183
#>  9 B      B         TRUE        0.152
#> 10 B      B         TRUE        0.176
#> # … with 88 more rows

stat_lda(dummy_df, foo2_NA, a:e)
#> ℹ stat_lda_prepare: dropping NAs
#> ℹ stat_lda_prepare: d were removed (var <1e-05)
#> ℹ stat_lda_prepare: b were removed (cor > 0.99999)
#> 
#> ── Linear Discriminant Analysis ────────────────────────────────────────────────
#> - 98 observations
#> - 3 variables
#> - 12 levels, unbalanced (N ranges from 3 to 13)
#> 
#> - Accuracy:            0.163
#> - Within classes:      min: 0.0306 (J), median: 0.0816 (C, E, G), max: 0.133 (B, D)
#> - Posterior (correct): min: 0.127, median: 0.175, max: 0.4

b <- stat_lda_bootstrap(dummy_df, foo2_NA, a:e, k=10)
#> ℹ stat_lda_prepare: dropping NAs
#> ℹ stat_lda_prepare: d were removed (var <1e-05)
#> ℹ stat_lda_prepare: b were removed (cor > 0.99999)
gg_stat_lda_bootstrap(b)