Calculations are delegated to MASS::lda

stat_lda0(x, f, full = FALSE, ...)

stat_lda(x, f, ...)

stat_lda_bootstrap(x, f, ..., k = 1000)

Arguments

x

matrix or [tibble][tibble::tibble-package] containing the explanatory variables forwarded to [MASS::lda]'s x`

f

factor forwarded to MASS::lda's grouping. If missing takes first columns and remove it from x.

full

logical whether to prepare useful components (default to FALSE)

...

addtional parameters forwardes to MASS::lda

k

integer number of permutations for stat_lda_bootstrap (default to 1000)

Details

With full=FALSE, stat_lda0 is roughly 6 times faster which justifies both stat_lda0 existence and full argument. Typically, stat_lda_bootstrap takes profit from that.

Functions

  • stat_lda0: Vanilla lda

  • stat_lda: Wrapped lda

  • stat_lda_bootstrap: Bootstrapped lda

References

stat_lda_bootstrap is based on: Evin, Cucchi, Cardini, Vidarsdottir, Larson and Dobney (2013) "The long and winding road: identifying pig domestication through molar size and shape." Journal of Archaeological Science 40:1 735‑43. https://doi.org/10.1016/j.jas.2012.08.005

Examples

(x <- dummy_df %>% stat_lda_prepare(foo2_NA, a:e))
#> ℹ stat_lda_prepare: dropping NAs
#> ℹ stat_lda_prepare: d were removed (var <1e-05)
#> ℹ stat_lda_prepare: b were removed (cor > 0.99999)
#> $df #> # A tibble: 98 x 4 #> foo2_NA a c e #> <fct> <dbl> <dbl> <dbl> #> 1 G 0.445 0.179 3.11 #> 2 C 1.50 0.775 3.20 #> 3 D -1.98 0.841 3.20 #> 4 F 0.125 0.287 3.19 #> 5 D -1.41 0.974 3.12 #> 6 F -1.07 0.680 3.13 #> 7 E 0.752 0.435 3.19 #> 8 I 0.588 0.0854 3.15 #> 9 B -0.158 0.315 3.15 #> 10 F 1.04 0.538 3.12 #> # … with 88 more rows #> #> $f_naked #> [1] G C D F D F E I B F D L A B K G D H A L G A D C F J H E E G F E D I G B A F #> [39] F G I J K D B B I I K B B C F B B C D F C D I E D E I B H L C G K A J B D G #> [77] D E H C K C H H H A H A L B H D H B E H H F #> Levels: A B C D E F G H I J K L #> #> $coe_naked #> # A tibble: 98 x 3 #> a c e #> <dbl> <dbl> <dbl> #> 1 0.445 0.179 3.11 #> 2 1.50 0.775 3.20 #> 3 -1.98 0.841 3.20 #> 4 0.125 0.287 3.19 #> 5 -1.41 0.974 3.12 #> 6 -1.07 0.680 3.13 #> 7 0.752 0.435 3.19 #> 8 0.588 0.0854 3.15 #> 9 -0.158 0.315 3.15 #> 10 1.04 0.538 3.12 #> # … with 88 more rows #> #> $cols_constant #> [1] "d" #> #> $cols_collinear #> [1] "b" #> #> $cols_NA #> foo2_NA a b c d e #> 2 0 0 0 0 0 #> #> $rows_NA #> [1] 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 #> [38] 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 #> [75] 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 #>
stat_lda0(x$coe_naked, x$f_naked)
#> # A tibble: 98 x 4 #> actual predicted correct posterior #> <fct> <fct> <lgl> <dbl> #> 1 A B FALSE 0.208 #> 2 A B FALSE 0.149 #> 3 A D FALSE 0.239 #> 4 A D FALSE 0.183 #> 5 A D FALSE 0.188 #> 6 A H FALSE 0.149 #> 7 A H FALSE 0.151 #> 8 B B TRUE 0.183 #> 9 B B TRUE 0.152 #> 10 B B TRUE 0.176 #> # … with 88 more rows
stat_lda(dummy_df, foo2_NA, a:e)
#> ℹ stat_lda_prepare: dropping NAs
#> ℹ stat_lda_prepare: d were removed (var <1e-05)
#> ℹ stat_lda_prepare: b were removed (cor > 0.99999)
#>
#> ── Linear Discriminant Analysis ────────────────────────────────────────────────
#> - 98 observations #> - 3 variables #> - 12 levels, unbalanced (N ranges from 3 to 13) #> #> - Accuracy: 0.163 #> - Within classes: min: 0.0306 (J), median: 0.0816 (C, E, G), max: 0.133 (B, D) #> - Posterior (correct): min: 0.127, median: 0.175, max: 0.4
b <- stat_lda_bootstrap(dummy_df, foo2_NA, a:e, k=10)
#> ℹ stat_lda_prepare: dropping NAs
#> ℹ stat_lda_prepare: d were removed (var <1e-05)
#> ℹ stat_lda_prepare: b were removed (cor > 0.99999)
gg_stat_lda_bootstrap(b)