intro.Rmd
Note: Move it to MomX when it’s ready
The aim of this vignette is two-fold:
This section is an essay on the programming side of morphometrics yet I believe it has more general merits.
Morphometrics is the study of shape (co)variation. It typically consists in mapping shapes acquired from images into synthetic descriptors.
A universal scheme would be1:
raw + cov -E-> coo + cov -M-> coe + cov
-E->
is the process of acquiring information such as landmark or outline coordinates from the raw object, typically captured as an image.-M->
is the process of turning these coordinates into more suitable variables, named coefficients.Dimensions of raw
, coo
and coe
usually decreases along the pipe, they get more and more synthetic. Dimensions can be arbitrary though but let’s brush some examples.
Typically :
array
of RGB images, or matrix
of grey images)matrices
of homogeneous length (landmarks) or maybe not (raw outlines coordinates) length, but this could anything else the most obvious beign 3D methods)This section gives the shared principles of this grammar implementation in MomX packages. For details, consult each package vignettes. Also you can skip Momocs history yet it contextualises the “why” before the “how”.
I started what would become Momocs in 2011 and still, until spring 2020, I have never been really happy with its general architecture. It (or rather I) lacked a theoretical thought of what morphometrics are, on the developper-side (this section), that could also be implemented in an user-friendly way (next one).
My attempts were numerous and some were painful. Among these attempts, two Momocs generations were available: first, the early S4 version (the one published in the Journal of Statistical Software); then the S3 version alos dealing with open curves and landmarks (all Momocs version >= 0.3).
I was not happy with these since coo
and coe
were kind of free-floating and were not natively tight together and the programmatic glue was incomplete, error-prone and overall boring for both programmer-me and front-user-me (and likely front-user you too).
In the meantime, the tidyverse became real and then matured. I also realized list-columns existed since the early days, including in in gold old data.frame
s2. And then it took me a little more time to understand how to code using them in a tidyeval framework.
Changes were too strong for releasing a Momocs version 2.0. I was kind of reluctant to it at first, but it became Momocs2 instead. I also seized the occasion3 to split the beast into smaller beauties that fed first versions of other MomX packages.
In brief, everything is a tibble now.
Across MomX, the main object is mom
, are tibble, slightly augmented to carry list-columns informations. Typically, one or more columns in these tibbles are lists. Lists are vectors that can carry numeric, factors, logical and nested lists. These list-columns can thus contain things with very different type and dimensions like image paths, images, shape coordinates, coefficients, statistical juice, etc.
Brush up your list skills reading this vignette (todo)
Working with such wonderful data structure, firmly tied up with other individual variables carried (or not) comes at a very special price: it’s not cheap, it’s not free, it’s a cash machine! For you, my beloved front-user, it only bring significant pay-offs (at least compared to Momocs senior):
ggplot2
, dplyr
, purrr
, stringr
and friends on them;data.frame
row). Before, they were kind of free-floating in $coo
/$coe
in one hand, and in $fac
on the other hand. This was boring to program and use, limiting and risky.coo -> coe
mapping method (eg efourier
), possibly and inverse method (coe -> coe
) to reconstruct coordinates from coefficients.*_single
method (eg img_single
that would, overall behave like coo_single
does)For the programmer-me, there are gazillions of smaller technical benefits. In brief, with a tibble
nature, tidyverse comes for free. This massively simplifies code architecture and readability and makes a huge difference for maintenance and future development.
You are already familiar with tidyeval: it allows the following to work:
filter(iris, Species=="setosa")
Try typing Species
on the console: it does not exist; it only does in the iris
environment. Practically it does not only means removing quotes, it overall simplify your code and help you focus more on what you want rather than on how to get it.
On the programming side, this behaviour is handled by tidy evaluation that delays evaluation: R waits a bit to turn Species
into what it is suppose to means, it only evaluates it when we ask it, within the select
context in this example.
Remember that ?
raw + cov -E-> coo + cov -M-> coe + cov
MomX packages have a strong scope on what they handle, for instance:
-E->
raw
or turning them into coo
for foreign formats (eg .tps
, .txt
or .mom
)-M->
, ie shape manipulation and shape morphometricscoe x cov
relationships. Momecs make this exploration interactive.raw
, coo
and coe
Momocs2 is a rewrite of Momocs. The main changes are:
You can pass single shapes or list of shapes to all coo_*
methods. You’re not bound to Momocs, you can come, use it and return to your preferred environment.
example here
I think this is universal whatever the morphometrics flavour, but I would be happy to be wrong↩︎
try data.frame(x=list(1, 2), y=3)
. The printing method is less nicer than with tibble::tibble(x=list(1, 2), y=3)
but still, it would work↩︎
and the covid19 confinment happened timely↩︎