Note: Move it to MomX when it’s ready

The aim of this vignette is two-fold:

Theorize what a minimalist and universal grammar for morphometrics could be. It is strongly flavoured with programmer perspective and enamoured with tibbles and particularly list-columns.
Glimpse at its implemenation in the MomX ecosystem

A grammar for morphometrics

This section is an essay on the programming side of morphometrics yet I believe it has more general merits.

Morphometrics is the study of shape (co)variation. It typically consists in mapping shapes acquired from images into synthetic descriptors.

A universal scheme would be¹:

raw + cov -E-> coo + cov -M-> coe + cov

raw, coo, coe are typically images, coordinates and coefficients, respectively
covariates can be absent, but are usually carried along the pipeline to explore covariations with coe
-E-> is the process of acquiring information such as landmark or outline coordinates from the raw object, typically captured as an image.
-M-> is the process of turning these coordinates into more suitable variables, named coefficients.

Dimensions of raw, coo and coe usually decreases along the pipe, they get more and more synthetic. Dimensions can be arbitrary though but let’s brush some examples.

Typically :

raw are images (array of RGB images, or matrix of grey images)
coo are coordinates (2-columns matrices of homogeneous length (landmarks) or maybe not (raw outlines coordinates) length, but this could anything else the most obvious beign 3D methods)
coe are coefficients whose dimensions are usually wider with less rows but more columns (think of Fourier coefficients)

MomX implementation

This section gives the shared principles of this grammar implementation in MomX packages. For details, consult each package vignettes. Also you can skip Momocs history yet it contextualises the “why” before the “how”.

A word on Momocs history

I started what would become Momocs in 2011 and still, until spring 2020, I have never been really happy with its general architecture. It (or rather I) lacked a theoretical thought of what morphometrics are, on the developper-side (this section), that could also be implemented in an user-friendly way (next one).

My attempts were numerous and some were painful. Among these attempts, two Momocs generations were available: first, the early S4 version (the one published in the Journal of Statistical Software); then the S3 version alos dealing with open curves and landmarks (all Momocs version >= 0.3).

I was not happy with these since coo and coe were kind of free-floating and were not natively tight together and the programmatic glue was incomplete, error-prone and overall boring for both programmer-me and front-user-me (and likely front-user you too).

In the meantime, the tidyverse became real and then matured. I also realized list-columns existed since the early days, including in in gold old data.frames². And then it took me a little more time to understand how to code using them in a tidyeval framework.

Changes were too strong for releasing a Momocs version 2.0. I was kind of reluctant to it at first, but it became Momocs2 instead. I also seized the occasion³ to split the beast into smaller beauties that fed first versions of other MomX packages.

A love declaration to list-columns

In brief, everything is a tibble now.

Across MomX, the main object is mom, are tibble, slightly augmented to carry list-columns informations. Typically, one or more columns in these tibbles are lists. Lists are vectors that can carry numeric, factors, logical and nested lists. These list-columns can thus contain things with very different type and dimensions like image paths, images, shape coordinates, coefficients, statistical juice, etc.

Brush up your list skills reading this vignette (todo)

Working with such wonderful data structure, firmly tied up with other individual variables carried (or not) comes at a very special price: it’s not cheap, it’s not free, it’s a cash machine! For you, my beloved front-user, it only bring significant pay-offs (at least compared to Momocs senior):

All MomX objects are tidyverse-ready: you can natively use ggplot2, dplyr, purrr, stringr and friends on them;
Coordinates, coefficients, etc. are tied to their covariates in their natural structure (a data.frame row). Before, they were kind of free-floating in $coo/$coe in one hand, and in $fac on the other hand. This was boring to program and use, limiting and risky.
The number of such partitions is no longer limited. For example, For a single shape, you can now have as many variants of their original coordinates, and as many coefficients calculated from them.
Adding new morphometrics methods resumes to writing a coo -> coe mapping method (eg efourier), possibly and inverse method (coe -> coe) to reconstruct coordinates from coefficients.
Adding new coordinates shapes, eg images rather than coordinates resumes to writing a new *_single method (eg img_single that would, overall behave like coo_single does)

For the programmer-me, there are gazillions of smaller technical benefits. In brief, with a tibble nature, tidyverse comes for free. This massively simplifies code architecture and readability and makes a huge difference for maintenance and future development.

On the benefits of tidyeval

You are already familiar with tidyeval: it allows the following to work:

filter(iris, Species=="setosa")

Try typing Species on the console: it does not exist; it only does in the iris environment. Practically it does not only means removing quotes, it overall simplify your code and help you focus more on what you want rather than on how to get it.

On the programming side, this behaviour is handled by tidy evaluation that delays evaluation: R waits a bit to turn Species into what it is suppose to means, it only evaluates it when we ask it, within the select context in this example.

MomX package boundaries

Remember that ?

raw + cov -E-> coo + cov -M-> coe + cov

MomX packages have a strong scope on what they handle, for instance:

Momacs handles -E->
Momit handles harvest of raw or turning them into coo for foreign formats (eg .tps, .txt or .mom)
Momocs2 handles -M->, ie shape manipulation and shape morphometrics
Momstat helps exploring coe x cov relationships. Momecs make this exploration interactive.
Momdata provide toy raw, coo and coe
etc.

Momocs architecture

Momocs2 is a rewrite of Momocs. The main changes are:

Momocs2 now only deals with core morphometrics operations: typically coo manipulation and morphometrics methods that turn them into coe.
Importing methods are in Momit, stat methods in Momstats. More generally see MomX. This is a much easier way for me to maintain packages and develop more.
Coo and Coe classes are now tibble, in other words tidyverse-ready
Restrictions are alleviated so that additional methods should be easier to add.
All graphics are ggplot2
You’re not bound to Momocs: you can come, use some of it and return to your preferred environment 😄

coo functions

coo_modifyers modify the shapes: return coo_single on coo_single, modify on place coo columns for coo_tbl
coo_descriptors describe the shape: return value on coo_single, create new columns on coo_tbl.

You can pass single shapes or list of shapes to all coo_* methods. You’re not bound to Momocs, you can come, use it and return to your preferred environment.

example here

I think this is universal whatever the morphometrics flavour, but I would be happy to be wrong↩︎
try data.frame(x=list(1, 2), y=3). The printing method is less nicer than with tibble::tibble(x=list(1, 2), y=3) but still, it would work↩︎
and the covid19 confinment happened timely↩︎

Philosophy