On Functional Diversity metrics

Summary: FD is getting very popular, so I figured that would be good to post not only the code (mostly borrowed) I am using: https://github.com/ibartomeus/fundiv, but also what I learnt of it. This posts assume you have read about FD calculation a bit.

I’ve been working with functional diversity (FD) for a couple of years now, and some papers are to come applying this concept to pollinator (and other insect). You can find some help on calculation FD in Owen Petchey‘s webpage and in FD package. However,  Petchey’s et al FD is not in an easy to use R function, so I uploaded some wrap up R functions to calculate the principal indexes, along with some null models and other stuff. There is a brief explanation on how to install and use the package there. But first some background: FD calculates the diversity of traits present in the community, and is a cool concept, because at a same number of species, a community can have quite homogenous or diverse traits, and this can be important to depict mechanisms of ecosystem functioning or responses to global change. However I have a hate-love relationship with the concept.

The first obvious one is that which traits are considered will directly affect the conclusions you get. The garbage in -> garbage out concept applies here, but even if you try to select appropriate traits, trait information quality is usually not good for most species, we don’t know what traits are responsible for which functions and any choice of traits is more subjective than a simple tally of species richness (or Phylogenetic diversity). However, if you identify traits that are meaningful for your question (e.g. body size is the usual suspect), can potentially be a very powerful tool.

A second big problem is that there is quite a few metrics out there. Just reading the literature is quite daunting, so I`ll give you my opinion after implementing them in R and using them on a lot of different datasets. First problem is that Functional diversity calculated as a dendrogram (Petchey approach) or as a space based (FD package approach) metric is not equivalent (they are correlated, but not strongly), and this is bothering me. If I have to choose, I like better dendogram based solutions. Why? You have the ability to calculate FD for communities with very few species (even richness = 1 species). This is useful sometimes, especially with simulations of random species removal. You can easily visualize the dendogram, but not a 8D space. For me dendograms are also easier to understand than PCoA’s. Moreover, the original distance matrix used tend to be better described by dendograms than by PCoA’s (in several datasets). Space based calculations crash more often than one would want and you are forced to drop lots of axes to calculate hypervolums (and sometimes is impossible to get some metrics for concrete datasets). Also, Feve and Fdiv are hard to interpret for me, even after reading about them several times. Said so, I have to say that we found Feve to predict fairly good function, and better than dendogram based indexes. Dendrogram based indexes are not perfect either, and specially they need for a better coverage of different complementary metrics, including not only Frich, but also equivalents of Feve and Fdis. I developed a weighted version of Frich (see link to package above), and an evenness one can be easily calculated using phylogenetic imbalance metrics (or see treeNODF!). If anyone is interested in exploring this options further drop me an email.

Advertisements

About Data

I recently did a workshop about data with PhD students. That was great to order my thoughts and put together a lot of good resources. All material used (with lots of links) is available in GitHub: https://github.com/ibartomeus/Data. Topics range from hardcore R code to read, clean and transform data, to discussions about data sharing and reproducibility. Few PhD students thought about sharing their data before the workshop, and I hope I convinced some of them, not sure about that. However, I clearly convinced them to have reproducible workflows and ditch excel, which is a win.

In the last years I worked with lots of different data (mine and from others, big and small) and I have to say most datasets were poorly formated, maintained and documented. I think we do not give enough importance to data curating practices, but they are central to the scientific process. The feedback of the students is that the workshop was very enlightening for them and that they never received formal or informal advice on how to deal with data. If things are going to change, we need to do an active effort to form the new generation of scientists in a open data culture.

Short guideline on multi-authored papers.

After being on the two sides of the story (first author and one more of dozens of co-authors) I already made a few errors someone may find useful to know, specially since multi-author papers (more than 10-15 authors from different institutes) are becoming normal (and I am not judging if this is good or bad*, is just happening).

1. Talk about co-authorship early on, but with conditions. This things should be talked at the beginning of the collaboration, because there is nothing more awkward than someone thinking that he/she is coauthoring a paper, while the lead author thinks that he/she is a data provider. However, do not grant co-authorship before even starting the project. Make clear that someone will be a co-author if his/her contribution is [fill in here your expectations] (e.g. the data provided ends up being critical for the paper, you are engaged on ms writing, etc…). By clear I meant very clear.

2. Establish feed-back points. This one is very tricky, because first authors (or the core team leading the paper) do not want 50 people commenting in every decision, but they don’t want coauthors to end up not contributing much. On the other hand, some coauthors want to be more involved than others, but they need to be offered the opportunity to contribute in order to do so. I would recommend to fix at least three points to provide feed-back. First a draft of the questions, hypothesis to be tested and which approach will be used. Second a draft with the main results/ figures. And third a first draft of the paper. Even this seems really a minimum, I made myself the error of not sending almost anything before the complete draft of the paper was ready to some coauthors.

3. Make all correspondence open. Always include all coauthors in the emails with drafts or results to discuss. All coauthors should be able to see other people comments. This is specially important when two coauthors disagree on something. The lead author has the final word, but other coauthors should discuss the disagreement between them (and hopefully agree on something) and in front of all other coauthors.

4. Be clear on what you want. That applies also to both sides. As a first author is very useful to tell people what do you want from them. Instead of letting people comment on whatever they want (they will do that anyway), ask for specific questions. Can some native speaker check my grammar? Can you go trough the mathy part and make sure it is correct? With several coauthors you have the risk that everyone will hope the other will look on the three pages of equations, and no-one ends up doing it. As a co-author is always nice also to state on what do you want to contribute. Even if you think that you will be “near the end of the list”, if you want to be more engaged and have clear ideas on how to redo an analysis, or enhance a figure, say it! (Author order is /should be flexible, so you may end up among firsts authors if you contribute)

Lastly, those are just suggestions, and all of them refer to one basic idea: enhance communication.

*I do think more than 10 authors are rarely useful…