P-hacking and Paul Feyerabend

P-hacking, or researcher degrees of freedom, it’s a worrying issue in science. Specially, because p-hacking is not a black and white issue. On the blackest side there is deliberate p-hacking with the only purpose to advance your career. This is bad, but I hope it’s rare*. The grey area is more intriguing, because it concerns researchers not doing it consciously. I used to think that this include researchers that never had a proper statistical training, with too much pressure to publish too small datasets or that fool themselves thinking that this new analysis/subset of the data is what he/she should be testing in first place, so it doesn’t matter really the 200 previous analysis/subsets (which is false, they matter!). This is equally bad for science (even if the motivation is not as bad).

But then I read “against method” of Paul Feyerabend**. Despite some passages are really slow and repetitive, I liked it. A big part of the book explains Galileo Gallilei story. Galileo changed the paradigm based in incomplete theory, iffy data and measurement tools, and lots of propaganda. He used more its intuition than a proper scientific method. He was still right and most of his ideas were confirmed years later.

And that rang a bell. I’ve heard before scientists saying things like “well, we can’t measure it accurately, but trust me I know the system and this is what is happening”. From here to do a bit of conscious or unconscious p-hacking to support your hypothesis there is a small step. This researchers are using intuition, hours of thought and lots of knowledge. This scientists are putting forward their ideas. Ideas in which they believe, but they can’t just prove unequivocally with the data at hand because of the complexity of the problem.

Paul Feyerabend said that “everything goes” if it advances science. I am not justifying p-hacking to support something that it’s hard to  prove but you think is true, but after reading Feyerabend I am also less worried about adding some subjectivity to the scientific method, because being completely objective and following the method strictly may also slow down science. Maybe the middle ground is being able to recognizing when something is an opinion, and not facts, and avoid sticking a p-value to this opinion, but defend it anyway in the light of the data available and try to push forward the agenda to get better data, better methods, or whatever you need to support it. It’s complicated.


*people that only want to advance their careers choosed politics in first place, not science, right?  I know this is probably a wrong assumption.

**In a nutshell he praises that an objective scientific method is unattainable and rarely applied, and that we should free ourselves from using it as the single tool to do science. I liked for example the idea of aiming to create a plethora of theories (with no historical constraints or resistance from the status quo to accept compatible alternative explanations) that can cohabit and let time to do the thinning a posteriori. More on wikipedia.

Preferring a preference index II: null models

This is a guest post by my PhD student Miguel Ángel Collado. My last post on preferring a preference indexes was not satisfactory to us, so we have better options now. Read Miguel Ángel solution below.


We are working on the ecological value of various habitats or sites. In addition to different classical biodiversity indexes, we want to know if we have some sites that are not specially diverse, but they have some ecologically important species attached to them, we could measure this through preference analyses, using null models to compare with our data.

We can define “preference” for an species if the presence of this species on a given site is bigger than expected by random. A way to know this is comparing to null models and establishing an upper threshold for preference, and a lower one for avoidance, this way we would know whether some species of interest have affinity for some sites or just use them as expected.

To see an example of this

Food consumption and global change

A conversation today at lunch time made me think about some notes I took on this topic, which I reproduce here:

Jonathan Foley gave a pretty convincing talk at ESA 2013 showing that meat consumption is unsustainable for the environment (i.e. land use + CO2 emissions). This was “the straw that broke the camel’s back”* for me and since then I reduced my meat consumption quite drastically.

However, I read a few days ago this paper showing that changing meat for vegetables and fruits can be even worse if you take also into account water footprint and energy use (e.g. transport and storage). I skip the details, but the bottom line is that the story is complicated and the best way to save the world is to reduce calorie intake and eat lots of grains. Here is Figure 2 from the paper (the paper style and figures are quite poor, by the way).

Tom_2015flexiterian_pdf__page_5_of_12_.png

It’s hard because even if you want to do the best is not easy. Is it better for the environment to use bacon or eggplant with my pasta? No idea!**. If I knew the Y axe of the following graph things would be easier.

Blank.png

*this is what google suggest for translating “la gota que colma el vaso”.

**Is the bacon from pigs next door? Is the eggplant from Nicaragua?

Climate change, phenology match and the big unknown

This year was crazy in Seville with plants flowering 2-3 months earlier than last year. So we went to sample, and guess what: bees were there too. Despite expectations about phenological “mis-match” are raised here and there, we don’t find a big phenological mismatch between plants and pollinators*. I am not talking here of specific species, but taking a community approach. However, this is not the end of the story. Is good that plants and pollinators are in sync, but this alone doesn’t warrants a healthy ecosystem functioning.

Why not? My main worry is that after a mild January and beginning of February, we have now “normal cold days” again. Consequently, we also find little bee activity (today we are sampling at 14ºC just to make sure this is true). Hence, both plants and bees are likely to suffer. The demographic implications of this are hard to predict, maybe is not a big deal if it happens only one year, but if it happens often, I presume can be quite bad. All in all its hard to quantify, but I suspect that we need to go back to population dynamics if we want to understand climate change impacts beyond phenological overlaps.

*Don’t take this blog as word, there are plenty of good papers showing it (here and here), including my own (here and here), and very little showing a clear mismatch, most of those on specialized systems.

Ecoflor 2016

Ecoflor is an annual Spanish meeting on everything related to flowers (from evolution to pollinators). The level is amazingly high for being a small “unorganized” local meeting and the most important part is that is a fun forum to discuss crazy ideas, and not just finished work. Here there are some of the things I learnt this year in no particular order:

  • You can do biogeography using Arabidobsis taliana. Moreover, flowering time can be regulated by photoperiod or vernalization and you can map responsible gens across regions (by X. Picò).
  • Plants can cooperate or be selfish depending on its genotype (by R. Torices).
  • The coolest talk was on epigenetics, which can redirect the course of evolution. With experimental data on radish exposed to herbivory. (by M. Sobral).
  • Invasive Oxalis pes-caprae was thought to have only one morph in its invasive rage and hance reproduce vegetatively only, but the second morph has arrived (and its here to stay) (by S. Castro)
  • Plant-pollinator networks can be better plotted than with bipartite (by J. Galeano)
  • And it was the first time one of my students talked in public. Definitively a great talk by Miguel Angel Collado on pollinator habitat preferences.

Next year will be in Seville, join us*!

*You need probably to know some spanish, but some talks are always in english an all slides are english.

Fun Data for teaching R

I’ll be running an R course soon and I am looking for fun (public) datasets to use in data manipulation and visualization. I would like to use a single dataset that has some easy variables for the first days, but also some more challenging ones for the final days. And I want that when I put exercises, the students* are curious about finding out the answer.

[*in this case students are not ecologists]

Ideas:

-Movies. How many movies has Woody Allen? Is the number of movies per year increasing linearly or exponentially? That is a good theme with lots of options. IMDB releases some data, AND processing their terribly formatted txt files and assembling them would be an excellent exercise for an advanced class, but not for beginners. OMDB has an API to make searches and if you donate you can get the full database. And of course, there is an R package to use the API. This is better option for beginners.

-Music. Everyone likes music and there are 300Gb of data here. You can get also just a chunk, though, but still 2 Gb of data is probably too much for beginers.

-Football: I discarded this one for me because I know nothing about it, but I am sure it will be highly popular in Spain. An open database here.

Kaggel datasets are also awesome. To download them you just have to register. I may use the baby names per year and US state. Everyone is curious about the most popular name the year of your birthday, for example.

Earthquakes: This one also needs some parsing of the txt files (easier than IMDB) and will do for pretty visualizations.

-Datasets already in R: Along with the classic datasets on Iris flowers (used by Fisher!) or the cars dataset there are cooler options. For example there are lots of datasets for econometrics (some are curious), and Rstudio also released some cool ones recently (e.g. flights).

-Other: Internet is full of data like real time series, lots of small data examples, M&M’s colors by bag, Jeopardy questions, Marvel social networks, Dolphins social networks, …

Please, add your ideas in the comments, especially if you have used them with success for teaching R. Thanks!