random factors | Bartomeus lab

My datasets tend to have random factors. I try to stick to general models whenever I can to avoid dealing with both random factors and complex error distributions (not always possible). I am compiling some notes here to avoid visiting the same R-forums every time I work in a new project. Those are some notes to self, but may be they are useful to someone else. Note that those are oversimplified notes, they assume you know your stats and your R and only want a cheat sheet. Also, they may contain errors/discrepancies with your philosophy (please notify me if you find errors, so I can update it!)

Only piece of advice. Plot everything and understand your data.

I’ll designate:
Response variable –this is what we want to model as y.
Predictors –This is what we think is affecting the response variable as x, z when continuos, and A, B when discrete with levels denoted by A1, A2, etc…

Models abbreviated as m.

Let’s start assuming normal error distributions:

1) Easy linear model with one predictor:

`m <- lm(y ~ x)`

then check residuals. You want to see no pattern in the following plot:

scatter.smooth(residuals(m)~fitted(m))

and a straight line here

qqPlot(residuals(m))

qqline(residuals(m))

more diagnostics here: http://www.statmethods.net/stats/rdiagnostics.html

2) Let’s say you have also a Covariate `z`

m <- lm(y ~ z + x)

summary(m)

coef(m)

confint(m)

If you want the anova table

anova(m)

But the covariable should be added first because order matters. The result will be the variance explained by x after accounting for the variance due to A (that’s call a Type 1 error)

3) Otherwise what you actually have is not a covariable, but two predictors:

m <- lm(y ~ z + x)

summary(m) #same here

library(car)

Anova(m, Type = "II")

If there is indeed no interaction, then type II is statistically more powerful than type III

If the interaction is significant type III is more appropriate

m <- lm(y ~ z * x)

summary(m)

library(car)

Anova(m, Type = "III")

note: when your design is classic balanced ANOVA design, it doesn’t matter as all types gives you the same answer. More info here: http://mcfromnz.wordpress.com/2011/03/02/anova-type-iiiiii-ss-explained/

To plot interactions:

interaction.plot(A, B, y) #only works with A and B factors, so it rarely works for me

Or customize it by discretizing one of the variables (e.g. cut(z)) and fit lines for the two/three levels of z using the coefficients.

UPDATE: use predict for this

newdata <- data newdata$z <- 0 #chose a representative low value to fix model_pred_lowz <- predict(m, newdata, level = 0) #use level 0 to see the population level (i.e. not interested in random factors) plot(y ~ x) lines(x, model_pred_lowz) newdata$z <- 1 #chose a high value now model_pred_highz <- predict(m, newdata, level = 0) lines(x, model_pred_highz)

Also, if you have NA values, use na.exclude, not na.omit in the model, to add NA’s to the residuals.

4) How? By understanding the output, tricky specially when covariate is a factor.

m <- lm(y ~ A * x)

summary(m)

(Intercept) Value is the intercept for A1 (v1)
A2 value (v2) is the difference in the intercept between A1 and A2 (the intercept for A2 is calculated as v1-v2)
x Value is the slope for x given A1 (v3)
x:A2 Value (v4) is the difference in the slope for x between A1 and A2 (the slope of x for A2 is calculated as v3-v4)

5) Select models (by `AIC` or compare two models `anova` I know confusing name)

AIC(m)

library(MuMIn)

AICc(m) #for small sample size

anova(m1,m2)

But see below with random factors

6) Until now we wanted to see if there were differences among levels of `A.`

A included as fixed factor -> “the expectation of the distribution of effects”

lm(x ~ A + z)

But if we want to see if there are general trends regardless of A, but we want to control for the variability due to A. –> “average effect in the population”, we use:

a) random intercept model (we expect the trend to be equal, but some plants will differ on the intercept)

library(nlme)

lme(y ~ x, random = ~1 | A)

b) random slope model (we expect the trend to be dependent on species identity)

lme(y ~ x, random = ~1 + x | A)

In addition, now we can extract the random effects

m$coefficients

fixed and random factors, so we can see the random intercepts (and slopes) as well as

intervals(m)

7) To compare models in nlme

With different radom structures you should use method="REML" as an option. To compare models with different fix effects structure using anova use ML, but give final p-values with REML

8) With complex models you can drege them to find the best ones

library(MuMIn)

m_set = dredge(m_full)

(top_m = get.models(m_set, subset = delta<3))

and compute averages the regular way without shrinkage:

model.avg(top_m)$avg.model

See also names and str(model.avg(top_m))

Or estimates with shrinkage (zero method) – this makes the most difference for the less important predictors since they will get averaged with a bunch of zeros, probably: model.avg(top_m)$coef.shrinkage

9) But what if residuals are bad?

Don’t panic yet. Some times is because variance is different across groups/values, you can investigate that by:

boxplot(A, residuals(m, type = "norm"))

plot(A, residuals(m, type = "norm"))

and fix it by incuding a variance factor in the models weights=varIdent(form=~1|A) (for factors) or weights= varPower(form=~x) for continous variables (you can also try VarExp or varConstPower)

And to check if it worked:

plot(residuals(m, type = "normalized") ~ fitted(m))

The ref is Cleasby IR, Nakagawa S. 2011. Neglected biological patterns in the residuals. Behavioral Ecology and Sociobiology, 65(12), 2361-2372.

note that lm can not use the weights option, but I think, but you can use gls, instead.

10) Is also recomended to center `scale(x, scale = FALSE)` or scale `scale(x)` predictors.

Reference: Schielzeth, H. (2010), Simple means to improve the interpretability of regression coefficients. Methods in Ecology and Evolution, 1: 103–113.

11) You should also worry about co-linearity

library(car)

vif(m)

Should be lower than 2.5, and you should be very concerned when is over 10

or for lme’s load this little function from here (https://github.com/aufrank/R-hacks/blob/master/mer-utils.R)

vif.mer <- function (fit) { ## adapted from rms::vif v <- vcov(fit) nam <- names(fixef(fit)) ## exclude intercepts ns <- sum(1 * (nam == "Intercept" | nam == " (Intercept)")) if (ns > 0) { v <- v[-(1:ns), -(1:ns), drop = FALSE] nam <- nam[-(1:ns)] } d <- diag(v)^0.5 v <- diag(solve(v/(d %o% d))) names(v) <- nam v }

vif.mer(m)

12) and you may want to get R2 for glm also…

Based in Nakagawa, S., Schielzeth, H. (2013), A general and simple method for obtaining R² from generalized linear mixed-effects models. Methods in Ecology and Evolution, 4: 133–142.

library(MuMIn)

r.squaredGLMM(m)

——————————————————————————————————

More details and plotting here: https://www.zoology.ubc.ca/~schluter/R/fit-model/

For GLMM’s (generalized) there is a whole new set of tricky things, maybe will be another gist.

Bartomeus lab

Ecology, global change and pollinators

Tag Archives: random factors

Quick and dirty notes on General Linear Mix Models

1) Easy linear model with one predictor:

`m <- lm(y ~ x)`

2) Let’s say you have also a Covariate `z`

3) Otherwise what you actually have is not a covariable, but two predictors:

4) How? By understanding the output, tricky specially when covariate is a factor.

5) Select models (by `AIC` or compare two models `anova` I know confusing name)

6) Until now we wanted to see if there were differences among levels of `A.`

a) random intercept model (we expect the trend to be equal, but some plants will differ on the intercept)

b) random slope model (we expect the trend to be dependent on species identity)

7) To compare models in nlme

8) With complex models you can drege them to find the best ones

9) But what if residuals are bad?

10) Is also recomended to center `scale(x, scale = FALSE)` or scale `scale(x)` predictors.

11) You should also worry about co-linearity

12) and you may want to get R2 for glm also…

1) Easy linear model with one predictor:

m <- lm(y ~ x)

2) Let’s say you have also a Covariate z

3) Otherwise what you actually have is not a covariable, but two predictors:

4) How? By understanding the output, tricky specially when covariate is a factor.

5) Select models (by AIC or compare two models anova I know confusing name)

6) Until now we wanted to see if there were differences among levels of A.

a) random intercept model (we expect the trend to be equal, but some plants will differ on the intercept)

b) random slope model (we expect the trend to be dependent on species identity)

7) To compare models in nlme

8) With complex models you can drege them to find the best ones

9) But what if residuals are bad?

10) Is also recomended to center scale(x, scale = FALSE) or scale scale(x) predictors.

11) You should also worry about co-linearity

12) and you may want to get R2 for glm also…

`m <- lm(y ~ x)`

2) Let’s say you have also a Covariate `z`

5) Select models (by `AIC` or compare two models `anova` I know confusing name)

6) Until now we wanted to see if there were differences among levels of `A.`

10) Is also recomended to center `scale(x, scale = FALSE)` or scale `scale(x)` predictors.