Quick and dirty notes on General Linear Mix Models

My datasets tend to have random factors. I try to stick to general models whenever I can to avoid dealing with both random factors and complex error distributions (not always possible). I am compiling some notes here to avoid visiting the same R-forums every time I work in a new project. Those are some notes to self, but may be they are useful to someone else. Note that those are oversimplified notes, they assume you know your stats and your R and only want a cheat sheet. Also, they may contain errors/discrepancies with your philosophy (please notify me if you find errors, so I can update it!)

Only piece of advice. Plot everything and understand your data.

I’ll designate:
Response variable –this is what we want to model as y.
Predictors –This is what we think is affecting the response variable as xz when continuos, and AB when discrete with levels denoted by A1A2, etc… 

Models abbreviated as m.

Let’s start assuming normal error distributions:

1) Easy linear model with one predictor:

    m <- lm(y ~ x)

then check residuals. You want to see no pattern in the following plot:


and a straight line here 



more diagnostics here: http://www.statmethods.net/stats/rdiagnostics.html

2) Let’s say you have also a Covariate z

    m <- lm(y ~ z + x)




If you want the anova table


But the covariable should be added first because order matters. The result will be the variance explained by x after accounting for the variance due to A (that’s call a Type 1 error)

3) Otherwise what you actually have is not a covariable, but two predictors:

    m <- lm(y ~ z + x)

    summary(m) #same here


    Anova(m, Type = "II")

If there is indeed no interaction, then type II is statistically more powerful than type III

If the interaction is significant type III is more appropriate

    m <- lm(y ~ z * x)



    Anova(m, Type = "III")

note: when your design is classic balanced ANOVA design, it doesn’t matter as all types gives you the same answer. More info here: http://mcfromnz.wordpress.com/2011/03/02/anova-type-iiiiii-ss-explained/

To plot interactions:

    interaction.plot(A, B, y)
    #only works with A and B factors, so it rarely works for me

Or customize it by discretizing one of the variables (e.g. cut(z)) and fit lines for the two/three levels of z using the coefficients.

UPDATE: use predict for this

newdata <- data
newdata$z <- 0 #chose a representative low value to fix
model_pred_lowz <- predict(m, newdata, level = 0) #use level 0 to see         the population level (i.e. not interested in random factors)
plot(y ~ x)
lines(x, model_pred_lowz)
newdata$z <- 1 #chose a high value now
model_pred_highz <- predict(m, newdata, level = 0)
lines(x, model_pred_highz)

Also, if you have NA values, use na.exclude, not na.omit in the model, to add NA’s to the residuals.

4) How? By understanding the output, tricky specially when covariate is a factor.

    m <- lm(y ~ A * x)


  • (Intercept) Value is the intercept for A1 (v1)
  • A2 value (v2) is the difference in the intercept between A1 and A2 (the intercept for A2 is calculated as v1-v2)
  • x Value is the slope for x given A1 (v3)
  • x:A2 Value (v4) is the difference in the slope for x between A1 and A2 (the slope of x for A2 is calculated as v3-v4)

5) Select models (by AIC or compare two models anova I know confusing name)



    AICc(m) #for small sample size


But see below with random factors

6) Until now we wanted to see if there were differences among levels of A.

A included as fixed factor -> “the expectation of the distribution of effects”

    lm(x ~ A + z)

But if we want to see if there are general trends regardless of A, but we want to control for the variability due to A. –> “average effect in the population”, we use:

a) random intercept model (we expect the trend to be equal, but some plants will differ on the intercept)


    lme(y ~ x, random = ~1 | A)

b) random slope model (we expect the trend to be dependent on species identity)

    lme(y ~ x, random = ~1 + x | A)

In addition, now we can extract the random effects


fixed and random factors, so we can see the random intercepts (and slopes) as well as


7) To compare models in nlme

With different radom structures you should use method="REML" as an option. To compare models with different fix effects structure using anova use ML, but give final p-values with REML

8) With complex models you can drege them to find the best ones


    m_set = dredge(m_full)

    (top_m = get.models(m_set, subset = delta<3))

and compute averages the regular way without shrinkage:


See also names and str(model.avg(top_m))

Or estimates with shrinkage (zero method) – this makes the most difference for the less important predictors since they will get averaged with a bunch of zeros, probably: model.avg(top_m)$coef.shrinkage

9) But what if residuals are bad?

Don’t panic yet. Some times is because variance is different across groups/values, you can investigate that by:

    boxplot(A, residuals(m, type = "norm"))

    plot(A, residuals(m, type = "norm"))

and fix it by incuding a variance factor in the models weights=varIdent(form=~1|A) (for factors) or weights= varPower(form=~x) for continous variables (you can also try VarExp or varConstPower)

And to check if it worked:

    plot(residuals(m, type = "normalized") ~ fitted(m))

The ref is Cleasby IR, Nakagawa S. 2011. Neglected biological patterns in the residuals. Behavioral Ecology and Sociobiology, 65(12), 2361-2372.

note that lm can not use the weights option, but I think, but you can use gls, instead.

10) Is also recomended to center scale(x, scale = FALSE) or scale scale(x) predictors.

Reference: Schielzeth, H. (2010), Simple means to improve the interpretability of regression coefficients. Methods in Ecology and Evolution, 1: 103–113.

11) You should also worry about co-linearity



Should be lower than 2.5, and you should be very concerned when is over 10

or for lme’s load this little function from here (https://github.com/aufrank/R-hacks/blob/master/mer-utils.R)

    vif.mer <- function (fit) {
                  ## adapted from rms::vif
v <- vcov(fit)
nam <- names(fixef(fit))
## exclude intercepts
ns <- sum(1 * (nam == "Intercept" | nam == " (Intercept)"))
if (ns > 0) {
v <- v[-(1:ns), -(1:ns), drop = FALSE]
nam <- nam[-(1:ns)] }
d <- diag(v)^0.5
v <- diag(solve(v/(d %o% d)))
names(v) <- nam
                                    v }


12) and you may want to get R2 for glm also…

Based in Nakagawa, S., Schielzeth, H. (2013), A general and simple method for obtaining R2 from generalized linear mixed-effects models. Methods in Ecology and Evolution, 4: 133–142.




More details and plotting here: https://www.zoology.ubc.ca/~schluter/R/fit-model/

For GLMM’s (generalized) there is a whole new set of tricky things, maybe will be another gist.


3 thoughts on “Quick and dirty notes on General Linear Mix Models

  1. Pingback: Getting the most of mix models with random slopes | Bartomeus lab

  2. Pingback: Getting the most of mix models with random slopes | Nicely Kerned


Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s