Institute of Mathematical Statistics | Terence’s Stuff: Multiple Linear Regression, 1

Terence’s Stuff: Multiple Linear Regression, 1

July 18, 2012

It’s time to respond to: I’m curious about what you tell PhD students about multiple linear regression. I tend to focus first on regression coefficients: what they are and are not, why we might care, and how we compute them. Almost fifty years ago, I was lucky enough to be introduced to Yule’s new system of notation, new in 1907, that is. (Thank you, Dr Geoffrey Jowett.) Given a collection X₁, X₂, … , X_p of random variables, the expression b_12•3…p denotes the (linear least-squares) regression coefficient of X₁ on X₂, when X₃, … , X_p are also in the regression equation. As Yule put it in his paper, the first subscript gives the dependent variable, the second the variable of which the given regression is the coefficient, and the subscripts after the period show the remaining independent variables which enter into the equation. This avoids having to emphasize that the regression coefficient of X₁ on X₂ depends on the other variables in the equation: it’s right there in the notation! Mosteller and Tukey say it another way in chapter 13, Woes of regression coefficients, of their magnificent 1977 book: “a coefficient in a multiple regression – either in a theory or in a fit – depends on MORE than just: the set of data and the method of fitting [and] the carrier it multiplies. It also depends on: what else is offered as part of the fit.”

Having got this point clear, we now need to address the vexed question of how we interpret b_12•3, that is, the words we use when we say informally what it means. As we all know, some people call it the regression coefficient of X₁ on X₂, controlling for X₃. But we also know that in general X‘s in regressions are not under any control, so this cannot be a good description. My preference is to say adjusting for X₃. This is vague, but less likely to mislead, and definitely conveys the fact that X₃ is in the model along with X₂. It is also connected to the use of regression for linear adjustment. But what exactly is a regression coefficient? Again we all know the simplistic interpretation of b_12•3 as the average change in X₁ per unit change in X₂, when X₃ is held fixed. Why simplistic? At times “held fixed” makes no sense, an example being X₃ = X₂².

What can we say? A lengthy, but basically correct, interpretation goes like this: b_12•3 tells us how X₁ responds, on average, to change in X₂, after allowing for simultaneous linear change in X₃ in the data at hand.

Mosteller and Tukey point out that sometimes X‘s can be held constant, and then the important thing is to recognize just how large the difference can be between (i) X₂ changing while X₃ is not otherwise disturbed or clamped, and (ii) changing X₂ while holding X₃ fast. The first corresponds to the interpretation I gave, and the second is what people usually wish for. Complicated? Indeed, but as Oscar Wilde told us, “The truth is rarely pure and never simple.”

Yule also introduced the notation X_1•23…p = X₁ − b_12•3…pX₂ − …b_{1p•1…p−1}X_p. This can be very helpful when we want to show that multiple linear regression may be viewed as a sequence of simple linear regressions, of residuals on residuals. It is closely related to added variable plots. I think it’s important for students to know this, and how to derive it using the fact that (least-squares) residuals are orthogonal to all the variables after the period. For example, one can easily derive the identity
b_12•3 = b₁₂ − b_13•2b₃₂, which I have found extremely useful over the years. Here’s one thing you can see from this identity: the regression coefficient of X₁ on X₂ doesn’t change when X₃ is added into the regression equation, if either b₃₂ = 0, i.e., if X₂ and X₃ are orthogonal, or b_13•2 = 0. Another is the relation between adjusted and unadjusted means in ANCOVA. These identities are not hard to understand if you learn them when you are doing all your multiple regression computations with a mechanical calculator. Jowett showed us that if we use Jordan’s procedure for matrix inversion, “every intermediate quantity occurring in the calculation is either a partial regression coefficient or a partial covariance, and therefore of potential interest.” Try this step-by-step in R.

In a sense, our problems in interpreting regression coefficients are consequences of their simplicity when (X₁, X₂, … , X_p) are jointly normally distributed. In that case, everything works out so beautifully that we are seduced into thinking it applies more generally. But it doesn’t.

Next column: it’s why and how.

4 Commments

4 comments on “Terence’s Stuff: Multiple Linear Regression, 1”

Terence’s Stuff: Multiple Linear Regression, part 2 « IMS Bulletin

September 6, 2012 at 12:40 pm

[...] does it to my liking? I mentioned Mosteller & Tukey in my last piece on this topic, and once again I’m happy to say that they do a fine job on the different questions that lead us [...]

Understanding regression models and regression coefficients « Statistical Modeling, Causal Inference, and Social Science

January 5, 2013 at 2:43 pm

[...] connection with partial correlation and partial regression, Terry Speed’s column in the August IMS Bulletin (attached) is [...]

College Posts

July 18, 2013 at 10:56 pm

I every time spent my half an hour to read this weblog's articles all the time along with a mug of coffee.

Multiple Linear Regression Revisited | Honglang Wang's Blog

November 10, 2014 at 2:26 am

[…] from the multiple linear regression knowledge, we […]

4 comments on “Terence’s Stuff: Multiple Linear Regression, 1”

Leave a Reply Cancel reply