Question:
Least-squares regression line help?
lickitysplit
2007-03-12 16:17:08 UTC
Okay, at any point correct me if I'm wrong.

y = mx + b, correct?
m = r (sy/sx) and b = y-bar - m(x-bar).

Does the r mean residual? If so, according to my book, the residual is observed y minus predicted y.
Well, the problem I am trying to work through gives height (x) [27.75; 24.5; 25.5; 26; 25; 27.75; 26.5; 27; 26.75; 26.75; and 27.5] and head circumference (y) [17.5; 17.1; 17.1; 17.3; 16.9; 17.6; 17.3; 17.5; 17.3; 17.5; and 17.5]. They want me to find the least-squares regression line, but how am I supposed to do that if I don't know what the observed y and minus y are? (I'm using Minitab, by the way.) Now, the answer in the back of the book is Head = 0.1827(Height) + 12.4932. But it doesn't say how it got the answer (specifically, how to get a residual from those two sets of numbers) and I need to know.

Any help would be much appreciated, particularly explaining this stuff in a way that a non-mathemetical person would understand.
Three answers:
Hy
2007-03-12 16:28:34 UTC
I'm very rusty on this stuff, but no-one else has answered yet so let me inflict my ignorance on you.

I have a dim recollection that r is the correlation coefficient of x and y. ??? If that's wrong I can't help, because I don't remember anything about residuals (too many decades ago!)
bloggerdude2005
2007-03-12 16:39:26 UTC
Let me make your life easier before I go into all the mumbo jumbo. Go to Excel or any other tabular format and do this:



1. Plug in the observed X values into the formula:

.1827(height)+12.4932



And whatever value you get is the Predicted Y.

2. Place those values in a separate column.

3. The difference between the Actual Y (your Head Circumference values) and that column (the Predicted Head Circumference) is the residual.

4. In a separate column, square the residual.

5. Finally, add up the residuals. This is yur SUM OF SQUARED ERRORS. The linear regression minimizes the sum of squared errors- this value- hence why it is called least squares regression.





You asked about m = r (sy/sx)



r here is the CORRELATION BETWEEN X AND Y!

sy and sx are the STANDARD DEVIATIONS of X AND Y.

The Standard DEVIATION is simply:

Square Root of [SUM of [(The Actual Value-Mean Value)^2/N-1)]]



where N is the number of observations you have, I think 11.

So you sum up the squared deviations from the mean, divide by N-1, and then take the square root. That is the Standard Deviation. Do this for both Y and X.

Step by step:



1. Find the Mean Y and Mean X (simply add up all X's or Y's, and divide by the number you have, to get the arithmetic mean).

2. For each Y or X, find the difference between that value and the mean you just calculated.

3. In a separate column, input the squared difference (answer to step 2 squared).

4. Add up the entire clumn of squared differences and divide by N-1, where N= number of observations you have, 11.

5. Take the square root of that. Do this for both the X and Y vaues. These represent the sdy(standard deviation y) and sdx(standard deviationx).

6. Do a correlation procedure, which I think Minitab should have as part of its features.

7. Multiply the correlation by the value (sy/sx), which you have from what you calculated.

8. This is your Beta coefficient, which means how much Y is expected to change for each 1 unit change in X.

9. The Intercept tells you where the regression line crosses the Y axis (that is, when x=0). The Intercept is 12.4932.

How do you manually calculate the intercept? You plug in the Mean X as your X and multiply by the Beta Coefficient you just calculated, and set it so that it equals:



Mean Y=Intercept +Beta Coefficient*(Mean X)



Then simply do:

Y(mean)-BetaCoefficient*X(mean)

=Intercept.



Voila, you have your regression formula!

Keep in mind these procedures only apply to the

simple regression, where you haveonly one x. When

you have more than one x, you cannot use the

formula for the Beta coefficient from above.

All the data you already have: you know the Y and X mean values, and you just calculated the Beta Coefficient in Step 7.



This is all you need!





More Mumbo Jumbo (Optional)



The simple regression model is a regression of head circumference (y) on height (x). What this really means is that we want to find out what effect, if any, height has on head circumference.



The "m" here is like the slope. As you may know, the slope defines the "rise over run". In regression models, the m=beta coefficient, and it states that for a 1 unit change in x, there will be a certain "beta" change in Y.



Do you have Excel? I am not too familiar with Minitab, but Excel does have a regression option. In some regressions we assume that the Y intercept is 0, but in this case we don't make that assumption. Simply place all values for x and y in columns, and do a regression of the y on x.



The regression line here will be a line that minimizes the squared deviations of the predicted y values from the observed y vaues(that is, it minimizes the squared errors, "sum of e^2". The computer program will create its own predicted Y vaues: you don't have to do that part yourself. The entire purpose of the model is to get the predicted Y values as close to the actual observed values, hence "least squares regression". Again, the computer does this part for you (it uses some tricky matrix algebra which most people would rack their brains out doing by hand).



So, for example, we model Y on X and get:



Y-actual=12.4932(in regressions we tend to place the Y intercept first)+.1827x+e



Ok so that is the Y-actual, what we really have. Notice that above I added an "e" term, which is the error. That is because the regression almost never completely predicts the Y value exactly as the observed value is. Does this make sense? If the line fr the regression passed through each and every observed y value (the values you actually measure), there would be no error.



error=0.

Because:

[Y-predicted=Intercept+BetaX]

-[Y-observed=Intercept+BetaX+error}

=error



But in reality the error is greater than 0. Your residual is just the difference between the predicted Y and the Observed Y. How do you get the residual? Very EASY. Just plug in the X's for the model, and get a Predicted Y. The differnece between Predicted and Actual is the residual (it's the error for each observation).
Eoas
2007-03-12 16:41:28 UTC
I just finished my assignment on regression and my teacher describes it in a different way. I don't think they mean residual by r in that formula. The formulas assume that the average of all the values will fall along the regression line which makes sense because that is what we are trying to fit. With this assumption, the slope or m is the summation of the product of all the x,y pairs divided by the summation of the x values squared. With the slope you can get the intercept b which is the average of the y values minus the average of the x values times the slope.



hope this helps


This content was originally posted on Y! Answers, a Q&A website that shut down in 2021.
Loading...