## Want to join the conversation?

Log in

Ethan Dlugie

12 years agoPosted 12 years ago. Direct link to Ethan Dlugie's post “How do we know that the s...”

How do we know that the squared error of the line is always less than (or equal to) the variance in the y values?

•

(40 votes)

Mohammed Rahman

11 years agoPosted 11 years ago. Direct link to Mohammed Rahman's post “Regression line is usuall...”

Regression line is usually the best fit for a given scatter plot, but if you draw the mean of Y that would just be a horizontal line in that plot, and definitely that horizontal line wouldn't be fit as good as the regression line. So the Total squared error from the mean (SE Y) would deinitely be greater than the total squared error from the regression line (SE Line). Therefore SE Line/ SE Y can never be greater than 1 and hence R^2 can't be negative. Hope that helps.

(70 votes)

daniel.corazza.89

9 years agoPosted 9 years ago. Direct link to daniel.corazza.89's post “Hello everyone. I have so...”

Hello everyone. I have some troubles understanding the concepts explained in this video on a deeper level. I would like to discuss some things.

In this video Sal talks a lot about the "variation in y" and "how much of the variation in y is described". However, I found myself wondering what is really this variation in y, what does it describe? Why do we care about this number?

I've always thought that the variance (or variation) of something is important when that something has a central tendency, and points tend to scatter randomly around that centre. The variance helps you quantify how much those points scatter around.

Here, however, we have y's that are positively correlated to the x's, which means that if you pick higher and higher values for x, you also get higher and higher values for y. So, there is really no central tendency for the y values, and in fact, the values you calculate for the "mean_y" and the "variation in y" will vary depending on which x values you choose. If we take points that have higher x's, our mean_y will increase, and if we take points with a wider range in x, our "variation in y" will also increase! So it seems to me that this "variation in y" has really no meaning in this context - it's an arbitrary number that depends on which x values we happen to choose. So why would we care about how much this random number we calculate and call "variation in y" is and how much of it is "explained" - whatever that even means?Now, I've managed to explain to myself what's been done here in a different way, and this kind of makes intuitive sense to me. Unlike the variation in y, the Standard Error is a much more significant concept in this context. It measures what's the error that one commits with their estimation of the relation between x and y (regression line).

The variation in y, as it was defined, measures the error from the mean_y. So, this is equivalent to the error that one commits if they fit the points with a horizontal line y = mean_y. Now that makes sense to me, it's what one would do if they had no better tools for fitting lines to points than saying "we want to fit a line to a bunch of points? Hey, why don't we just take a horizontal line that goes through the mean of the y values we have".

In fact, y = mean_y is the line of the form y = constant that minimizes the SE. So, it's the best line of its kind. Still, a constant line is the most basic model one could come up with, as a linear function, an exponential function, a quadratic function all can adapt better to points and have more "degrees of freedom" (more parameters to be played with) than a line y = constant. So, SE_y can be seen as the error that is committed by fitting points with the worst - or most basic - model available.

If we see things this way, SE_line / SE_y kind of measures how much of a better fit we have with our model compared to the most-basic model available.

Does that not make more intuitive sense?•

(20 votes)

Pranu

a year agoPosted a year ago. Direct link to Pranu's post “Yes, that was my question...”

Yes, that was my question as well! I hope it gets answered soon.

(2 votes)

Anshu Dwibhashi

10 years agoPosted 10 years ago. Direct link to Anshu Dwibhashi's post “Why do we minimize r squa...”

Why do we minimize r squared instead of r? I mean, why do we minimize the square distance instead of just the distance? Is it just for accuracy or something deeper?

•

(8 votes)

Madhur Devkota

11 years agoPosted 11 years ago. Direct link to Madhur Devkota's post “Why are we comparing Squa...”

Why are we comparing Squared error (

**S.E. line**) with total variation in y (**S.E. y**) ?

Thought it seems convincing and logical to compare it with**S.E.y**, my question is,**S.E.y**may not be perfectly absolute domain for**S.E.line**.•

(12 votes)

shimmichimmi

11 years agoPosted 11 years ago. Direct link to shimmichimmi's post “Can we say that the highe...”

Can we say that the higher the value of R2, the greater the probability the model is

correct?

and is the most important factor when comparing a model with any others is

to find the highest R2?•

(5 votes)

fosterz

11 years agoPosted 11 years ago. Direct link to fosterz's post “Not necessarily. R2 only ...”

Not necessarily. R2 only measures how well a line approximates points on a graph. It is NOT a probability value. How likely a model is correct depends on many things and is the subject of hypothesis testing (covered in future videos). It is possible (and common, even in science) that a linear model describes the data perfectly even though it is the wrong model for whatever process generated the data.

Say I am trying to model outdoor air temperature over time, but I only measure air temperatures once a day, and only during spring. If the data turns out linear (probably would not be), a best fit line could have a high R2, but the line would not describe 24h variation in temperature caused by night/day cycles. More importantly, the line would predict that temperature increases forever (since it was warming in spring, when we sampled), which clearly is not true, even under the most dire global warming predictions ; ). R2 only matters if you pick the right model and sample at the right resolution.(13 votes)

Bharath Rangarajan

13 years agoPosted 13 years ago. Direct link to Bharath Rangarajan's post “Issue: Variation in x doe...”

Issue: Variation in x does not refer to the line. Solution: M and b provide the best match to variation in y using a straight line model. Since data is not on a line, a line is not a perfect explanation of the data or a perfect match to variation in y. R-squared is comparing how much of true variation is in fact explained by the best straight line provided by the regression model. If R-squared is very small then it indicates you should consider models other than straight lines.

•

(7 votes)

tbeatty

13 years agoPosted 13 years ago. Direct link to tbeatty's post “If R-squared is close to ...”

If R-squared is close to zero, a line may not be appropriate (if the data is non-linear), or the explanatory variable just doesn't do much explaining when it comes to the response variable (y-variable). In that case, you should consider adding another explanatory variable (multiple regression), or find a new explanatory variable altogether.

(5 votes)

See AlsoR squared of a linear regressionsherrellbc

11 years agoPosted 11 years ago. Direct link to sherrellbc's post “6:40 This makes absolutel...”

6:40

This makes absolutely no sense at all. How does the mean of y (y_bar) subtracted from any given y represent an error? What does the average y (y_bar) represent? What if, as x increases, there*IS*an upward trend?This would make sense if the y value was a constant, say 6. You could measure the total error by taking the difference of each measured y and the value 6. The average, at least to me, really does not represent anything. So, how can a measured value of y over the average of all measured y's represent an error of anything? If the measured y's were for the same x value, then a variation in y could be measured as an error. But if the y has a relationship with x such that it increases as x increases, how does y/y_bar represent error in any sense?

-----------------------------------------

For example:You are given an unknown resistance. You decide to experimentally determine the resistance of the component by measuring its i-V (current, voltage) curve (response).

Given that X is voltage, and Y is current, you may measure something like this:

*In an ideal case:*

X = 10V, Y = 1Amp

X = 20V, Y = 2Amp

X = 30V, Y = 3Amp

If you plot this curve, there is quite obviously a linear relationship. And, if you are familiar with Ohm's relationship(LAW, if you like), we have the resistance = 10Ohms.-- The point is, as Voltage increases, current increases as well for any constant resistance R. So, we have a positively sloping linear relationship.

So, from the ideal case above.

y_bar = 2 Amps.

So, given what we have in this video:

The total error associated with our measured values(current, Y), is given by:

(y1-y_bar)^2 + (y2-y_bar)^2 + (y3-y_bar)^2 = (1-2)^2 + (2-2)^2 + (3-2)^2 = 2Given an ideal world, where the resistance was EXACTLY equal to 10Ohms, and we measured precisely the expected values of current needed to resolve this, how can we say that the measured data had a total error associated with our measured values of current equal to 2?

•

(2 votes)

Dr C

11 years agoPosted 11 years ago. Direct link to Dr C's post “You raised a number of po...”

You raised a number of points here, I'll try to address them all:

> "How does the mean of y (y_bar) subtracted from any given y represent an error?"

When we say "error" we're really meaning "deviation," specifically, deviation

*from the mean*. Ybar is a measure of center, or a "typical" value, and the deviations of (Y - Ybar) can be used to give us some idea of the spread around that measure of center.> "What does the average y (y_bar) represent?"

It represents the norm, or a "typical" y-value,

> "What if, as x increases, there IS an upward trend?"

That would be an indication of positive correlation between X and Y.

> "This would make sense if the y value was a constant, say 6. You could measure the total error by taking the difference of each measured y and the value 6."

I think this is where you may be going astray. If the y-value was a constant, like 6, there would be no variability, all the y-values would be 6. Just associate Ybar, the average, with this value of 6 that you conjecture.

The idea in correlation is to measure above average vs below average for both X and Y. Correlation is looking at when values are above/below average - meaning: higher than normal or lower than normal, and it is looking at this for both X and Y

*simultaneously*. In a sense, it's asking the question "*Are larger above-average Y-values associated with above-average X-values?*" This is why we care about and need the average.In your example with Ohms, you only calculated what we'd call the Sum of Squares for Y: SUM{ (Yi - Ybar)^2 }. This is close to the variance. You obtained a 2 (which is correct). And this IS a measure of variability for Y: not all of the Y-values are equal, so there is variability among the Y's! When we take X into account, we'd see that we have a

*deterministic*relationship, but if we look at each variable alone, there are differences, and hence variability.(11 votes)

Jeffrey Wan

11 years agoPosted 11 years ago. Direct link to Jeffrey Wan's post “I don't exactly see why w...”

I don't exactly see why we are comparing SEline to SEy. Why do we care about SEy here?

•

(5 votes)

fosterz

11 years agoPosted 11 years ago. Direct link to fosterz's post “The r-squared coefficient...”

The r-squared coefficient is the percentage of y-variation that the line "explained" by the line compared to how much the average y-explains. You could also think of it as how much closer the line is to any given point when compared to the average value of y. SEy is the total variation in y (sum of squared distances from the mean of y) and tells you the how much the data deviates from the mean of y. The variation in y gives you a baseline by which to judge how much better the best fit line fits the data compared to the y average.

(4 votes)

Lois Duhourcau

7 years agoPosted 7 years ago. Direct link to Lois Duhourcau's post “First of all, I feel this...”

First of all, I feel this video is genius with the pictural description of the errors vs. mean and errors vs. regression line. However, I am slow, and I am lost at the following step: when we say that SE(Line) shows what is NOT explained by the regression line; and therefore SE(Line)/SE(y) = what % of variation that is not explained by the variation in x.

A few questions:

1. Why is the variation in x equivalent to the regression line? Is this just linguistics to say that the regression line is the result of independent variable x to obtain y, so the regression line describes the variation in x and its impact on y?

2. SE(Line) / SE(y) essentially is equivalent to: the errors of the regression (which are thus not explained by the regression) divided by the sum of the total regression. For that reason, we know what % is not explained by the regression, and then we can deduct what is explained by the regression. Is that the right way to think about it?

•

(6 votes)

Parsa Abangah

a year agoPosted a year ago. Direct link to Parsa Abangah's post “1. Yes, the variation in ...”

1. Yes, the variation in x is equivalent to the regression line because the regression line is a function of the independent variable x. The regression line describes the relationship between x and y and summarizes the overall pattern of the data. The variation in x is important to consider because it helps explain the variability in y.

2. SE(Line)/SE(y) gives the proportion of the total variation in y that is not explained by the regression line. This proportion represents the amount of variation in y that is not accounted for by the variation in x. Therefore, we can say that the remaining proportion of variation in y is explained by the regression line. SE(Line)/SE(y) is essentially a measure of how well the regression line fits the data, and the closer it is to 1, the better the fit.

(2 votes)

Adriano Viann

10 years agoPosted 10 years ago. Direct link to Adriano Viann's post “What if the SE is higher ...”

What if the SE is higher than 1? Will (1 - SE/SEY) still making sense?

•

(2 votes)

Dr C

10 years agoPosted 10 years ago. Direct link to Dr C's post “The ratio of SE(line) / S...”

The ratio of SE(line) / SE(Y) cannot be larger than 1. If the X-variable tells us absolutely nothing about the Y-variable, then the two quantities will be equal, so the largest value SE(line) / SE(Y) can take on is 1.

(4 votes)

## Video transcript

In the last few videos, we sawthat if we had n points, each of them have x andy-coordinates. Let me draw n of those points. So let's call this point one. It has coordinates x1, y1. You have the secondpoint over here. It had coordinates x2, y2. And we keep putting points uphere and eventually we get to the nth point. That has coordinates xn, yn. What we saw is that there is aline that we can find that minimizes the squareddistance. This line right here,I'll call it y, is equal to mx plus b. There's some line that minimizesthe square distance to the points. And let me just review whatthose squared distances are. Sometimes, it's calledthe squared error. So this is the error betweenthe line and point one. So I'll call that error one. This is the error betweenthe line and point two. We'll call this error two. This is the error betweenthe line and point n. So if you wanted the totalerror, if you want the total squared error-- this is actuallyhow we started off this whole discussion-- thetotal squared error between the points and the line, youliterally just take the y value each point. So for example, youwould take y1. That's this value right overhere, you take y1 minus the y value at this pointin the line. Well, that point in the line is,essentially, the y value you get when you substitutex1 into this equation. So I'll just substitutex1 into this equation. So minus m x1 plus b. This right here, that is thethis y value right over here. That is m x1 b. I don't want to my get mygraph too cluttered. So I'll just deletethat there. That is error one rightover there. And we want the squared errorsbetween each of the points of the line. So that's the first one. Then you do the same thingfor the second point. And we started our discussionthis way. y2 minus m x2 plus b squared,all the way-- I'll do dot dot dot to show that there are abunch of these that we have to do until we get to the nthpoint-- all the way to yn minus m xn plus b squared. And now that we actually knowhow to find these m's and b's, I showed you the formula. And in fact, we've provedthe formula. We can find this line. And if we want to say, well,how much error is there? We can then calculate it. Because we now know them's and the b's. So we can calculate it forcertain set of data. Now, what I want to do is kindof come up with a more meaningful estimate of how goodthis line is fitting the data points that we have. And todo that, we're going to ask ourselves the question, whatpercentage of the variation in y is described by thevariation in x? So let's think about this. How much of the total variationin y-- there's obviously variation in y. This y value is over here. This point's y valueis over here. There is clearly a bunchof variation in the y. But how much of that isessentially described by the variation in x? Or described by the line? So let's think about that. First, let's think about whatthe total variation is. How much of the totalvariation in y? So let's just figure out whatthe total variation in y is. It's really just a toolfor measuring. When we think about variation,and this is even true when we thought about variance, whichwas the mean variation in y. If you think about the squareddistance from some central tendency, and the best centralmeasure we can have of y is the arithmetic mean. So we could just say, the totalvariation in y is just going to be the sum of thedistances of each of the y's. So you get y1 minus the meanof all the y's squared. Plus y2 minus the mean ofall the y's squared. Plus, and you just keepgoing all the way to the nth y value. To yn minus the mean ofall the y's squared. This gives you the totalvariation in y. You can just take outall the y values. Find their mean. It'll be some value, maybe it's right over here someplace. And so you can even visualize itthe same way we visualized the squared errorfrom the line. So if you visualize it, you canimagine a line that's y is equal to the mean of y. Which would lookjust like that. And what we're measuring overhere, this error right over here, is the square of thisdistance right over here. Between this point verticallyand this line. The second one is goingto be this distance. Just right up to the line. And the nth one is going to bethe distance from there all the way to the lineright over there. And there are these otherpoints in between. This is the totalvariation in y. Makes sense. If you divide this by n, you'regoing to get what we typically associate as thevariance of y, which is kind of the average squareddistance. Now, we have the totalsquared distance. So what we want to do is-- howmuch of the total variation in y is described by thevariation in x? So maybe we can thinkof it this way. So our denominator, we want whatpercentage of the total variation in y? Let me write it this way. Let me call this the squarederror from the average. Maybe I'll call thisthe squared error from the mean of y. And this is really thetotal variation in y. So let's put that asthe denominator. The total variation in y, whichis the squared error from the mean of the y's. Now we want to what percentageof this is described by the variation in x. Now, what is not describedby the variation in x? We want to how much isdescribed by the variation in x. But what if we want how much ofthe total variation is not described by the regressionline? Well, we already havea measure for that. We have the squarederror of the line. This tells us the square of thedistances from each point to our line. So it is exactly this measure. It tells us how much of thetotal variation is not described by the regressionline. So if you want to know whatpercentage of the total variation is not described bythe regression line, it would just be the squared error of theline, because this is the total variation not describedby the regression line, divided by the totalvariation. So let me make it clear. This, right over here, tellsus what percentage of the total variation is notdescribed by the variation in x. Or by the regression line. So to answer our question, whatpercentage is described by the variation? Well, the rest of it hasto be described by the variation in x. Because our question is whatpercent of the total variation is described by thevariation in x. This is the percentage thatis not described. So if this number is 30%-- if30% of the variation in y is not described by the line, thenthe remainder will be described by the line. So we could essentially justsubtract this from 1. So if we take 1 minus thesquared error between our data points and the line over thesquared error between the y's and the mean y, this actuallytells us what percentage of total variation is describedby the line. You can either view it'sdescribed by the line or by the variation in x. And this number right here, thisis called the coefficient of determination. It's just what statisticianshave decided to name it. And it's also calledR-squared. You might have even heard thatterm when people talk about regression. Now let's think about it. If the squared error of theline is really small what does that mean? It means that theseerrors, right over here, are really small. Which means that the lineis a really good fit. So let me write it over here. If the squared error of theline is small, it tells us that the line is a good fit. Now, what would happenover here? Well, if this number is reallysmall, this is going to be a very small fraction over here. 1 minus a very small fractionis going to be a number close to 1. So then, our R-squared will beclose to 1, which tells us that a lot of the variationin y is described by the variation in x. Which makes sense, becausethe line is a good fit. You take the opposite case. If the squared error of the lineis huge, then that means there's a lot of error betweenthe data points and the line. So if this number is huge, thenthis number over here is going to be huge. Or it's going to be a percentageclose to 1. And 1 minus that is goingto be close to 0. And so if the squared error ofthe line is large, this whole thing's going tobe close to 1. And if this whole thing isclose to 1, the whole coefficient of determination,the whole R-squared, is going to be close to 0, whichmakes sense. That tells us that very littleof the total variation in y is described by the variation inx, or described by the line. Well, anyway, everything I'vebeen dealing with so far has been a little bitin the abstract. In the next video, we'llactually look at some data samples and calculate theirregression line. And also calculate theR-squared, and see how good of a fit it really is.