Who invented linear regression
Why is it not constant? More precisely, we hope to find a model whose prediction errors are smaller, in a mean square sense, than the deviations of the original variable from its mean. In using linear models for prediction, it turns out very conveniently that the only statistics of interest at least for purposes of estimating coefficients are the mean and variance of each variable and the correlation coefficient between each pair of variables.
The coefficient of correlation between X and Y is commonly denoted by the letter r. The correlation coefficient between two variables is a statistic that measures the strength of the linear relationship between them, on a relative i. That is, it measures the extent to which a linear model can be used to predict the deviation of one variable from its mean given knowledge of the other's deviation from its mean at the same point in time.
The correlation coefficient is most easily computed if we first standardize each of the variables--i. Now, the correlation coefficient is equal to the average product of the standardized values of the two variables. The average of the values in the last column is the correlation between X and Y.
If the two variables tend to vary on the same sides of their respective means at the same time, then the average product of their deviations and hence the correlation between them will be positive , since the product of two numbers with the same sign is positive. Conversely, if they tend to vary on opposite sides of their respective means at the same time, their correlation will be negative. If they vary independently with respect to their means--that is, if one is equally likely to be above or below its mean regardless of what the other is doing--then the correlation will be zero.
The correlation coefficient is not only the average product of the standardized values, but also: the correlation coefficient is the "best" coefficient for multiplying the standardized value of one variable in order to predict the standardized value of the other. Thus, if X is observed to be 1 standard deviation above its own mean, then we should predict that Y will be r standard deviations above its own mean; if X is 2 standard deviations below its own mean, then we should be predict that Y will be 2r standard deviations below its own mean, and so on.
Note: this fact is not supposed to be obvious, but it is easily proved by elementary differential calculus of several variables. If we want to obtain a formula for predicting Y from X in unstandardized terms , we just need to substitute the formulas for the standardized values in the prededing equation, which then becomes:.
If we now rearrange this equation and collect constant terms, we obtain:. At the turn of the 18th Century, improving ocean navigation was perhaps the most important practical scientific problem of the day.
The Age of Discovery had led to great riches and lucrative trade, but sea travel was still dangerous, and prone to inaccuracies. Improved technology in this area was worth a lot of money.
With greater navigation precision, ships -- and their cargo -- would be more likely to reach their intended location safely and quickly. Given the massive economic rewards of better navigation, geodesy , the study of the measurement of earth, was all the rage.
This led to better mapping and improved knowledge of location, which in turn made it easier to find your way quickly and safely from Portugal to India. Monarchs and noblemen were happy to support research in this area. It was in this historical context that the mathematicians Carl Friedrich Gauss and Adrien-Marie Legendre independently discovered the method of least squares, the essential feature of statistical regression.
Least squares is a way to use data to make quantitative predictions. The dataset used for the first ever publicly demonstrated statistical regression by early 19th Century mathematician Adrien-Marie Legendre. Imagine that you have a classroom of 5th graders. You are given the gender, height and weight of all the students. There are all sort of optimality criterion you could choose.
You might like the criterion that minimizes the absolute error of your guess, or maybe one that has the least chance of being off by greater than 10 pounds.
The least squares method optimizes by minimizing squared error. So what makes squared error so special? Why did both Gauss and Legendre choose it, independently? There are two main reasons that the squared error was almost immediately accepted by the mathematical community.
First, at that time and to a lesser degree today, it was comparatively easy to compute. Second, the estimate based on least squares has some nifty statistical properties.
So the name "linear regression" historically refers both to the linear algebra and the type of equations that can be used with it - and a great cheer rose up throughout the world as well. Linear equations can also be solved by iterative, non-linear methods, but there is seldom reason to do so now.
Sign up to join this community. The best answers are voted up and rise to the top. Stack Overflow for Teams — Collaborate and share knowledge with a private group. Create a free Team What is Teams? Learn more. Ask Question.
Asked 2 years, 6 months ago. Active 1 month ago. Contents Search. Regression Analysis. How to cite. This is a preview of subscription content, log in to check access. Eisenhart, C. In: Kendall, M. Studies in the History of Statistics and Probability, vol.
0コメント