Linear regression analysis
Regression analysis goes beyond correlation and attempts to identify the extent to which the independent variable determines (or predicts or forecasts) the dependent variable. Although this technique identifies a functional relationship between the independent and dependent variables, it does not require the independent variable to be the cause of changes in the dependent variable.
For example, it may be possible to use regression analysis to identify a linear relationship between a person's age and his or her annual income. In this example, age is the independent variable and annual income is the dependent variable. Age could be used to predict annual income, but annual income does not predict age! Furthermore, age is not the cause of changes with annual income.
Linear regression is represented as a function, with X as the independent variable and Y as the dependent variable:
Y = a + bX
In the regression function, b is the slope and a is the intercept value.
The regression function is calculated in an R script by using the lm command.
The regression function could be used to determine (predict or forecast) the dependent value for a given independent value. However, it is not possible to determine the independent value given a dependent value! Additionally, forecasting the dependent value may not be the same as causing the change in the dependent value.
The variables and regression line may be combined in one chart for visual comparison.
The regression function may be displayed in an R script by using the abline command.
The coefficient of determination is calculated and symbolized as r2. This coefficient provides a percentage quantity which is the proportion of the total variation of the values of the dependent variable from its average value that is explained by the regression line. The coefficient of determination will be 100% if all the values of the dependent variable are found on the regression line and the slope is not equal to zero. Otherwise, the coefficient of determination will decrease as there is more deviations between the values of the dependent variable and the regression line.
The coefficient of determination is calculated in an R script by using the cor command and the exponent operator.
Linear regression requires that the dependent variable is normally distributed and that the independent variable is measured with little to no error.
http://hughesbennett.co.uk/LinearRegression