# Test for linear relationship

### Regression Slope Test Testing for linear and additivity of predictive relationships. Testing for independence (lack of correlation) of errors. Testing for homoscedasticity ( constant. This lesson describes how to conduct a hypothesis test to determine whether there is a significant linear relationship between an independent variable X and a . Solved: Hello I'm trying to perform an f-test to check if there is a linear relationship between 2 continous variables at the level. I've done.

You can learn more about interval and ratio variables in our article: There needs to be a linear relationship between the two variables. Whilst there are a number of ways to check whether a linear relationship exists between your two variables, we suggest creating a scatterplot using SPSS Statistics where you can plot the dependent variable against your independent variable and then visually inspect the scatterplot to check for linearity.

Your scatterplot may look something like one of the following: If the relationship displayed in your scatterplot is not linear, you will have to either run a non-linear regression analysis, perform a polynomial regression or "transform" your data, which you can do using SPSS Statistics. In our enhanced guides, we show you how to: There should be no significant outliers. An outlier is an observed data point that has a dependent variable value that is very different to the value predicted by the regression equation.

As such, an outlier will be a point on a scatterplot that is vertically far away from the regression line indicating that it has a large residual, as highlighted below: The problem with outliers is that they can have a negative effect on the regression analysis e.

This will change the output that SPSS Statistics produces and reduce the predictive accuracy of your results. Fortunately, when using SPSS Statistics to run a linear regression on your data, you can easily include criteria to help you detect possible outliers. In our enhanced linear regression guide, we: You should have independence of observations, which you can easily check using the Durbin-Watson statistic, which is a simple test to run using SPSS Statistics. We explain how to interpret the result of the Durbin-Watson statistic in our enhanced linear regression guide. Your data needs to show homoscedasticity, which is where the variances along the line of best fit remain similar as you move along the line.

Whilst we explain more about what this means and how to assess the homoscedasticity of your data in our enhanced linear regression guide, take a look at the three scatterplots below, which provide three simple examples: Whilst these help to illustrate the differences in data that meets or violates the assumption of homoscedasticity, real-world data can be a lot more messy and illustrate different patterns of heteroscedasticity.

## Hypothesis Test for Regression Slope

Therefore, in our enhanced linear regression guide, we explain: Finally, you need to check that the residuals errors of the regression line are approximately normally distributed we explain these terms in our enhanced linear regression guide. Two common methods to check this assumption include using either a histogram with a superimposed normal curve or a Normal P-P Plot.

Residuals are usually plotted against the fitted values,against the predictor variable values,and against time or run-order sequence, in addition to the normal probability plot. Plots of residuals are used to check for the following: Residuals follow the normal distribution. Residuals have a constant variance. Regression function is linear. A pattern does not exist when residuals are plotted in a time or run-order sequence. There are no outliers. Examples of residual plots are shown in the following figure. Such a plot indicates an appropriate regression model. Such a plot indicates increase in variance of residuals and the assumption of constant variance is violated here. Transformation on may be helpful in this case see Transformations. If the residuals follow the pattern of c or dthen this is an indication that the linear regression model is not adequate. Addition of higher order terms to the regression model or transformation on or may be required in such cases. A plot of residuals may also show a pattern as seen in eindicating that the residuals increase or decrease as the run order sequence or time progresses. This may be due to factors such as operator-learning or instrument-creep and should be investigated further. Example Residual plots for the data of the preceding table are shown in the following figures. Test Practical Utility of the Regression Model This step is of great importance to managers using regression for practical applications. While regression models have to pass statistical utility tests and assumptions, if a model has no practical utility it should not be used. R2 To test practical utility, we look at two statistics, R2 and Standard Error.

### - Equivalent linear relationship tests | STAT

The first statistic was already introduced in Module 2. It is chosen since it equates to an approximate 0. It is also provided in the Summary Output portion of the Regression output. In the section titled Regression Statistics in Worksheet 2. Note its value is 0. Right above the R Square is the Multiple R. That is the correlation coefficient that we computed in Module 2. I will talk about this in Module 3 - we can ignore it for simple linear regression analysis. I realize that when I import an Excel worksheet into the Web Site, it is imported as a table and doesn't have exactly the same format as in Excel. Recall that to interpret the R2 we say that client assets explain This is not hard to compute - just tedious.

I am going to illustrate how it is computed - don't be alarmed - the computer program does this for us. Variation explained by the regression model is computed by finding the difference or variation between the predicted value of Y and the average value of Y for every observation in the data set. For example, the first observation shown in Worksheet 2. The predicted value of Y External Hours would be: So the variation attributed to regression for this observation is - The computer program then computes a similar variation for all of the other predicted values of Y for each observation in the data set.

These variations are then squared and summed. This is the variation explained by or attributed to the regression model and is called the Sum of Squares Regression SSR.

The value of this squared variation can be seen in Worksheet 2. The value is Next, we find the total variation by finding the difference between the actual value of Y and the average value of Y for each observation in the data set. For the first observation, this variation is - The computer program then squares this and the differences for all of the other observations and sums them up.

Note the value is Since we use the regression model to compute the estimate, some refer to this standard error as the Standard Error of the Regression Model. The interpretation is similar to the interpretation of the standard deviation of an observation and the standard error of the mean, as we learned in Module 1.

So, if we predict external hours to beit could be anywhere from to This is really important. So many times in regression analysis, people make a prediction and go with it without ever looking at the standard error. This measure of practical utility gives us an indication of how reliable the regression model will be. The tough part is that I cannot give you nor can a text a good benchmark - it is a management call on how much error is acceptable. Obviously, there will be error since not every observation in a sample of data falls on the regression line.

For the example above, the standard error is For an actual value of Y at the high end ofthe error percent is 6 percent.

For the average value of Y of hours, the error percent is 8. For planning purposes, this range of error may be tolerable. For precise prediction purposes, to be off by up to 13 percent may not be tolerable. Often times, we can use the standard error as a comparison tool.

Let's say we ran another model with a different independent variable and get a standard error of It would be much better to have an error of 45 on a prediction than an error of The point is, without the measure of average or standard error of the prediction, we would not be able to compare models.

To compute the standard error of the estimate, the computer program first finds the error, which is also called the residual, for each observation in the data set. The error is the difference between the actual value of Y and the predicted value of y, or Y - y. To illustrate for the first observation, the actual value of y external hours is and the predicted value of y is The error is thus - or In a similar manner, all of the errors are computed for each observation, then squared, then summed to get the Sum of Squares Error SSE.

SSE is a measure of the unexplained variation in the regression and is the variation around the regression line. To get the standard error of the estimate, the computer program divides the SSE by the sample size minus 2 to adjust for the degrees of freedom in simple regressionand then takes the square root.

Regression models that have lower Standard Errors and higher R2's have greater practical utility compared to models with higher Standard Errors and lower R2's. However, these are judgment calls rather than precise statistical standards. The important thing is that analysts have an ethical standard to report the Standard Error and R2 values to their audiences. Did you note low standard errors would be associated with high R2's, and vice versa?

• Simple Linear Regression Analysis
• 2.8 - Equivalent linear relationship tests
• Linear Regression Analysis using SPSS Statistics

This is simply because regression models in which the data are tightly grouped around the regression line have little error, and X has high predictive value movements in X result in predictable movements in Y. This can also be explained by the equation for R2, Equation 2. Lower SSE results in lower standard errors. Test the Statistical Utility of the Regression Model There are two inferential methods of testing statistical utility of the regression model: The parameter of interest in determining if a regression is statistically significant or useful is the slope.

Testing a Hypothesis for a Population Slope The five-step process for testing a hypothesis for a population mean is identical to that of testing a hypothesis for a population slope, we just change the parameter from the mean to the slope. State Null and Alternative Hypotheses The null and alternative hypotheses in regression are: That is, the regression line is horizontal, meaning that Y does not change when X changes.

Let's look at the regression equation again: