Machine Learning

# Beginner to Advance level – Steps to make regression model part 2

In previous article of this series we learned how to calculate values of coefficients, test of slope coefficients and Hypothesis.

Let us continue where we left out

• ANOVA
• Coefficient of Determination

#### What is ANOVA?

A basic idea about ANOVA, that of partitioning variation, is a fundamental idea of experimental idea of experimental statistics. The ANOVA belies its name in that it is not concerned about analyzing variances but rather with analyzing the variances of mean.

There are two types of ANOVA:

• One way ANOVA
• Two way ANOVA

I have explained One way and Two way ANOVA respectively.

Now lets discuss Coefficient Of Determination

#### What is Coefficient of Determination?

Coefficient of determination denoted by R² or r² and pronounced as R-squared, it is a ratio of sum of squared.

`    R² or r²=SS(reg)/SS(t)`
•  is a statistic that will give some information about the goodness of fit of a model.
•  ,coefficient of determination measure of how good is the relationship between dependent and independent variable.
•  lies between [0,1].
• An  of 1 indicates that there is 100% relationship between variables.
• If R² = 0.8 explain 80% variability between variables.
• An  of 0 indicates that there is no relationship between the variables.
• R² does not tell you that independent variable is the cause of change in dependent varibale.
• R² does not tell you whether correct regression model was used.

R² increase or decrease on adding of any extra regressor variable, so we can not much dependent on R².

If this isn’t a solution then tere might be other way to find coefficient of determination of model. Yes, there is a solution known as Adjusted R².

The above properties for R² and Adjusted R² will remain same.

• The adjusted R2 can be negative, and its value will always be less than or equal to that of R2.
• The adjusted R2 increases only when the increase in R2 (due to addition of a new regressor  variable)

The adjusted R2 is defined as

where

• p is the total number of regressor variables in the model (not including the constant term)
• n is the sample size.

Adjusted R2 can also be written as

where

• dft is the total degrees of freedom.
•  n– 1 of the estimate of the population variance of the dependent variable.
• dfe is the degrees of freedom of regression model.
• n – p – 1 of the estimate of the underlying population error variance.

Next is Model Adequacy checking, Multicollinearity and selecting significant explanatory variables.

We will discuss these remaining topics in the next article of this series. Till then, if you have any doubt or suggestion please feel free to shoot me an email on khanirfan.khan21@gmail.com or mention in comment.