Machine Learning

# Linear Regression Analysis using R

One of the most frequent used techniques in statistics is linear regression where we investigate the potential relationship between a variable of interest (often called the response variable but there are many other names in use) and a set of one of more variables (known as the independent variables or some other term). Unsurprisingly there are flexible facilities in R for fitting a range of linear models from the simple case of a single variable to more complex relationships.

In this post we will consider the case of simple linear regression with one response variable and a single independent variable. For this example we will use some data

The purpose of using this data is to determine whether there is a relationship, described by a simple linear regression model, between variables.

First I will tell you the steps:

• Open R script.
• Set working directory according to your habit.
• Load the dataset in R software(.txt, .xlsx, .csv).

If the extension of your file is  ‘.xlsx’ then

Use these commands

require(xlsx)

var1 <- read.xlsx(“<filename with proper extension>”, sheetIndex = <number or name >)

#sheetIndex is nothing but the title of excel sheet where your data stored.

Now we are ready to go to make a linear regression model using R programming software

My R script

You seen in the image that first i checked my working directory and then changed it to another directory, this means the working datafiles have another location so i changed it for my help.

I can set the working directory by two methods: 1)files->change directory 2) setwd(“<path>”)

In the following data
X = annual franchise fee (\$1000)
Y = start up cost (\$1000)
for a pizza franchise

You can see the details in below image is

Here in this image you can see that the R² valued is very less, this means that there is almost no relationship between the two variables.

How good is your Regression model?

• Based on R² value, we can explain the model.
• Difference between observations (which are not explained by model) is the error term or residual.
• In the above regression model the value of  almost R²=.22, 22% variance of dependent variables which are explained by the model and the remaining 78% which is not explained, is error term or residual.