One of the most frequent used techniques in statistics is linear regression where we investigate the potential relationship between a variable of interest (often called the response variable but there are many other names in use) and a set of one of more variables (known as the independent variables or some other term). Unsurprisingly there are flexible facilities in **R** for fitting a range of linear models from the simple case of a single variable to more complex relationships.

In this post we will consider the case of simple linear regression with one response variable and a single independent variable. For this example we will use some data

The purpose of using this data is to determine whether there is a relationship, described by a simple linear regression model, between variables.

First I will tell you the steps:

- Open R script.
- Set working directory according to your habit.
- Load the dataset in R software(.txt, .xlsx, .csv).

If the extension of your file is ‘.xlsx’ then

Use these commands

require(xlsx)

var1 <- read.xlsx(“<filename with proper extension>”, sheetIndex = <number or name >)

#sheetIndex is nothing but the title of excel sheet where your data stored.

Now we are ready to go to make a linear regression model using R programming software

My R script

Now load the data

You seen in the image that first i checked my working directory and then changed it to another directory, this means the working datafiles have another location so i changed it for my help.

I can set the working directory by two methods: 1)files->change directory 2) setwd(“<path>”)

In the following data

X = annual franchise fee ($1000)

Y = start up cost ($1000)

for a pizza franchise

You can see the details in below image is

Here in this image you can see that the R² valued is very less, this means that there is almost no relationship between the two variables.

**How good is your Regression model?**

- Based on R² value, we can explain the model.
- Difference between observations (which are not explained by model) is the error term or residual.
- In the above regression model the value of almost R²=.22, 22% variance of dependent variables which are explained by the model and the remaining 78% which is not explained, is error term or residual.

Categories: Machine Learning, R

## 3 replies »