How to Calculate the Coefficient of Determination (R-Squared) Step-by-Step

The coefficient of determination also known as R-squared (R2), is a statistic that measures how well a regression model fits the data. R-squared shows the proportion of variation in the response variable that can be explained by the predictors in the model. Calculating R-squared is simple once you understand the basic formula and components.

In this comprehensive guide, I’ll walk you through the step-by-step process of calculating R-squared using its formula I’ll also provide examples using regression analysis in statistics software to demonstrate how to interpret R-squared Let’s get started!

What is the Coefficient of Determination?

The coefficient of determination (R2) measures the degree to which the variance of the response variable Y is explained by the predictors (X variables) in a regression model

R-squared is defined as the proportion of total variation in Y that is explained by the regression model. It ranges from 0 to 1, with higher values indicating more of the response variable variation is accounted for by the predictors.

  • R2 near 0 means the model does not fit the data well (low explanatory power).
  • R2 near 1 indicates the regression model fits the data very well.

R-squared is also called the explained variation because it represents the amount of variation in Y that is explained by the X variables in the model. The remaining unexplained variation is captured by the error term.

R-Squared Formula

R-squared is calculated using the formula:

R2=SSRSSTR^2 = frac{SSR}{SST}R2=SSTSSR

Where:

  • SSR = Regression sum of squares (variation explained by model)
  • SST = Total sum of squares (total variation in Y)

This formula is based on partitioning the total sum of squares (SST) into:

  • Sum of squares regression (SSR): Variation explained by X variables
  • Sum of squares error (SSE): Unexplained variation (residuals)

SST=SSR+SSESST = SSR + SSESST=SSR+SSE

Substituting SST in the formula gives:

R2=SSRSSR+SSER^2 = frac{SSR}{SSR + SSE}R2=SSR+SSESSR

Which is the proportion of explained variation out of total variation.

How to Calculate R-Squared Step-by-Step

Follow these 5 steps to calculate the coefficient of determination R-squared:

Step 1: Perform Regression Analysis

First, perform a regression analysis between the response (Y) and predictor variables (X). This gives the regression equation relating X and Y.

For example, let’s say we perform a simple linear regression of Y on X. This gives:

Y=b0+b1XY = b_0 + b_1XY=b0+b1X

Step 2: Calculate SST

Find the total sum of squares (SST) using the formula:

SST=i=1n(yiyˉ)2SST = sum_{i=1}^{n}(y_i – bar{y})^2SST=i=1n(yiyˉ)2

Where:

  • y = response value
  • $bar{y}$ = mean of response

This measures the total deviation of each y value from the mean.

Step 3: Calculate SSR

Next, calculate the regression sum of squares (SSR) using:

SSR=i=1n(y^iyˉ)2SSR = sum_{i=1}^{n}(hat{y}_i – bar{y})^2SSR=i=1n(y^iyˉ)2

Where $hat{y}_i$ are the predicted y values from the regression equation.

SSR measures the variation explained by the model.

Step 4: Calculate SSE

The sum of squares error (SSE) is found by:

SSE=i=1n(yiy^i)2SSE = sum_{i=1}^{n}(y_i – hat{y}_i)^2SSE=i=1n(yiy^i)2

SSE represents the residual variation not explained by the model.

Step 5: Compute R-Squared

Finally, compute R-squared using its formula:

R2=SSRSST=SSRSSR+SSER^2 = frac{SSR}{SST} = frac{SSR}{SSR + SSE}R2=SSTSSR=SSR+SSESSR

Substitute the values calculated for SSR and SST. This gives the coefficient of determination.

And we’re done! Those are the key steps involved in calculating R-squared manually from a regression analysis. Now let’s look at some examples.

R-Squared Example in Excel

Let’s see how to find R-squared for a simple linear regression example in Excel.

Given this X and Y data:

X Y
1 3
2 5
3 7
4 8
5 12

We perform a linear regression of Y on X, giving regression equation:

Y=0.2+2.6XY = -0.2 + 2.6XY=0.2+2.6X

Use Excel to find the R-squared value as follows:

  1. Input the X and Y data

  2. Click the Data Analysis button and select Regression

  3. Select the Y and X input ranges

  4. Check the Residuals box

  5. Click OK

This gives the following regression output with R-squared and the sums of squares:

Regression Statistics
R-squared 0.969
Observations 5
ANOVA
SS
Regression 104.8
Residual 3.2
Total 108
  • SST (Total SS) = 108
  • SSR (Regression SS) = 104.8
  • SSE (Residual SS) = 3.2

Finally, calculate R-squared using the formula:

R2=SSRSST=104.8108=0.969R^2 = frac{SSR}{SST} = frac{104.8}{108} = 0.969R2=SSTSSR=108104.8=0.969

We get the same R-squared value of 0.969 as in the Excel output. This confirms our manual R-squared calculation.

R-Squared in R Programming

Calculating R-squared in R is straightforward using the lm() function for linear regression.

Let’s use the same X and Y data from the Excel example:

r

# Store data in vectors X <- c(1, 2, 3, 4, 5)  Y <- c(3, 5, 7, 8, 12)# Linear modelmodel <- lm(Y ~ X)# Summary gives R-squaredsummary(model)$r.squared

This prints the R-squared value of 0.969. The summary() function applied on the linear model returns a detailed table including R-squared.

We don’t have to manually calculate the sums of squares – R computes them automatically behind the scenes!

R-Squared in Python

Here is how to find R-squared in Python using sklearn:

python

from sklearn.linear_model import LinearRegressionfrom sklearn.metrics import r2_score# X and Y data X = [[1], [2], [3], [4], [5]]  y = [3, 5, 7, 8, 12]# Create and fit modelmodel = LinearRegression()model.fit(X, y)  # R-squared scorer_squared = model.score(X, y)print(r_squared)

The LinearRegression.score() method returns the R-squared value. We get 0.969 again for this data.

Interpreting R-Squared

When interpreting R-squared, consider:

  • Scale: R-squared is measured on a 0 to 1 scale. Values close to 1 indicate a better model fit.

  • Context: The acceptability of an R-squared value depends on the problem context. Higher is better, but may not be possible or expected.

  • Use cautiously: While R-squared indicates model fit, a higher R-squared does not necessarily mean the model has better predictive performance. It also does not indicate causation between X and Y variables.

  • Multiple regression: With multiple X variables, adjusted R-squared is used to account for the number of predictors in the model. But interpretation remains similar.

Overall, R-squared gives the percentage of variation explained by the model – a valuable statistic for evaluating and comparing regression analyses. Used properly, it can aid model selection, improvement, and predictive accuracy.

Common Questions about R-Squared

Here are answers to some frequently asked questions about the coefficient of determination:

How is R-squared calculated for multiple regression?

The R-squared formula remains the same for multiple regression models with several X variables. The process for calculating SST, SSR and SSE is identical.

What is the difference between R-squared and Adjusted R-squared?

Adjusted R-squared penalizes model complexity, so will always be lower than R-squared. It is used more for comparing models rather than measuring fit.

Can R-squared be negative?

how to calculate coefficient of determination

What is the coefficient of determination?

The coefficient of determination () measures how well a statistical model predicts an outcome. The outcome is represented by the model’s dependent variable.

The R2 ranges from 0 to 1, if the result is 0 then the outcome of the model is not good, and vice versa.

How to use the coefficient of determination calculator?

  • Enter the data set X
  • Enter the data set Y
  • Values must be comma separated.
  • Click on the calculate button.
  • You can erase all inputs by clicking on the reset button.

Give Feedback What do you think of coefficient of determination calculator?

how to calculate coefficient of determination

Finding and Interpreting the Coefficient of Determination

Related Posts

Leave a Reply

Your email address will not be published. Required fields are marked *