## Sum of Squares – definitional

## Sum of squares formula

A mathematical method for identifying the model that deviates from the data the least is the sum of squares formula. It’s important to note that experts occasionally refer to the sum of squares as “the variation.” The formula for computing the total sum of squares, the most typical application of this calculation, is as follows:

In this equation:

## What is the sum of squares?

Scientists and statisticians use the sum of squares (SS) to assess the overall variance of a data set from its mean. This statistical tool, especially when used with regression analysis, displays how well the data fit the model.

A smaller sum of squares indicates a better model, and a larger sum of squares indicates a worse model, making SS one of the most crucial outputs in regression analysis. Individual data points deviate from the mean less or more depending on whether the sum is small or large. The sum equals zero if your model fits the data perfectly.

## How to calculate the sum of squares

You can use the steps listed below to determine the sum of squares:

**1. Count the number of measurements**

**2. Calculate the mean**

The mean is the arithmetic average of the sample. Add all the measurements and divide by n, the sample size, to arrive at this result.

**3. Subtract each measurement from the mean**

A negative number will result if your numbers are greater than the mean, which is acceptable. A series of n unique deviations from the mean should be present.

**4. Square the difference of each measurement from the mean**

If you had any negative numbers in the previous step, they will now be positive since the result of a squared number is always a positive number. You should have a series of n positive numbers.

**5. Add the squares together and divide by (n-1)**

You should have the sum of squares in this final step. Your sample size’s standard deviation is represented by this sum of squares.

## Sum of squares example

Here is an illustration of a situation where the methods described above are used to solve the sum of squares for the numbers 2, 4, and 6:

**1. Count**

Count the number of measurements. The sample size is represented by the letter “n,” which equals the number of measurements. “.

**2. Calculate**

To determine the mean, add up all the measurements and divide by the sample size.

**3. Subtract**

Subtract each measurement from the mean.

**4. Square**

To produce a string of n positive numbers, square each measurement’s deviation from the mean.

**5. Add**

The sum of squares, also referred to as the standard deviation for your sample size, is calculated by adding the squares together.

## Types of sum of squares

Total sum of squares, regression sum of squares, and residual sum of squares are the three main types of sum of squares. Here is a brief explanation of each type:

**Total sum of squares**

The total sum of squares formula, as shown above, quantifies the total variation of a sample and indicates how much variation there is in the dependent variable.

On some graphs, the total sum of squares along the regression line is sometimes represented by actual squares. Although a diagram, such as a regression line on a graph, is not required, having one makes the calculation easier to understand. Sometimes, the equation y = Y – represents the sum of all squares.

**Regression sum of squares**

A regression model’s regression sum of squares demonstrates how well it represents the modeled data. When professionals use it to compute the sum of squares in regression analysis, the sum of squares becomes more challenging. It is extremely uncommon for professionals to perform this calculation by hand due to these difficulties. Instead, they use software programs to calculate the results.

The regression sum of squares is a measure of how well the model fits the data, and a higher regression sum of squares means the model needs improvement. A lower regression sum of squares shows that the model fits the data well.

**Residual sum of squares**

The residual sum of squares demonstrates how much of the variation in the dependent variables your model does not account for. It displays the degree of variation in the dependent variable by measuring the variation of errors in a regression model. It is the total of the squared discrepancies between the actual and expected values of Y.

A lower residual sum of squares indicates that the regression model is more effective at explaining the data when the residual sum of squares is calculated. When the regression model fails to adequately explain the data, the regression sum of squares increases.