Blog

Degrees of Freedom

featured-image
data science terminologies data visualization machine learning pandas python Uncategorized

Degrees of Freedom

Degrees of freedom are conceptually difficult but are important to report to understand statistical analysis. For example, without degrees of freedom, we are unable to calculate or to understand any underlying population variability. Also, in a bivariate and multivariate analysis, degrees of freedom are a function of sample size, number of variables, and number of parameters to be estimated.

Definition

“Degrees of freedom are the number of values in a distribution that are free to vary for any particular statistic”. In any statistical analysis the goal is to understand how the variables (or parameters to be estimated) and observations are linked. Hence, degrees of freedom are a function of both sample size (N) and the number of independent variables (k) in one’s model.

The degrees of freedom are equal to the number of independent observations {N),or the number of
subjects in the data, minus the number of parameters (k) estimated.

For example, a teacher records marks of N number of students from a class. Here he or she has N
independent pieces of information (that is, N points of marks) and one variable called marks; in subsequent analysis of this data set, degrees of freedom are associated with both N and k. For instance, if this teacher wants to calculate sample variance to understand the extent to which marks vary in this class, the degrees of freedom equal N – k.

The relationship between sample size and degrees of freedom is positive; as sample size increases so
do the degrees of freedom. On the other hand, the relationship between the degrees of freedom and number of parameters to be estimated is negative. In other words, the degrees of freedom decrease as the number of parameters to be estimated increases.

A Single Observation with One Parameter to Be Estimated

If a teacher has measured marks (k = 1) for one observation {N = 1) from a class, the mean sample marks is the same as the value of this observation. With this value, the teacher has some idea about
the mean marks of this class but does not know anything about the population spread or variability.

Also, the teacher has only one independent observation (marks) with a parameter that he or she
needs to estimate. The degrees of freedom here are equal to N – k.

Thus, there is no degree of freedom in this example (1 – 1 = 0). In other words, the data point has no
freedom to vary, and the analysis is limited to the presentation of the value of this data point.

For us to understand data variability, N must be larger than 1.

Multiple Observations (N) with One Parameter to Be Estimated

Suppose there are N observations for marks. To examine the variability in marks, we need to
estimate only one parameter (that is, sample variance) for marks (k), leaving the degrees of freedom
of N — k. Because we know that we have only one parameter to estimate, we may say that we have a total of N — 1 degrees of freedom. Therefore, all univariate sample characteristics that are
computed with the sum of squares including the standard deviation and variance have N— 1 degrees of freedom.

Example: Teacher wants to compute variability of marks of 100 students for a subject. Parameter is
Marks and observations are 100 (N). The Degree of Freedom is N-k (100-1 i.e. 99).

Degrees of freedom vary from one statistical test to another as we move from univariate to bivariate
and multivariate statistical analysis, depending on the nature of restrictions applied even when sample size remains unchanged.

Two Samples with One Parameter

Suppose that the teacher has two samples, boys and girls, or n1 + n2 observations. Here, one can
use an independent samples t test to analyse whether the mean marks of these two groups are different. In the comparison of marks variability between these two independent means (or k number of means), the teacher will have n 1 + n 2 -2 degrees of freedom. The total degrees of freedom are the sum of the number of cases in group 1 and group 2 minus the number of groups.

Example: Teacher wants to compute variability of marks of 100 boys and 100 girls for a subject. Two
groups are boys and girls, and observations are 200 (N). The Degree of Freedom is N-k (200-2 i.e., 198).

Comparing the Means of x Groups with One Parameter (Analysis of Variance)

Let us assume that we have n 1 , + …+ n g groups of observations.
We can test the variability of means by using the analysis of variance (ANOVA).

The ANOVA procedure produces three different types of degrees of freedom :

  • Between groups
  • Within groups
  • Total Groups

The first type of degrees of freedom

It is called the between-groups degrees of freedom and can be determined by using the number of group means we want to compare. The ANOVA procedure tests the assumption that the x groups have equal means and that the population mean is not statistically different from the individual group means. There are x – 1 model degrees of freedom for testing the null hypothesis and for assessing variability among the x means.

The second type of degrees of freedom

It is called the within-groups degrees of freedom is derived from subtracting the first type of degrees of freedom from the corrected total
degrees of freedom. The within-groups degrees of freedom equal the total number of observations minus the number of groups to be compared, x 1 + . .. + x g -g.

Total Degrees of Freedom

We know that the sum of deviation from the mean or ∑(Y – Ý)^2 = O. Therefore, to estimate the total sum of squares ∑(Y – Ý)^2, we need only the sum of N – 1 deviations from the mean. Therefore, with the total sample size we can obtain the total degrees of freedom, or corrected total degrees of freedom, by using the formula N – 1.

Degrees of Freedom in Multiple Regression Analysis

One must understand four different types of degrees of freedom in multiple regression.

The first type is the model degrees of freedom

Model degrees of freedom are associated with the number of independent variables in the model.

A null model or a model without independent variables will have zero parameters to be estimated.
Therefore, predicted Y is equal to the mean of Y and the degrees of freedom equal 0.

A model with one independent variable has one predictor or one piece of useful information (k = 1)
for estimation of variability in Y. This model must also estimate the point where the regression line
originates or an intercept. Hence, in a model with one predictor, there are (k + 1) parameters—k
regression coefficients plus an intercept—to be estimated, with k signifying the number of
predictors. Therefore, there are [{k +1 ])- 1],or k degrees of freedom for testing this regression
model. In other words, the model degrees of freedom equal the number of useful pieces of
information available for estimation of variability in the dependent variable.

The second type is the residual, or error, degrees of freedom

Residual degrees of freedom in multiple regression involve information of both sample size and predictor variables. In addition, we also need to account for the intercept. For example, if our sample size equals N, we need to estimate k + 1 parameters, or one regression co-efficient for each of the predictor variables {k} plus one for the intercept. The residual degrees of freedom are calculated N – {k + 1). This is the same as the formula for the error, or within-groups, degrees of freedom in the ANOVA. It is important to note
that increasing the number of predictor variables has implications for the residual degrees of freedom. Each additional parameter to be estimated costs one residual degree of freedom. The remaining residual degrees of freedom are used to estimate variability in the dependent variable.

The third type of degrees of freedom is the total degrees of freedom

As in ANOVA, this is calculated N- 1.

The fourth type of degrees of freedom under the parameter estimate

The null hypothesis is that there is no relationship between each independent variable and the dependent variable. The degree of freedom is always 1 for each relationship.

Degrees of Freedom in a Nonparametric Test

Pearson’s chi square, or simply the chi-square statistic, is an example of a nonparametric test that is
widely used to examine the association between the nominal level variables. In a contingency table, one row and one column are fixed, and the remaining cells are independent and are free to vary.
Therefore, the chi-square distribution has (r- 1) X (c- 1) degrees of freedom, where r is the number of rows and f is the number of columns in the analysis. We subtract one from both the number of rows and columns simply because by knowing the values in other cells we can tell the values in the last cells for both rows and columns; therefore, these last cells are not independent.

Leave your thought here

Your email address will not be published. Required fields are marked *

Select the fields to be shown. Others will be hidden. Drag and drop to rearrange the order.
  • Image
  • SKU
  • Rating
  • Price
  • Stock
  • Availability
  • Add to cart
  • Description
  • Content
  • Weight
  • Dimensions
  • Additional information
Click outside to hide the comparison bar
Compare