Statistics with R - Measures of Central Tendency and Measures of Dispersion

sc0v0ne - Aug 17 - - Dev Community

mtcars

data(mtcars)
head(mtcars)
Enter fullscreen mode Exit fullscreen mode

Loads and displays the first few rows of the mtcars dataset.

str(mtcars)
Enter fullscreen mode Exit fullscreen mode

Displays the structure of the mtcars dataset, showing the type of each column.

summary(mtcars)
Enter fullscreen mode Exit fullscreen mode

Measures of Central Tendency


Mean

μ=1Ni=1Nxi \mu = \frac{1}{N} \sum_{i=1}^{N} x_i

Calculates the mean of a sequence of numbers.

n = c(1,2,4,5,6)

print(n)

mean_ = sum(n) / length(n)

print(mean_)
Enter fullscreen mode Exit fullscreen mode
mean_cyl = sum(mtcars$cyl) / length(mtcars$cyl) 

print(mean_cyl)
Enter fullscreen mode Exit fullscreen mode

Median

  • If ( N ) is odd:
Med=x(N+12) \text{Med} = x_{\left(\frac{N+1}{2}\right)}
  • If ( N ) is even:
Med=x(N2)+x(N2+1)2 \text{Med} = \frac{x_{\left(\frac{N}{2}\right)} + x_{\left(\frac{N}{2} + 1\right)}}{2}

Calculates the median of a sequence of numbers with an odd size.

data_even <- c(7, 13, 19, 33, 67)

median_ <- median(data_even)
print(median_)

data_even <- c(7, 13, 19, 33, 67)
n = length(data_even)
median_ <- data_even[(n + 1) / 2]
print(median_)
Enter fullscreen mode Exit fullscreen mode

Calculates the median of a sequence of numbers with an even size.

data_odd <- c(2, 34, 76, 92, 112)

median_ <- median(data_odd)
print(median_)

data_odd <- c(2, 34, 76, 92, 112)
n = length(data_odd)

median_ <- (data_odd[n / 2] + data_odd[n / 2 + 1]) / 2

print(median_)
Enter fullscreen mode Exit fullscreen mode
median(mtcars$cyl)
Enter fullscreen mode Exit fullscreen mode
median(mtcars$qsec)
Enter fullscreen mode Exit fullscreen mode

Mode

Mode=argmaxxi f(xi) \text{Mode} = \underset{x_i}{\operatorname{argmax}} \ f(x_i)

Creates a frequency table for a sequence of numbers.

numbers <- c(1, 233, 233, 010101, 342, 1, 2, 1111, 1, 55)

tnumbers <- table(numbers)
print(numbers)
print(tnumbers)
Enter fullscreen mode Exit fullscreen mode
mode_ <- as.numeric(names(tnumbers)[tnumbers == max(tnumbers)])
print(mode_)
Enter fullscreen mode Exit fullscreen mode

Identifies the most frequent value(s) in the sequence of numbers.

library(DescTools)
Enter fullscreen mode Exit fullscreen mode
mode_ <- Mode(tnumbers)
print(mode_)
Enter fullscreen mode Exit fullscreen mode

Measures of Dispersion

Defines a sequence of numbers.

n_arr = c(1,2,4,5,6)
print(n_arr)
Enter fullscreen mode Exit fullscreen mode

Variance

σ2=1Ni=1N(xiμ)2 \sigma^2 = \frac{1}{N} \sum_{i=1}^{N} (x_i - \mu)^2

Calculates the variance of a sequence of numbers.

mean_ <- mean(n_arr)

print('Mean')
print(mean_)

print('Variance')
var_ <- sum((n_arr - mean_)^2) / length(n_arr)

print((n_arr - mean_))
print((n_arr - mean_)^2)
print(sum((n_arr - mean_)^2))
print(length(n_arr))
print(var_)
Enter fullscreen mode Exit fullscreen mode

Standard Deviation

σ=σ2 \sigma = \sqrt{\sigma^2}

Calculates the standard deviation, which is the square root of the variance.

print('Variance')
var_ <- sum((n_arr - mean_)^2) / length(n_arr)
print((n_arr - mean_))
print((n_arr - mean_)^2)
print(sum((n_arr - mean_)^2))
print(length(n_arr))
print(var_)

print('Standard Deviation')
std_ <- sqrt(var_)
print(std_)
Enter fullscreen mode Exit fullscreen mode

Calculates the standard deviation using the sd function in R.

std_ <- sd(n_arr)
print(std_)
Enter fullscreen mode Exit fullscreen mode

Range

Range=xmaxxmin \text{Range} = x_{\text{max}} - x_{\text{min}}

Calculates the range, which is the difference between the maximum and minimum values.

range_ <- max(n_arr) - min(n_arr)
print('Range')
print(max(n_arr))
print(min(n_arr))
print(range_)
Enter fullscreen mode Exit fullscreen mode

Calculates the range using the diff function.

range_ <- diff(range(n_arr))
print(range_)
Enter fullscreen mode Exit fullscreen mode

Coefficient of Variation

CV=σμ \text{CV} = \frac{\sigma}{\mu}

Calculates the coefficient of variation, which is the ratio of the standard deviation to the mean.

mean_ <- mean(n_arr)
print('Mean')
print(mean_)

print('Variance')
var_ <- sum((n_arr - mean_)^2) / length(n_arr)
print((n_arr - mean_))
print((n_arr - mean_)^2)
print(sum((n_arr - mean_)^2))
print(length(n_arr))
print(var_)

print('Standard Deviation')
std_ <- sqrt(var_)
print(std_)

print('Coefficient of Variation')
cv <- std_ / mean_
print(cv)
Enter fullscreen mode Exit fullscreen mode

My Latest Posts


Favorites Projects Open Source


About the author:

A little more about me...

Graduated in Bachelor of Information Systems, in college I had contact with different technologies. Along the way, I took the Artificial Intelligence course, where I had my first contact with machine learning and Python. From this it became my passion to learn about this area. Today I work with machine learning and deep learning developing communication software. Along the way, I created a blog where I create some posts about subjects that I am studying and share them to help other users.

I'm currently learning TensorFlow and Computer Vision

Curiosity: I love coffee

. . . . . . . . . . . . . . . . . . . . . . . . . . .