How to find the standard deviation. Calculation of variance, root mean square (standard) deviation, coefficient of variation in Excel

Standard deviation is one of those statistical terms in the corporate world that raises the profile of people who manage to screw it up successfully in a conversation or presentation, and leaves a vague misunderstanding for those who don't know what it is but are embarrassed to ask. In fact, most managers don't understand the concept of standard deviation, and if you're one of them, it's time for you to stop living the lie. In today's article, I'll show you how this underrated statistic can help you better understand the data you're working with.

What does standard deviation measure?

Imagine that you are the owner of two stores. And in order to avoid losses, it is important that there is a clear control of stock balances. In an attempt to find out who is the best stock manager, you decide to analyze stocks from the past six weeks. The average weekly cost of the stock of both stores is approximately the same and is about 32 conventional units. At first glance, the average value of the stock shows that both managers work in the same way.

But if you take a closer look at the activity of the second store, you can see that although the average value is correct, the stock variability is very high (from 10 to 58 USD). Thus, it can be concluded that the mean does not always correctly estimate the data. This is where the standard deviation comes in.

The standard deviation shows how the values ​​are distributed relative to the mean in our . In other words, you can understand how big the runoff is from week to week.

In our example, we used the Excel function STDEV to calculate the standard deviation along with the mean.

In the case of the first manager, the standard deviation was 2. This tells us that each value in the sample deviates on average by 2 from the mean. Is it good? Let's look at the question from a different angle - a standard deviation of 0 tells us that each value in the sample is equal to its mean value (in our case, 32.2). For example, a standard deviation of 2 is not much different from 0, indicating that most of the values ​​are close to the mean. The closer the standard deviation is to 0, the more reliable the mean. Moreover, a standard deviation close to 0 indicates little variability in the data. That is, a sink value with a standard deviation of 2 indicates the first manager's incredible consistency.

In the case of the second store, the standard deviation was 18.9. That is, the cost of the runoff deviates on average by 18.9 from the average value from week to week. Crazy spread! The further the standard deviation is from 0, the less accurate the mean. In our case, the figure of 18.9 indicates that the average value ($32.8 per week) simply cannot be trusted. It also tells us that the weekly runoff is highly variable.

This is the concept of standard deviation in a nutshell. Although it does not provide insight into other important statistical measurements (Mode, Median…), in fact the standard deviation plays a crucial role in most statistical calculations. Understanding the principles of standard deviation will shed light on the essence of many processes in your activity.

How to calculate standard deviation?

So, now we know what the standard deviation figure says. Let's see how it counts.

Consider a data set from 10 to 70 in increments of 10. As you can see, I have already calculated the standard deviation for them using the STDEV function in cell H2 (orange).

Below are the steps Excel takes to arrive at 21.6.

Please note that all calculations are visualized for better understanding. In fact, in Excel, the calculation is instantaneous, leaving all the steps behind the scenes.

Excel first finds the mean of the sample. In our case, the average turned out to be 40, which is subtracted from each sample value in the next step. Each resulting difference is squared and summed up. We got the sum equal to 2800, which must be divided by the number of sample elements minus 1. Since we have 7 elements, it turns out that we need to divide 2800 by 6. From the result we find the square root, this figure will be the standard deviation.

For those who are not entirely clear on the principle of calculating the standard deviation using visualization, I give a mathematical interpretation of finding this value.

Standard deviation calculation functions in Excel

There are several varieties of standard deviation formulas in Excel. You just need to type =STDEV and you will see for yourself.

It is worth noting that the functions STDEV.V and STDEV.G (the first and second functions in the list) duplicate the functions STDEV and STDEV (the fifth and sixth functions in the list), respectively, which were retained for compatibility with earlier versions of Excel.

In general, the difference in the endings. In and. G functions indicate the principle of calculating the standard deviation of a sample or population. I already explained the difference between these two arrays in the previous one.

A feature of the STDEV and STDEVPA functions (the third and fourth functions in the list) is that when calculating the standard deviation of an array, logical and text values ​​are taken into account. Text and true booleans are 1, and false booleans are 0. It's hard for me to imagine a situation where I would need these two functions, so I think they can be ignored.

Good afternoon!

In the article, I decided to consider how the standard deviation works in Excel using the STDEV function. I just haven’t described or commented for a very long time, and also simply because this is a very useful feature for those who study higher mathematics. And helping students is sacred, I know from my own experience how difficult it is to master. In reality, the standard deviation functions can be used to determine the stability of the products sold, create a price, adjust or create an assortment, and other equally useful analyzes of your sales.

Excel uses several variants of this variance function:


mathematical theory

To begin with, a little about the theory of how the standard deviation function can be described in mathematical language for applying it in Excel, for analyzing, for example, sales statistics data, but more on that later. I warn you right away, I will write a lot of incomprehensible words ...)))), if anything below in the text, see the practical application in the program right away.

What exactly does the standard deviation do? It estimates the standard deviation of a random variable X relative to its mathematical expectation based on an unbiased estimate of its variance. Agree, it sounds confusing, but I think students will understand what it is actually about!

To begin with, we need to determine the "standard deviation", in order to further calculate the "standard deviation", the formula will help us with this: It is possible to describe the formula as follows: it will be measured in the same units as the measurement of a random variable and is used when calculating the standard arithmetic mean error, when constructing confidence intervals, when testing hypotheses for statistics, or when analyzing a linear relationship between independent variables. The function is defined as the square root of the variance of independent variables.

Now we can define and standard deviation is an analysis of the standard deviation of a random variable X compared to its mathematical perspective based on an unbiased estimate of its variance. The formula is written like this:
Note that all two estimates are provided biased. In general cases, it is not possible to construct an unbiased estimate. But an estimate based on an unbiased variance estimate will be consistent.

Practical implementation in Excel

Well, now let's move away from boring theory and in practice let's see how the STDEV function works. I will not consider all variations of the standard deviation function in Excel, one is enough, but in examples. As an example, consider how sales stability statistics are determined.

First, look at the spelling of the function, and as you can see, it is very simple:

STDEV.G(_number1_;_number2_; ....), where:


Now let's create an example file and based on it we will consider the operation of this function. Since for analytical calculations it is necessary to use at least three values, as in principle in any statistical analysis, I also conditionally took 3 periods, it can be a year, quarter, month or week. In my case, a month. For the greatest reliability, I recommend taking as many periods as possible, but not less than three. All data in the table is very simple for clarity of work and functionality of the formula.

To begin with, we need to calculate the average value by month. We will use the AVERAGE function for this and get the formula: =AVERAGE(C4:E4).
Now, in fact, we can find the standard deviation using the STDEV.G function, in the value of which we need to put down the sales of goods for each period. The result is a formula of the following form: \u003d STDEV.G (C4; D4; E4).
Well, that's half the work done. In the next step, we form the "Variation", this is obtained by dividing by the average value, standard deviation and converting the result into percentages. We get the following table:
Well, the main calculations are over, it remains to figure out how sales are going stably or not. Let us take as a condition that deviations of 10% are considered stable, from 10 to 25% these are small deviations, but everything above 25% is no longer stable. To obtain the result according to the conditions, we will use the logical one and to obtain the result we will write the formula:

IF(H4<0,1;"стабильно";ЕСЛИ(H4<0,25;"нормально";"не стабильно"))

All ranges are taken conditionally for clarity, your tasks may have completely different conditions.
To improve data visualization, when your table has thousands of positions, you should take the opportunity to impose certain conditions that you need or use it to highlight certain options with a color scheme, it will be very visual.

First, select the ones you want to apply conditional formatting to. In the "Home" control panel, select "Conditional Formatting" and in the drop-down menu, the item "Cell Selection Rules" and then click the menu item "Text contains ...". A dialog box appears in which you enter your conditions.

After the conditions are written, for example, “stable” - green, “normal” - yellow and “not stable” - red, we get a beautiful and understandable table in which you can see what to pay attention to first of all.

Using VBA for the STDEV.H Function

Those who are interested can automate their calculations using macros and use the following function:

Function MyStDevP(Arr) Dim x, aCnt&, aSum#, aAver#, tmp# For Each x In Arr aSum = aSum + x "calculate the sum of array elements aCnt = aCnt + 1 "calculate the number of elements Next x aAver = aSum / aCnt "average value For Each x In Arr tmp = tmp + (x - aAver) ^ 2 "compute the sum of the squares of the difference between the elements of the array and the average Next x MyStDevP = Sqr(tmp / aCnt) "compute STDEV.G() End Function

Function MyStDevP(Arr )

Dim x , aCnt & , aSum #, aAver#, tmp#

For Each x In Arr

aSum = aSum + x "calculate the sum of the elements of the array

Among the many indicators that are used in statistics, it is necessary to highlight the calculation of variance. It should be noted that manually performing this calculation is a rather tedious task. Fortunately, there are functions in Excel that allow you to automate the calculation procedure. Let's find out the algorithm for working with these tools.

Dispersion is an indicator of variation, which is the average square of deviations from the mathematical expectation. Thus, it expresses the spread of numbers about the mean. The calculation of the dispersion can be carried out both for the general population and for the sample.

Method 1: calculation on the general population

To calculate this indicator in Excel for the general population, the function is used DISP.G. The syntax for this expression is as follows:

DISP.G(Number1;Number2;…)

In total, from 1 to 255 arguments can be applied. Arguments can be both numeric values ​​and references to the cells in which they are contained.

Let's see how to calculate this value for a range of numeric data.


Method 2: sample calculation

In contrast to the calculation of the value for the general population, in the calculation for the sample, the denominator is not the total number of numbers, but one less. This is done in order to correct the error. Excel takes into account this nuance in a special function that is designed for this type of calculation - DISP.V. Its syntax is represented by the following formula:

VAR.B(Number1;Number2;…)

The number of arguments, as in the previous function, can also range from 1 to 255.


As you can see, the Excel program is able to greatly facilitate the calculation of the variance. This statistic can be calculated by the application for both the population and the sample. In this case, all user actions are actually reduced only to specifying the range of numbers to be processed, and Excel does the main work itself. Of course, this will save a significant amount of time for users.

Instruction

Let there be several numbers characterizing - or homogeneous quantities. For example, the results of measurements, weighings, statistical observations, etc. All quantities presented must be measured by the same measurement. To find the standard deviation, do the following.

Determine the arithmetic mean of all numbers: add all the numbers and divide the sum by the total number of numbers.

Determine the dispersion (scatter) of numbers: add up the squares of the deviations found earlier and divide the resulting sum by the number of numbers.

There are seven patients in the ward with a temperature of 34, 35, 36, 37, 38, 39 and 40 degrees Celsius.

It is required to determine the average deviation from the average.
Solution:
"in the ward": (34+35+36+37+38+39+40)/7=37 ºС;

Temperature deviations from the average (in this case, the normal value): 34-37, 35-37, 36-37, 37-37, 38-37, 39-37, 40-37, it turns out: -3, -2, -1 , 0, 1, 2, 3 (ºС);

Divide the sum of numbers obtained earlier by their number. For the accuracy of the calculation, it is better to use a calculator. The result of the division is the arithmetic mean of the summands.

Pay close attention to all stages of the calculation, as an error in at least one of the calculations will lead to an incorrect final indicator. Check the received calculations at each stage. The arithmetic average has the same meter as the summands of the numbers, that is, if you determine the average attendance, then all indicators will be “person”.

This method of calculation is used only in mathematical and statistical calculations. So, for example, the arithmetic mean in computer science has a different calculation algorithm. The arithmetic mean is a very conditional indicator. It shows the probability of an event, provided that it has only one factor or indicator. For the most in-depth analysis, many factors must be taken into account. For this, the calculation of more general quantities is used.

The arithmetic mean is one of the measures of central tendency, widely used in mathematics and statistical calculations. Finding the arithmetic average of several values ​​​​is very simple, but each task has its own nuances, which are simply necessary to know in order to perform correct calculations.

Quantitative results of such experiments.

How to find the arithmetic mean

The search for the arithmetic mean for an array of numbers should begin with determining the algebraic sum of these values. For example, if the array contains the numbers 23, 43, 10, 74 and 34, then their algebraic sum will be 184. When writing, the arithmetic mean is denoted by the letter μ (mu) or x (x with a bar). Next, the algebraic sum should be divided by the number of numbers in the array. In this example, there were five numbers, so the arithmetic mean will be 184/5 and will be 36.8.

Features of working with negative numbers

If there are negative numbers in the array, then the arithmetic mean is found using a similar algorithm. There is a difference only when calculating in the programming environment, or if there are additional conditions in the task. In these cases, finding the arithmetic mean of numbers with different signs comes down to three steps:

1. Finding the common arithmetic mean by the standard method;
2. Finding the arithmetic mean of negative numbers.
3. Calculation of the arithmetic mean of positive numbers.

The responses of each of the actions are written separated by commas.

Natural and decimal fractions

If the array of numbers is represented by decimal fractions, the solution occurs according to the method of calculating the arithmetic mean of integers, but the result is reduced according to the requirements of the problem for the accuracy of the answer.

When working with natural fractions, they should be reduced to a common denominator, which is multiplied by the number of numbers in the array. The numerator of the answer will be the sum of the given numerators of the original fractional elements.

The standard deviation function is already from the category of higher mathematics related to statistics. In Excel, there are several options for using the Standard Deviation Function:

  • STDEV function.
  • STDEV function.
  • STDEV function

We will need these functions in sales statistics to identify the stability of sales (XYZ analysis). This data can be used both for pricing and for the formation (adjustment) of the assortment matrix and for other useful sales analyzes, which I will definitely talk about in future articles.

Foreword

Let's look at the formulas first in mathematical language, and then (below in the text) we will analyze the formula in Excel in detail and how the resulting result is applied in the analysis of sales statistics.

So, Standard Deviation is an estimate of the standard deviation of a random variable x regarding its mathematical expectation based on an unbiased estimate of its variance)))) Do not be afraid of incomprehensible words, be patient and you will understand everything!

Description of the formula: The standard deviation is measured in units of the random variable itself and is used when calculating the standard error of the arithmetic mean, when constructing confidence intervals, when statistically testing hypotheses, when measuring a linear relationship between random variables. Defined as the square root of the variance of a random variable

Now the standard deviation is an estimate of the standard deviation of a random variable x with respect to its mathematical expectation based on an unbiased estimate of its variance:

Dispersion;

- i-th sample element;

Sample size;

Sample arithmetic mean:

It should be noted that both estimates are biased. In the general case, it is impossible to construct an unbiased estimate. However, an estimate based on an unbiased variance estimate is consistent.

three sigma rule() - almost all values ​​of a normally distributed random variable lie in the interval . More strictly, with approximately 0.9973 probability, the value of a normally distributed random variable lies in the specified interval (provided that the value is true, and not obtained as a result of sample processing). We will use a rounded interval of 0.1

If the true value is unknown, then you should use not, but s. Thus, the rule of three sigma is transformed into the rule of three s. It is this rule that will help us determine the stability of sales, but more on that later...

Now Standard Deviation Function in Excel

I hope I didn't overwhelm you with math? Perhaps someone will need this information for an abstract or some other purpose. Now let's chew on how these formulas work in Excel...

To determine the stability of sales, we do not need to delve into all the options for standard deviation functions. We will use only one:

STDEV function

STDEV(number1;number2;... )

Number1, Number2,..- from 1 to 30 numerical arguments corresponding to the general population.

Now let's look at an example:

Let's create a book and a makeshift spreadsheet. You can download this example in Excel at the end of the article.

To be continued!!!

Hello again. Well!? Got a free minute. Let's continue?

And so the stability of sales with the help STDEV functions

For clarity, let's take a few improvised goods:

In analytics, whether it is a forecast, research, or something else related to statistics, it is always necessary to take three periods. It can be a week, month, quarter or year. It is possible and even best to take as many periods as possible, but not less than three.

I specifically showed exaggerated sales, where you can see with the naked eye what is being sold consistently and what is not. This will make it easier to understand how the formulas work.

And so we have sales, now we need to calculate the average sales values ​​by period.

Average value formula AVERAGE(period data) in my case, the formula looks like this =AVERAGE(C6:E6)

We stretch the formula for all products. This can be done by holding the right corner of the selected cell and dragging it to the end of the list. Or put the cursor on the column with the product and press the following key combinations:

Ctrl + Down move the cursor to the bottom of the list.

Ctrl + Right, the cursor will move to the right side of the table. One more time to the right and we will get to the column with the formula.

Now we clamp

Ctrl + Shift and press up. So we select the area of ​​​​stretching the formula.

And the key combination Ctrl + D will stretch the function where we need it.

Remember these combinations, they really increase your speed in Excel, especially when you work with large arrays.

The next step, the standard deviation function itself, as I said, we will use only one STDEV

We prescribe the function and in the function values ​​we put the sales values ​​of each period. If you have sales in the table one after another, you can use the range, as in my formula =STDEV(C6:E6) or list the required cells with a semicolon =STDEV(C6;D6;E6)

Here are all the calculations and ready. But how do you know what sells consistently and what doesn't? Let's just put down the convention XYZ where,

X is stable

Y - with small deviations

Z - not stable

To do this, we use error intervals. if fluctuations occur within 10%, we will assume that sales are stable.

If between 10 and 25 percent, it will be Y.

And if the variation values ​​​​exceed 25% - this is not stability.

To correctly set the letters for each product, we will use the IF formula in more detail about. In my table, this function will look like this:

IF(H6<0,1;"X";ЕСЛИ(H6<0,25;"Y";"Z"))

Accordingly, we stretch all the formulas for all names.

I will try to immediately answer the question, Why the intervals of 10% and 25%?

In fact, the intervals may be different, it all depends on the specific task. I specifically showed you exaggerated sales values, where the difference is visible to the "eye". It is obvious that product 1 is not sold consistently, but the dynamics shows an increase in sales. Leave this item alone...

But product 2, there is already destabilization on the face. And our calculations show Z, which tells us about the instability of sales. Item 3 and Item 5 show stable performance, please note the variation is within 10%.

Those. Item 5 with scores of 45, 46, and 45 shows a 1% variation, which is a stable number series.

But Product 2 with scores of 10, 50, and 5 shows a 93% variation, which is NOT a stable number series.

After all the calculations, you can put a filter and filter out the stability, so if your table consists of several thousand items, you can easily select which are not stable in sales or, on the contrary, which ones are stable.

"Y" did not work in my table, I think for clarity of the number series, it needs to be added. I will draw Goods 6...

You see, the number series 40, 50 and 30 shows 20% variation. It seems that there is no big error, but still the spread is significant ...

And so to sum it up:

10,50,5 - Z is not stable. Variation over 25%

40,50,30 - Y you can pay attention to this product and improve its sales. Variation less than 25% but greater than 10%

45,46,45 - X is stability, nothing needs to be done with this product yet. Variation less than 10%

That's all! I hope I explained everything clearly, if not, ask what is not clear. And I will be grateful to you for every comment, whether it be praise or criticism. So I will know that you are reading me and you, which is very IMPORTANT, interesting. And accordingly, new lessons will appear.

Views