Spearman correlation test online. Correlation analysis using the Spearman method (Spearman ranks)

​ Spearman's rank correlation coefficient is a non-parametric method that is used to statistically study the relationship between phenomena. In this case, the actual degree of parallelism between the two quantitative series of the studied characteristics is determined and an assessment of the closeness of the established connection is given using a quantitatively expressed coefficient.

1. History of the development of the rank correlation coefficient

This criterion was developed and proposed for correlation analysis in 1904 Charles Edward Spearman, English psychologist, professor at the Universities of London and Chesterfield.

2. What is the Spearman coefficient used for?

Spearman's rank correlation coefficient is used to identify and evaluate the closeness of the relationship between two series of compared quantitative indicators. In the event that the ranks of indicators, ordered by degree of increase or decrease, in most cases coincide (a greater value of one indicator corresponds to a greater value of another indicator - for example, when comparing the patient's height and body weight), it is concluded that there is straight correlation connection. If the ranks of indicators have the opposite direction (a higher value of one indicator corresponds to a lower value of another - for example, when comparing age and heart rate), then they talk about reverse connections between indicators.

    The Spearman correlation coefficient has the following properties:
  1. The correlation coefficient can take values ​​from minus one to one, and with rs=1 there is a strictly direct relationship, and with rs= -1 there is a strictly feedback relationship.
  2. If the correlation coefficient is negative, then there is a feedback relationship; if it is positive, then there is a direct relationship.
  3. If the correlation coefficient is zero, then there is practically no connection between the quantities.
  4. The closer the module of the correlation coefficient is to unity, the stronger the relationship between the measured quantities.

3. In what cases can the Spearman coefficient be used?

Due to the fact that the coefficient is a method nonparametric analysis, no test for normal distribution is required.

Comparable indicators can be measured both in continuous scale(for example, the number of red blood cells in 1 μl of blood), and in ordinal(for example, expert assessment points from 1 to 5).

The effectiveness and quality of the Spearman assessment decreases if the difference between different meanings any of the measured quantities is large enough. It is not recommended to use the Spearman coefficient if there is an uneven distribution of the values ​​of the measured quantity.

4. How to calculate the Spearman coefficient?

Calculation of the Spearman rank correlation coefficient includes the following steps:

5. How to interpret the Spearman coefficient value?

When using the rank correlation coefficient, the closeness of the connection between characteristics is conditionally assessed, considering coefficient values ​​equal to 0.3 or less as indicators of weak connection; values ​​more than 0.4, but less than 0.7 are indicators of moderate closeness of connection, and values ​​of 0.7 or more are indicators of high closeness of connection.

The statistical significance of the obtained coefficient is assessed using Student's t-test. If the calculated t-test value is less than the table value for a given number of degrees of freedom, statistical significance There is no observed relationship. If it is greater, then the correlation is considered statistically significant.

Assignment of rank correlation coefficient

The Spearman rank correlation method allows you to determine the closeness (strength) and direction of the correlation between two signs or two profiles (hierarchies) signs.

Description of the method

To calculate rank correlation, it is necessary to have two rows of values ​​that can be ranked. Such series of values ​​could be:

1) two signs measured in the same group of subjects;

2) two individual hierarchies of characteristics, identified in two subjects according to the same set of characteristics (for example, personality profiles according to the 16-factor questionnaire of R. B. Cattell, hierarchy of values ​​according to the method of R. Rokeach, sequence of preferences in choosing from several alternatives, etc.);

3) two group hierarchies of traits;

4) individual and group hierarchy of features.

First, the indicators are ranked separately for each of the characteristics. As a rule, a lower rank is assigned to a lower attribute value.

Let's consider case 1 (two signs). Here the individual values ​​for the first characteristic obtained by different subjects are ranked, and then the individual values ​​for the second characteristic.

If two characteristics are positively related, then subjects who have low ranks on one of them will have low ranks on the other, and subjects who have high ranks on one of the characteristics will also have high ranks on the other characteristic. To count r s it is necessary to determine the differences (d) between the ranks obtained by a given subject for both characteristics. Then these indicators d are transformed in a certain way and subtracted from 1. The smaller the difference between the ranks, the greater r s will be, the closer it will be to +1.

If there is no correlation, then all the ranks will be mixed and there will be no correspondence between them. The formula is designed so that in this case r s, will be close to 0.

In the case of a negative correlation, low ranks of subjects on one attribute will correspond to high ranks on another attribute, and vice versa.

The greater the discrepancy between the subjects' ranks on two variables, the closer r s is to -1.

Let's consider case 2 (two individual profiles). Here the individual values ​​obtained by each of the 2 subjects are ranked according to a certain (identical for both of them) set of characteristics. The first rank will be given to the feature with the lowest value; the second rank is a feature with a higher value, etc. Obviously, all characteristics must be measured in the same units, otherwise ranking is impossible. For example, it is impossible to rank indicators on the Cattell Personality Inventory (16 PF), if they are expressed in “raw” points, since the ranges of values ​​are different for different factors: from 0 to 13, from 0 to 20 and from 0 to 26. We cannot say which factor will take first place in terms of severity until we will not reduce all values ​​to unified scale(most often this is a wall scale).

If the individual hierarchies of two subjects are positively related, then features that have low ranks in one of them will have low ranks in the other, and vice versa. For example, if one subject’s factor E (dominance) has the lowest rank, then another subject’s factor should have a low rank; if one subject’s factor C (emotional stability) has the highest rank, then the other subject should have a high rank on this factor. rank, etc.

Let's consider case 3 (two group profiles). Here the average group values ​​obtained in 2 groups of subjects are ranked according to a certain set of characteristics, identical for the two groups. In what follows, the line of reasoning is the same as in the previous two cases.

Let's consider case 4 (individual and group profiles). Here, the individual values ​​of the subject and the group average values ​​are ranked separately according to the same set of characteristics, which are obtained, as a rule, by excluding this individual subject - he does not participate in the group average profile with which his individual profile will be compared. Rank correlation will test how consistent the individual and group profiles are.

In all four cases, the significance of the resulting correlation coefficient is determined by the number of ranked values N. In the first case, this number will coincide with the sample size n. In the second case, the number of observations will be the number of features that make up the hierarchy. In the third and fourth cases N- this is also the number of features being compared, and not the number of subjects in groups. Detailed explanations are given in the examples.

If the absolute value of r s reaches or exceeds a critical value, the correlation is reliable.

Hypotheses

There are two possible hypotheses. The first applies to case 1, the second to the other three cases.

First version of hypotheses

H 0: The correlation between variables A and B does not differ from zero.

H 1: The correlation between variables A and B is significantly different from zero.

Second version of hypotheses

H 0: The correlation between hierarchies A and B does not differ from zero.

H1: The correlation between hierarchies A and B is significantly different from zero.

Graphical representation of the rank correlation method

Most often, the correlation relationship is presented graphically in the form of a cloud of points or in the form of lines reflecting the general tendency of placing points in the space of two axes: the axis of feature A and feature B (see Fig. 6.2).

Let's try to depict rank correlation in the form of two rows of ranked values, which are connected in pairs by lines (Fig. 6.3). If the ranks for trait A and trait B coincide, then there is a horizontal line between them; if the ranks do not coincide, then the line becomes slanted. The greater the discrepancy between the ranks, the more inclined the line becomes. On the left in Fig. Figure 6.3 shows the highest possible positive correlation (r =+1.0) - practically this is a “ladder”. In the center there is a zero correlation - a braid with irregular weaves. All the ranks are mixed up here. On the right is the highest negative correlation (r s = -1.0) - a web with a regular interweaving of lines.

Rice. 6.3. Graphical representation of rank correlation:

a) high positive correlation;

b) zero correlation;

c) high negative correlation

Restrictionsrank coefficientcorrelations

1. For each variable, at least 5 observations must be presented. The upper limit of the sample is determined by the available tables of critical values ​​(Table XVI Appendix 1), namely N40.

2. Spearman's rank correlation coefficient r s with a large number of identical ranks for one or both compared variables gives rough values. Ideally, both correlated series should represent two sequences of divergent values. If this condition is not met, it is necessary to make an adjustment for equal ranks. The corresponding formula is given in example 4.

Example 1 - correlationbetween twosigns

A study simulating the activity of an air traffic controller (Oderyshev B.S., Shamova E.P., Sidorenko E.V., Larchenko N.N., 1978), a group of subjects, students Faculty of Physics LSU underwent training before starting work on the simulator. The subjects had to solve problems of choosing the optimal type of runway for a given type of aircraft. Is the number of errors made by subjects in a training session related to indicators of verbal and nonverbal intelligence measured using D. Wechsler’s method?

Table 6.1

Indicators of the number of errors in the training session and indicators of the level of verbal and non-verbal intelligence among physics students (N=10)

Subject

Number of mistakes

Verbal Intelligence Index

Nonverbal Intelligence Index

First, let's try to answer the question of whether indicators of the number of errors and verbal intelligence are related.

Let's formulate hypotheses.

H 0: The correlation between the number of errors in a training session and the level of verbal intelligence does not differ from zero.

H 1 : The correlation between the number of errors in a training session and the level of verbal intelligence is statistically significantly different from zero.

Next, we need to rank both indicators, assigning a lower rank to the smaller value, then calculate the differences between the ranks that each subject received for the two variables (attributes), and square these differences. Let's make all the necessary calculations in the table.

In Table. 6.2 the first column on the left shows the values ​​for the number of errors; the next column shows their ranks. The third column from the left shows the scores for verbal intelligence; the next column shows their ranks. The fifth from the left presents the differences d between the rank on variable A (number of errors) and variable B (verbal intelligence). The last column presents the squared differences - d 2 .

Table 6.2

Calculation d 2 for Spearman's rank correlation coefficient r s when comparing indicators of the number of errors and verbal intelligence among physics students (N=10)

Subject

Variable A

number of mistakes

Variable B

verbal intelligence.

d (rank A -

J 2

Individual

values

Individual

values

Spearman's rank correlation coefficient is calculated using the formula:

Where d - the difference between the ranks on two variables for each subject;

N- number of ranked values, c. in this case, the number of subjects.

Let's calculate the empirical value of r s:

The obtained empirical value of r s is close to 0. Nevertheless, we determine the critical values ​​of r s at N = 10 according to Table. XVI Appendix 1:

Answer: H 0 is accepted. The correlation between the number of errors in a training session and the level of verbal intelligence does not differ from zero.

Now let’s try to answer the question of whether indicators of the number of errors and nonverbal intelligence are related.

Let's formulate hypotheses.

H 0: The correlation between the number of errors in a training session and the level of nonverbal intelligence does not differ from 0.

H 1: The correlation between the number of errors in a training session and the level of nonverbal intelligence is statistically significantly different from 0.

The results of ranking and comparison of ranks are presented in Table. 6.3.

Table 6.3

Calculation d 2 for Spearman's rank correlation coefficient r s when comparing indicators of the number of errors and non-verbal intelligence among physics students (N=10)

Subject

Variable A

number of mistakes

Variable E

nonverbal intelligence

d (rank A -

d 2

Individual

Individual

values

values

We remember that to determine the significance of r s, it does not matter whether it is positive or negative, only its absolute value is important. In this case:

r s em

Answer: H 0 is accepted. The correlation between the number of errors in a training session and the level of nonverbal intelligence is random, r s does not differ from 0.

However, we can pay attention to a certain trend negative relationship between these two variables. We might be able to confirm this to a statistically significant level if we increased the sample size.

Example 2 - correlation between individual profiles

In a study devoted to the problems of value reorientation, hierarchies of terminal values ​​were identified according to the method of M. Rokeach among parents and their adult children (Sidorenko E.V., 1996). The ranks of terminal values ​​obtained during the examination of a mother-daughter pair (mother - 66 years old, daughter - 42 years old) are presented in Table. 6.4. Let's try to determine how these value hierarchies correlate with each other.

Table 6.4

Ranks of terminal values ​​according to M. Rokeach’s list in the individual hierarchies of mother and daughter

Terminal values

Rank of values ​​in

Rank of values ​​in

d 2

mother's hierarchy

daughter's hierarchy

1 Active active life

2 Life wisdom

3 Health

4 Interesting work

5 The beauty of nature and art

7 Financially secure life

8 Having good and loyal friends

9 Public recognition

10 Cognition

11 Productive life

12 Development

13 Entertainment

14 Freedom

15 Happy family life

16 The happiness of others

17 Creativity

18 Self-confidence

Let's formulate hypotheses.

H 0: The correlation between mother and daughter terminal value hierarchies is not different from zero.

H 1: The correlation between mother and daughter terminal value hierarchies is statistically significantly different from zero.

Since the ranking of values ​​is assumed by the research procedure itself, we can only calculate the differences between the ranks of 18 values ​​in two hierarchies. In the 3rd and 4th columns of Table. 6.4 presents the differences d and the squares of these differences d 2 .

We determine the empirical value of r s using the formula:

Where d - differences between ranks for each of the variables, in this case for each of the terminal values;

N- the number of variables that form the hierarchy, in this case the number of values.

For this example:

According to Table. XVI Appendix 1 determine the critical values:

Answer: H 0 is rejected. H 1 is accepted. The correlation between the hierarchies of terminal values ​​of mother and daughter is statistically significant (p<0,01) и является положительной.

According to Table. 6.4 we can determine that the main differences occur in the values ​​“Happy family life”, “Public recognition” and “Health”, the ranks of other values ​​are quite close.

Example 3 - Correlation between two group hierarchies

Joseph Wolpe, in a book written jointly with his son (Wolpe J., Wolpe D., 1981), provides an ordered list of the most common “useless” fears, as he calls it, in modern man, which do not carry a signal meaning and only interfere with living a full life and act. In a domestic study conducted by M.E. Rakhova (1994) 32 subjects had to rate on a 10-point scale how relevant this or that type of fear from Wolpe’s list was for them 3 . The surveyed sample consisted of students from the Hydrometeorological and Pedagogical Institutes of St. Petersburg: 15 boys and 17 girls aged 17 to 28 years, average age 23 years.

Data obtained on a 10-point scale were averaged over 32 subjects, and the averages were ranked. In Table. Table 6.5 presents the ranking indicators obtained by J. Volpe and M. E. Rakhova. Do the ranking sequences of the 20 types of fear coincide?

Let's formulate hypotheses.

H 0: The correlation between ordered lists of types of fear in the American and domestic samples does not differ from zero.

H 1: The correlation between ordered lists of types of fear in the American and domestic samples is statistically significantly different from zero.

All calculations related to calculating and squaring the differences between the ranks of different types of fear in two samples are presented in Table. 6.5.

Table 6.5

Calculation d for the Spearman rank correlation coefficient when comparing ordered lists of types of fear in American and domestic samples

Types of fear

Rank in American sample

Rank in Russian

Fear of public speaking

Fear of flying

Fear of making a mistake

Fear of failure

Fear of disapproval

Fear of Rejection

Fear of evil people

Fear of loneliness

Fear of Blood

Fear of open wounds

Dentist fear

Fear of injections

Fear of taking tests

Fear of the police ^militia)

Fear of heights

Fear of dogs

Fear of spiders

Fear of crippled people

Fear of hospitals

Fear of the dark

We determine the empirical value of r s:

According to Table. XVI Appendix 1 we determine the critical values ​​of g s at N=20:

Answer: H 0 is accepted. The correlation between ordered lists of types of fear in the American and domestic samples does not reach the level of statistical significance, that is, it does not differ significantly from zero.

Example 4 - correlation between individual and group average profiles

A sample of St. Petersburg residents aged from 20 to 78 years (31 men, 46 women), balanced by age in such a way that people over the age of 55 made up 50% of it 4, were asked to answer the question: “What is the level of development of each of the following? qualities required for a deputy of the City Assembly of St. Petersburg?" (Sidorenko E.V., Dermanova I.B., Anisimova O.M., Vitenberg E.V., Shulga A.P., 1994). The assessment was made on a 10-point scale. In parallel with this, a sample of deputies and candidates for deputies to the City Assembly of St. Petersburg (n=14) was examined. Individual diagnostics of political figures and candidates were carried out using the Oxford Express Video Diagnostic System using the same set of personal qualities that were presented to a sample of voters.

In Table. 6.6 shows the average values ​​​​obtained for each of the qualities V sample of voters (“reference series”) and individual values ​​of one of the deputies of the City Assembly.

Let's try to determine how much the individual profile of a K-va deputy correlates with the reference profile.

Table 6.6

Averaged reference assessments of voters (n=77) and individual indicators of the K-va deputy on 18 personal qualities of express video diagnostics

Quality name

Average Benchmark Voter Scores

Individual indicators of the K-va deputy

1. General level of culture

2. Learning ability

4. The ability to create new things

5.. Self-criticism

6. Responsibility

7. Independence

8. Energy, activity

9. Determination

10. Self-control, self-control

I. Persistence

12. Personal maturity

13. Decency

14. Humanism

15. Ability to communicate with people

16. Tolerance for other people's opinions

17. Flexibility of behavior

18. Ability to make a favorable impression

Table 6.7

Calculation d 2 for the Spearman rank correlation coefficient between the reference and individual profiles of the deputy’s personal qualities

Quality name

quality rank in the reference profile

Row 2: quality rank in individual profile

d 2

1 Responsibility

2 Decency

3 Ability to communicate with people

4 Self-control, self-control

5 General level of culture

6 Energy, activity

8 Self-criticism

9 Independence

10 Personal maturity

And Determination

12 Learning ability

13 Humanism

14 Tolerance for other people's opinions

15 Fortitude

16 Flexibility of behavior

17 Ability to make a favorable impression

18 Ability to create new things

As can be seen from Table. 6.6, voters’ assessments and individual deputy indicators vary in different ranges. Indeed, voters' assessments were obtained on a 10-point scale, and individual indicators on express video diagnostics are measured on a 20-point scale. Ranking allows us to convert both measurement scales into a single scale, where the unit of measurement is 1 rank, and the maximum value is 18 ranks.

Ranking, as we remember, must be done separately for each row of values. In this case, it is advisable to assign a lower rank to a higher value, so that you can immediately see where this or that quality ranks in terms of importance (for voters) or in terms of severity (for a deputy).

The ranking results are presented in Table. 6.7. The qualities are listed in a sequence that reflects the reference profile.

Let's formulate hypotheses.

H 0: The correlation between the individual profile of a K-va deputy and the reference profile constructed according to voters’ assessments does not differ from zero.

H 1: The correlation between the individual profile of a K-va deputy and the reference profile constructed according to voters’ assessments is statistically significantly different from zero. Since in both compared ranking series there are

groups of identical ranks, before calculating the rank coefficient

correlations need to be corrected for the same ranks of T a and T b :

Where A - the volume of each group of identical ranks in rank row A,

b - the volume of each group of identical ranks in the ranking series B.

In this case, in row A (reference profile) there is one group of identical ranks - the qualities “learning ability” and “humanism” have the same rank 12.5; hence, A=2.

T a =(2 3 -2)/12=0.50.

In row B (individual profile) there are two groups of identical ranks, while b 1 =2 And b 2 =2.

T a =[(2 3 -2)+(2 3 -2)]/12=1.00

To calculate the empirical value r s we use the formula

In this case:

Note that if we had not made the correction for equal ranks, then the value of r s would have been only (0.0002) higher:

With large numbers of identical ranks, changes in r 5 can be much more significant. The presence of identical ranks means a lower degree of differentiation of ordered variables and, therefore, less opportunity to assess the degree of connection between them (Sukhodolsky G.V., 1972, p. 76).

According to Table. XVI Appendix 1 we determine the critical values ​​of r, at N = 18:

Answer: Hq is rejected. The correlation between the individual profile of a K-va deputy and the reference profile that meets the requirements of voters is statistically significant (p<0,05) и является положи­тельной.

From Table. 6.7 it is clear that the K-v deputy has a lower rank on the Ability to Communicate with People scales and higher ranks on the Determination and Persistence scales than prescribed by the electoral standard. These discrepancies mainly explain a slight decrease in the obtained rs.

Let us formulate a general algorithm for calculating r s.

Pearson correlation coefficient

Coefficient r- Pearson is used to study the relationship between two metric variables measured on the same sample. There are many situations in which its use is appropriate. Does intelligence affect academic performance in senior university years? Is the size of an employee's salary related to his friendliness towards colleagues? Does a student’s mood affect the success of solving a complex arithmetic problem? To answer such questions, the researcher must measure two indicators of interest for each member of the sample.

The value of the correlation coefficient is not affected by the units of measurement in which the characteristics are presented. Consequently, any linear transformations of features (multiplying by a constant, adding a constant) do not change the value of the correlation coefficient. An exception is the multiplication of one of the signs by a negative constant: the correlation coefficient changes its sign to the opposite.

Application of Spearman and Pearson correlation.

Pearson correlation is a measure of the linear relationship between two variables. It allows you to determine how proportional the variability of two variables is. If the variables are proportional to each other, then the relationship between them can be graphically represented as a straight line with a positive (direct proportion) or negative (inverse proportion) slope.

In practice, the relationship between two variables, if there is one, is probabilistic and graphically looks like an ellipsoidal dispersion cloud. This ellipsoid, however, can be represented (approximated) as a straight line, or regression line. A regression line is a straight line constructed using the least squares method: the sum of the squared distances (calculated along the Y axis) from each point on the scatter plot to the straight line is the minimum.

Of particular importance for assessing the accuracy of prediction is the variance of estimates of the dependent variable. Essentially, the variance of estimates of a dependent variable Y is that portion of its total variance that is due to the influence of the independent variable X. In other words, the ratio of the variance of estimates of the dependent variable to its true variance is equal to the square of the correlation coefficient.

The square of the correlation coefficient between the dependent and independent variables represents the proportion of variance in the dependent variable that is due to the influence of the independent variable and is called the coefficient of determination. The coefficient of determination thus shows the extent to which the variability of one variable is caused (determined) by the influence of another variable.

The determination coefficient has an important advantage over the correlation coefficient. Correlation is not a linear function of the relationship between two variables. Therefore, the arithmetic mean of the correlation coefficients for several samples does not coincide with the correlation calculated immediately for all subjects from these samples (i.e., the correlation coefficient is not additive). On the contrary, the coefficient of determination reflects the relationship linearly and is therefore additive: it can be averaged over several samples.

Additional information about the strength of the connection is provided by the value of the correlation coefficient squared - the coefficient of determination: this is the part of the variance of one variable that can be explained by the influence of another variable. Unlike the correlation coefficient, the coefficient of determination increases linearly with increasing connection strength.

Spearman correlation coefficients and τ - Kendall ( rank correlations )

If both variables between which the relationship is being studied are presented on an ordinal scale, or one of them is on an ordinal scale and the other on a metric scale, then rank correlation coefficients are used: Spearman or τ - Kendella. Both coefficients require a preliminary ranking of both variables for their application.

Spearman's rank correlation coefficient is a non-parametric method that is used for the purpose of statistically studying the relationship between phenomena. In this case, the actual degree of parallelism between the two quantitative series of the studied characteristics is determined and an assessment of the closeness of the established connection is given using a quantitatively expressed coefficient.

If the members of a size group were ranked first on the x variable, then on the y variable, then the correlation between the x and y variables can be obtained simply by calculating the Pearson coefficient for the two series of ranks. Provided there are no rank relationships (i.e., no repeating ranks) for either variable, the Pearson formula can be greatly simplified computationally and converted into what is known as the Spearman formula.

The power of the Spearman rank correlation coefficient is somewhat inferior to the power of the parametric correlation coefficient.

It is advisable to use the rank correlation coefficient when there are a small number of observations. This method can be used not only for quantitative data, but also in cases where the recorded values ​​are determined by descriptive features of varying intensity.

Spearman's rank correlation coefficient at large quantities equal ranks for one or both compared variables gives coarsened values. Ideally, both correlated series should represent two sequences of divergent values

An alternative to the Spearman correlation for ranks is the τ correlation - Kendall. The correlation proposed by M. Kendall is based on the idea that the direction of the connection can be judged by comparing subjects in pairs: if a pair of subjects has a change in x that coincides in direction with a change in y, then this indicates a positive connection, if does not match - then about a negative connection.

Correlation coefficients were specifically designed to quantify the strength and direction of the relationship between two properties measured on numerical scales (metric or rank). As already mentioned, the maximum strength of the connection corresponds to correlation values ​​of +1 (strict direct or directly proportional connection) and -1 (strict inverse or inversely proportional connection); the absence of connection corresponds to a correlation equal to zero. Additional information about the strength of the relationship is provided by the coefficient of determination: this is the portion of the variance in one variable that can be explained by the influence of another variable.

9. Parametric methods for data comparison

Parametric comparison methods are used if your variables were measured on a metric scale.

Comparison of Variances 2- x samples according to Fisher's criterion .


This method allows you to test the hypothesis that the variances of the 2 general populations from which the compared samples are extracted differ from each other. Limitations of the method - the distribution of the characteristic in both samples should not differ from normal.

An alternative to comparing variances is the Levene test, for which there is no need to test for normality of distribution. This method can be used to check the assumption of equality (homogeneity) of variances before checking the significance of differences in means using the Student's test for independent samples of different sizes.

Spearman rank correlation(rank correlation). Spearman's rank correlation is the simplest way to determine the degree of relationship between factors. The name of the method indicates that the relationship is determined between ranks, that is, series of obtained quantitative values, ranked in descending or ascending order. It must be borne in mind that, firstly, rank correlation is not recommended if the connection between pairs is less than four and more than twenty; secondly, rank correlation makes it possible to determine the relationship in another case, if the values ​​are semi-quantitative in nature, that is, they do not have a numerical expression and reflect a clear order of occurrence of these values; thirdly, it is advisable to use rank correlation in cases where it is sufficient to obtain approximate data. An example of calculating the rank correlation coefficient to determine the question: the questionnaire measures X and Y similar personal qualities of the subjects. Using two questionnaires (X and Y), which require alternative answers “yes” or “no,” the primary results were obtained - the answers of 15 subjects (N = 10). The results were presented as the sum of affirmative answers separately for questionnaire X and for questionnaire B. These results are summarized in table. 5.19.

Table 5.19. Tabulation of primary results to calculate the Spearman rank correlation coefficient (p) *

Analysis of the summary correlation matrix. Method of correlation galaxies.

Example. In table Figure 6.18 shows interpretations of eleven variables that are tested using the Wechsler method. Data were obtained from a homogeneous sample aged 18 to 25 years (n = 800).

Before stratification, it is advisable to rank the correlation matrix. To do this, the average values ​​of the correlation coefficients of each variable with all the others are calculated in the original matrix.

Then according to the table. 5.20 determine the acceptable levels of stratification of the correlation matrix with a given confidence probability of 0.95 and n - quantities

Table 6.20. Ascending correlation matrix

Variables 1 2 3 4 would 0 7 8 0 10 11 M(rij) Rank
1 1 0,637 0,488 0,623 0,282 0,647 0,371 0,485 0,371 0,365 0,336 0,454 1
2 1 0,810 0,557 0,291 0,508 0,173 0,486 0,371 0,273 0,273 0,363 4
3 1 0,346 0,291 0,406 0,360 0,818 0,346 0,291 0,282 0,336 7
4 1 0,273 0,572 0,318 0,442 0,310 0,318 0,291 0,414 3
5 1 0,354 0,254 0,216 0,236 0,207 0,149 0,264 11
6 1 0,365 0,405 0,336 0,345 0,282 0,430 2
7 1 0,310 0,388 0,264 0,266 0,310 9
8 1 0,897 0,363 0,388 0,363 5
9 1 0,388 0,430 0,846 6
10 1 0,336 0,310 8
11 1 0,300 10

Designations: 1 - general awareness; 2 - conceptuality; 3 - attentiveness; 4 - vdataness K of generalization; b - direct memorization (in numbers) 6 - level of mastery of the native language; 7 - speed of mastering sensorimotor skills (symbol coding) 8 - observation; 9 - combinatorial abilities (for analysis and synthesis) 10 - ability to organize parts into a meaningful whole; 11 - ability for heuristic synthesis; M (rij) - the average value of the correlation coefficients of the variable with other observation variables (in our case n = 800): r (0) - the value of the zero "Dissecting" plane - the minimum significant absolute value of the correlation coefficient (n - 120, r (0) = 0.236; n = 40, r (0) = 0.407) | Δr | - permissible stratification step (n = 40, | Δr | = 0.558) in - permissible number of stratification levels (n = 40, s = 1; n = 120, s = 2); r (1), r (2), ..., r (9) - absolute value of the cutting plane (n = 40, r (1) = 0.965).

For n = 800, we find the value of gtype and boundaries gi, after which we stratify the correlation matrix, highlighting the correlation galaxies within the layers, or separate parts of the correlation matrix, drawing associations of correlation galaxies for the overlying layers (Fig. 5.5).

A meaningful analysis of the resulting galaxies goes beyond the limits of mathematical statistics. It should be noted that there are two formal indicators that help with the meaningful interpretation of the Pleiades. One significant indicator is the degree of a vertex, that is, the number of edges adjacent to a vertex. The variable with the largest number of edges is the “core” of the galaxy and can be considered as an indicator of the remaining variables of this galaxy. Another significant indicator is communication density. A variable may have fewer connections in one galaxy, but closer, and more connections in another galaxy, but less close.

Predictions and estimates. The equation y = b1x + b0 is called the general equation of the line. It indicates that pairs of points (x, y), which

Rice. 5.5. Correlation galaxies obtained by matrix layering

lie on a certain line, connected in such a way that for any value x, the value b in paired with it can be found by multiplying x by a certain number b1 and adding secondly, the number b0 to this product.

The regression coefficient allows you to determine the degree of change in the investigative factor when the causal factor changes by one unit. Absolute values ​​characterize the relationship between variable factors by their absolute values. The regression coefficient is calculated using the formula:

Design and analysis of experiments. Design and analysis of experiments is the third important branch of statistical methods developed to find and test causal relationships between variables.

To study multifactorial dependencies, methods of mathematical experimental design have recently been increasingly used.

The ability to simultaneously vary all factors allows you to: a) reduce the number of experiments;

b) reduce experimental error to a minimum;

c) simplify the processing of received data;

d) ensure clarity and ease of comparison of results.

Each factor can acquire a certain corresponding number of different values, which are called levels and denoted -1, 0 and 1. A fixed set of factor levels determines the conditions of one of the possible experiments.

The totality of all possible combinations is calculated using the formula:

A complete factorial experiment is an experiment in which all possible combinations of factor levels are implemented. Full factorial experiments can have the property of orthogonality. With orthogonal planning, the factors in the experiment are uncorrelated; the regression coefficients that are ultimately calculated are determined independently of each other.

An important advantage of the method of mathematical experimental planning is its versatility and suitability in many areas of research.

Let's consider an example of comparing the influence of some factors on the formation of the level of mental stress in color TV controllers.

The experiment is based on an orthogonal Design 2 three (three factors change at two levels).

The experiment was carried out with a complete part 2 + 3 with three repetitions.

Orthogonal planning is based on the construction of a regression equation. For three factors it looks like this:

Processing of the results in this example includes:

a) construction of an orthogonal plan 2 +3 table for calculation;

b) calculation of regression coefficients;

c) checking their significance;

d) interpretation of the obtained data.

For the regression coefficients of the mentioned equation, it was necessary to put N = 2 3 = 8 options in order to be able to assess the significance of the coefficients, where the number of repetitions K was 3.

The matrix for planning the experiment looked like this:

The calculator below calculates the Spearman rank correlation coefficient between two random variables. The theoretical part, in order not to be distracted from the calculator, is traditionally placed under it.

add import_export mode_edit delete

Changes in random variables

arrow_upwardarrow_downward Xarrow_upwardarrow_downward Y
Page Size: 5 10 20 50 100 chevron_left chevron_right

Changes in random variables

Import data Import error

You can use one of these symbols to separate fields: Tab, ";" or "," Example: -50.5;-50.5

Import Back Cancel

The method for calculating the Spearman rank correlation coefficient is actually described very simply. This is the same Pearson correlation coefficient, only calculated not for the results of measurements of random variables themselves, but for their rank values.

That is,

All that remains is to figure out what rank values ​​are and why all this is needed.

If the elements of a variation series are arranged in ascending or descending order, then rank element will be its number in this ordered series.

For example, let us have a variation series (17,26,5,14,21). Let's sort its elements in descending order (26,21,17,14,5). 26 has rank 1, 21 has rank 2, etc. The variation series of rank values ​​will look like this (3,1,5,4,2).

That is, when calculating the Spearman coefficient, the original variation series are transformed into variation series of rank values, after which the Pearson formula is applied to them.

There is one subtlety - the rank of repeated values ​​is taken as the average of the ranks. That is, for the series (17, 15, 14, 15) the series of rank values ​​will look like (1, 2.5, 4, 2.5), since the first element equal to 15 has rank 2, and the second one has rank 3, and .

If there are no repeating values, that is, all values ​​of the rank series are numbers from the range from 1 to n, the Pearson formula can be simplified to

Well, by the way, this formula is most often given as a formula for calculating the Spearman coefficient.

What is the essence of the transition from the values ​​themselves to their rank values?
The point is that by studying the correlation of rank values, you can determine how well the dependence of two variables is described by a monotonic function.

The sign of the coefficient indicates the direction of the relationship between the variables. If the sign is positive, then Y values ​​tend to increase as X values ​​increase; if the sign is negative, then the Y values ​​tend to decrease as the X values ​​increase. If the coefficient is 0, then there is no trend. If the coefficient is 1 or -1, then the relationship between X and Y has the form of a monotonic function - that is, as X increases, Y also increases, or vice versa, as X increases, Y decreases.

That is, unlike the Pearson correlation coefficient, which can only reveal a linear dependence of one variable on another, the Spearman correlation coefficient can reveal a monotonic dependence where a direct linear relationship is not detected.

Let me explain with an example. Let's assume that we are examining the function y=10/x.
We have the following X and Y measurements
{{1,10}, {5,2}, {10,1}, {20,0.5}, {100,0.1}}
For these data, the Pearson correlation coefficient is -0.4686, that is, the relationship is weak or absent. But the Spearman correlation coefficient is strictly equal to -1, which seems to hint to the researcher that Y has a strict negative monotonic dependence on X.

Views