The standard significance levels are levels. Level of statistical significance

Significance level - this is the probability that we considered the differences to be significant, but they are actually random.

When we indicate that the differences are significant at the 5% significance level, or when R< 0,05 , then we mean that the probability that they are unreliable is 0.05.

When we indicate that the differences are significant at the 1% significance level, or when R< 0,01 , then we mean that the probability that they are unreliable is 0.01.

If we translate all this into more formalized language, then the significance level is the probability of rejecting the null hypothesis, while it is true.

Error,consisting ofthe onewhat werejectednull hypothesiswhile it is correct, it is called a type 1 error.(See Table 1)

Table 1. Null and alternative hypotheses and possible test conditions.

The probability of such an error is usually denoted as α. In essence, we would have to indicate in parentheses not p < 0.05 or p < 0.01, and α < 0.05 or α < 0,01.

If the probability of error is α , then the probability of a correct decision: 1-α. The smaller α, the greater the probability of a correct decision.

Historically, it has been accepted in psychology that lowest level statistical significance is the 5% level (p≤0.05): sufficient is the 1% level (p≤0.01) and the highest is the 0.1% level (p≤0.001), therefore in the tables of critical values ​​it is usually the values ​​of the criteria corresponding to the levels of statistical significance p≤0.05 and p≤0.01 are given, sometimes - p≤0.001. For some criteria, the tables indicate the exact significance level of their different empirical values. For example, for φ*=1.56 p=O.06.

Until, however, the level of statistical significance reaches p=0.05, we still have no right to reject null hypothesis.

We will adhere to the following rule for rejecting the hypothesis of no differences (Ho) and accepting the hypothesis of statistical significance of differences (H 1).

Rule for rejecting Ho and accepting h1

If the empirical value of the criterion is equal to the critical value corresponding to p≤0.01 or exceeds it, then H 0 is rejected and H 1 is accepted.

Exceptions : G sign test, Wilcoxon T test and Mann-Whitney U test. Inverse relationships are established for them.

Rice. 4. Example of a “significance axis” for Rosenbaum’s Q criterion.

The critical values ​​of the criterion are designated as Q o, o5 and Q 0.01, the empirical value of the criterion as Q em. It is enclosed in an ellipse.

To the right of the critical value Q 0.01 extends the “zone of significance” - this includes empirical values ​​exceeding Q 0.01 and, therefore, certainly significant.

To the left of the critical value Q 0.05, the “zone of insignificance” extends - this includes empirical Q values ​​that are below Q 0.05 and, therefore, are certainly insignificant.

We see that Q 0,05 =6; Q 0,01 =9; Q em. =8;

The empirical value of the criterion falls in the region between Q 0.05 and Q 0.01. This is a zone of “uncertainty”: we can already reject the hypothesis about the unreliability of differences (H 0), but we cannot yet accept the hypothesis about their reliability (H 1).

In practice, however, the researcher can consider as reliable those differences that do not fall into the zone of insignificance, declaring that they are reliable at p < 0.05, or by indicating the exact level of significance of the obtained empirical criterion value, for example: p=0.02. Using standard tables, which are in all textbooks on mathematical methods, this can be done in relation to the Kruskal-Wallis H criteria, χ 2 r Friedman, Page's L, Fisher's φ* .

The level of statistical significance, or critical test values, is determined differently when testing directional and non-directional statistical hypotheses.

With a directional statistical hypothesis, a one-tailed test is used, with a non-directional hypothesis, a two-tailed test is used. The two-tailed test is more stringent because it tests differences in both directions, and therefore the empirical value of the test that previously corresponded to the significance level p < 0.05, now corresponds only to the p level < 0,10.

We won't have to decide for ourselves every time whether he uses a one-sided or two-sided criterion. The tables of critical values ​​of the criteria are selected in such a way that directional hypotheses correspond to a one-sided criterion, and non-directional hypotheses correspond to a two-sided criterion, and the given values ​​satisfy the requirements that apply to each of them. The researcher only needs to ensure that his hypotheses coincide in meaning and form with the hypotheses proposed in the description of each of the criteria.

P-value(English) - a quantity used in testing statistical hypotheses. In fact, this is the probability of error when rejecting the null hypothesis (type I error). Testing hypotheses using the P-value is an alternative to the classical procedure of testing through the critical value of the distribution.

Typically, the P-value is equal to the probability that a random variable with a given distribution (the distribution of the test statistic under the null hypothesis) will take a value no less than the actual value of the test statistic. Wikipedia.

In other words, the p-value is the smallest significance level (i.e., the probability of rejecting a valid hypothesis) for which the calculated test statistic leads to rejection of the null hypothesis. Typically, the p-value is compared to the generally accepted standard significance levels of 0.005 or 0.01.

For example, if the test statistic calculated from the sample corresponds to p = 0.005, this indicates a 0.5% probability of the hypothesis being true. Thus, the lower the p-value, the better, because it increases the “strength” of rejecting the null hypothesis and increases the expected significance of the result.

There is an interesting explanation for this on Habré.

Statistical analysis begins to resemble a black box: the input is data, the output is a table of main results and a p-value.

What does p-value say?

Suppose we decided to find out whether there is a relationship between addiction to bloody computer games and aggressiveness in real life. For this purpose, two groups of schoolchildren of 100 people each were randomly formed (group 1 - fans of shooting games, group 2 - those who do not play computer games). An indicator of aggressiveness is, for example, the number of fights with peers. In our imaginary study, it turned out that a group of schoolchildren who are gambling addicts actually conflicts with their friends noticeably more often. But how do we find out how statistically significant the differences are? Maybe we got the observed difference completely by chance? To answer these questions, the p-value of significance level (p-value) is used - this is the probability of obtaining such or more pronounced differences, provided that there are actually no differences in the general population. In other words, this is the probability of getting the same or even stronger differences between our groups, provided that, in fact, computer games have no effect on aggressiveness. Doesn't sound that difficult. However, this particular statistic is very often misinterpreted.

Examples about p-value

So, we compared two groups of schoolchildren with each other in terms of their level of aggressiveness using a standard t-test (or the non-parametric Chi-square test, which is more appropriate in this situation) and found that the coveted p-level of significance is less than 0.05 (for example, 0.04). But what does the resulting p-value actually tell us? So, if p-value is the probability of obtaining such or more pronounced differences, provided that there are actually no differences in the population, then what do you think is the correct statement:

1. Computer games are the reason aggressive behavior with a probability of 96%.
2. The probability that aggression and computer games are not related is 0.04.
3. If we received a p-level of significance greater than 0.05, this would mean that aggressiveness and computer games are in no way related to each other.
4. The probability of obtaining such differences by chance is 0.04.
5. All statements are incorrect.

If you chose the fifth option, then you are absolutely right! But, as numerous studies show, even people with significant experience in data analysis often incorrectly interpret the p-value.

Let's look at all the answers in order:

The first statement is an example of the correlation fallacy: the fact that two variables are significantly correlated tells us nothing about cause and effect. Perhaps it is more aggressive people who prefer to spend time playing computer games, and it is not computer games that make people more aggressive.

This is a more interesting statement. The thing is that we initially take it for granted that there really are no differences. And, keeping this in mind as a fact, we calculate the p-value. Therefore, the correct interpretation is: “If we assume that aggression and computer games are in no way related, then the probability of obtaining such or even more pronounced differences was 0.04.”

But what if we get insignificant differences? Does this mean that there is no relationship between the variables under study? No, this only means that there may be differences, but our results did not allow us to detect them.

This is directly related to the definition of p-value itself. 0.04 is the probability of getting these or even more extreme differences. It is in principle impossible to estimate the probability of obtaining exactly the same differences as in our experiment!

These are the pitfalls that can be hidden in the interpretation of such an indicator as p-value. Therefore, it is very important to understand the mechanisms underlying the methods of analysis and calculation of basic statistical indicators.

How to find p-value?

1. Determine the expected results of your experiment

Typically, when scientists conduct an experiment, they already have an idea of ​​what results are considered “normal” or “typical.” This may be based on experimental results from past experiments, on reliable data sets, on data from the scientific literature, or the scientist may rely on some other sources. For your experiment, determine the expected results and express them as numbers.

Example: For example, earlier studies showed that in your country, red cars are more likely to receive speeding tickets than blue cars. For example, average results show a 2:1 preference for red cars over blue cars. We want to determine whether the police are similarly biased towards the color of cars in your city. To do this, we will analyze fines issued for speeding. If we take a random set of 150 speeding tickets given to either red or blue cars, we would expect 100 tickets to be issued to red cars and 50 to blue ones if the police in our city are as biased towards the color of cars as this observed throughout the country.

2. Determine the observable results of your experiment.

Now that you have determined the expected results, you need to conduct an experiment and find the actual (or "observed") values. Again, you need to represent these results as numbers. If we create experimental conditions, and the observed results differ from the expected ones, then we have two possibilities - either it happened by chance, or it was caused by our experiment. The purpose of finding a p-value is to determine whether the observed results differ from the expected results so much that the “null hypothesis”—the hypothesis that there is no relationship between the experimental variables and the observed results—can be rejected.

Example: For example, in our city, we randomly selected 150 speeding tickets that were issued to either red or blue cars. We determined that 90 fines were issued to red cars, and 60 to blue ones. This is different from the expected results, which are 100 and 50, respectively. Is our experiment really (in in this case, changing the data source from national to city) led to this change in results, or are our city police biased exactly the same as the national average, and we are just seeing random variation? The P-value will help us determine this.

3. Determine the number of degrees of freedom of your experiment

The number of degrees of freedom is the degree of variability in your experiment, which is determined by the number of categories you examine. The equation for the number of degrees of freedom is Number of degrees of freedom = n-1, where “n” is the number of categories or variables that you are analyzing in your experiment.

Example: In our experiment there are two categories of results: one category for red cars, and one for blue cars. Therefore, in our experiment we have 2-1 = 1 degree of freedom. If we were comparing red, blue and green cars, we would have 2 degrees of freedom, and so on.

4. Compare expected and observed results using the chi-square test

Chi-square (spelled "x2") is a numerical value that measures the difference between the expected and observed values ​​of an experiment. The equation for chi-square is x2 = Σ((o-e)2/e), where “o” is the observed value and “e” is the expected value. Sum up the results of this equation for all possible outcomes (see below).

Note that this equation includes the summation operator Σ (sigma). In other words, you need to calculate ((|o-e|-.05)2/e) for each possible outcome, and add the resulting numbers to get the chi-square test value. In our example, we have two possible outcomes – either the car that received the ticket is red or it is blue. Therefore, we must calculate ((o-e)2/e) twice - once for the red cars, and once for the blue cars.

Example: Let's plug our expected and observed values ​​into the equation x2 = Σ((o-e)2/e). Remember that because of the sum operator, we need to calculate ((o-e)2/e) twice - once for the red cars, and once for the blue cars. We will do this job as follows:
x2 = ((90-100)2/100) + (60-50)2/50)
x2 = ((-10)2/100) + (10)2/50)
x2 = (100/100) + (100/50) = 1 + 2 = 3.

5. Select the significance level

Now that we know the number of degrees of freedom of our experiment, and we know the value of the chi-square test, we need to do one more thing before we find our p-value. We need to determine the significance level. Speaking in simple language, the significance level indicates how confident we are in our results. A low value for significance corresponds to a low probability that the experimental results occurred by chance, and vice versa. Significance levels are written as decimals (such as 0.01), which correspond to the probability that the experimental results were obtained by chance (in this case, the probability of this is 1%).

By convention, scientists usually set the significance level of their experiments to 0.05, or 5%. This means that experimental results that meet this significance criterion have only a 5% chance of occurring purely by chance. In other words, there is a 95% chance that the results were caused by the way the scientist manipulated the experimental variables and not by chance. For most experiments, 95% confidence in the presence of a relationship between two variables is enough to consider that they are “really” related to each other.

Example: For our example of red and blue cars, let's follow the consensus among scientists and set the significance level to 0.05.

6. Use the chi-square distribution data table to find your p-value.

Scientists and statisticians use large tables to calculate the p-value of their experiments. These tables typically have a vertical axis on the left, corresponding to the number of degrees of freedom, and a horizontal axis on the top, corresponding to the p-value. Use the table data to first find the number of your degrees of freedom, then look at your series from left to right until you find the first value greater than your chi-square value. Look at the corresponding p-value at the top of your column. Your p-value is between this number and the next one (the one to the left of yours).

Tables with the chi-square distribution can be obtained from many sources (you can find one of them at this link).

Example: Our chi-square test value was 3. Since we know that in our experiment there is only 1 degree of freedom, we will select the very first row. We go from left to right along this line until we encounter a value greater than 3, our chi-square test value. The first one we find is 3.84. Looking at the top of our column, we see that the corresponding p-value is 0.05. This means our p-value is between 0.05 and 0.1 ( next p-value in the table in ascending order).

7. Decide whether to reject or retain your null hypothesis

Since you have determined the approximate p-value for your experiment, you need to decide whether to reject the null hypothesis of your experiment (remember, this is the hypothesis that the experimental variables you manipulated did not affect the results you observed). If your p-value is less than your significance level, congratulations, you have proven that there is a very likely relationship between the variables you manipulated and the results you observed. If your p-value is higher than your significance level, you cannot say with certainty whether the results you observed were due to pure chance or manipulation of your variables.

Example: Our p-value is between 0.05 and 0.1. This is clearly no less than 0.05, so unfortunately we cannot reject our null hypothesis. This means that we have not reached the minimum 95% probability of saying that the police in our city are issuing tickets to red and blue cars at a probability that is quite different from the national average.

In other words, there is a 5-10% chance that the results we observed are not the effects of a change in location (analysis of a city, not the entire country), but simply due to chance. Since we required an accuracy of less than 5%, we cannot say that we are confident that the police in our city are less biased against red cars - there is a small (but statistically significant) chance that they are not.

Significance level - this is the probability that we considered the differences to be significant, but they are actually random.

When we indicate that the differences are significant at the 5% significance level, or when R< 0,05 , then we mean that the probability that they are unreliable is 0.05.

When we indicate that the differences are significant at the 1% significance level, or when R< 0,01 , then we mean that the probability that they are unreliable is 0.01.

If we translate all this into more formalized language, then the significance level is the probability of rejecting the null hypothesis, while it is true.

Error,consisting ofthe onewhat werejectednull hypothesiswhile it is correct, it is called a type 1 error.(See Table 1)

Table 1. Null and alternative hypotheses and possible test conditions.

The probability of such an error is usually denoted as α. In essence, we would have to indicate in parentheses not p < 0.05 or p < 0.01, and α < 0.05 or α < 0,01.

If the probability of error is α , then the probability of a correct decision: 1-α. The smaller α, the greater the probability of a correct decision.

Historically, in psychology it is generally accepted that the lowest level of statistical significance is the 5% level (p≤0.05): sufficient is the 1% level (p≤0.01) and the highest is the 0.1% level ( p≤0.001), therefore, the tables of critical values ​​usually contain the values ​​of criteria corresponding to the levels of statistical significance p≤0.05 and p≤0.01, sometimes - p≤0.001. For some criteria, the tables indicate the exact significance level of their different empirical values. For example, for φ*=1.56 p=O.06.

However, until the level of statistical significance reaches p=0.05, we still have no right to reject the null hypothesis.

We will adhere to the following rule for rejecting the hypothesis of no differences (Ho) and accepting the hypothesis of statistical significance of differences (H 1).

Rule for rejecting Ho and accepting h1

If the empirical value of the criterion is equal to the critical value corresponding to p≤0.01 or exceeds it, then H 0 is rejected and H 1 is accepted.

Exceptions : G sign test, Wilcoxon T test and Mann-Whitney U test. Inverse relationships are established for them.

Rice. 4. Example of a “significance axis” for Rosenbaum’s Q criterion.

The critical values ​​of the criterion are designated as Q o, o5 and Q 0.01, the empirical value of the criterion as Q em. It is enclosed in an ellipse.

To the right of the critical value Q 0.01 extends the “zone of significance” - this includes empirical values ​​exceeding Q 0.01 and, therefore, certainly significant.

To the left of the critical value Q 0.05, the “zone of insignificance” extends - this includes empirical Q values ​​that are below Q 0.05 and, therefore, are certainly insignificant.

We see that Q 0,05 =6; Q 0,01 =9; Q em. =8;

The empirical value of the criterion falls in the region between Q 0.05 and Q 0.01. This is a zone of “uncertainty”: we can already reject the hypothesis about the unreliability of differences (H 0), but we cannot yet accept the hypothesis about their reliability (H 1).

In practice, however, the researcher can consider as reliable those differences that do not fall into the zone of insignificance, declaring that they are reliable at p < 0.05, or by indicating the exact level of significance of the obtained empirical criterion value, for example: p=0.02. Using standard tables, which are in all textbooks on mathematical methods, this can be done in relation to the Kruskal-Wallis H criteria, χ 2 r Friedman, Page's L, Fisher's φ* .

The level of statistical significance, or critical test values, is determined differently when testing directional and non-directional statistical hypotheses.

With a directional statistical hypothesis, a one-tailed test is used, with a non-directional hypothesis, a two-tailed test is used. The two-tailed test is more stringent because it tests differences in both directions, and therefore the empirical value of the test that previously corresponded to the significance level p < 0.05, now corresponds only to the p level < 0,10.

We won't have to decide for ourselves every time whether he uses a one-sided or two-sided criterion. The tables of critical values ​​of the criteria are selected in such a way that directional hypotheses correspond to a one-sided criterion, and non-directional hypotheses correspond to a two-sided criterion, and the given values ​​satisfy the requirements that apply to each of them. The researcher only needs to ensure that his hypotheses coincide in meaning and form with the hypotheses proposed in the description of each of the criteria.

We will adhere to the following rule for rejecting the hypothesis of no differences (Ho) and accepting the hypothesis of statistical significance of differences (H 1).

Sample distribution parameters determined from a series of measurements are random variables, therefore, their deviations from the general parameters will also be random. The assessment of these deviations is probabilistic in nature - in statistical analysis one can only indicate the probability of a particular error. Let for the general parameter A Let for the general parameter unbiased estimate obtained from experience = *. Let us assign a sufficiently large probability b (such that an event with probability b can be considered practically certain) and find such a value e b f

(b), for which Let for the general parameter Range of practically possible values ​​of the error that occurs during replacement Let for the general parameter on

*, will be ±e b. Errors that are large in absolute value will appear only with a low probability called level of significance Let for the general parameter. Otherwise, expression (4.1) can be interpreted as the probability that the true value of the parameter

. (4.3)

lies within The probability b is called confidence probability and characterizes the reliability of the resulting estimate. Interval I b = a * ± e b is called confidence interval b =¢ = b =. Interval boundaries b =¢¢ = b =* - e b and * + e b are called. The confidence interval at a given confidence level determines the accuracy of the estimate. The value of the confidence interval depends on the confidence probability with which the parameter is guaranteed to be found Let for the general parameter inside the confidence interval: the larger the b value, the larger the interval and characterizes the reliability of the resulting estimate. Interval b (and e b value). An increase in the number of experiments is manifested in a reduction in the confidence interval with a constant confidence probability or in an increase in the confidence probability while maintaining the confidence interval.

In practice, the confidence probability value is usually fixed (0.9, 0.95 or 0.99) and then the confidence interval of the result is determined and characterizes the reliability of the resulting estimate. Interval b. When constructing a confidence interval, the problem of absolute deviation is solved:

Thus, if the distribution law of the estimate was known Let for the general parameter*, the problem of determining the confidence interval would be solved simply. Let's consider constructing a confidence interval for the mathematical expectation of a normally distributed random variable X with a known general standard s for a sample size n. Best estimate for mathematical expectation m is the sample mean with the standard deviation of the mean

.

Using the Laplace function, we get

. (4.5)

Having given the confidence probability b, we determine from the table of the Laplace function (Appendix 1) the value . Then the confidence interval for the mathematical expectation takes the form

. (4.7)

From (4.7) it is clear that the decrease in the confidence interval is inversely proportional to the square root of the number of experiments.

Knowing the general variance allows one to estimate the mathematical expectation even from one observation. If for a normally distributed random variable X as a result of the experiment, the value was obtained X 1, then the confidence interval for the mathematical expectation for the chosen b has the form

Where U 1-p/2 - quantile of the standard normal distribution (Appendix 2).

Law of valuation distribution Let for the general parameter* depends on the law of distribution of the value X and, in particular, from the parameter itself Let for the general parameter. To get around this difficulty, two methods are used in mathematical statistics:

1) close - at n³ 50 replace unknown parameters in the expression for e b with their estimates, for example:

2) from a random variable Let for the general parameter* go to another random variable Q *, the distribution law of which does not depend on the estimated parameter Let for the general parameter, but depends only on the sample size n and on the type of distribution law of the quantity X. These types of quantities have been studied in most detail for the normal distribution of random variables. Symmetric quantiles are usually used as confidence limits Q¢ and Q¢¢

, (4.9)

or taking into account (4.2)

. (4.10)

4.2. Testing statistical hypotheses, significance criteria,

errors of the first and second types.

Under statistical hypotheses some assumptions regarding the population distributions of a particular random variable are understood. Hypothesis testing means a comparison of certain statistical indicators, verification criteria (significance criteria), calculated from the sample, with their values ​​determined under the assumption that the given hypothesis is true. In hypothesis testing, a hypothesis is usually tested. N 0 versus alternative hypothesis N 1 .

To decide whether a hypothesis is accepted or rejected, the significance level is set R. The most commonly used significance levels are 0.10, 0.05 and 0.01. Based on this probability, using the hypothesis about the distribution of the Q * estimate (significance criterion), quantile confidence limits are found, usually symmetrical to Q p/2 and Q 1- p/2. Q numbers p/2 and Q 1- p/2 are called critical values ​​of the hypothesis; Q values ​​*< Qp/2 and Q * > Q 1- p/2 form critical


the area of ​​the hypothesis (or the area of ​​non-acceptance of the hypothesis) (Fig. 12).

Rice. 12. Critical region Rice. 13. Checking statistical

hypotheses. hypotheses.

If Q 0 found from the sample falls between Q p/2 and Q 1- p/2, then the hypothesis allows such a value as random and therefore there is no reason to reject it. If the value of Q 0 falls into the critical region, then according to this hypothesis it is practically impossible. But since it appeared, the hypothesis itself is rejected.

When testing hypotheses, two types of errors can be made. Error of the first kind is that a hypothesis that is actually true is rejected. The probability of such an error is no greater than the accepted significance level. Error of the second type is that the hypothesis is accepted, but in fact it is incorrect. The higher the significance level, the lower the probability of this error, since this increases the number of rejected hypotheses. If the probability of a second type error is a, then the value (1 - a) is called criterion power.

In Fig. Figure 13 shows two distribution density curves of the random variable Q, corresponding to two hypotheses N 0 and N 1 . If from experiment the value Q > Q is obtained p, then the hypothesis is rejected N 0 and the hypothesis is accepted N 1 , and vice versa, if Q< Qp.

Area under the probability density curve corresponding to the validity of the hypothesis N 0 to the right of the Q value p, equal to the significance level R, i.e. the probability of a type I error. Area under the probability density curve corresponding to the validity of the hypothesis N 1 to the left of Q p, is equal to the probability of a second type error a, and to the right of Q p- power of criterion (1 - a). Thus, the more R, the more (1 - a). When testing a hypothesis, one strives to select from all possible criteria the one that, at a given level of significance, has a lower probability of a type II error..

Typically, the optimal level of significance when testing hypotheses is used p= 0.05, since if the hypothesis being tested is accepted with a given level of significance, then the hypothesis should certainly be considered consistent with the experimental data; on the other hand, the use of this significance level does not provide grounds for rejecting the hypothesis.

For example, two values ​​of some sample parameter are found, which can be considered as estimates of the general parameters Let for the general parameter 1 and Let for the general parameter 2. It is hypothesized that the difference between and is random and that the general parameters Let for the general parameter 1 and Let for the general parameter 2 are equal to each other, i.e. Let for the general parameter 1 = Let for the general parameter 2. This hypothesis is called null, or null hypothesis. To test it, you need to find out whether the discrepancy between and under the conditions of the null hypothesis is significant. To do this, they usually examine the random variable D = – and check whether its difference from zero is significant. Sometimes it is more convenient to consider the value / by comparing it with unity.

By rejecting the null hypothesis, we thereby accept the alternative, which breaks down into two: > and< . Если одно из этих равенств заведомо невозможно, то альтернативная гипотеза называется unilateral, and to check it they use unilateral significance criteria (unlike the usual, bilateral). In this case, it is necessary to consider only one of the halves of the critical region (Fig. 12).

For example, R= 0.05 with a two-sided criterion correspond to the critical values ​​Q 0.025 and Q 0.975, i.e. Q * that take the values ​​Q * are considered significant (non-random)< Q 0.025 и Q * >Q 0.975. With a one-sided criterion, one of these inequalities is obviously impossible (for example, Q *< Q 0.025) и значимыми будут лишь Q * >Q 0.975. The probability of the latter inequality is 0.025, and therefore the significance level will be 0.025. Thus, if the same critical numbers are used for a one-sided significance test as for a two-sided test, these values ​​will correspond to half the significance level.

Typically, for a one-sided test, the same level of significance is taken as for a two-sided test, since under these conditions both tests provide the same type I error. To do this, a one-sided criterion must be derived from a two-sided one, corresponding to twice the level of significance than the one accepted. To maintain a significance level for a one-sided test R= 0.05, for double-sided you need to take R= 0.10, which gives the critical values ​​Q 0.05 and Q 0.95. Of these, for a one-sided criterion, one will remain, for example, Q 0.95. The significance level for a one-sided test is equal to 0.05. The same level of significance for a two-sided test corresponds to the critical value Q 0.975. But Q 0.95< Q 0.975 , значит, при одностороннем критерии larger number hypotheses will be rejected and, therefore, there will be less error of the second type.

The significance level in statistics is important indicator, reflecting the degree of confidence in the accuracy and truth of the received (predicted) data. The concept is widely used in various fields: from conducting sociological research to statistical testing of scientific hypotheses.

Definition

The level of statistical significance (or statistically significant result) shows how likely it is that the indicators being studied occur by chance. The overall statistical significance of a phenomenon is expressed by the p-value coefficient (p-level). In any experiment or observation, there is a possibility that the data obtained were due to sampling errors. This is especially true for sociology.

That is, a statistically significant value is one whose probability of random occurrence is extremely small or tends to the extreme. The extreme in this context is the degree to which statistics deviate from the null hypothesis (a hypothesis that is tested for consistency with the obtained sample data). In scientific practice, the significance level is selected before data collection and, as a rule, its coefficient is 0.05 (5%). For systems where precise values ​​are extremely important, this figure may be 0.01 (1%) or less.

Background

The concept of significance level was introduced by the British statistician and geneticist Ronald Fisher in 1925, when he was developing a technique for testing statistical hypotheses. When analyzing any process, there is a certain probability of certain phenomena. Difficulties arise when working with small (or not obvious) percentages of probabilities that fall under the concept of “measurement error.”

When working with statistical data that is not specific enough to test them, scientists are faced with the problem of the null hypothesis, which “prevents” operating with small quantities. Fisher proposed for such systems to determine the probability of events at 5% (0.05) as a convenient sampling cut, allowing one to reject the null hypothesis in calculations.

Introduction of fixed odds

In 1933 scientists Jerzy Neyman and Egon Pearson in their works recommended setting a certain level of significance in advance (before data collection). Examples of the use of these rules are clearly visible during elections. Let's say there are two candidates, one of whom is very popular and the other is little known. It is obvious that the first candidate will win the election, and the chances of the second tend to zero. They strive - but are not equal: there is always the possibility of force majeure, sensational information, unexpected decisions, which could change the predicted election results.

Neyman and Pearson agreed that Fisher's significance level of 0.05 (denoted by α) was most appropriate. However, Fischer himself in 1956 opposed fixing this value. He believed that the level of α should be set according to specific circumstances. For example, in particle physics it is 0.01.

p-level value

The term p-value was first used by Brownlee in 1960. The P-level (p-value) is an indicator that is inversely related to the truth of the results. The highest p-value coefficient corresponds to the lowest level of confidence in the sampled relationship between variables.

This value reflects the likelihood of errors associated with the interpretation of the results. Let's assume p-level = 0.05 (1/20). It shows a five percent probability that the relationship between variables found in the sample is just a random feature of the sample. That is, if this dependence is absent, then with repeated similar experiments, on average, in every twentieth study, one can expect the same or greater dependence between the variables. The p-level is often seen as a "margin" for the error rate.

By the way, p-value may not reflect the real relationship between variables, but only shows a certain average value within the assumptions. In particular, the final analysis of the data will also depend on the selected values ​​of this coefficient. At p-level = 0.05 there will be some results, and at a coefficient equal to 0.01 there will be different results.

Testing statistical hypotheses

The level of statistical significance is especially important when testing hypotheses. For example, when calculating a two-sided test, the rejection region is divided equally at both ends of the sampling distribution (relative to the zero coordinate) and the truth of the resulting data is calculated.

Suppose, when monitoring a certain process (phenomenon), it turns out that new statistical information indicates small changes relative to previous values. At the same time, the discrepancies in the results are small, not obvious, but important for the study. The specialist is faced with a dilemma: are changes really occurring or are these sampling errors (measurement inaccuracy)?

In this case, they use or reject the null hypothesis (attribute everything to an error, or recognize the change in the system as a fait accompli). The problem solving process is based on the ratio of overall statistical significance (p-value) and significance level (α). If p-level< α, значит, нулевую гипотезу отвергают. Чем меньше р-value, тем более значимой является тестовая статистика.

Values ​​used

The level of significance depends on the material being analyzed. In practice, the following fixed values ​​are used:

  • α = 0.1 (or 10%);
  • α = 0.05 (or 5%);
  • α = 0.01 (or 1%);
  • α = 0.001 (or 0.1%).

The more accurate the calculations are required, the lower the α coefficient is used. Naturally, statistical forecasts in physics, chemistry, pharmaceuticals, and genetics require greater accuracy than in political science and sociology.

Significance thresholds in specific areas

In high precision fields such as particle physics and production activity, statistical significance is often expressed as the ratio of the standard deviation (denoted by the sigma coefficient – ​​σ) relative to a normal probability distribution (Gaussian distribution). σ is a statistical indicator that determines the dispersion of the values ​​of a certain quantity relative to mathematical expectations. Used to plot the probability of events.

Depending on the field of knowledge, the coefficient σ varies greatly. For example, when predicting the existence of the Higgs boson, the parameter σ is equal to five (σ = 5), which corresponds to p-value = 1/3.5 million. In genome studies, the significance level can be 5 × 10 -8, which is not uncommon for this areas.

Efficiency

It is necessary to take into account that coefficients α and p-value are not exact specifications. Whatever the level of significance in the statistics of the phenomenon under study, it is not an unconditional basis for accepting the hypothesis. For example, than less valueα, the greater the chance that the hypothesis being established is significant. However, there is a risk of error, which reduces the statistical power (significance) of the study.

Researchers who focus solely on statistically significant results may reach erroneous conclusions. At the same time, it is difficult to double-check their work, since they apply assumptions (which in fact are the α and p-values). Therefore, it is always recommended, along with calculating statistical significance, to determine another indicator - the magnitude of the statistical effect. Effect size is a quantitative measure of the strength of an effect.