2. Statistical Hypothesis Experiment

 


1. Steps of Statistical Hypothesis Testing

    It is assumed that the population has certain statistical properties (such as having a certain parameter, or following a certain distribution, etc.), and then testing whether the hypothesis is credible. This method is called statistical hypothesis testing (or hypothesis testing). The steps are as follows:

For example , if the average strength of a certain product is known in kilograms, the production method is changed, and parts are randomly selected to calculate kilograms and kilograms. Q Does the change in the production method have a significant effect on the strength? 

Statistical Hypothesis Testing Steps

          Process analysis     

( 1 ) Suppose H 0

 

 

( 2 ) Select statistics and clarify their distribution

 

 

( 3 ) gives the significance level

 

 

( 4 ) Find out the confidence limit

 

 

( 5 ) Calculate the statistic u

 

( 6 ) Statistical inference

    At that time , accept H 0

    At that time , negating H 0

H 0 :

   ( is the overall mean after the production method has been changed)

 

 

Depend on

Check the normal distribution table to get K 0.025 =1.96

 

  due to

        

Therefore, it is believed that H 0 , with a significance level of 5 % , the change in the manufacturing method is considered to have no significant effect on the strength of the product.

2. Statistical hypothesis test table for normal population parameters

   For large samples, no matter the population follows the even distribution, according to the central limit theorem, it can be considered that the sample mean asymptotically follows the normal distribution. Therefore, statistical hypothesis testing of population parameters was performed using the " u- test method " described below.

  In the table is the given significance level, which is the sample mean, and s is the sample standard deviation.

name

Condition and inspection purpose

Assumption  

Statistics and their distribution

Negative domain

Determination of confidence limits

 

 

 

 

 

check

 

test

 

Law

Given the population variance , test whether the mean of the population is equal to (or less than or greater than) a known constant

 

 

 

 

 

 

 

Two population variances are known to be equal

Compare the two population means and

 

 

 

 

 

 

Condition and inspection purpose

Assumption

Statistics and their distribution

Negative domain

       Determination of confidence limits

Two population variances are known

   

Compare two population means

and

 

 

 

 

 

 

 

 

check

 

test

 

Law

Population variance unknown, test whether the population mean is equal to (or less than or greater than) a known constant

 

 

 

 

 

 

Two populations are known to have the same variance (but the value is unknown), compare the sum of the means of the two populations .

 

 

 

 

 

 

 

 

 

check

 

test

 

Law

 

Given the population mean , test whether the population variance is equal to (or less than or greater than) a known constant .

 

 

 

  or

 

or

  a

 

The population mean is unknown, test whether the population variance is equal to (or less than or greater than) a known constant .

 

 

 

 

 

 

 

or

 

or

  

  

F

 

check

 

test

 

Law

 

 

 

The mean and variance of the two populations are unknown, and the variances of the two populations are compared

 

 

 

 

 

 

 

 

 

 

3. Statistical hypothesis testing of the overall distribution function

   Let be a known type of distribution function, be a parameter (known or partially known), be a sample of the population, and be a hypothesized distribution function , and perform statistical hypothesis testing in two cases.

All parameters of 1 ° are known to divide the real axis into m disjoint intervals: 

which is understood to be . Let the theoretical frequency be

               

The number of samples that fall in the interval is (empirical frequency), then the statistic

Following a distribution with m degrees of freedom, the hypothesis can be tested by applying the test method

                      H 0 :  F ( x ) =F 0 ( x )

Is it credible.

       All or part of the parameters of 2 ° F 0 ( x ) are unknown. If there are l parameters unknown, the maximum likelihood method (this section, 1, 3 ) can be used to determine the estimates of these l parameters. As the corresponding parameter, then the theoretical frequency can be calculated in the case of 1 ° , and then the empirical frequency can be calculated, then the statistic  

                    

When n is large, it follows a distribution with degrees of freedom. Hypotheses can be tested by applying tests

                     H 0 : F ( x ) =F 0 ( x ) 

Is it credible.

4. Statistical hypothesis test for whether two samples are from the same distribution population

   [ Symbol test method ]   This method is simple and intuitive, and does not require an understanding of the distribution law of the test quantity. It is often used to test whether the degree of fluctuation is the same and whether there is an obvious change in the production status.

    The symbols " + " , " - " and " 0 " are used to indicate that the data of A is larger, smaller and equal than that of B respectively, and , and are used to indicate the number of occurrences of " + " , " - " and " 0 ". Statistical hypothesis testing step use case description is as follows:

Example A and B analyze the content of a certain component in the same substance and obtain the following table 

 

 First

 Second

symbol

14.7   15.0   15.2   14.8   15.5   14.6   14.9   14.8   15.1   15.0

14.6   15.1   15.4   14.7   15.2   14.7   14.8   14.6   15.2   15.0

  +     -     -      +      +      -     +      +     -     0

 First

 Second

symbol

14.7   14.8   14.7   15.0   14.9   14.9   15.2   14.7   15.4   15.3

14.6   14.6   14.8   15.3   14.7   14.6   14.8   14.9   15.2   15.0

  +     + - - + + + - + +                                   

 

  Are there any significant differences in the results of the two analyses ?

Statistical Hypothesis Testing Steps

            Process analysis

(1) Suppose H 0

(2) Statistics

(3) Give the significance level

(4) Find out the confidence limit

 

 

(5) Calculate statistics

(6) Statistical inference

   At that time , accept H 0

   At that time , negating H 0

Assume that the two analysis results have the same distribution function

    r= min { n +  , n - }

    a = 10%

Check the symbol inspection table ( see next page ), by

N=n + + = 12+7=19,

a = 10% , the negative domain is .

   

because r= 7>5 =r 10%  

Therefore, accepting H 0 means that there is no significant difference in the analysis results of A and B with 10% reliability .

 

 

                      Symbol Checklist _       

  

N

1   5   10    25

     ( % )

  

N

1    5   10    25

      ( % )

  

N

1    5   10    25

      ( % )

 

 1

 2

 3

 4

 5

 

 6

 7

 8

 9

 10

 

 11

 12

 13

 14

 15

 

 16

 17

 18

 19

 20

 

 twenty one

 twenty two

 twenty three

 twenty four

 25

 

 26

 27

 28

 29

 30

 

 

 

            0

            0

        0    0

 

    0    0    1

    0    0    1

0    0    1    1

0    1    1    2

0    1    1    2

 

0    1    2    3

1    2    2    3

1    2    3    3

1    2    3    4

2    3    3    4

 

2    3    4    5

2    4    4    5

3    4    5    6

3    4    5    6

3   5    5    6

 

4    5    6    7

4    5    6    7

4    6    7    8

5    6    7    8

5    7    7    9

 

6    7    8    9

6    7    8   10

6    8    9   10

7    8    9   10

7    9   10   11

 

 

 31

 32

 33

 34

 35

 

 36

 37

 38

 39

 40

 

 41

 42

 43

 44

 45

 

 46

 47

 48

 49

 50

 

 51

 52

 53

 54

 55

 

 56

 57

 58

 59

 60

 

 7    9   10   11

 8    9   10   12

 8   10   11   12

 9   10   11   13

 9   11   12   13

 

 9   11   12   14

10   12   13   14

10   12   13   14

11   12   13   15

11   13   14   15

 

11   13   14   16

12   14   15   16

12   14   15   17

13   15   16   17

13   15   16   18

 

13   15   16   18

14   16   17   19

14   16   17   19

15   17   18   19

15   17   18   20

 

15   18   19   20

16   18   19   21

16   18   20   21

17   19   20   22

17   19   20   22

 

17   20   21   23

18   20   21   23

18   21   22   24

19   21   22   24

19   21   23   25

 

 61

 62

 63

 64

 65

 

 66

 67

 68

 69

 70

 

 71

 72

 73

 74

 75

 

 76

 77

 78

 79

 80

 

 81

 82

 83

 84

 85

 

 86

 87

 88

 89

 90

 

20   22   23   25

20   22   24   25

20   23   24   26

21   23   24   26

21   24   25   27

 

22   24   25   27

22   25   26   28

22   25   26   28

23   25   27   29

23   26   27   29

 

24   26   28   30

24   27   28   30

25   27   28   31

25   28   29   31

25   28   29   32

 

26   28   30   32

26   29   30   32

27   29   31   33

27   30   31   33

28   30   32   34

 

28   31   32   34

28   31   33   35

29   32   33   35

29   32   33   36

30   32   34   36

 

30   33   34   37

31   33   35   37

31   34   35   38

31   34   36   38

32   35   36   39

     [ Note ] The numbers in the table represent   the sign limits corresponding to the sign and N and the significance level .

[ Rank sum test method ]   This method has higher accuracy than the symbol test method , can better utilize the information provided by the data , and does not require the data to be "paired" . The steps and use cases are described as follows :

    For example, a life test is carried out on a product made of two materials, A and B , and it is found that

              A 1610 1650 1680 1700 1750 1720 1800              

              B 1580 1600 1640 1640 1700          

 Is there any significant difference in the impact of the two materials on product quality ?

     Solution Arrange the above data into the following table from small to large : 

   

rank

 1     2     3 4 5 6 7 8 9 10 11 12                             

First

Second

         1610           1650 1680 1700 1720 1750 1800      

1580 1600      1640 1640           1700

 

The rank in the first row in the above table represents the ordinal number arranged from small to large. There are 1700 A and B data , and they are ranked in two ordinal positions of 8 and 9. The rank is taken according to the average rank .

 

Statistical Hypothesis Testing Steps

       Process analysis

( 1 ) Suppose H 0

( 2 ) Statistics

( 3 ) gives the significance level

( 4 ) Find out the confidence limit

 

 

 

 

 

 

( 5 ) Calculate statistics

( 6 ) Statistical inference

      At that time , accept H 0

      When or , negate H 0

Assuming no significant difference in the impact of the two materials on product life

T = sum of ranks for the group with the smaller number of samples

Check the "rank sum test table" (see next page), parameters n 1 =5, n 2 =7

( n 1 n 2 , the size of the two samples) to get the lower limit of T

and caps (i.e. negative domains or

 

T= 1+2+4+5+8.5=20.5 (rank sum of group B)

Because , so negate H 0 , that is, with 5 % , think that the influence of the two materials on the product life is significantly different

      

rank sum test table       

n 1

n 2

n 1

n 2

n 1

n 2

n 1

n 2

n 1

n 2

2

2

2

2

2

2

2

2

2

2

2

2

3

3

3

3

3

4

5

6

6

7

7

8

8

9

9

10

10

3

4

4

5

5

3

3

3

4

3

4

3

4

3

4

4

5

6

6

7

6

7

11

13

15

14

17

16

19

18

twenty one

20

twenty two

twenty one

15

18

17

twenty one

20

3

3

3

3

3

3

3

3

3

3

4

4

4

4

4

4

4

6

6

7

7

8

8

9

9

10

10

4

4

5

5

6

6

7

7

8

8

9

8

9

9

10

9

11

11

12

12

13

12

14

13

twenty three

twenty two

25

twenty four

28

27

30

29

33

31

25

twenty four

28

27

32

30

35

4

4

4

4

4

4

4

5

5

5

5

5

5

5

5

5

5

7

8

8

9

9

10

10

5

5

6

6

7

7

8

8

9

9

15

14

16

15

17

16

18

18

19

19

20

20

twenty two

twenty one

twenty three

twenty two

25

33

38

36

41

39

44

42

37

36

41

40

45

43

49

47

53

50

5

5

6

6

6

6

6

6

6

6

6

6

7

7

7

7

7

10

10

6

6

7

7

8

8

9

9

10

10

7

7

8

8

9

twenty four

26

26

28

28

30

29

32

31

33

33

35

37

39

39

41

41

56

54

52

50

56

54

61

58

65

63

69

67

68

66

73

71

78

7

7

7

8

8

8

8

8

8

9

9

9

9

10

10

9

10

10

8

8

9

9

10

10

9

9

10

10

10

10

43

43

46

49

52

51

54

54

57

63

66

66

69

79

83

76

83

80

87

84

93

90

98

95

108

105

114

111

131

127

 

 [ Note ]   The header indicates the number of data in the two groups; and are the lower and upper limits of the rank sum, respectively. The corresponding rank and upper and lower limits are represented by bold numbers, and the corresponding rank and upper and lower limits are represented by ordinary fonts.

 

3. Analysis of variance

 

    Analysis of variance is a method of analyzing experimental (or observational) data. The basic problem it solves is to clarify the influence of various factors related to the research object and the interaction between various factors on the object through data analysis. The objects it studies are assumed to follow a normal distribution.

[ One-way ANOVA ] considers   the influence of different levels of a factor A on the object under investigation. Test for k different levels A i of A (their distributions are tested to obtain test data ; n k ) assuming (although the value is unknown), test whether the mean of the test results of each A i is significantly different. The inspection steps are as follows:

   ( 1 ) Assumption

   ( 2 ) Select statistics and clarify their distribution

                  

in the formula             

                 

                  

   ( 3 ) gives the significance level

   ( 4 ) The confidence limit can be found from the F distribution table (degree of freedom is ( k -1, n - k ) ) , which satisfies

                  

   ( 5 ) List calculation statistics.

Grading

 

Test data x ij

 

n i

 

A 1

 

A 2

 

 

A k

 

  ...

 

  ...

 

     ... 

 

  ...

 

n 1

 

n 2

 

 

 

 

n k

 

 

   

      

  

 

 

 


 

 

mark

  

 

  ( 6 ) One-way ANOVA table

 

variance

source

sum of square

degrees of freedom

mean square

Statistics

confidence limits

statistical inference

Between groups

 

s

k

 

 

n

 

 

At that time , accept H 0

At that time , negating H 0

sum

n

 

 

 

 

  Explanation: If the value of 1 °  is larger, take it as a constant, then use it instead to carry out the above calculation, and the analysis result will not change. 2 ° The between - group variance S1 reflects the systematic error caused by different levels of factor A , while the within-group variance S2 is the within - group difference caused by random factors. If the effects of different factors A i are similar, the ratio of the between-group variance to the within-group variance is small, then it can be considered ; if the effects of different factors A i are significantly different, the ratio of the between-group variance to the within-group variance is larger, it cannot be considered . 

[ Two-way ANOVA ]   Consider the influence of two factors A and B. A is divided into l grades A 1 , A 2 , ··· , A l . B is divided into m grades B 1 , B 2 , ··· , B m under the condition of two factors A i j (that is, A i and B j are required to make lm kinds of cooperation in each test) for n trials, get lmn data . the assumed distribution , testing the effect of A orWhether the effect of B or the effect of B has a significant effect on the test results, respectively. The inspection steps are as follows:

  ( 1 ) Hypothesis H 0 : The corresponding effect ( A or B or ) has no significant effect on the test results.

  ( 2 ) Select statistics and clarify their distribution

where F A , F B and represent the effect of factor A , the effect of B and the interaction of factors A and B , respectively, and

,

 

( 3 ) gives the reliability .

( 4 ) Find out the confidence limits . When the degrees of freedom are , then

                  

( 5 ) List calculation statistics (Table 1 and Table 2 ).

Table 1

 

 

A

 

B

 

test results

 

 

 

 

A 1

 

B 1

 

 

B 2

 

Bm

 

 ...

 

 ...

   

 

 ...

 

    

 

    

     

 

    

  

   

 

   

    

 

   

  

      

     

    

   

 

 

 

A l

 

B 1

 

 

B 2

 

Bm

 

 ...

 

 ...

   

 

 ...

 

    

 

    

     

 

    

  

   

 

   

    

 

   

 

 

 

    

 

mark

 

Table 2

 

     

                          

 

   B 1       B 2      ...      B m

 

 A 1

 

 

 A 2

 

 

 

  A l

 

   x 11       x 12      ...      x 1 m

 

 

   x 21       x 22      ...      x 2 m

 

                ...     

 

   x l 1       x l 2      ...      x lm

 

 

 

 

 

 

       ... 

 

  

 ...

     

( 6 ) Two-way ANOVA table

                                                                       

variance

source

 sum of square

 degrees of freedom

  mean square  

Statistics

  confidence limits   

statistical inference

A 's

effect

 

B 's

effect

 

S A = Q

 

S B =R

 

l

 

m

 

when

, accept H 0.

role

 

random action

S A B =

T– Q + P

 

 

S false = W

 

( l )( m )

 

 

lm ( n )

 

 

 

when

, negate H 0 .

total flat

Fang He

   S

 lmn

 

 

 

 

When the interaction of the two factors A and B is not significant, S A B and S are mistakenly mixed together. At this time, if only one experiment is performed under the condition (ie n= 1 ), the measured experimental data is x i j , record

                      

                     

but            

                

At this time, the statistics and distribution of factor A and factor B are

The calculation process and analysis of variance are the same as before.

[ Analysis of variance by systematic grouping ]   The method of grouping by system is often used for investigation. For example, when a county is surveyed, several communes are selected, each commune also selects several brigades, and each brigade selects several production teams. This approach is called system grouping.

   ANOVA for systematic grouping is different from multivariate ANOVA. For example, in the two - way ANOVA, the factors A and B are parallel, but in the ANOVA of the systematic grouping , A and B are not parallel . l , and then in each group A i are grouped by factor B into B i 1 , B i 2 ,..., B im . However, the method of analysis is similar.

   Suppose n times of tests are made under the conditions of factor A i and factor B ij , and the test data is , and the inspection steps are as follows:

  ( 1 ) Hypothesis H 0 : Under the condition, the effect of factor A (or B ) is not significant.

  ( 2 ) Select statistics

where F AB and F B represent the significance of the influence of factor A and factor B , respectively, and

  ( 3 ) gives the reliability .

   ( 4 )   Find out the confidence limits . When the degrees of freedom are

                 

   (5) List calculation statistics

 

 

Test result x ij (k)

 

 

 

A 1

 

B 11

 

B 12

 

          ... 

 

    x 11

    

       x 12

      

      

  

    x 1 m

 

 

 

 

 

 

 

 

 

   

   

    

     

                         

      

 

  

    

   

 

 

 

 

A l

 

B l 1

 

B l2

 

 

B l m

 

           ...   

 

    x 11

    

       x 12

      

      

 

   x 1 m

 

 

 

 

 

 

 

 

  

                                                                                 

                                

 

 

mark

                                                             

 

( 6 ) System grouping variance analysis table

source of variance

sum of square

degrees of freedom

  mean square

 Statistics

 confidence limits

statistical inference

 

The role of A

 

The role of B

 

 

random action

 

S A = QP

 

S B = TQ

 

 

S false = W - T

 

 l- 1

 

 l ( m- 1)

 

 

 lm ( n- 1)

 

 

At that time, H 0 was accepted , and the corresponding factors were considered to be insignificant;

At that time , it was negative , and the corresponding factors were considered to have a significant impact.

total sum of squares

lmn

 

 

 

 

 

 

Original text