AN EXERCISE IN STATISTICS

By James Mariner


LEARNING OBJECTIVES

  1. Determine the variance and standard deviation for a set of quantitative data

  2. Use the t-test to determine the probability that two sets of data belong to the same population

BACKGROUND

  1. Definitions

    • Mean (x) -- arithmetical average

    • Normal distribution -- a population with a bell-shaped distribution about its mean

    • Variance (s2) -- a measure of the degree of variation of a set of values from the mean of those values found by summing the squares of the deviations from the mean and dividing by one less than the number in the sample:

      (1)
    • Standard deviation (s) -- also a measure of the degree of variation of a set of values from the mean of those values; it is associated with the interpretation of the normal curve and is found by taking the square root of the variance (equation 1, above).

      Figure 1 shows that about 68% of the values in a normal population have values that are within plus or minus one standard deviation. About 95% of the values will fall within plus or minus two standard deviations, and nearly all values (over 99%) fall within plus or minus three deviations of the mean.




      Fig. 1. A normal distribution falling within three standard deviations

    • Standard error of the mean (sx) -- the standard deviation of the means of several random samples taken from a population; found by dividing the standard deviation by the square root of the number in the sample:


      (2)

    • Null hypothesis (HØ) -- a statement of the assumption that any differences between the observed results and reality are the results of chance alone. Validation of the null hypothesis says that chance alone is responsible for any variations observed. It is always stated in terms of its probability (p) that it should be rejected, and is in fact usually rejected if p < 0.05.


  2. The t-test
    The t-test is a valid statistical technique for random samples of continuous variables from normally distributed populations. It can determine the probability that the null hypothesis concerning the means of two small samples is correct; that is, it shows the probability that two samples are representative of a single population or of different populations. We will use the following formula for determining the value of t:
    (3)

    where x1 = mean of sample 1, x2 = mean of sample 2, n1 = the number in sample 1, n2 = the number in sample 2; s12 = the variance of sample 1, and s22 = the variance of sample 2. If the sample sizes are equal, then n1= n2 = n, thus allowing formula (4), above, to be simplified as:

    (4)

    Note that t can be either a positive or negative value. This is not important since the curve for t is symmetrical and we are concerned only with the magnitude of the variation from the mean. Thus, we are only concerned with the absolute value of t.

    Having determined the value of t, we must refer to the table on the Distribution of t Probability (Table 1, next page). In the table, you will note a series of p values along the top, and a listing of the degrees of freedom (d.f.) along the left margin. The degrees of freedom is the number of sets of values that are free to vary in a given sample. It is equal to one less than the number of values in each sample. In using the t-test, it will be two less than the total number in both samples.

    Criteria for using the t-test:

    1. Samples must be chosen randomly.

    2. Samples must have the characteristics of a "normal" distribution.

    3. Measurements must be of continuous variables.


    Procedure for t-test

    1. Arrange data in a table with the following headings:

      1. specimen label (number, letter, name, etc.)

      2. measurement of specimen (height, weight, etc.) -- as many as necessary

      3. deviations from the mean for each measurement

      4. squares of deviations from the mean for each measurement

    2. Determine the mean value (xm) for each measurement heading

    3. State the null hypothesis (H0)

    4. Sum the deviations from the mean

    5. Sum the squares of the deviations from the mean

    6. Compute the variances (formula #1)

    7. Determine which formula (#3 or #4) for t is applicable and use it to calculate the value of t

    8. Determine the number of degrees of freedom (df)

    9. Refer to Table 1 and determine the level of probability that your populations differ from each other by chance

    10. Accept or reject the null hypothesis on the probability (p); H0 is usually rejected if p < 0.05

    11. State a conclusion in terms of the results of the experiment.

  3. Lung Volume Investigation

    1. Prepare the spirometer for collecting lung volume data using a disposable mouthpiece for each subject tested. Metersticks and a bathroom scale should be readied for collecting data as well.

    2. Collect the following data for at least 50 students: gender, age, height, weight, and lung volume. Record the data for each subject in a spreadsheet. DO NOT collect names with these data; they are irrelevant. Be sure each subject is a willing participant in the investigation by explaining exactly what information will be obtained and that it will be used anonymously. It will be most convenient if age is recorded in months, and the metric system is used for height (cm), weight (kg), and lung volume (L). If height is collected in English units, it can be converted to centimeters by first converting to inches and then multiplying by 2.54 cm/in. Pounds are converted to kilograms by dividing by 2.2 lbs/kg.


    Table 1. Distribution of t Probability
    df / p = 0.10 0.05 0.01 0.001
    1
    2
    3
    4
    5

    6
    7
    8
    9
    10

    11
    12
    13
    14
    15

    16
    17
    18
    19
    20

    21
    22
    23
    24
    25

    26
    27
    28
    29
    30

    40

    60

    120

    6.314
    2.920
    2.353
    2.132
    2.015

    1.943
    1.895
    1.860
    1.833
    1.812

    1.796
    1.782
    1.771
    1.761
    1.753

    1.746
    1.740
    1.734
    1.729
    1.725

    1.721
    1.717
    1.714
    1.711
    1.708

    1.706
    1.703
    1.701
    1.699
    1.697

    1.684

    1.671

    1.658

    1.645

    12.706
    4.303
    3.128
    2.776
    2.571

    2.447
    2.365
    2.306
    2.262
    2.228

    2.201
    2.179
    2.160
    2.145
    2.131

    2.120
    2.110
    2.101
    2.093
    2.086

    2.080
    2.074
    2.069
    2.064
    2.060

    2.056
    2.052
    2.048
    2.045
    2.042

    2.025

    2.000

    1.980

    1.960

    63.657
    9.925
    5.841
    4.604
    4.032

    3.707
    3.499
    3.355
    3.250
    3.169

    3.106
    3.055
    3.012
    2.997
    2.947

    2.921
    2.898
    2.878
    2.861
    2.845

    2.831
    2.819
    2.807
    2.797
    2.787

    2.779
    2.771
    2.763
    2.756
    2.750

    2.704

    2.660

    2.617

    2.576

    636.619
    31.598
    12.941
    8.610
    6.859

    5.959
    5.405
    5.041
    4.781
    4.587

    4.437
    4.318
    4.221
    4.140
    4.073

    4.015
    3.965
    3.922
    3.883
    3.850

    3.818
    3.792
    3.767
    3.745
    3.725

    3.707
    3.690
    3.674
    3.659
    3.646

    3.551

    3.460

    3.373

    3.291

    <=====accept HØ reject HØ===>



    1. Use the organizing functions in the spreadsheet to compare two populations with respect to their lung volume. These populations could be established on the basis of male v. female students, younger v. older students, shorter v. taller students, or lighter v. heavier students. Where the difference between groups is not obvious (as it is in gender), the groups could be differentiated on whether their ages, heights, or weights fell below or above the mean or median for all the data in that category. In addition, each lab group could compare lung volumes of two populations based on different criteria.

    2. Write a formal report using the prescribed report format regarding the Relationship Between (the selected criterion) and Lung Volume for (the population). Your paper should address the question of whether there is a significant difference in lung volume between the two populations.

    Procedure for entering data in a computer spreadsheet

    1. Boot ClarisWorks and open a new spreadsheet.

    2. In Row 1 of the spreadsheet, label the columns with the headings corresponding to the data that you are taking for each subject tested (i.e., subject number, age, gender, height, weight, lung volume, etc.). Note in Figure 2 that column widths can be changed to fit these headings and that units are easiest to interpret if they are given in the metric system. You can also set the number of decimal places you use for data and/or calculations in each cell.

      Figure 2. Sample set-up for data in a spreadsheet with some sample data entered.
      A
      B
      C
      D
      E
      F
      1
      Subject
      Gender
      Age
      Height
      Weight
      Lung Volume
      2

      M/F
      (mo)
      (cm)
      (kg)
      (L)
      3
      1
      M
      197
      172.7
      70.5
      5.1
      4
      2
      F
      199
      160.3
      52.3
      4.8
      5
      3





      6
      4





      7
      5





      8
      6





      9
      7





      10
      8





      11
      9





      12
      10





      13
      11





      14
      12





    3. Beginning in Row 3, enter the data for each subject. Enter values without units (the units are given in the heading for each column). It is a good idea to save your data after entering each subject.

    4. When all of the data are entered, the information may be sorted by any of the criteria represented by the data (age, gender, height, weight, etc.). CAUTION : when sorting, you must select all of the data in the spreadsheet; if you select only the data in the column you want to serve as the basis for the sort, only that column will be sorted and it will no longer be aligned with the appropriate subject.

    5. At the bottom of the spreadsheet, you can designate various cells to summarize data for you. For example, you can show the number of data items in each population (n1 and n2), the sum, or the average of each column or data range selected by entering a formula from the Paste Function submenu (Edit Menu). The variance and standard deviation values for selected ranges can also be calculated using the Paste Function submenu. The values calculated by the spreadsheet can then be used to calculate "t" value for comparing the two populations as described earlier.

    6. You can also use the spreadsheet to calculate the value of "t" by entering the appropriate formula using cell positions to identify the variables in the formula. Your teacher should be able to help you in this endeavor.

Fellows Collection Index


Activities Exchange Index


 
Feedback   About AE   Discussions   Copyright © Info   Privacy Policy  
Sitemap  Email this Link   Contact   Access Excellence Home