full2.gif (11401 bytes)
Correlation Coefficient
(A worked Problem:  Procedure explained in more detail
in Chapter 8)


Ten statistics students have taken the first exam.  Below you will find the test score on the first exam and the student's current homework score.  The maximum possible score on the test was 100 points and the maximum possible score on homework is 50 points.

Compute the correlation coefficient between the test scores and the homework scores.

  Student                        Test Score (X)                 Homework Score (Y)
  Robert                                  61                                         35
  Thomas                                 95                                         50
  Mark                                    44                                           5
  Wanda                                  93                                         50
  Judy                                      63                                         15
  Haydn                                   80                                         34
  Barbara                                62                                         16
  Karen                                   95                                         50
  Marilyn                                65                                           7
  Phil                                       88                                         38


Note:  As you know from reading the book there are two methods that you can use to compute the correlation coefficient.  One method uses the covariance, and the other method, the computational method, uses sums that are easily calculated on a scientific calculator.  Because the computational formula is usually preferred, and is usually faster, that is the method of computing the correlation coefficient that is used here.

Hint:  Computing the correlation coefficient from the beginning takes a lot of time.  I suggest that you budget at least 45 minutes to an hour for each problem.   This should give you enough time to carefully compute and double check all the sums and formulas.

I.  The computational formula for the Correlation Coefficient is:

    Image95.gif (1881 bytes)

II.  In order to use this formula we must compute several sums

X X2 Y Y2 X·Y
61 3721 35 1225 2135
95 9025 50 2500 4750
44 1936 5 25 220
93 8649 50 2500 4650
63 3969 15 225 945
80 6400 34 1156 2720
62 3844 16 256 992
95 9025 50 2500 4750
65 4225 7 49 455
88 7744 38 1444 3344
746 58538 300 11880 24961

    A.  The sum of the X......Add all the X's

                sumx.gif (996 bytes)

    B.  The sum of  the Y.....Add all the Y's

                sumy.gif (989 bytes)

    C.  The sum of the X2.....Square each X then add all the X squares

                sumx2.gif (1044 bytes)

Hint:     Remember to square each score then add all the squared scores.
            If you square the sum of the X's you will be way wrong.

    D.  The sum of the Y2.....Square each Y then all the Y squares

                sumy2.gif (1037 bytes)

Hint:     Remember to square each score then add all the squared scores.
            If you square the sum of the Y's you will be way wrong.

    E.  The sum of the X times Y.....Multiply each X times its paired Y

                sumxy.gif (1061 bytes)

Hint:     Remember to keep the pairs together. 
            If you break up or rearrange the pairs you will be totally wrong.

    F.  We also need to know n.....n is the number of pairs of scores
               n = 10

III.  All that is left is to substitute the sums in the correlation formula and then compute r

                correlation1.gif (4089 bytes)

IV.  Next we need to determine the significance of r by finding the Critical Value in Table R

    A.  Compute the degrees of freedom (df) for the problem

            df = n - 2 = 10 - 2 = 8

    B.  Using df = 8 and the .05 column we then can read the Critical Value (CV) from Table R

            CV = .6215

    C.  Determining the significance of r

        a.  If the computed value of r is > or = CV then r is significant .

        b.  If the computed value of r is < CV then r is not significant.

        c.  Because .895 is > .6215 the correlation coefficient is significant.

    D.  There is a significant positive relationship between Test Scores and Homework Scores.
            This means that as test scores go up, homework scores tend to go up, and visa-versa.
            Therefore we can use homework scores to predict test scores or visa-versa.

Hint:     This does not mean that homework scores cause test scores. The correlation coefficient
            does not determine causality.  Only an experiment can determine causality.

V.  The coefficient of determination....r2

    A. The proportion of the variance in one sample that can be explained by
         the variance in the other sample.

    B.  r2 =.8952 = .801

    C.  .801 is the proprotion of the variability of test scores that can be explained by
          the variability in homework scores.

Hint:    Remember that this does not mean the part of the variability of X is caused by Y.

Copyright © 2004 by Mark W. Vernoy