Another service from Omega

Predicting a Percentile Rank from r only


*****

When the scatter plot is football shaped the 5 number summary of the data: (average and SD of x, average and SD of y and the correlation coefficient) contain all the information necessary for a standard regression analysis. But it is possible to estimate the percentile rank on one variable given the percentile rank for the other variable by knowing only the correlation coefficient. Here is a typical example.


Problem:

In a study between two variables (x and y) the correlation coefficient is -0.6. Estimate the percentile rank for y when x is on the 35th percentile.


Solution:

  1. transform the percentile rank for x into standard units by using the normal table.
  2. multiply x in sus by the correlation coefficient to obtain the predicted y in standard units.
  3. Use the normal curve again to go from y in sus to a percentile.


Before working out the numbers it is useful to think about what is expected as a reasonable answer in these problems. We can think about it this way:
If a number is on the 35th percentile of a list that follows the normal curve, then that number is below the average for the list. When the correlation coefficient is negative (-0.6 in this case) a point with x-coordinate below the average is expected to have y-coordinate above the average but due to the *regression towards the mean effect* its y percentile rank will be closer to the average than the x percentile rank. Thus, the y percentile rank is expected to be grater than 50th but less than 65th (which is what it would be estimated for a point on the SD line).



Now the numbers:


The 35th percentile is the number such that 35% of the entries in the list are below it. Thus, this number in standard units is estimated by the -z on the normal curve with left tail of size 35%. This means that the area between -z and z is:

> area1 := (100. - 2*35);

                                 area1 := 30.

Entering the table with Area = 30 we see:

  z    Height  Area     z    Height  Area     z    Height  Area 
___________________    __________________    ___________________
 0.00  39.89   0.00    1.50  12.95  86.64    3.00  0.443  99.730
 0.05  39.84   3.99    1.55  12.00  87.89    3.05  0.381  99.771
 0.10  39.70   7.97    1.60  11.09  89.04    3.10  0.327  99.806
 0.15  39.45  11.92    1.65  10.23  90.11    3.15  0.279  99.837
 0.20  39.10  15.85    1.70   9.40  91.09    3.20  0.238  99.863

 0.25  38.67  19.74    1.75   8.63  91.99    3.25  0.203  99.885
 0.30  38.14  23.58    1.80   7.90  92.81    3.30  0.172  99.903
 0.35  37.52  27.37    1.85   7.21  93.57    3.35  0.146  99.919
 0.40  36.83  31.08    1.90   6.56  94.26    3.40  0.123  99.933
 0.45  36.05  34.73    1.95   5.96  94.88    3.45  0.104  99.944
So z is about 0.39 and we estimate the 35th percentile to be about -0.39 standard units. Now the second step

> y_in_sus := (-0.39)*(-0.60);

                               y_in_sus := 0.23

Notice that y in sus is positive and closer to 0 than -0.39. The final step is to go back to percentiles for y. The answer is given by the area under the normal curve from 0.23 to the left. From the table above we see that when z=0.23 then Area between -z and z is about 19% so:

> answer := 50 + 19/2.;

                             answer := 59.5 %


When x is on the 35th percentile y is predicted on the 59.5th percentile



Link to the commands in this file
Carlos Rodriguez <carlos@math.albany.edu>
Last modified: Tue Mar 16 14:50:43 EST 1999