When the scatter plot is football shaped the 5 number summary of the data: (average and SD of x, average and SD of y and the correlation coefficient) contain all the information necessary for a standard regression analysis. But it is possible to estimate the percentile rank on one variable given the percentile rank for the other variable by knowing only the correlation coefficient. Here is a typical example. |
Problem:In a study between two variables (x and y) the correlation coefficient is -0.6. Estimate the percentile rank for y when x is on the 35th percentile. |
Solution:
|
Before working out the numbers it is useful to think about
what is expected as a reasonable answer in these problems.
We can think about it this way: If a number is on the 35th percentile of a list that follows the normal curve, then that number is below the average for the list. When the correlation coefficient is negative (-0.6 in this case) a point with x-coordinate below the average is expected to have y-coordinate above the average but due to the *regression towards the mean effect* its y percentile rank will be closer to the average than the x percentile rank. Thus, the y percentile rank is expected to be grater than 50th but less than 65th (which is what it would be estimated for a point on the SD line). |
|
Now the numbers:
The 35th percentile is the number such that 35% of the entries in the list are below it. Thus, this number in standard units is estimated by the -z on the normal curve with left tail of size 35%. This means that the area between -z and z is: |
> area1 := (100. - 2*35);
area1 := 30.
Entering the table with Area = 30 we see:
z Height Area z Height Area z Height Area ___________________ __________________ ___________________ 0.00 39.89 0.00 1.50 12.95 86.64 3.00 0.443 99.730 0.05 39.84 3.99 1.55 12.00 87.89 3.05 0.381 99.771 0.10 39.70 7.97 1.60 11.09 89.04 3.10 0.327 99.806 0.15 39.45 11.92 1.65 10.23 90.11 3.15 0.279 99.837 0.20 39.10 15.85 1.70 9.40 91.09 3.20 0.238 99.863 0.25 38.67 19.74 1.75 8.63 91.99 3.25 0.203 99.885 0.30 38.14 23.58 1.80 7.90 92.81 3.30 0.172 99.903 0.35 37.52 27.37 1.85 7.21 93.57 3.35 0.146 99.919 0.40 36.83 31.08 1.90 6.56 94.26 3.40 0.123 99.933 0.45 36.05 34.73 1.95 5.96 94.88 3.45 0.104 99.944So z is about 0.39 and we estimate the 35th percentile to be about -0.39 standard units. Now the second step |
> y_in_sus := (-0.39)*(-0.60);
y_in_sus := 0.23
Notice that y in sus is positive and closer to 0 than -0.39. The final step is to go back to percentiles for y. The answer is given by the area under the normal curve from 0.23 to the left. From the table above we see that when z=0.23 then Area between -z and z is about 19% so: |
> answer := 50 + 19/2.;
answer := 59.5 %
|
When x is on the 35th percentile y is predicted on the 59.5th percentile