Problem1:The Residential Energy Consumption Survey found in 1990 that 14.8% of American households had a computer. A market survey organization repeated this study in a certain town with 25000 households, using a simple random sample of 500 househods: 79 of the sample households had computers.
|
Solution:
The percentage of households in the town with computers is estimated by the observed sample percentage, psample given by |
> psample := (79./500)*100;
psample := 15.80000000
The standard error for this estimate is given by the square-root law as, |
> SE := (SDbox/sqrt(500))*100;
1/2 SE := 2 SDbox 5
where SDbox is the standard deviation in the box of 25,000 tickets with zeroes and ones. The exact percentage of ones in the box is not known but it is estimated by the bootstrap method with the observed psample above. Using the formula for the standard deviation of a list of 0-1 tickets we estimate the SDbox by the SDsample given by, |
> SDsample := sqrt(.158*(1-.158));
SDsample := .3647410040
Hence the SE is estimated by SEest, |
> SEest := (SDsample/sqrt(500.))*100;
SEest := 1.631171358
Thus, the percentage of households in this town with computers is
estimated by 15.8% give or take 1.6% or so.
Let us now find the 95%-confidence interval. This is just an interval centered about the observed sample percentage with the property that repeated samples will have about 95% chance of producing an interval covering the true percentage in the town. All we need is to go above and below 15.8% by 2*SE. The actual interval is, |
> [psample-2*SEest, psample+2*SEest];
[12.53765728, 19.06234272]
|
How about a 92.7% Confidence Interval?
All we need is to enter the Normal Table with AREA=92.7%
and read out the z,
z Height Area z Height Area z Height Area ___________________ __________________ ___________________ 0.00 39.89 0.00 1.50 12.95 86.64 3.00 0.443 99.730 0.05 39.84 3.99 1.55 12.00 87.89 3.05 0.381 99.771 ...... ..... 0.25 38.67 19.74 1.75 8.63 91.99 3.25 0.203 99.885 0.30 38.14 23.58 1.80 7.90 92.81 3.30 0.172 99.903 0.35 37.52 27.37 1.85 7.21 93.57 3.35 0.146 99.919The closest is z=1.80 The interval is then, |
> [psample-1.8*SEest,psample+1.8*SEest];
[12.86389156, 18.73610844]
Do you see how the interval shrinks? The more the confidence the wither the interval until we get that the interval [0%,100%] has total 100% confidence but ofcourse we knew that before taking the sample so that extreme case is useless.
One question that arises in the above computations of SEest is:
How large must the sample size be in order for the Bootstrap
method (of estimating the SD in the box with the SD in the sample)
to work? |
Answer:
It can't be done! Problem3:Continuing with Problem1.... Suppose now that among the sample households 121 had no car, 172 had one car, and 207 had two or more cars. Find a 92%-confidence interval for the percentage of households in the town with one or more cars. |
Answer:
The observed percentage of households in the town with one ore more cars is given by, |
> p := 100*(172+207.)/500.;
p := 75.80000000
The estimated SE for this percentage is, |
> SE := sqrt(p*(100-p))/sqrt(500.);
SE := 1.915390299
Notice that I used a simplified version of the formula for the SE of a pecent. It is just algebra and it always gives the same answer as the formula we used in Problem1. Look |
> se := 100*sqrt(.758*(1-.758))/sqrt(500.);
se := 1.915390299
So far we know that estimated percentage is: 75.8% give or take 1.9% or so |
> [p-1.75*SE,p+1.75*SE];
[72.44806698, 79.15193302]
What does it mean?What indeed does it mean that the interval above [72%,79%] is a 92%-confidence interval?The answer to this question is tricky. In fact IT DOES NOT MEANThere is 92% chance that the true % is between 72% and 79%I know, I know.. stats should be able to do better... |