The Chi-square test can also be used to test for independence
between rows and columns of a contingency table. Here is
an example.
Problem:In a certain town, there are about one million eligible voters. A simple random sample of 10000 eligible voters was chosen to study the relationship between sex and participation in the last election. The results are summarized in the following 2X2 (read two by two) contingency table:Men Women _____________________________ Voted 2792 3591 Didn't vote 1486 2131We want to check whether being a man or a woman (columns) is independent of having voted in the last election (rows). In other words is "sex and voting independent"? |
Solution:
In order to answer the question we need to build a test of hypothesis as usual. We have |
> Null := `Sex is independent of Voting`:
> Alternative := `Sex and Voting are dependent`:
After specifying the Null hypothesis we need to compute the expected
table under the assumption that rows and columns are in fact independent.
To compute the expected table we use the product rule for chances:
chance of (row_i,col_j) = (chance row_i) * (chance col_j)From here we deduce that the expected number of counts in (row_i,col_j) is given by: N*(chance row_i)*(chance col_j) = (Sum row_i)*(Sum col_j) / NThe observed table with totals included is: OBSERVED TABLE Men Women Total _____________________________ |______ Voted 2792 3591 | 6383 Didn't vote 1486 2131 | 3617 _____________________________________ Total 4278 5722 | 10000The associated expected table under the assumption that sex and voting are independent is given by EXPECTED TABLE Men Women Total _____________________________ |______ Voted 2731 3652 | 6383 Didn't vote 1547 2070 | 3617 _____________________________________ Total 4278 5722 | 10000We now have the observed table and the expected table under the null hypothesis of independence. After that we need to compute the X2 statistic. The X2 statistic measures how far away is the observed table from the expected one. The X2 statistic has as many terms as there are cells in the observed table (4 in our case): |
> c11 := (2792-2731)^2/2731.:
> c12 := (3591-3652)^2/3652.:
> c21 := (1486-1547)^2/1547.:
> c22 := (2131-2070)^2/2070.:
The X2-statistic is the sum of each of the contributions from each cell: |
> X2 := c11+c12+c21+c22;
X2 := 6.584283457
The last part is to compute the P-value. This is done by looking
under the Chi-square table with (rows-1)*(cols-1) degrees of freedom.
In the case of a 2x2 table (our case) the number of degrees of
freedom is (2-1)(2-1)=1*1=1. The table gives the tail areas at:
Degrees of freedom 99% ... 10% 5% 1% _____________________________________________________ 1 0.00016 2.71 3.84 6.64 2 0.020 4.60 5.99 9.21Since the observed X2 = 6.58 and thus, Problem2:Each respondent in the Current Population Survey of March 1993 was classified as employed, unemployed, or outside the labor force. The results for men in California age 35-44 can be cross-tabulated by marital status, as follows:Widowed, divorced, Never Married or separated married ________________________________________ Employed 679 103 114 Unemployed 63 10 20 Not in labor force 42 18 25Men of different marital status seem to have different distributions of labor force status. Or is this just chance variation? (you may assume the table results from a simple random sample.) |
Solution:
We have: |
> Obs_table := matrix(3,3,[679,103,114,63,10,20,42,18,25]);
[679 103 114] [ ] Obs_table := [ 63 10 20] [ ] [ 42 18 25]> R1 := 679+103+114:R2:=63+10+20:R3:=42+18+25:
[654 109 133] [ ] Exp_table := [ 68 11 14] [ ] [ 62 10 13] 2 2 (679 - 654) (25 - 13) X2 := ------------ + ... + ---------- 654 13> X2 := 30.96:
Looking at the table of the Chi-sqare distribution with (3-1)(3-1)=2*2=4
degrees of freedom we get:
Degrees of freedom 99% ... 10% 5% 1% _____________________________________________________ 1 0.00016 2.71 3.84 6.64 2 0.020 4.60 5.99 9.21 3 0.12 6.25 7.82 11.34 4 0.30 7.78 9.49 13.28 5 0.55 9.24 11.07 15.09 |
since 30.96 > > 13.28 we conclude from the table that:
|