projection {mining} | R Documentation |
Finds a projection matrix that separates data groups.
projection(x,y,k=1,given=NULL,type=c("mv","m","v"),...)
x |
a data frame of variables to project. |
y |
a factor or numeric vector which defines the groups to
separate.
If y is not given, it is
taken to be the response variable of x (the last column if
x is not a model.frame ). |
k |
the number of dimensions to project x down to. |
given |
A matrix specifying axes to avoid. The projection matrix
will be orthogonal to given . |
type |
see below. |
... |
additional parameters depending on type . |
This function only uses the second-order statistics of the data (means and covariances of the groups).
If type="m"
, the within-group covariances are assumed equal
and the projection will try to separate the projected means.
If type="v"
, the within-group means are assumed equal
and the projection will try to separate the projected covariances.
If type="mv"
, the projection will try to separate the projected
means and covariances, by maximizing the divergence between the
projected classes (as Gaussians).
If y
is a numeric vector, overlapping classes are defined by
grouping data points with similar values of y
.
The optional argument span
controls how big the classes will
be (as a percentage of the dataset), and res
controls the
amount of overlap. The total number of classes will be
res
/span
. The default values are usually acceptable.
The projection is "stabilized" so that small changes in the data do not cause sign flips in the projection.
A matrix suitable for input to project
,
with named rows matching columns of x
and
columns named h1
, ..., hk
.
Each column denotes a new dimension to be obtained as a linear combination
of the variables in x
.
Tom Minka
m-projection is Fisher's linear discriminant analysis. mv-projection is heteroscedastic discriminant analysis:
N. Kumar and A.G. Andreou. Heteroscedastic discriminant analysis and reduced rank HMMs for improved speech recognition. Speech Communication 26: 283-297, 1998.
The case when y
is numeric is sliced inverse
regression:
K.-C. Li. Sliced inverse regression for dimension reduction. Journal of the American Statistical Association 86(414): 316-327, 1991.
# illustrate difference between (m,v,mv) library(MASS) m1 <- c(6,6) v1 <- array(c(2,1.9,1.9,2),c(2,2)) #v1 <- array(c(1,0,0,1),c(2,2)) x1 <- mvrnorm(100,m1,v1) m2 <- c(0,0) v2 <- array(c(20,0,0,10),c(2,2)) x2 <- mvrnorm(300,m2,v2) x = as.data.frame(rbind(x1,x2)) y = factor(c(rep(1,nrow(x1)),rep(2,nrow(x2)))) plot(x[,1],x[,2],col=1,xlab="",ylab="",asp=1) points(x2[,1],x2[,2],col=2) w = projection(x,y,type="m") abline(0,w[2]/w[1],col=3) w = projection(x,y,type="v") abline(0,w[2]/w[1],col=4) w = projection(x,y,type="mv") abline(0,w[2]/w[1],col=5) my.legend(1,c("m","v","mv"),col=3:5,lty=1) # regression projection x1 <- 2*runif(200)-1 x2 <- 2*runif(200)-1 y <- x1^2/2 + x2^2 x <- data.frame(x1,x2) color.plot(x[,1],x[,2],y) w = projection(x,y) abline(0,w[2]/w[1],col=4)