| projection {mining} | R Documentation |
Finds a projection matrix that separates data groups.
projection(x,y,k=1,given=NULL,type=c("mv","m","v"),...)
x |
a data frame of variables to project. |
y |
a factor or numeric vector which defines the groups to
separate.
If y is not given, it is
taken to be the response variable of x (the last column if
x is not a model.frame). |
k |
the number of dimensions to project x down to. |
given |
A matrix specifying axes to avoid. The projection matrix
will be orthogonal to given. |
type |
see below. |
... |
additional parameters depending on type. |
This function only uses the second-order statistics of the data (means and covariances of the groups).
If type="m", the within-group covariances are assumed equal
and the projection will try to separate the projected means.
If type="v", the within-group means are assumed equal
and the projection will try to separate the projected covariances.
If type="mv", the projection will try to separate the projected
means and covariances, by maximizing the divergence between the
projected classes (as Gaussians).
If y is a numeric vector, overlapping classes are defined by
grouping data points with similar values of y.
The optional argument span controls how big the classes will
be (as a percentage of the dataset), and res controls the
amount of overlap. The total number of classes will be
res/span. The default values are usually acceptable.
The projection is "stabilized" so that small changes in the data do not cause sign flips in the projection.
A matrix suitable for input to project,
with named rows matching columns of x and
columns named h1, ..., hk.
Each column denotes a new dimension to be obtained as a linear combination
of the variables in x.
Tom Minka
m-projection is Fisher's linear discriminant analysis. mv-projection is heteroscedastic discriminant analysis:
N. Kumar and A.G. Andreou. Heteroscedastic discriminant analysis and reduced rank HMMs for improved speech recognition. Speech Communication 26: 283-297, 1998.
The case when y is numeric is sliced inverse
regression:
K.-C. Li. Sliced inverse regression for dimension reduction. Journal of the American Statistical Association 86(414): 316-327, 1991.
# illustrate difference between (m,v,mv)
library(MASS)
m1 <- c(6,6)
v1 <- array(c(2,1.9,1.9,2),c(2,2))
#v1 <- array(c(1,0,0,1),c(2,2))
x1 <- mvrnorm(100,m1,v1)
m2 <- c(0,0)
v2 <- array(c(20,0,0,10),c(2,2))
x2 <- mvrnorm(300,m2,v2)
x = as.data.frame(rbind(x1,x2))
y = factor(c(rep(1,nrow(x1)),rep(2,nrow(x2))))
plot(x[,1],x[,2],col=1,xlab="",ylab="",asp=1)
points(x2[,1],x2[,2],col=2)
w = projection(x,y,type="m")
abline(0,w[2]/w[1],col=3)
w = projection(x,y,type="v")
abline(0,w[2]/w[1],col=4)
w = projection(x,y,type="mv")
abline(0,w[2]/w[1],col=5)
my.legend(1,c("m","v","mv"),col=3:5,lty=1)
# regression projection
x1 <- 2*runif(200)-1
x2 <- 2*runif(200)-1
y <- x1^2/2 + x2^2
x <- data.frame(x1,x2)
color.plot(x[,1],x[,2],y)
w = projection(x,y)
abline(0,w[2]/w[1],col=4)