R: Discriminative projection

projection {mining}

R Documentation

Discriminative projection

Description

Finds a projection matrix that separates data groups.

Usage

projection(x,y,k=1,given=NULL,type=c("mv","m","v"),...)

Arguments

`x`	a data frame of variables to project.
`y`	a factor or numeric vector which defines the groups to separate. If `y` is not given, it is taken to be the response variable of `x` (the last column if `x` is not a `model.frame`).
`k`	the number of dimensions to project `x` down to.
`given`	A matrix specifying axes to avoid. The projection matrix will be orthogonal to `given`.
`type`	see below.
`...`	additional parameters depending on `type`.

Details

This function only uses the second-order statistics of the data (means and covariances of the groups).

If type="m", the within-group covariances are assumed equal and the projection will try to separate the projected means.

If type="v", the within-group means are assumed equal and the projection will try to separate the projected covariances.

If type="mv", the projection will try to separate the projected means and covariances, by maximizing the divergence between the projected classes (as Gaussians).

If y is a numeric vector, overlapping classes are defined by grouping data points with similar values of y. The optional argument span controls how big the classes will be (as a percentage of the dataset), and res controls the amount of overlap. The total number of classes will be res/span. The default values are usually acceptable.

The projection is "stabilized" so that small changes in the data do not cause sign flips in the projection.

Value

A matrix suitable for input to project, with named rows matching columns of x and columns named h1, ..., hk. Each column denotes a new dimension to be obtained as a linear combination of the variables in x.

Author(s)

Tom Minka

References

m-projection is Fisher's linear discriminant analysis. mv-projection is heteroscedastic discriminant analysis:

N. Kumar and A.G. Andreou. Heteroscedastic discriminant analysis and reduced rank HMMs for improved speech recognition. Speech Communication 26: 283-297, 1998.

The case when y is numeric is sliced inverse regression:

K.-C. Li. Sliced inverse regression for dimension reduction. Journal of the American Statistical Association 86(414): 316-327, 1991.

Examples

# illustrate difference between (m,v,mv)
library(MASS)
m1 <- c(6,6)
v1 <- array(c(2,1.9,1.9,2),c(2,2))
#v1 <- array(c(1,0,0,1),c(2,2))
x1 <- mvrnorm(100,m1,v1)
m2 <- c(0,0)
v2 <- array(c(20,0,0,10),c(2,2))
x2 <- mvrnorm(300,m2,v2)
x = as.data.frame(rbind(x1,x2))
y = factor(c(rep(1,nrow(x1)),rep(2,nrow(x2))))
plot(x[,1],x[,2],col=1,xlab="",ylab="",asp=1)
points(x2[,1],x2[,2],col=2)
w = projection(x,y,type="m")
abline(0,w[2]/w[1],col=3)
w = projection(x,y,type="v")
abline(0,w[2]/w[1],col=4)
w = projection(x,y,type="mv")
abline(0,w[2]/w[1],col=5)
my.legend(1,c("m","v","mv"),col=3:5,lty=1)

# regression projection
x1 <- 2*runif(200)-1
x2 <- 2*runif(200)-1
y <- x1^2/2 + x2^2
x <- data.frame(x1,x2)
color.plot(x[,1],x[,2],y)
w = projection(x,y)
abline(0,w[2]/w[1],col=4)

[Package Contents]