% Generated by roxygen2: do not edit by hand
% Please edit documentation in R/bcorsis.R
\name{bcorsis}
\alias{bcorsis}
\title{Ball Correlation Sure Independence Screening}
\usage{
bcorsis(x, y, d = "small", weight = FALSE, method = "standard",
  dst = FALSE, parms = list(d1 = 5, d2 = 5, df = 3), R = 99, seed = 4)
}
\arguments{
\item{x}{a numeric matirx or data.frame included \eqn{n} rows and \eqn{p} columns. 
Each row is an observation vector and each column corresponding to a explanatory variable, generally \eqn{p >> n}.}

\item{y}{a numeric vector, matirx, data.frame or \code{dist} object.}

\item{d}{the hard cutoff rule suggests selecting \eqn{d} variables. Setting \code{d = "large"} or 
\code{ d = "small"} means \code{n-1} or \code{floor(n/log(n))} 
variables are selected. If \code{d} is a integer, 
\code{d} variables are selected. Default: \code{d = "small"}}

\item{weight}{when \code{weight = TRUE}, weighted ball correlation is used instead of ball correlation. Default: \code{ weight = FALSE}}

\item{method}{method for sure independence screening procedure, include: \code{"standard"}, \code{"pvalue"},
\code{"lm"}, \code{"gam"}, \code{"interaction"} and \code{"survival"}.
Setting \code{method = "standard"} or \code{"pvalue"} means standard sure independence screening procedure 
based on ball correlation or \emph{p}-value of ball correlation test while options
\code{"lm"} and \code{"gam"} carry out iterative BCor-SIS procedure with ordinary 
linear regression and generalized additive models, respectively.
Options \code{"interaction"} and \code{"survival"} are designed for detecting variables 
with potential linear interaction or associated with censored responses. Default: \code{method = "standard"}}

\item{dst}{if \code{dst = TRUE}, \code{y} will be considered as a distance matrix. 
Arguments only available when \code{ method = "standard"}, \code{method = "pvalue"} 
or \code{ method = "interaction"}. Default: \code{dst = FALSE}}

\item{parms}{parameters list only available when \code{method = "lm"} or \code{"gam"}. 
It contains three parameters: \code{d1}, \code{d2}, and \code{df}. \code{d1} is the
number of initially selected variables, \code{d2} is the number of variables collection size added in each iteration.
\code{df} is degree freedom of basis in generalized additive models 
playing a role only when \code{method = "gam"}. Default: \code{ parms = list(d1 = 5, d2 = 5, df = 3)}}

\item{R}{the number of replications. Arguments only available when \code{method = "pvalue"}. Default \code{ R = 99}}

\item{seed}{the random seed. Arguments only available when \code{method = "pvalue"}.}
}
\value{
\item{\code{ix }}{ the vector of indices selected by ball correlation sure independence screening procedure.}
}
\description{
Generic non-parametric sure independence screening procedure based on ball correlation.
Ball correlation is a generic multivariate measure of dependence in Banach space.
}
\details{
\code{bcorsis} implements a model-free generic screening procedure, 
BCor-SIS, with fewer and less restrictive assumptions. 
The sample sizes (number of rows or length of the vector) of the 
two variables \code{x} and \code{y} must agree, 
and samples must not contain missing values. 

BCor-SIS procedure for censored response is carried out when \code{method = "survival"}. At that time, 
the matrix or data.frame pass to argument \code{y} must have exactly two columns and the first column is 
event (failure) time while the second column is censored status, a dichotomous variable. 

If we set \code{dst = TRUE}, arguments \code{y} is considered as distance matrix, 
otherwise \code{y} is treated as data.

BCor-SIS is based on a recently developed universal dependence measure: Ball correlation (BCor). 
BCor efficiently measures the dependence between two random vectors, which is between 
0 and 1, and 0 if and only if these two random vectors are independent under some mild conditions.
(See the manual page for \code{\link{bcor}}.)

Theory and numerical result indicate that BCor-SIS has following advantages:

(i) It has a strong screening consistency property without finite sub-exponential moments of the data.
Consequently, even when the dimensionality is an exponential order of the sample size, BCor-SIS still 
almost surely able to retain the efficient variables.

(ii) It is nonparametric and has the property of robustness.

(iii) It works well for complex responses and/or predictors, such as shape or survival data

(iv) It can extract important features even when the underlying model is complicated.

See (Pan 2017) for theoretical properties of the BCor-SIS, including statistical consistency.
}
\examples{
\dontrun{

############### Quick Start for bcorsis function ###############
set.seed(1)
n <- 150
p <- 3000
x <- matrix(rnorm(n * p), nrow = n)
error <- rnorm(n)
y <- 3*x[, 1] + 5*(x[, 3])^2 + error
res <- bcorsis(y = y, x = x)
head(res[[1]])

############### BCor-SIS: Censored Data Example ###############
data("genlung")
result <- bcorsis(x = genlung[["covariate"]], y = genlung[["survival"]], 
                  method = "survival")$ix
top_gene <- colnames(genlung[["covariate"]])[result]
head(top_gene, n = 1)


############### BCor-SIS: Interaction Pursuing ###############
set.seed(1)
n <- 150
p <- 3000
x <- matrix(rnorm(n * p), nrow = n)
error <- rnorm(n)
y <- 3*x[, 1]*x[, 5]*x[, 10] + error
res <- bcorsis(y = y, x = x, method = "interaction")
head(res[[1]])

############### BCor-SIS: Iterative Method ###############
library(mvtnorm)
set.seed(1)
n <- 150
p <- 3000
sigma_mat <- matrix(0.5, nrow = p, ncol = p)
diag(sigma_mat) <- 1
x <- rmvnorm(n = n, sigma = sigma_mat)
error <- rnorm(n)
rm(sigma_mat); gc(reset = TRUE)
y <- 3*(x[, 1])^2 + 5*(x[, 2])^2 + 5*x[, 8] - 8*x[, 16] + error
res <- bcorsis(y = y, x = x, method = "gam", d = 15)
res[[1]]
}
}
\seealso{
\code{\link{bcor}}
}
\author{
WenLiang Pan, WeiNan Xiao, XueQin Wang, HePing Zhang, HongTu Zhu
}
