Updated: 2024-11-21
COASTSS runs a coding-variant allelic series test
starting from standard summary statistics. COASTSS is not
identical to the test provided by COAST, as some components
of the original test could not be calculated from standard summary
statistics. Nonetheless, both methods behave similarly and provide
consistent results in large samples.
The function CalcSumstats can be used to calculate the
required summary statistics. The essential inputs are the annotation
vector anno, the subject by variant genotype matrix
geno, and the phenotype vector pheno. If
covariates covar are not provided, an intercept-only
covariate matrix is adopted by default. If covariates are provided, an
intercept should be included as necessary. For additional details on the
data generating process DGP, see the
data_generation vignette.
withr::local_seed(101)
# Generate data.
n <- 1e4
data <- AllelicSeries::DGP(
n = n,
snps = 300,
beta = c(1, 4, 9) / sqrt(n),
)
# Generate summary statistics.
sumstats <- AllelicSeries::CalcSumstats(
anno = data$anno,
covar = data$covar,
geno = data$geno,
pheno = data$pheno
)The output sumstats is a list containing:
anno, the (snps x 1) annotation vector.ld, a (snps x snps) LD (genotype correlation)
matrix.maf, a (snps x 1) minor allele frequency vector.sumstats, a (snps x 4) data.frame including the
annotations, effect sizes beta, standard errors
se, and p-values p.The required inputs to COASTSS are the annotation vector
anno along with the per-variant effect sizes
beta and standard errors se. Ideally, the
in-sample ld matrix is also provided. If the LD matrix is
not provided, an identity matrix is assumed. This approximation is
reasonable when the LD is minimal, as is expected among rare variants,
however it may break down if variants of sufficient minor allele count
are included in the analysis. If available, we recommend always
providing the in-sample LD matrix. The minor allele frequencies
maf are optionally provided to allow the allelic SKAT test
to up-weight rarer variants.
# COAST-SS, with LD and MAF provided.
full <- AllelicSeries::COASTSS(
anno = sumstats$sumstats$anno,
beta = sumstats$sumstats$beta,
se = sumstats$sumstats$se,
maf = sumstats$sumstats$maf,
ld = sumstats$ld
)
show(full)
#> Effect Sizes:
#> test beta se
#> 1 base 0.01 0.008
#> 2 base 0.03 0.010
#> 3 base 0.11 0.020
#> 4 sum 0.02 0.003
#>
#>
#> P-values:
#> test type pval
#> 1 baseline burden 2.01e-10
#> 2 sum_count burden 2.99e-10
#> 3 allelic_skat skat 4.94e-07
#> 4 omni omni 4.81e-10
# COAST-SS, with LD and MAF omitted.
minimal <- AllelicSeries::COASTSS(
anno = sumstats$sumstats$anno,
beta = sumstats$sumstats$beta,
se = sumstats$sumstats$se
)
#> Warning in CheckInputsSS(anno = anno, beta = beta, se = se, lambda = lambda, :
#> If LD is not provided, an identity matrix is assumed. This may not be accurate
#> in cases where the LD is appreciable.
show(minimal)
#> Effect Sizes:
#> test beta se
#> 1 base 0.01 0.008
#> 2 base 0.03 0.010
#> 3 base 0.11 0.020
#> 4 sum 0.02 0.003
#>
#>
#> P-values:
#> test type pval
#> 1 baseline burden 2.62e-10
#> 2 sum_count burden 3.05e-10
#> 3 allelic_skat skat 1.86e-08
#> 4 omni omni 5.56e-10By default, COASTSS, like COAST, uses a
simple linear weighting scheme of weights = c(1, 2, 3).
Here, the data were simulated with a geometric weighting scheme of
weights = c(1, 4, 9). By changing the weighting scheme of
COASTSS to match the generative model, we can improve
power.
# COAST-SS, alternate weights.
results <- AllelicSeries::COASTSS(
anno = sumstats$sumstats$anno,
beta = sumstats$sumstats$beta,
se = sumstats$sumstats$se,
maf = sumstats$sumstats$maf,
ld = sumstats$ld,
weights = c(1, 4, 9)
)
show(results)
#> Effect Sizes:
#> test beta se
#> 1 base 0.01 0.008
#> 2 base 0.03 0.010
#> 3 base 0.11 0.020
#> 4 sum 0.01 0.002
#>
#>
#> P-values:
#> test type pval
#> 1 baseline burden 2.01e-10
#> 2 sum_count burden 1.03e-11
#> 3 allelic_skat skat 3.82e-08
#> 4 omni omni 3.91e-11COAST and COASTSS were originally designed
to operate on the benign missense variants, damaging missense variants,
and protein truncating variants within a gene. Both have been
generalized to allow for an arbitrary number of discrete annotation
categories. The following example simulates and analyzes data with 4
annotation categories. The main difference when analyzing a different
number of annotation categories is that the weight vector
should be specified, and should have length equal to the number of
possible annotation categories. COASTSS will run,
albeit with a warning, if there are possible annotation categories to
which no variants are assigned (e.g. a gene contains no PTVs).
withr::local_seed(102)
# Generate data.
n <- 1e4
data <- AllelicSeries::DGP(
n = n,
snps = 400,
beta = c(1, 2, 3, 4) / sqrt(n),
prop_anno = c(0.4, 0.3, 0.2, 0.1),
weights = c(1, 1, 1, 1)
)
# Generate summary statistics.
sumstats <- AllelicSeries::CalcSumstats(
anno = data$anno,
covar = data$covar,
geno = data$geno,
pheno = data$pheno
)
# Run COAST-SS.
results <- AllelicSeries::COASTSS(
anno = sumstats$sumstats$anno,
beta = sumstats$sumstats$beta,
se = sumstats$sumstats$se,
maf = sumstats$sumstats$maf,
ld = sumstats$ld,
weights = c(1, 2, 3, 4)
)
show(results)
#> Effect Sizes:
#> test beta se
#> 1 base 0.00 0.008
#> 2 base 0.02 0.009
#> 3 base 0.02 0.010
#> 4 base 0.06 0.015
#> 5 sum 0.01 0.002
#>
#>
#> P-values:
#> test type pval
#> 1 baseline burden 5.22e-05
#> 2 sum_count burden 4.80e-06
#> 3 allelic_skat skat 3.77e-04
#> 4 omni omni 1.72e-05eps is a regularization term added to the diagonal
of the LD matrix if the provided LD matrix is not positive definite. The
default value is eps = 1. Larger values may be needed, but
smaller values are not recommended.
lambda is an optional 3 x 1 vector of inflation
factors that are applied to the p-values of the baseline,
sum_count, and allelic_skat tests
before the omnibus p-value is calculated. By default,
lambda = c(1, 1, 1), which results in no correction. Larger
values may be needed, particularly if more-common variants are included.
Values less than 1 will be reset to 1.
pval_weights is a 3 x 1 vector specifying the
relative weights of the p-values from the baseline,
sum_count, and allelic_skat tests when
calculating the omnibus p-value. By default,
pval_weights = c(0.25, 0.25, 0.50), which gives the allelic
SKAT test equal weight to the two burden-type tests (i.e. the baseline
and allelic sum tests).