This document introduces the TaxNorm R package, a
package for normalizing microbiome taxa data. Here, we will go through
how to install, analyze and visualize microbiome data using this
package. TaxNorm implements the Zero Inflated Negative
Binomial (ZINB) method to normalize microbiome data.
There are three main steps in using this package:
Load and QC Input Data: In the package we have an example data set from the phyloeq package that shows shows the format needed for analysis. These data can be generated using methods blah blah blah.
Running ZINB Normalization Function: The
TaxNorm_Normalization function is runn using the above data
on the input. This function implements the ZINB method for
normalization.
Visualizing and Quality Control: Last, visualization and quality control measures are built into the package for use.
TaxaNorm requires the packages phyloeq and
microbiome which can be found on bioconductor.
For the newest, but potentially unstable, version of the package, direct github installation is also supported.
Basic Useage
data("TaxaNorm_Example_Input", package = "TaxaNorm")
# run normalization
TaxaNorm_Example_Output <- TaxaNorm_Normalization(data= TaxaNorm_Example_Input,
depth = NULL,
group = sample_data(TaxaNorm_Example_Input)$body_site,
meta.data = NULL,
filter.cell.num = 10,
filter.taxa.count = 0,
random = FALSE,
ncores = 1)
# run diagnosis test
Diagnose_Data <- TaxaNorm_Run_Diagnose(Normalized_Results = TaxaNorm_Example_Output, prev = TRUE, equiv = TRUE, group = sample_data(TaxaNorm_Example_Input)$body_site)Built in example data as a phyloseq object can be loaded with the command below.
We have prepared several QC figures for the input data characters, which give a preliminary criteria on pre-filtering rare taxa with low information before any analysis. This will improve the power and computational efficiency for the algorithm. If the user already has the cleaned data or pre-processed the data by themselves before, they can ignore and skip this step.
Here we provide a popular option to ensure at least
filter.sample.num samples with a count of
filter.taxa.count or more, where
filter.sample.num can be chosen as any arbitrary value or
the sample size of the smallest group of samples. By default, we used
filter.taxa.count=1 and filter.sample.num=10.
This criteria is incorporated in the following main function
TaxNorm_Normalization() as well.
filter.sample.num =1
filter.taxa.count = 10
taxaIn <- rowSums(abundances(TaxaNorm_Example_Input) > filter.taxa.count) > filter.sample.num
TaxaNorm_Example_Input <- prune_taxa(taxaIn, TaxaNorm_Example_Input) Users can apply any of their customized filtering criteria as well. Alternatively, a basic pre-filtering is to keep only rows that have at least 10 reads total:
The normalization is run and returns a TaxaNorm_Results
object. This object contains the input data, raw data, normdata, ecdf,
model parameters, and convergence.