This vignette provides information about how the package stores and
outputs information about species taxonomy via the taxonomy
object and available options for simulating species taxonomy.
A phylogenetic tree object, in the format used by most software
packages, does not typically contain information about species over
time. A timescaled phylogeny is informative about the minimum number of
co-existing lineages at a given moment but not about the start and end
point of species or the mode of speciation. Three modes of
speciation that have been discussed widely in palaeobiological
literature are shown in Fig. 1. and can be modelled using
FossilSim:
Fig. 1. Three different modes of speciation that can be simulated using FossilSim. Budding and bifurcation are also referred to as asymmetric and symmetric speciation.
The distinction between these processes is important because they have consequences on parameter estimates using different methods.
In molecular phylogenetics, we typically draw trees as fully
bifurcating structures (i.e. each node gives rise to two descendant
branches). This is also how trees are stored in computer memory by most
phylogenetic software packages, including ape. However, it
may not be the case that each internal edge represents a unique species.
Instead, some speciation events may be budding rather than bifurcating
or anagenesis may have occurred at some point along internal edges. To
explore the impact of different speciation modes it is valuable to know
the relationship between species through time (the taxonomy)
and the corresponding tree object.
FossilSim contains options for simulating species
evolution under different speciation modes, described in the section Simulating taxonomy. Information about
the true species taxonomy is stored in the taxonomy object,
which will be associated with the corresponding phylo
object used to simulate species evolution.
The object contains a dataframe detailing species taxonomy, which
contains the following 7 fields for each edge and species in the
corresponding phylo object:
sp the true species identity label. If all species
originated via budding or bifurcation this will always correspond to the
terminal-most edge label (i.e. the youngest node) associated with each
species. This is not the case if the data set also contains anagenetic
species, when multiple species may be associated with a single
edge
edge edge label of the branch in the corresponding
phylo object. Note that some species may be associated with multiple
edges
parent ancestor of species sp. Parent
labels follow the same convention as species. The label assigned to the
parent of the origin or root will be zero
start start time of the corresponding edge and/or
origin time of the species. If the corresponding edge is also the oldest
edge associated with the species this value will equal the species
origination time. If speciation mode is asymmetric or symmetric the
speciation time will match the start time of the corresponding edge. If
speciation mode is anagenetic the speciation time will be younger than
the start time of the corresponding edge
end end time of the corresponding edge and/or end
time of the species. If the corresponding edge is also the youngest edge
associated with the species this value will equal the species end time.
Unless the species end time coincides with an anagenetic speciation
event, the species end time will match the end time of the corresponding
edge. If the species end time coincides with an anagenetic speciation
event, the end time will be older than the end time of the corresponding
edge
mode mode = speciation mode. “o” = origin or “r” =
root (the edge/species that began the process). “b” = asymmetric or
budding speciation. “s” = symmetric or bifurcating speciation. “a” =
anagenetic speciation
A new taxonomy object can be created by passing the
above information to the taxonomy() function.
Cryptic species can also be simulated. Under cryptic speciation the
descendant species will be indistinguishable from the ancestor and the
taxonomy object can incorporate this information using the
additional optional fields:
cryptic TRUE if the speciation event was cryptic. If
this information is not passed to taxonomy(), the function
assumes cryptic = FALSE for all species
cryptic.id cryptic species identity, i.e. ancestral
species that the current species will be identified as. If
cryptic = TRUE, cryptic.id will differ from
the true species identity sp
The relationship between information provided by the
taxonomy object and the corresponding phylo
object will become clear when you start simulating species and exploring
the output.
The taxonomy object can be used as the starting point
for simulating fossils or other downstream analyses that require
information about discrete species units.
Special Note The birth and death rates used as parameters in the birth-death process typically do not correspond to the rates of appearance and extinction of species under the above processes, unless speciation occurs entirely via budding.
Species evolution or taxonomy can be simulated in
FossilSim under a mixed model of speciation that can
incorporate three modes of speciation – budding (or asymmetric),
bifurcating (or symmetric) and anagenetic – in addition to cryptic
speciation (Stadler et al. 2018).
This model of mixed speciation was described previously in (Bapst 2013) and is also available as part of
the paleotree package (Bapst
2012).
In a birth-death process \(\lambda\) is the rate at which branching events occur, while \(\mu\) is the rate at which lineage termination occurs. These are the parameters typically used to simulate birth-death trees (see the code snippet below for an example). The mixed model of speciation includes three additional parameters, \(\beta, \lambda_a\) and \(\kappa\), which are defined as follows:
The relationship between these processes and the underlying tree
structure in the corresponding phylo object is described in
section The taxonomy object.
Simulating species evolution or taxonomy for any user specified tree in
FossilSim is straightforward.
# set the random number generator seed to generate the same results using the same code
set.seed(123)
# simulate a tree using TreeSim conditioned on tip number
lambda = 1
mu = 0.2
tips = 8
t = TreeSim::sim.bd.taxa(n = tips, numbsim = 1, lambda = lambda, mu = mu)[[1]]
# t is an object of class phylo
t
#>
#> Phylogenetic tree with 10 tips and 9 internal nodes.
#>
#> Tip labels:
#> t2, t5, t8, t4, t9, t10, ...
#>
#> Rooted; includes branch lengths.
# use t$edge, t$edge.length, t$root.edge to see the tree attributes
# simulate under complete budding speciation
s = sim.taxonomy(tree = t) # this is equivalent to using the default parameters beta = 0, lambda_a = 0, kappa = 0
# s is an object of class taxonomy
s
#> sp edge parent start end mode cryptic cryptic.id
#> 1 1 1 8 0.38710924 0.00000000 b 0 1
#> 2 2 2 4 0.14461369 0.00000000 b 0 2
#> 3 3 3 6 0.08786013 0.00000000 b 0 3
#> 4 4 14 10 0.84708717 0.52482777 b 0 4
#> 5 4 17 10 0.52482777 0.14461369 b 0 4
#> 6 4 4 10 0.14461369 0.00000000 b 0 4
#> 7 5 5 8 0.10233497 0.00000000 b 0 5
#> 8 6 19 8 0.44503437 0.08786013 b 0 6
#> 9 6 6 8 0.08786013 0.00000000 b 0 6
#> 10 7 7 4 0.52482777 0.00000000 b 0 7
#> 11 8 15 10 1.53730579 0.44503437 b 0 8
#> 12 8 16 10 0.44503437 0.38710924 b 0 8
#> 13 8 18 10 0.38710924 0.10233497 b 0 8
#> 14 8 8 10 0.10233497 0.00000000 b 0 8
#> 15 9 9 10 1.17167856 0.09161934 b 0 9
#> 16 10 11 0 3.34383891 1.53730579 o 0 10
#> 17 10 12 0 1.53730579 1.17167856 o 0 10
#> 18 10 13 0 1.17167856 0.84708717 o 0 10
#> 19 10 10 0 0.84708717 0.34120366 o 0 10
#> Taxonomy representing 10 species across 19 edges.Note that the way sim.taxonomy assigns budding species
is deterministic – at each branching event, the “left” descendant edge
will always be assigned the ancestral species label and the “right”
descendant edge will always be the new species. This means even without
setting the random number seed sim.taxonomy will always
produce the same output when \(\beta =
0\).
FossilSim contains various options for plotting the
package output. The plot.taxonomy function will produce a
plot highlighting species taxonomy along with the corresponding
bifurcating tree object.
Note that R detects what type of object you are trying to plot, in
this case taxonomy, and will automatically call
plot.taxonomy when you apply the function plot
to an object of class taxonomy. Use
?plot.taxonomy to view further options that can be used to
change the appearance of the figure.
# simulate under complete bifurcating speciation
s = sim.taxonomy(tree = t, beta = 1)
plot(s, tree = t, legend.position = "topleft")# simulate under mixed speciation
s = sim.taxonomy(tree = t, beta = 0.5, lambda.a = 1, kappa = 0.1)
plot(s, tree = t, legend.position = "topleft")Note that plot.taxonomy only plots the true species, and
ignores any information about cryptic speciation (i.e. the function uses
the sp labels and ignores cryptic.id).
Given an existing taxonomy object you can also add
anagenetic or cryptic species downstream using the functions
sim.anagenetic.species and
sim.cryptic.species. These functions also return a
taxonomy object.
# simulate taxonomy without anagenetic or cryptic species
s1 = sim.taxonomy(tree = t, beta = 0.5)
# simulate anagenetic species
# note this function also requires the corresponding tree object
s2 = sim.anagenetic.species(tree = t, species = s1, lambda.a = 1)
# simulate cryptic species
s3 = sim.cryptic.species(species = s2, kappa = 0.1)These functions can not be used with taxonomy objects
that already contain anagenetic or cryptic species.
See the paleotree vignette to see how
FossilSim objects can be converted into
paleotree objects.