Coalescent class documentation

Note

Using the coalescent model is optional. When running via the command line this class will only be initialized when the flag

--coalescent <arg>

is used. The argument is either ‘const’ (have TreeTime estimate a constant coalescence rate), or ‘skyline’ (estimate a piece-wise linear merger rate trajectory) or a floating point number giving the time scale of coalescence in units of divergence (Tc). This is also called the effective population size.

class treetime.Coalescent(tree, Tc=0.001, logger=None, date2dist=None, n_branches_posterior=False)[source]

Class for adding the Kingman coalescent model to the tree time inference, this is optional. The coalescent model is based on the idea that certain tree structures are more likely given a specific population structure and this likelihood can be added to divergence time inference. The coalescent depends on the effective population size (\(Tc\)) and the number of lineages at any point in time \(k(t)\).

__init__(tree, Tc=0.001, logger=None, date2dist=None, n_branches_posterior=False)[source]

Initialize \(k(t)\) and \(Tc\) functions

Parameters:

Tc (float) – time scale of coalescence / effective population size
n_branches_posterior (boolean) – True if the uncertainty of the divergence time estimates should be taken into consideration when calculating the number of lineages function \(k(t)\), False if current divergence times should be seen as fixed (default). Using the uncertainty should make \(k(t)\) more smooth.

Parameters of the Kingsman coalescence model

Coalescent.branch_merger_rate(t)[source]: rate at which one particular branch merges with any other branch at time t, in the Kingman model this is: \(\kappa(t) = (k(t)-1)/(2Tc(t))\)

Coalescent.total_merger_rate(t)[source]: rate at which any branch merges with any other branch at time t, in the Kingman model this is: \(\lambda(t) = k(t)(k(t)-1)/(2Tc(t))\)

Coalescent.cost(t_node, branch_length, multiplicity=2.0)[source]

returns the cost associated with a branch starting with divergence time t_node (\(t_n\)) having a branch length \(\tau\). This is equivalent to the probability of there being no merger on that branch and a merger at the end of the branch, calculated in the negative log \(-log(\lambda(t_n+ \tau)^{(m-1)/m}) + \int_{t_n}^{t_n+ \tau} \kappa(t) dt\), where m is the multiplicity

Parameters:

t_node (float) – time of the node
branch_length (float) – branch length, determines when this branch merges with sister
multiplicity (int) – 2 if merger is binary, higher if this is a polytomy

Skyline Methods

Coalescent.optimize_skyline(n_points=20, stiffness=2.0, method='SLSQP', tol=0.03, regularization=10.0, **kwarks)[source]

optimize the trajectory of the clock rate 1./T_c to maximize the coalescent likelihood, this is the product of the cost of coalescence of all nodes

Parameters:

n_points (int) – number of pivots of the Tc interpolation object
stiffness (float) – penalty for rapid changes in log(Tc)
methods (str) – method used to optimize, see documentation of scipy.optimize.minimize for options
tol (float) – optimization tolerance
regularization (float) – cost of moving log(Tc) outsize of the range [-100,0] merger rate is measured in branch length units, no plausible rates should ever be outside this window

Coalescent.skyline_empirical(gen=1.0, n_points=20)[source]

returns the skyline, i.e., an estimate of the inverse rate of coalesence. Here, the skyline is estimated from a sliding window average of the observed mergers, i.e., without reference to the coalescence likelihood.

Parameters:

gen (float) – number of generations per year
n_points (int) –

Coalescent.skyline_inferred(gen=1.0, confidence=False)[source]

return the skyline, i.e., an estimate of the inverse rate of coalesence. This function merely returns the merger rate self.Tc that was set or estimated by other means. If it was determined using self.optimize_skyline, the returned skyline will maximize the coalescent likelihood.

Parameters:

gen (float) – number of generations per year. Unit of time is branch length, hence this needs to be the inverse substitution rate per generation
confidence (boolean, float) – False, or number of standard deviations of confidence intervals