Sequence Utilities

treetime.seq_utils.seq2array(seq, word_length=1, convert_upper=False, fill_overhangs=False, ambiguous='N')[source]

Take the raw sequence, substitute the “overhanging” gaps with ‘N’ (missequenced), and convert the sequence to the numpy array of chars.

Parameters:

seq (Biopython.SeqRecord, str, iterable) – Sequence as an object of SeqRecord, string or iterable
word_length (int, optional) – 1 for nucleotide or amino acids, 3 for codons etc.
convert_upper (bool, optional) – convert the sequence to upper case
fill_overhangs (bool) – If True, substitute the “overhanging” gaps with ambiguous character symbol
ambiguous (char) – Specify the character for ambiguous state (‘N’ default for nucleotide)

Returns:

sequence – Sequence as 1D numpy array of chars

Return type:

np.array

treetime.seq_utils.seq2prof(seq, profile_map)[source]

Convert the given character sequence into the profile according to the alphabet specified.

Parameters:

Returns:

idx – Profile for the character. Zero array if the character not found

Return type:

numpy.array

treetime.seq_utils.prof2seq(profile, gtr, sample_from_prof=False, normalize=True, rng=None)[source]

Convert profile to sequence and normalize profile across sites.

Parameters:

profile (numpy 2D array) – Profile. Shape of the profile should be (L x a), where L - sequence length, a - alphabet size.
gtr (gtr.GTR) – Instance of the GTR class to supply the sequence alphabet
collapse_prof (bool) – Whether to convert the profile to the delta-function

Returns:

seq (numpy.array) – Sequence as numpy array of length L
prof_values (numpy.array) – Values of the profile for the chosen sequence characters (length L)
idx (numpy.array) – Indices chosen from profile as array of length L