Sequence Utilities

treetime.seq_utils.seq2array(seq, word_length=1, convert_upper=False, fill_overhangs=False, ambiguous='N')[source]

Take the raw sequence, substitute the “overhanging” gaps with ‘N’ (missequenced), and convert the sequence to the numpy array of chars.

Parameters
  • seq (Biopython.SeqRecord, str, iterable) – Sequence as an object of SeqRecord, string or iterable

  • word_length (int, optional) – 1 for nucleotide or amino acids, 3 for codons etc.

  • convert_upper (bool, optional) – convert the sequence to upper case

  • fill_overhangs (bool) – If True, substitute the “overhanging” gaps with ambiguous character symbol

  • ambiguous (char) – Specify the character for ambiguous state (‘N’ default for nucleotide)

Returns

sequence – Sequence as 1D numpy array of chars

Return type

np.array

treetime.seq_utils.seq2prof(seq, profile_map)[source]

Convert the given character sequence into the profile according to the alphabet specified.

Parameters
  • seq (numpy.array) – Sequence to be converted to the profile

  • profile_map (dic) – Mapping valid characters to profiles

Returns

idx – Profile for the character. Zero array if the character not found

Return type

numpy.array

treetime.seq_utils.prof2seq(profile, gtr, sample_from_prof=False, normalize=True)[source]

Convert profile to sequence and normalize profile across sites.

Parameters
  • profile (numpy 2D array) – Profile. Shape of the profile should be (L x a), where L - sequence length, a - alphabet size.

  • gtr (gtr.GTR) – Instance of the GTR class to supply the sequence alphabet

  • collapse_prof (bool) – Whether to convert the profile to the delta-function

Returns

  • seq (numpy.array) – Sequence as numpy array of length L

  • prof_values (numpy.array) – Values of the profile for the chosen sequence characters (length L)

  • idx (numpy.array) – Indices chosen from profile as array of length L