Sequence Utilities
- treetime.seq_utils.seq2array(seq, word_length=1, convert_upper=False, fill_overhangs=False, ambiguous='N')[source]
Take the raw sequence, substitute the “overhanging” gaps with ‘N’ (missequenced), and convert the sequence to the numpy array of chars.
- Parameters:
seq (Biopython.SeqRecord, str, iterable) – Sequence as an object of SeqRecord, string or iterable
word_length (int, optional) – 1 for nucleotide or amino acids, 3 for codons etc.
convert_upper (bool, optional) – convert the sequence to upper case
fill_overhangs (bool) – If True, substitute the “overhanging” gaps with ambiguous character symbol
ambiguous (char) – Specify the character for ambiguous state (‘N’ default for nucleotide)
- Returns:
sequence – Sequence as 1D numpy array of chars
- Return type:
np.array
- treetime.seq_utils.seq2prof(seq, profile_map)[source]
Convert the given character sequence into the profile according to the alphabet specified.
- Parameters:
seq (numpy.array) – Sequence to be converted to the profile
profile_map (dic) – Mapping valid characters to profiles
- Returns:
idx – Profile for the character. Zero array if the character not found
- Return type:
numpy.array
- treetime.seq_utils.prof2seq(profile, gtr, sample_from_prof=False, normalize=True, rng=None)[source]
Convert profile to sequence and normalize profile across sites.
- Parameters:
profile (numpy 2D array) – Profile. Shape of the profile should be (L x a), where L - sequence length, a - alphabet size.
gtr (gtr.GTR) – Instance of the GTR class to supply the sequence alphabet
collapse_prof (bool) – Whether to convert the profile to the delta-function
- Returns:
seq (numpy.array) – Sequence as numpy array of length L
prof_values (numpy.array) – Values of the profile for the chosen sequence characters (length L)
idx (numpy.array) – Indices chosen from profile as array of length L