General Considerations (Sequence Data)

The sequence data must consist of two or more sequences of equal length. All sequences must be aligned and you may use the in-built alignment system for this purpose. Nucleotide and amino acid sequences should be written in IUPAC single-letter codes. Sequences can be written in any combination of upper- and lower-case letters. Special symbols for alignment gaps, missing data, and identical sites also can be included in the sequences.

Special Symbols

Blank spaces and tabs are frequently used to format data files, so they are simply ignored by MEGA. ASCII characters such as the period (.), dash (-), and question mark (?), are generally used as special symbols to represent identity to the first sequence, alignment gaps, and missing data, respectively.