Computing Sequence Statistics

The “Drosophila_Adh.meg” data file, which is used in this tutorial, can be found in the MEGA/Examples folder (The default location for Windows users is C:\Program Files\MEGA\Examples. The default location for Mac users is $HOME/MEGA/Examples, where $HOME is the user’s home directory).


Using Sequence Data Explorer

The Sequence Data Explorer provides various tools for visually analyzing sequence data as well as calculating compositional statistics. In the following examples we will demonstrate the basic usage of the Sequence Data Explorer.

Example 9.1:

Activate the "Drosophila_Adh.meg" file). If necessary, refer to Example 1.2 in the “MEGA Basics” tutorial.

Select the Data | Explore Active Data (F4) command.

Use the arrow keys on your keyboard or the mouse to move from site to site. At the bottom left corner of the window, you will find an indicator that displays the column and the total number of sites. As you move through the columns, the column indicator changes.

 

Highlighting

If you look at the bottom of the Sequence Data Explorer window, the Highlighted Sites indicator displays "None" because no special site attributes are yet highlighted.

You can highlight variable sites in various ways:

Example 9.2:

Use one of the above methods to highlight variable sites in the Drosophila data. All sites that are variable are now highlighted. The Highlighted indicator at the bottom of the window has been replaced with the Variable indicator. The number of sites which are variable is displayed, along with the total number of sites (Variable sites/Total # of sites). When you press the V key again, the sites return to the normal color. The Highlighted indicator again displays "None".

Now highlight the parsimony-informative sites by pressing the P key, clicking on the button labeled Pi from the shortcut bar below the main menu, or selecting the Highlight | Parsim-Info sites menu option. The Highlighted indicator turns into the Parsim-info indicator.

To highlight 0, 2, and 4-fold degenerate sites, press the 0, 2, or 4 keys, respectively, or click on the corresponding buttons from the shortcut bar below the main menu, or select the corresponding command from the Highlight menu. Once again, the Highlighter indicator will turn into the Zero-fold indicator, Two-fold indicator, and Four-fold indicator respectively.

 

Statistics

The Statistics main menu option allows you to calculate Nucleotide Composition, Nucleotide Pair Frequencies and Codon Usage. Before selecting one of these options, you will need to select whether to use all sites or only the highlighted sites. You will also need to select the format in which you want the results displayed.

Example 9.3:

Select Statistics | Use All Selected Sites. To display the results of the calculation in a text file using the built-in text editor, click the Statistics menu option again and select the Display Results in Text Editor option. To calculate the nucleotide base frequencies, select the option, Nucleotide Composition, from the Statistics menu.

To compute codon usage, go back to the Sequence Data Explorer and select the Statistics | Codon Usage menu command. This will calculate the codon usage and display the results of the calculation in a text file using the built-in text editor.

To compute nucleotide pair frequencies, select the Statistics | Nucleotide Pair Frequencies | Directional (16 pairs), or the Statistics | Nucleotide Pair Frequencies | Undirectional (10 pairs) main menu option. This will calculate the pair frequencies and display the results of the calculation in a text file using the built-in text editor.

Note: Notice that the Amino Acid Compositions option on the Statistics menu is disabled (grayed-out). This option is only available if the sequences have been translated.

 

Using the Amino Acid Composition Option

Example 9.4:

To translate these protein-coding sequences into amino acid sequences and back again, select the Data | Translate Sequences main menu command from the Sequence Data Explorer window.

Once the sequences are translated, calculate the amino acid composition by selecting the Statistics | Amino Acid Composition main menu command from the Sequence Data Explorer window.

Close the Text File Editor and Format Convertor window without saving your work. Close the Sequence Data Explorer and select Close Data icon on the main MEGA window.