Computing Statistical Quantities for Nucleotide Sequences

In this exercise, we illustrate the use of the Data Explorer for computing various statistical quantities of nucleotide sequences. In addition, we explain shortcuts for obtaining frequently used commands, methods of accessing on-line help, and the distinction between enabled and disabled commands.

Ex 8.0.1: Start MEGA by double-clicking on the MEGA desktop icon, or by using the Windows start-menu to click on the MEGA icon located in the programs folder.

We now will examine the contents of the file Drosophila_Adh.meg by using the built-in Text Editor.

Ex 8.1.1: Click on the File menu item to expand the menu options. To activate the text editor, either click File |Text Editor or press the F3 key on your keyboard. In the text editor, use the File|Open command to open the Drosophila_Adh.meg file.

Ex 8.1.2: Examine the Drosophila_Adh.meg file. Take note of the #mega format specifier, title, OTU names, and the interleaved sequence data.

Ex 8.1.3: We advise that you exit the text editor before proceeding with data analysis. Select the File menu item from the text editor's menu, and click the Exit option from the expanded menu. If the editor asks you if you would like to save the changes that you have made to the file, select No.

To study statistical quantities of the data in the file Drosophila_Adh.meg, we must first activate it.

Ex 8.2.1: You can activate a data file using the link titled “Click me to activate a data file” in the main application window, or select the File menu item from the main menu and click the Open Data option from the expanded menu. You may also press the F5 key on your keyboard. All of these methods will display a standard Windows open file dialog box.

Ex 8.2.2: Open the Drosophila_Adh.meg data file under the Examples folder.

Ex 8.2.3: A progress dialog box will appear briefly. When the data file is active, details about it are displayed at the bottom of the main application window. More menu items now are available on the main menu.

Examine the main menu. Now that the data file is active, the menu items Data, Distances, Pattern, and Selection have become available.

We now will use Data Explorer to compute some basic statistics for these data.

Ex 8.3.1: Select the Data|Data Explorer command, or press the F4 key if the Sequence Data Explorer is not available.

Ex 8.3.2: DNA sequences are displayed on the screen in a grid format. Use the left and right arrow keys (←→) or the mouse to move from site to site; note a change in the bottom-left corner of the display. Use the up and down (↑↓) arrow keys or the mouse to move between OTUs. The Total Sites view on the bottom-left panel displays the sequence length under the current site position, and the Highlighted Sites displays “None” because no special site attributes are yet highlighted.

Ex 8.3.3: To highlight variable sites, select the Highlight|Variable Sites option, click the button labeled “V” from the shortcut bar below the menu, or press the V key. All sites that are variable are highlighted, and the number in the Highlighted Sites display changes. When you press V again, the sites return to the normal color, and Highlighted Sites displays “None.”

Ex 8.3.4: Now to highlight the parsimony-informative, press the P key, click on the button labeled “Pi” from the shortcut bar below the menu, or select the Highlight|Parsim-info sites menu command. To highlight 0, 2, and 4-fold degenerate sites, press the 0, 2, or 4 keys, respectively, click on the corresponding button from the shortcut bar below the menu, or select the corresponding command from the highlight menu.

Ex 8.3.5: To compute the nucleotide base frequencies, select the Statistics|Nucleotide Composition menu command. This will calculate the composition and display the results of the calculation in a text file using the built-in text editor.

Ex 8.3.6: To compute codon usage, select the Statistics|Codon Usage menu command. This will calculate the codon usage and display the results of the calculation in a text file using the built-in text editor.

Ex 8.3.7: To compute nucleotide pair frequencies, select the Statistics|Nucleotide Pair Frequencies|Directional, or the Statistics|Nucleotide Pair Frequencies|Unidirectional menu command. This will calculate the pair frequencies and display the results of the calculation in a text file using the built-in text editor.

Ex 8.3.8: To translate these protein-coding sequences into amino acid sequences and back, press the T key, or select the Data|Translate/Untranslate menu command from the Data Explorer menu.

Ex 8.3.9: Once the sequences are translated, calculate the amino acid composition by selecting the Statistics|Amino Acid Composition menu command from the Data Explorer Menu.

Ex 8.3.10: To shut down MEGA, select the File|Exit menu command from the main MEGA application window and close the data file.