Overview

With its theoretical basis firmly established in molecular evolutionary and population genetics, comparative sequence analysis has become essential for reconstructing the evolutionary histories of species and multi-gene families, estimating the rates of molecular evolution, and inferring the nature and extent of selective forces shaping the evolution of genes and genomes. This need is now well-recognized and has led to a greatly expanded horizon for the application of computational and statistical methods. Since these methods require the use of computers, the need for easy-to-use computer programs is well appreciated. These programs must contain fast computational algorithms and useful statistical methods, and they must have an extensive user-interface so they can be used by experimentalists working at the forefront of sequence data generation and analysis toward the discovery of novel patterns as well as the exploration of basic sequence attributes. This dual need motivated the development of the MEGA (Molecular Evolutionary Genetics Analysis) software in the early 1990's. From its inception, the goal of MEGA has been to make available a wide variety of statistical and computational methods for comparative sequence analysis in a user-friendly environment. The first version of MEGA, released in 1993, was distributed to over 2000 scientists. MEGA2, released in 2001, was a complete rewrite of the first version to take advantage of the manifold increase in the computing power of the average desktop computer, and the availability of the Microsoft Windows graphical interfaces. The user-friendliness and methodological advances of MEGA2 and the increased scope of molecular evolutionary analysis in the scientific community led to a ten-fold increase of the number of users from around the world. A survey of research papers citing MEGA reveals that the software is used in diverse disciplines, including AIDS/HIV research, virology, bacteriology and general disease, plant biology, conservation biology, systematics, developmental evolution, and population genetics.

The newly released MEGA versions (3 and 4) expands the functionalities of MEGA by adding sequence data alignment and assembly features, along with other advancements. In version 3, the data sequence acquisition, effectively integrated with the evolutionary analyses, makes it much easier to conduct comparative analyses in an integrated computing environment. Version 4 includes a unique facility to generate captions, written in figure legend format, in order to provide natural language descriptions of the models and methods used in the analyses. This facility aims to promote a better understanding of the underlying assumptions used in analyses, and of the results generated. Another new feature is the Maximum Composite Likelihood (MCL) method for estimating evolutionary distances between all pairs of sequences simultaneously, with and without incorporating rate variation among sites and substitution pattern heterogeneities among lineages.

MEGA comes with on-line help outlining the different aspects of its user-interface. Extensive details of the statistical and computational methods available in MEGA are presented in the book Molecular Evolution and Phylogenetics (Nei and Kumar, Oxford University Press, 2000). This book explains the various statistical methods for analyzing molecular data while showing how to interpret the results obtained by various computer programs. It also includes examples of data analysis, a majority of which can be conducted in MEGA3. The DNA sequence and other data used in numerical examples in the Nei and Kumar book are available for use in research and teaching on the book website: http://lifesciences.asu.edu/mep/.