Tajima Nei Distance (Heterogeneous patterns)

In real data, nucleotide frequencies often deviate substantially from 0.25. In this case the Tajima-Nei distance (Tajima and Nei 1984Tajima_and_Nei_1984) gives a better estimate of the number of nucleotide substitutions than the Jukes-Cantor distance. Note that this assumes an equality of substitution rates among sites and between transitionalRH_Transition and transversionalRH_Transversion substitutions. When the nucleotide frequencies are different between the sequences, the modified formula (Tamura and Kumar 2002Tamura_and_Kumar_2002) relaxes the assumption of substitution pattern homogeneity.

 

The Felsenstein-Tajima-Nei model

 

MEGA provides facilities for computing the following quantities for this method:

d: Transitions + Transversions: Number of nucleotide substitutions per site.

L: No of valid common sites: Number of sites compared.

 

Formulas for computing these quantities are as follows:

Distance

where p is the proportion of sites with different nucleotides and

where xij is the relative frequency of the nucleotide pair i and j,  gi’s are the nucleotide frequencies.

Variance can be estimated by the bootstrap method.