This method is a modification of the Pamilo-Bianchi-Li and Comeron (1995) methods and is able to handle some problematic degeneracy class assignments (see a detailed description below). It computes the following quantities:
Synonymous distance
This is the number of synonymous substitutions per synonymous site.
Nonsynonymous distance
This is the number of nonsynonymous substitutions per nonsynonymous site.
Substitutions at the 4-fold degenerate sites
This is the number of substitutions per 4-fold degenerate site. It is useful for measuring the rate of neutral evolution.
Substitutions at the 0-fold degenerate sites
This is the number of substitutions per 0-fold degenerate site. It is useful for measuring the rate of amino acid sequence evolution.
Number of 4-fold degenerate sites
This is the estimate of the number of 4-fold degenerate sites, computed by averaging the number of 4-fold degenerate sites in the two sequences, compared.
Number of 0-fold degenerate sites
This is the estimate of the number of 0-fold degenerate sites, computed by averaging the number of 0-fold degenerate sites in the two sequences, compared.
Difference between synonymous and nonsynonymous distances
This computes the differences between the synonymous and nonsynonymous distances. These statistics are useful for conducting tests of selection.
Kumars modification of the PBL method:
The treatment of arginine and isoleucine codons in the Li-Wu-Luo and the Pamilo-Bianchi-Li methods is arbitrary, which sometimes creates a problem because the arginine codons occur quite frequently. Comeron (1995) addressed this problem by dividing the 2-fold degenerate sites into two groups: 2S-fold and 2V-fold. The 2S-fold refers to sites in which the transitional change is synonymous and the two transversional changes are nonsynonymous, whereas the 2V-fold represents sites in which the transitional change is nonsynonymous and the transversional changes are synonymous. Although these definitions help in correcting some of the inaccurate classifications of synonymous and nonsynonymous sites (e.g., methionine codons), they do not solve the problem completely. For example, consider mutations in the first nucleotide position of the arginine codon: CGG produces TGG (Trp), AGG (Arg), or GGG (Gly). The transitional change (C to T) results in a nonsynonymous change. Of the two transversional substitutions, one (C to A) results in a synonymous change, while the other (C to G) results in a nonsynonymous change. Therefore, this nucleotide site is neither a 2S-fold nor a 2V-fold site. Thus, the first position of three arginine codons (CGU, CGC, and CGA) and the third position of two isoleucine codons (ATT and ATC) cannot be assigned to any of the Comeron (1995) categories. For this reason, Comeron (personal communication) used a more complicated classification of codons when he wrote his computer program. For example, the first position of arginine codon CGG was assigned to a 2V-fold site with a probability of one-third and to a 0-fold site with a probability of two-thirds. Similar assignments are used by W.-H. Li (personal communication) in his computer program.
Since the nucleotide site assignments discussed above are quite arbitrary and may not apply to all known genetic code tables, Kumar developed another method that uses the PBL method for any genetic code table. In this version, nucleotide sites are first classified into 0-fold, 2-fold, and 4-fold degenerate sites. The 2-fold degenerate sites are further subdivided into simple 2-fold and complex 2-fold degenerate sites. Simple 2-fold sites are those at which the transitional change results in a synonymous substitution and the two transversional changes result in nonsynonymous substitutions. All other 2-fold sites, including those for the three isoleucine codons, belong to the complex 2-fold site category. If we use this definition, all nucleotide sites can be classified into the five groups shown in the following table.
Table.
Degeneracy -> |
0-fold |
Simple 2-fold |
Complex 2-fold |
4-fold |
|
No. of sites -> |
L0 |
L2S |
L2C |
L4 |
|
|
|
|
Syn |
Nonsyn |
|
Transition (s) |
s0 |
s2 |
s2S |
s2N |
s4 |
Transversion (v) |
v0 |
V2 |
v2S |
v2N |
v4 |
Here, L0, L2S, L2C, and L4 are the numbers of 0-fold, simple 2-fold, complex 2-fold, and 4-fold degenerate sites, respectively.
Once this table is filled using the observed counts for a given pair of sequences, we compute the proportions of transitional (Pi) and transversional (Qi) differences for the i-fold degenerate site in the following way:
|
|
|
|
|
|
From these quantities, we compute the Ai and Bi as in the PBL method. Then using L2 = L2C + L2S, we apply the formulas for the PBL method.
See also Nei and Kumar (2000), page 64.