Converting XML Format

These files consist of a group of XML tags and attribute values. A DOCTYPE header may or may not be present. The MEGA input converter for XML file formats does not implement a full parser; it only looks for a few specific tags that might be present. For example, an XML file might contain the following data:

<Bioseq-set>

<seq-data>ATACATCATAACACTACTTCCTACCCATAAGCTCCTTTTAACTTGTTAAAGTCTTGCTTGAATT

AAAGACTTGTTTAAACACAAAAATTTAGAGTTTTACTCAACAAAAGTGATTGATTGATTGATTGATTGATTGATGGTT

TACAGTAGGACTTCATTCTAGTCATTATAGCTGCTGGCAGTATAACTGGCCAGCCTTTAATACATTGCTGCTTAGAGT

CAAAGCATGTACTTAGAGTT</seq-data>

</Bioseq>

</Bioseq-set>

The MEGA format converter looks for the following two tags:

<seq-data>ATACATCATAACACTAC. . .</seq-data>

If it finds these tags, it uses the text between the <name>. . .</name> tags as the sequence name, and the text between the <seq-data>. . .</seq-data> tags as the sequence data corresponding to that name. The conversion of the above XML block into MEGA format would look like this:

#Mega

Title: filename.xml

#G019uabh

ATACATCATAACACTACTTCCTACCCATAAGCTCCTTTTAACTTGTTAAAGTCTTGCTTGAATT

AAAGACTTGTTTAAACACAAAAATTTAGAGTTTTACTCAACAAAAGTGATTGATTGATTGATTGATTGATTGATGGTT

TACAGTAGGACTTCATTCTAGTCATTATAGCTGCTGGCAGTATAACTGGCCAGCCTTTAATACATTGCTGCTTAGAGT