Converting NEXUS Format

The NEXUS file format has a header with lines identifying the name of each of the sequences in the file, followed by lines that begin with the sequence name and some data. An example of part of an input file is:

 

#NEXUS

BEGIN DATA;

DIMENSIONS NTAX=17 NCHAR=428;

FORMAT DATATYPE=PROTEIN INTERLEAVE MISSING=-;

[Name: Chloroflex Len: 428 Check: 0]

[Name: Rcapsulatu Len: 428 Check: 0]

MATRIX

Chloroflex MSKEHVQTIATDDVSKNGHT PPTNASTPPYPFVAIVGQAE

Rcapsulatu ---------MTTAVARLQPS ASGAKTRPVFPFSAIVGQED

 

Chloroflex DQCRALEQQSGKTKKPAVIN IPVPVVDLPLGATEDRVCGT

Rcapsulatu DWATVLS-----TN---VIR KPTPVVDLPLGVSEDRVVGA

 

The MEGA conversion function looks for all the lines starting with the “[Name:” flag and takes the following word as a sequence name. The conversion function then scans through the data looking for all lines starting with each of the identified names and places them on the output. This appears as follows:

 

#mega

Title: infile.nexus

#Chloroflex

MSKEHVQTIATDDVSKNGHT PPTNASTPPYPFVAIVGQAE

DQCRALEQQSGKTKKPAVIN IPVPVVDLPLGATEDRVCGT

 

#Rcapsulatu

---------MTTAVARLQPS ASGAKTRPVFPFSAIVGQED

DWATVLS-----TN---VIR KPTPVVDLPLGVSEDRVVGA