MantisBT - MEGA
View Issue Details
0001158MEGAAlignment Explorerpublic2019-03-19 18:262019-03-21 13:11
guest 
gstecher 
normalminorhave not tried
resolvedwon't fix 
PCLinuxother
MEGA-CC 11 (command line version) 
 
Tom
Caldwell
tcaldwel@uci.edu
tcaldwel@uci.edu
0001158: error in TClustalThread.SetSeqArray: Out of memory
I am trying to run multiple sequence alignment with a very large data sets containing 100,000 to 1,000,000 sequences using the megacc program on a high performing computing cluster associated with my school. Cluster runs on linux OS. My alignments get aborted about 5 minutes after starting, and they are giving me an error message that reads: "error in TClustalThread.SetSeqArray: Out of memory". The issue may be an internal one regarding the module itself, not a problem with the cpu memory usage (see image "cpu memory usage.jpg").
For my data set (P02c.fas, too large to be uploaded into this error report but it contains ~ 1.6 million sequences), I am using ClustalW alignment (clustal_align_NC.mao). I set my output to P02_align. I run this command using the following script:

megacc -a clustal_align_NC.mao -d P02c.fas -o P02_align

After I submit this job on the high-throughput scheduler, the job is in qw (waiting) for a minute or two before the run begins.

Run lasts about 5 minutes before the error occurs.

This worked in the past with a different dataset, but it only contained ~ 5,000 sequences. I don't know if this is a problem with the size of the input or software bug, but it definitely isn't a problem with the HPCC.
See "error_report.txt" for the error message I received when running this script.

Size of input file is ~ 527MB
No tags attached.
zip mega bug report.zip (198,878) 2019-03-19 18:26
https://megasoftware.net/mantis_bt/
Issue History
2019-03-19 18:26guestNew Issue
2019-03-19 18:26guestFile Added: mega bug report.zip
2019-03-21 13:11gstecherNote Added: 0004211
2019-03-21 13:11gstecherStatusnew => resolved
2019-03-21 13:11gstecherResolutionopen => won't fix
2019-03-21 13:11gstecherAssigned To => gstecher

Notes
(0004211)
gstecher   
2019-03-21 13:11   
Hi Tom,

I am writing in response to the bug report you recently submitted regarding the megacc software. The ClustalW implementation in megacc is not an appropriate tool to align such a large data set - it was written over 20 years ago, before you could get so much data. I am not sure aligning that many sequences makes sense (see https://www.drive5.com/muscle/manual/bigalignments.html [^]). Anyway, if you need a software that might work with that data set, you might try MAFFT (https://mafft.cbrc.jp/alignment/software/ [^]).

--
Best regards,
Glen Stecher
Institute for Genomics and Evolutionary Medicine
igem.temple.edu