MantisBT

View Issue Details Jump to Notes ] Issue History ] Print ]
IDProjectCategoryView StatusDate SubmittedLast Update
0001158MEGAAlignment Explorerpublic2019-03-19 18:262019-03-21 13:11
Reporterguest 
Assigned Togstecher 
PrioritynormalSeverityminorReproducibilityhave not tried
StatusresolvedResolutionwon't fix 
PlatformPCOSLinux 
Product VersionMEGA-CC 11 (command line version) 
Target VersionFixed in Version 
Summary0001158: error in TClustalThread.SetSeqArray: Out of memory
DescriptionI am trying to run multiple sequence alignment with a very large data sets containing 100,000 to 1,000,000 sequences using the megacc program on a high performing computing cluster associated with my school. Cluster runs on linux OS. My alignments get aborted about 5 minutes after starting, and they are giving me an error message that reads: "error in TClustalThread.SetSeqArray: Out of memory". The issue may be an internal one regarding the module itself, not a problem with the cpu memory usage (see image "cpu memory usage.jpg").
Steps To ReproduceFor my data set (P02c.fas, too large to be uploaded into this error report but it contains ~ 1.6 million sequences), I am using ClustalW alignment (clustal_align_NC.mao). I set my output to P02_align. I run this command using the following script:

megacc -a clustal_align_NC.mao -d P02c.fas -o P02_align

After I submit this job on the high-throughput scheduler, the job is in qw (waiting) for a minute or two before the run begins.

Run lasts about 5 minutes before the error occurs.

This worked in the past with a different dataset, but it only contained ~ 5,000 sequences. I don't know if this is a problem with the size of the input or software bug, but it definitely isn't a problem with the HPCC.
Additional InformationSee "error_report.txt" for the error message I received when running this script.

Size of input file is ~ 527MB
TagsNo tags attached.
Attach Tags (Separate by ",")
First NameTom
Last NameCaldwell
Emailtcaldwel@uci.edu
Confirm Emailtcaldwel@uci.edu
Attached Fileszip file icon mega bug report.zip (198,878 bytes) 2019-03-19 18:26

- Relationships

-  Notes
(0004211)
gstecher (administrator)
2019-03-21 13:11

Hi Tom,

I am writing in response to the bug report you recently submitted regarding the megacc software. The ClustalW implementation in megacc is not an appropriate tool to align such a large data set - it was written over 20 years ago, before you could get so much data. I am not sure aligning that many sequences makes sense (see https://www.drive5.com/muscle/manual/bigalignments.html [^]). Anyway, if you need a software that might work with that data set, you might try MAFFT (https://mafft.cbrc.jp/alignment/software/ [^]).

--
Best regards,
Glen Stecher
Institute for Genomics and Evolutionary Medicine
igem.temple.edu

- Issue History
Date Modified Username Field Change
2019-03-19 18:26 guest New Issue
2019-03-19 18:26 guest File Added: mega bug report.zip
2019-03-21 13:11 gstecher Note Added: 0004211
2019-03-21 13:11 gstecher Status new => resolved
2019-03-21 13:11 gstecher Resolution open => won't fix
2019-03-21 13:11 gstecher Assigned To => gstecher


Copyright © 2000 - 2024 MantisBT Team
Powered by Mantis Bugtracker