0001158: error in TClustalThread.SetSeqArray: Out of memory

View Issue Details [ Jump to Notes ]

[ Issue History ] [ Print ]

Project

Category

View Status

Date Submitted

Last Update

0001158

MEGA

Alignment Explorer

public

2019-03-19 18:26

2019-03-21 13:11

Reporter

guest

Assigned To

gstecher

Priority

normal

Severity

minor

Reproducibility

have not tried

Status

resolved

Resolution

won't fix

Platform

Linux

Product Version

Target Version

Fixed in Version

Summary

0001158: error in TClustalThread.SetSeqArray: Out of memory

Description

I am trying to run multiple sequence alignment with a very large data sets containing 100,000 to 1,000,000 sequences using the megacc program on a high performing computing cluster associated with my school. Cluster runs on linux OS. My alignments get aborted about 5 minutes after starting, and they are giving me an error message that reads: "error in TClustalThread.SetSeqArray: Out of memory". The issue may be an internal one regarding the module itself, not a problem with the cpu memory usage (see image "cpu memory usage.jpg").

Steps To Reproduce

For my data set (P02c.fas, too large to be uploaded into this error report but it contains ~ 1.6 million sequences), I am using ClustalW alignment (clustal_align_NC.mao). I set my output to P02_align. I run this command using the following script:

megacc -a clustal_align_NC.mao -d P02c.fas -o P02_align

After I submit this job on the high-throughput scheduler, the job is in qw (waiting) for a minute or two before the run begins.

Run lasts about 5 minutes before the error occurs.

This worked in the past with a different dataset, but it only contained ~ 5,000 sequences. I don't know if this is a problem with the size of the input or software bug, but it definitely isn't a problem with the HPCC.

Additional Information

See "error_report.txt" for the error message I received when running this script.

Size of input file is ~ 527MB

Tags

No tags attached.

Attach Tags

(Separate by ",")

First Name

Tom

Last Name

Caldwell

tcaldwel@uci.edu

Confirm Email

tcaldwel@uci.edu

Attached Files

mega bug report.zip (198,878 bytes) 2019-03-19 18:26

Relationships

Notes
(0004211) gstecher (administrator) 2019-03-21 13:11	Hi Tom, I am writing in response to the bug report you recently submitted regarding the megacc software. The ClustalW implementation in megacc is not an appropriate tool to align such a large data set - it was written over 20 years ago, before you could get so much data. I am not sure aligning that many sequences makes sense (see https://www.drive5.com/muscle/manual/bigalignments.html [^]). Anyway, if you need a software that might work with that data set, you might try MAFFT (https://mafft.cbrc.jp/alignment/software/ [^]). -- Best regards, Glen Stecher Institute for Genomics and Evolutionary Medicine igem.temple.edu

Issue History
Date Modified	Username	Field	Change
2019-03-19 18:26	guest	New Issue
2019-03-19 18:26	guest	File Added: mega bug report.zip
2019-03-21 13:11	gstecher	Note Added: 0004211
2019-03-21 13:11	gstecher	Status	new => resolved
2019-03-21 13:11	gstecher	Resolution	open => won't fix
2019-03-21 13:11	gstecher	Assigned To	=> gstecher