MantisBT - MEGA
View Issue Details
0000329MEGAMain Formpublic2016-12-05 09:552017-01-20 13:54
guest 
gstecher 
normalminorhave not tried
assignedopen 
PCLinuxRedHat
MEGA-CC 11 (command line version) 
 
Max
Sanderford
tuf79348@temple.edu
tuf79348@temple.edu
0000329: Failed attempt to lock file on networked file system hangs indefinitely, instead of timing out/returning a descriptive error mes
Failed attempt to lock file on networked file system hangs indefinitely, instead of timing out/returning a descriptive error message.
Try to run an analysis with input/output on a networked filesystem.
The full command line was:

/home/tue39618/EP/Programs/megacc -a /home/tue39618/EP/Programs/Living_seq_Poisson_G5.mao -d /home/tue39618/EP/Test_data/NM_002773/seqs/gene_012.fas -t /home/tue39618/EP/Test_data/NM_002773/trees/gene_012.nwk -o /home/tue39618/EP/Test_data/NM_002773/AS/gene_012.results

This runs perfectly fine on the head node (i.e. it start processing things and spitting out updates). It fails on all of the compute nodes. There, it simply prints the header

MEGA-CC 7.0.18 Molecular Evolutionary Genetics Analysis
Build#: 7160617-x86_64

and then hangs, chewing CPU.

So, I ran it in strace as root on a compute node, and the problem was quickly apparent.



strace /home/tue39618/EP/Programs/megacc -a /home/tue39618/EP/Programs/Living_seq_Poisson_G5.mao -d /home/tue39618/EP/Test_data/NM_002773/seqs/gene_012.fas -t /home/tue39618/EP/Test_data/NM_002773/trees/gene_012.nwk -o /home/tue39618/EP/Test_data/NM_002773/AS/gene_012.results
execve("/home/tue39618/EP/Programs/megacc", ["/home/tue39618/EP/Programs/megac"..., "-a", "/home/tue39618/EP/Programs/Livin"..., "-d", "/home/tue39618/EP/Test_data/NM_0"..., "-t", "/home/tue39618/EP/Test_data/NM_0"..., "-o", "/home/tue39618/EP/Test_data/NM_0"...], [/* 49 vars */]) = 0
brk(0) = 0x21ce000

<SNIP>
A bunch of stuff loading the shared libraries, registering interrupt handlers, mapping memory, and other initialization things
</SNIP>

stat("/home/tue39618/EP/Test_data/NM_002773/AS/gene_012.results", 0x7ffe682a7488) = -1 ENOENT (No such file or directory)
stat("/home/tue39618/EP/Test_data/NM_002773/AS/", {st_mode=S_IFDIR|0775, st_size=4096, ...}) = 0
stat("/home/tue39618/EP/Test_data/NM_002773/AS/", {st_mode=S_IFDIR|0775, st_size=4096, ...}) = 0
access("/home/tue39618/EP/Programs/Living_seq_Poisson_G5.mao", F_OK) = 0
write(1, "MEGA-CC 7.0.18 Molecular Evoluti"..., 56MEGA-CC 7.0.18 Molecular Evolutionary Genetics Analysis
) = 56
write(1, "Build#: 7160617-x86_64\n", 23Build#: 7160617-x86_64
) = 23
access("/home/tue39618/EP/Programs/Living_seq_Poisson_G5.mao", F_OK) = 0
access("/home/tue39618/EP/Programs/Living_seq_Poisson_G5.mao", F_OK) = 0
open("/home/tue39618/EP/Programs/Living_seq_Poisson_G5.mao", O_RDONLY|O_LARGEFILE) = 3
flock(3, LOCK_SH|LOCK_NB

So, the output file doesn't exist (not an error). It writes the header, then goes on to parse the .mao file. This file exists, and is opened with file descriptor 3. Then, the attempt to lock the file hangs.

But, flock isn't safe on NFS mounts (i.e. it doesn't work), and user data is mounted NFS on the compute nodes. This attempt to lock just hangs. It never times out (in my impatient waiting at least), and never returns.

So, this is an issue with this version of megacc accessing files over NFS.
No tags attached.
Issue History
2016-12-05 09:55guestNew Issue
2016-12-05 11:38gstecherFile Deleted: MEGA3 Error Report.txt
2017-01-20 13:54gstecherAssigned To => gstecher
2017-01-20 13:54gstecherStatusnew => assigned

Notes
(0000255)
gstecher   
1969-12-31 17:33   
I\'m not sure if this is a bug. Probably trying to open M3 mts files with old versions.