Author |
Message |
|
Noelia, welcome aboard. Been looking for your wus, finally got some. Unfortunately, I just had 11 Noelia wu's crash after about 13 secs each with a computational error. Entries from my event log for one wu are below
7/25/2012 9:03:49 PM | GPUGRID | Starting task run2_replica29-NOELIA_sh2fragment_run-0-4-RND8749_2 using acemdlong version 616 (cuda42) in slot 6
7/25/2012 9:04:04 PM | GPUGRID | Computation for task run2_replica29-NOELIA_sh2fragment_run-0-4-RND8749_2 finished
7/25/2012 9:04:04 PM | GPUGRID | Output file run2_replica29-NOELIA_sh2fragment_run-0-4-RND8749_2_1 for task run2_replica29-NOELIA_sh2fragment_run-0-4-RND8749_2 absent
7/25/2012 9:04:04 PM | GPUGRID | Output file run2_replica29-NOELIA_sh2fragment_run-0-4-RND8749_2_2 for task run2_replica29-NOELIA_sh2fragment_run-0-4-RND8749_2 absent
7/25/2012 9:04:04 PM | GPUGRID | Output file run2_replica29-NOELIA_sh2fragment_run-0-4-RND8749_2_3 for task run2_replica29-NOELIA_sh2fragment_run-0-4-RND8749_2 absent
Wu's are of this variety:
run9_replica7-NOELIA_sh2fragment_run-0-4-RND8072_2
Workunit 3598139
Stderr output
<core_client_version>7.0.28</core_client_version>
<![CDATA[
<message>
- exit code 98 (0x62)
</message>
<stderr_txt>
MDIO: cannot open file "restart.coor"
ERROR: file deven.cpp line 1106: # Energies have become nan
called boinc_finish
</stderr_txt>
]]>
Thought you'd want to know - let me know if I should have forwarded other details.
Edit: Win7 64, 2x560ti, AMD FX 6100 6 core @ 3.3 GHZ, 850 watt psu |
|
|
neilp62Send message
Joined: 23 Nov 10 Posts: 14 Credit: 7,876,790,536 RAC: 0 Level
Scientific publications
|
Hmm, I've experience the same error with two back-to-back NOELIA WUs. My PC finished a PAOLA WU just before the NOELIA WUs with no error. For now, I'll suspend the GPUGRID project until something is posted about this... |
|
|
|
Hi Noelia,
same error (twice) also for me in
http://www.gpugrid.net/result.php?resultid=5664840
http://www.gpugrid.net/result.php?resultid=5664472
<core_client_version>7.0.28</core_client_version>
<![CDATA[
<message>
- exit code 98 (0x62)
</message>
<stderr_txt>
MDIO: cannot open file "restart.coor"
ERROR: file deven.cpp line 1106: # Energies have become nan
called boinc_finish
</stderr_txt>
]]>
I'll stop or cancel your WU until something is posted about this error.
k.
____________
Dreams do not always come true. But not because they are too big or impossible. Why did we stop believing.
(Martin Luther King) |
|
|
5potSend message
Joined: 8 Mar 12 Posts: 411 Credit: 2,083,882,218 RAC: 0 Level
Scientific publications
|
Ditto.
All run2 WUs if I'm not mistaken and all failing on all other sent hosts.
Definitely a problem with these. |
|
|
|
yep some errors, also gpu utilization is pretty low, currently at 79% on my GTX 470
task rundig8_run5-NOELIA_smd2-1-5-RND4856_0 using acemdlong version 616 (cuda42)
|
|
|
noeliaSend message
Joined: 5 Jul 12 Posts: 35 Credit: 393,375 RAC: 0 Level
Scientific publications
|
Hi guys,
I apologize for this inconvenience. It is the first time I run the system after doing the equilibration phase in acemdbeta (the first step commented on this other thread: http://www.gpugrid.org/forum_thread.php?id=3088 ), and works quite differently as when we run it locally, so that's why all the simulations where crashing within a few seconds. Now the procedure is automatized and this should not be a problem in the future when running this way. Thank you for you time :)
|
|
|
|
Shouldn't these workunits processed by the 6.47 beta client? |
|
|
5potSend message
Joined: 8 Mar 12 Posts: 411 Credit: 2,083,882,218 RAC: 0 Level
Scientific publications
|
Just had another one recently. It was sent about 5 or so hours ago. I hope these are out of the system now, because I've produced 53 errors on these tasks.
Just seems like a lot of wasted bandwidth on my end and on yours.
I understand things happen, but please take these out of the hopper if you have not done so already.
Cheers |
|
|
|
Started processing one replacement Noelia wu. run10_replica37-NOELIA_sh2fragment_fixed-0-4-RND1582_0
It is only 10% complete after about 90 mins. At start-up the wu projected 9:36 to complete but is now on track for a bit over 14 hours, and prolly significantly more. http://www.gpugrid.net/workunit.php?wuid=3601656
This wu will never qualify for max bonus bc there just isn't enough time to process and return within 24 hours. Same problem that Nathan wus had back in Feb/March if I remember correctly.
I'll let this wu run another couple of hours, see how it is tracking, then update this post.
In the meanwhile, may I suggest you visit with Nathan on proc time as he has lived through this before & was able to adjust the wu's so proc returned to "8-12 hours on fastest cards." On my 560 ti's his wu's typically take about 8 hours to crunch and another 10-12 mins to upload.
Thank you!
|
|
|
|
I'll let this wu run another couple of hours, see how it is tracking, then update this post.
Edit: after 4.5 hours, still on track to finish in a bit over 14 hours |
|
|
|
I'll let this wu run another couple of hours, see how it is tracking, then update this post.
Edit: after 4.5 hours, still on track to finish in a bit over 14 hours
After the release of the GTX 6xx series, I wouldn't consider a GTX 560 Ti as one of "the fastest cards". Besides the GTX 560 Ti 448, it has only 256 usable shaders (by the GPUGrid client) because it is a CC2.1 card (while the Ti 448 'limited edition' is a CC2.0 card, so all of it's shaders can be used by the GPUGrid client).
At the moment the fastest cards are:
GTX 690, 680, 670, 590, 580, 570, 480, 470, 560 Ti 448, 465 |
|
|
|
No argument on which cards are currently fastest out there. Nevertheless, the 560ti can comfortably finish all existing tasks in 8-12 hours with the exception of this latest group from Noelia.
That's all I meant. |
|
|