Author |
Message |
|
In the last 6 days I had two work units crashing after 50.000 and 73.000 seconds, respectively.
http://www.gpugrid.net/workunit.php?wuid=865833
http://www.gpugrid.net/workunit.php?wuid=869267
Both were running on my NVIDIA GeForce 9800 GT (1024MB) driver: 19062 using Boinc 6.6.36
Apperently I wasn't the first with these units crashing (also on GT/GTX 9800 and Boinc version 6.6.36), however, I was the last since both units went fine on a GTX 260 afterwards.
The error message in all cases was Cuda error: Kernel [pme_fill_charges_accumulate] failed in file 'fillcharges.cu' in line 73 : unknown error.
Any suggestions what to do? Despite these two work units, all other units didn't have any problems. It just would be nice to avoid loosing more than 100k seconds of GPU time within one week...Thanks. |
|
|
|
And just another one bit the dust......that makes it 3 fails out of the last 13.
Any comment on this issue or should I look for another CUDA project?? |
|
|
zpm![Avatar](user_profile/images/17259_avatar.jpg) Send message
Joined: 2 Mar 09 Posts: 159 Credit: 13,639,818 RAC: 0 Level
![Proline - More than 10M credits Pro](img/badges/aa/badge_pro.png) Scientific publications
![Top 50% (755th/2932) contribution to Buch et al, J. Chem. Inf. Model. 2010 wat](img/badges/papers/badge_pub_gold.png) ![Top 25% (414th/2466) contribution to Sadiq et al, Proteins 2010 wat](img/badges/papers/badge_pub_ruby.png) ![Top 10% (302nd/3118) contribution to Selent et al, PLoS Comput Biol 2010 wat](img/badges/papers/badge_pub_emerald.png) ![Top 10% (357th/4410) contribution to Buch et al, PNAS 2011 wat](img/badges/papers/badge_pub_emerald.png) ![Top 25% (263rd/2450) contribution to Giorgino et al, J. Chem. Theory Comput. 2011 wat](img/badges/papers/badge_pub_ruby.png) ![Top 10% (292nd/9662) contribution to Buch et al, J. Chem. Theory Comput. 2011 wat](img/badges/papers/badge_pub_emerald.png) ![Top 10% (256th/5798) contribution to Sadiq et al, PNAS 2012 wat](img/badges/papers/badge_pub_emerald.png) ![Top 50% (1065th/3349) contribution to Buch et al, JCIM 2013 wat](img/badges/papers/badge_pub_gold.png) ![Top 25% (492nd/4477) contribution to Pérez-Hernández et al, JCP 2013 wat](img/badges/papers/badge_pub_ruby.png) ![Top 90% (1090th/1348) contribution to Doerr et al, JCTC 2017 wat](img/badges/papers/badge_pub_bronze.png) |
their will be some bad apples every once in awhile unless someone really screws up the compiler. just the other day, i had my first wu error for the first time since july........ 4 months without a single error is lucky.... |
|
|
|
Sure errors may occur once in a while, but this may seem to be a systematic one, since 9800GT likes to crash while GTX260 has no problem with the same WU. Apart from wasting lots of GPU hours I would really like to help and report things like these, but for sure it would be nice also to get some feedback.
This is also the first time since July that I run into errors, but 3 from 13 are quite a lot bad apples at a time..... |
|
|
|
Seem all of sudden the G92 and below based cards are computing bullshit.
I can report a similar problem here, my GT200b machine does the job just fine, but my G92 based machine is calculating producing recently a 100% error rate.
i think of switching that one to another project, but i have no idea wich one yet.
Collatz is most of the time out of order, milkyway does not support my G92, for Seti is no Linux Cuda Client avaiable, Aqua stept back from GPU processing at all. Seem i am stuck on this project.
Does anyone know a alternative project where we could join instead of getting crazy trying figure whats the matter here (once again) ?
To be honest i am really getting sick of that crap every now and then. |
|
|
skgivenVolunteer moderator Volunteer tester
![Avatar](https://www.gravatar.com/avatar/77be8b04dc35f6033048abca3f3803c4?s=100&d=identicon) Send message
Joined: 23 Apr 09 Posts: 3968 Credit: 1,995,359,260 RAC: 0 Level
![Histidine - More than 1.5B credits His](img/badges/aa/badge_his.png) Scientific publications
![Top 100% (2761st/2932) contribution to Buch et al, J. Chem. Inf. Model. 2010 wat](img/badges/papers/badge_pub_white.png) ![Top 75% (1680th/2466) contribution to Sadiq et al, Proteins 2010 wat](img/badges/papers/badge_pub_silver.png) ![Top 10% (266th/3118) contribution to Selent et al, PLoS Comput Biol 2010 wat](img/badges/papers/badge_pub_emerald.png) ![Top 1% (15th/4410) contribution to Buch et al, PNAS 2011 wat](img/badges/papers/badge_pub_sapphire.png) ![Top 1% (22nd/2450) contribution to Giorgino et al, J. Chem. Theory Comput. 2011 wat](img/badges/papers/badge_pub_sapphire.png) ![Top 1% (15th/9662) contribution to Buch et al, J. Chem. Theory Comput. 2011 wat](img/badges/papers/badge_pub_sapphire.png) ![Top 1% (27th/3113) contribution to Giorgino et al, J. Chem. Theory Comput, 2012 wat](img/badges/papers/badge_pub_sapphire.png) ![Top 1% (14th/5798) contribution to Sadiq et al, PNAS 2012 wat](img/badges/papers/badge_pub_sapphire.png) ![Top 25% (352nd/1995) contribution to Venken et al, JCTC 2013 wat](img/badges/papers/badge_pub_ruby.png) ![Top 1% (15th/3349) contribution to Buch et al, JCIM 2013 wat](img/badges/papers/badge_pub_sapphire.png) ![Top 10% (49th/3864) contribution to Dainese et al, Biochem. J. 2013 wat](img/badges/papers/badge_pub_emerald.png) ![Top 10% (62nd/4477) contribution to Pérez-Hernández et al, JCP 2013 wat](img/badges/papers/badge_pub_emerald.png) ![Top 10% (70th/2163) contribution to Bisignano et al. JCIM 2014 wat](img/badges/papers/badge_pub_emerald.png) ![Top 10% (14th/1283) contribution to Doerr et al. JCTC 2014 wat](img/badges/papers/badge_pub_emerald.png) ![Top 10% (45th/2838) contribution to Stanley et al, Nat Commun 2014 wat](img/badges/papers/badge_pub_emerald.png) ![Top 1% (18th/3183) contribution to Lauro et al., JCIM 2014 wat](img/badges/papers/badge_pub_sapphire.png) ![Top 1% (27th/3611) contribution to Ferruz et al., JCIM 2015 wat](img/badges/papers/badge_pub_sapphire.png) ![Top 1% (34th/4128) contribution to Ferruz et al., Sci Rep 2016 wat](img/badges/papers/badge_pub_sapphire.png) ![Top 1% (49th/4815) contribution to Stanley et al., Sci Rep 2016 wat](img/badges/papers/badge_pub_sapphire.png) ![Top 10% (105th/4730) contribution to Noe et al., Nat Chem 2017 wat](img/badges/papers/badge_pub_emerald.png) ![Top 100% (1222nd/1348) contribution to Doerr et al, JCTC 2017 wat](img/badges/papers/badge_pub_white.png) ![Top 1% (35th/4634) contribution to Martinez-Rosell et al, JCIM 2018 wat](img/badges/papers/badge_pub_sapphire.png) ![Top 50% (485th/1656) contribution to Kapoor et al., Sci Rep 2017 wat](img/badges/papers/badge_pub_gold.png) ![Top 10% (50th/1885) contribution to Ferruz et al., Sci Rep 2018 wat](img/badges/papers/badge_pub_emerald.png) ![Top 75% (551st/1022) contribution to Wang et al., ACS Cent. Sci. 2019 wat](img/badges/papers/badge_pub_silver.png) ![Top 25% (307th/1541) contribution to Rodriguez-Espigares et al., Nat Meth 2020 wat](img/badges/papers/badge_pub_ruby.png) ![Top 10% (29th/1450) contribution to Herrera-Nieto et al, Sci Rep 2020 wat](img/badges/papers/badge_pub_emerald.png) ![Top 10% (334th/6232) contribution to Herrera-Nieto et al, JCIM 2020 wat](img/badges/papers/badge_pub_emerald.png) |
I'm experiencing similar problems with my G92 cards. About 40% success over the last 2 weeks, and even this has dropped way down since the 28th Nov 09. I expect this might be work unit related, but there seems to be a move away from the G92 core support that is causing more failures. Perhaps the very long run time is to blame. This has been creeping up over the last 6 months. The longer you run the more chance of an error.
You might want to look into hooking your G92 cards up to Folding@home.
It is not a Boinc project, so it has its own client, but there is always work to be done and it seems to be more stable than the other projects you mentioned.
On several occasions running Aqua I lost huge amounts of work when it failed. The last time I lost over 450hours, that was 19 days non-stop folding lost at about 85% in! Not surprised they gave up.
You could also try Einstein. This does use the Boinc client. It has only recently begun to use CUDA, so dont expect too much from it.
I'm going to give my two G92 cards a few more days before I pull them away from here, and I will be trying to make them both as stable as possible in the mean time.
I have my doubts about pushing for G200 and above here. |
|
|
Zydor Send message
Joined: 8 Feb 09 Posts: 252 Credit: 1,309,451 RAC: 0 Level
![Alanine - More than 1M credits Ala](img/badges/aa/badge_ala.png) Scientific publications
![Top 50% (878th/2932) contribution to Buch et al, J. Chem. Inf. Model. 2010 wat](img/badges/papers/badge_pub_gold.png) ![Top 25% (605th/2466) contribution to Sadiq et al, Proteins 2010 wat](img/badges/papers/badge_pub_ruby.png) ![Top 50% (840th/3118) contribution to Selent et al, PLoS Comput Biol 2010 wat](img/badges/papers/badge_pub_gold.png) ![Top 25% (2182nd/9662) contribution to Buch et al, J. Chem. Theory Comput. 2011 wat](img/badges/papers/badge_pub_ruby.png) |
I was having lots of problems with my 9800GTX, and a while back had to stop GPUGrid WUs as they were crashing too often. It had crunched very well in the past, and certainly others had 9800GTXs running with no issues, so it was all a little strange, but I couldnt take the blue screens anymore. I suspected the driver but could not say definitively. I am paranoid about cleaning old driver bits out, so it was not that.
I keep track with new drivers as they came out with little change to me so far. I changed to 195.62. It was a whole new world ..... blue screens disappeared, reliability returned, along with my sanity ....
Run it for a couple of weeks now and no changes, rock solid, one blue screen and that was my fault. Probably will get back to crunching here again as I miss doing these WUs for many reasons.
Worth a shot at the new drivers, worked for me so far, yet to do the acid test with a GPUGrid WU, but looking good so far.
Regards
Zy |
|
|
skgivenVolunteer moderator Volunteer tester
![Avatar](https://www.gravatar.com/avatar/77be8b04dc35f6033048abca3f3803c4?s=100&d=identicon) Send message
Joined: 23 Apr 09 Posts: 3968 Credit: 1,995,359,260 RAC: 0 Level
![Histidine - More than 1.5B credits His](img/badges/aa/badge_his.png) Scientific publications
![Top 100% (2761st/2932) contribution to Buch et al, J. Chem. Inf. Model. 2010 wat](img/badges/papers/badge_pub_white.png) ![Top 75% (1680th/2466) contribution to Sadiq et al, Proteins 2010 wat](img/badges/papers/badge_pub_silver.png) ![Top 10% (266th/3118) contribution to Selent et al, PLoS Comput Biol 2010 wat](img/badges/papers/badge_pub_emerald.png) ![Top 1% (15th/4410) contribution to Buch et al, PNAS 2011 wat](img/badges/papers/badge_pub_sapphire.png) ![Top 1% (22nd/2450) contribution to Giorgino et al, J. Chem. Theory Comput. 2011 wat](img/badges/papers/badge_pub_sapphire.png) ![Top 1% (15th/9662) contribution to Buch et al, J. Chem. Theory Comput. 2011 wat](img/badges/papers/badge_pub_sapphire.png) ![Top 1% (27th/3113) contribution to Giorgino et al, J. Chem. Theory Comput, 2012 wat](img/badges/papers/badge_pub_sapphire.png) ![Top 1% (14th/5798) contribution to Sadiq et al, PNAS 2012 wat](img/badges/papers/badge_pub_sapphire.png) ![Top 25% (352nd/1995) contribution to Venken et al, JCTC 2013 wat](img/badges/papers/badge_pub_ruby.png) ![Top 1% (15th/3349) contribution to Buch et al, JCIM 2013 wat](img/badges/papers/badge_pub_sapphire.png) ![Top 10% (49th/3864) contribution to Dainese et al, Biochem. J. 2013 wat](img/badges/papers/badge_pub_emerald.png) ![Top 10% (62nd/4477) contribution to Pérez-Hernández et al, JCP 2013 wat](img/badges/papers/badge_pub_emerald.png) ![Top 10% (70th/2163) contribution to Bisignano et al. JCIM 2014 wat](img/badges/papers/badge_pub_emerald.png) ![Top 10% (14th/1283) contribution to Doerr et al. JCTC 2014 wat](img/badges/papers/badge_pub_emerald.png) ![Top 10% (45th/2838) contribution to Stanley et al, Nat Commun 2014 wat](img/badges/papers/badge_pub_emerald.png) ![Top 1% (18th/3183) contribution to Lauro et al., JCIM 2014 wat](img/badges/papers/badge_pub_sapphire.png) ![Top 1% (27th/3611) contribution to Ferruz et al., JCIM 2015 wat](img/badges/papers/badge_pub_sapphire.png) ![Top 1% (34th/4128) contribution to Ferruz et al., Sci Rep 2016 wat](img/badges/papers/badge_pub_sapphire.png) ![Top 1% (49th/4815) contribution to Stanley et al., Sci Rep 2016 wat](img/badges/papers/badge_pub_sapphire.png) ![Top 10% (105th/4730) contribution to Noe et al., Nat Chem 2017 wat](img/badges/papers/badge_pub_emerald.png) ![Top 100% (1222nd/1348) contribution to Doerr et al, JCTC 2017 wat](img/badges/papers/badge_pub_white.png) ![Top 1% (35th/4634) contribution to Martinez-Rosell et al, JCIM 2018 wat](img/badges/papers/badge_pub_sapphire.png) ![Top 50% (485th/1656) contribution to Kapoor et al., Sci Rep 2017 wat](img/badges/papers/badge_pub_gold.png) ![Top 10% (50th/1885) contribution to Ferruz et al., Sci Rep 2018 wat](img/badges/papers/badge_pub_emerald.png) ![Top 75% (551st/1022) contribution to Wang et al., ACS Cent. Sci. 2019 wat](img/badges/papers/badge_pub_silver.png) ![Top 25% (307th/1541) contribution to Rodriguez-Espigares et al., Nat Meth 2020 wat](img/badges/papers/badge_pub_ruby.png) ![Top 10% (29th/1450) contribution to Herrera-Nieto et al, Sci Rep 2020 wat](img/badges/papers/badge_pub_emerald.png) ![Top 10% (334th/6232) contribution to Herrera-Nieto et al, JCIM 2020 wat](img/badges/papers/badge_pub_emerald.png) |
Some of the Work Units were not compatible with my G92 cards;
TONI-HERG is Bad on G92 Cards.
|
|
|