Author |
Message |
MarkJ Volunteer moderator Volunteer tester Send message
Joined: 24 Dec 08 Posts: 738 Credit: 200,909,904 RAC: 0 Level
Scientific publications
|
I've been getting an awful lot of these since the switch to the 6.05 app. On 2 different machines too. Mainly 6.05 but have also been getting 6.72 tasks as well. All machines have 196.21 drivers under Win7 x64. Interestingly the GTX275 seems immune to these, only the GTX295's seem to be getting them.
Both suggest a bug with the code/driver/hardware combination.
From 6.05's I get the following errors:
The system cannot find the path specified. (0x3) - exit code 3 (0x3)
</message>
<stderr_txt>
# Using device 1
# There are 2 devices supporting CUDA
# Device 0: "GeForce GTX 295"
# Clock rate: 1.24 GHz
# Total amount of global memory: 939524096 bytes
# Number of multiprocessors: 30
# Number of cores: 240
# Device 1: "GeForce GTX 295"
# Clock rate: 1.24 GHz
# Total amount of global memory: 939524096 bytes
# Number of multiprocessors: 30
# Number of cores: 240
MDIO ERROR: cannot open file "restart.coor"
SWAN : FATAL : Failure executing kernel [mshake_position_kernel_1] [2] [10,1,1][64,1,1]
Assertion failed: 0, file swanlib_nv.cpp, line 281
For the 6.72's I tend to get the following, but I also sometimes get the same as the 6.05's
- exit code 98 (0x62)
</message>
<stderr_txt>
# Using device 1
# There are 2 devices supporting CUDA
# Device 0: "GeForce GTX 295"
# Clock rate: 1.24 GHz
# Total amount of global memory: 939524096 bytes
# Number of multiprocessors: 30
# Number of cores: 240
# Device 1: "GeForce GTX 295"
# Clock rate: 1.24 GHz
# Total amount of global memory: 939524096 bytes
# Number of multiprocessors: 30
# Number of cores: 240
ERROR: file ntnbrlist.cpp line 63: Insufficent memory available for pairlists. Set pairlistdist to match the cutoff.
____________
BOINC blog |
|
|
skgivenVolunteer moderator Volunteer tester
Send message
Joined: 23 Apr 09 Posts: 3968 Credit: 1,995,359,260 RAC: 0 Level
Scientific publications
|
This error appears many times for your, my and everyone else's failing cards (when running 6.72 tasks):
ERROR: file ntnbrlist.cpp line 63: Insufficent memory available for pairlists. Set pairlistdist to match the cutoff.
called boinc_finish
For 6.05 failures you are getting this Error:
Assertion failed: 0, file swanlib_nv.cpp, line 281
This application has requested the Runtime to terminate it in an unusual way.
Please contact the application's support team for more information.
I was getting the same 6.72 errors for my GTX260 on Win7, so I pulled the card from the project. It had been fine for 6.03 tasks, but I could not find a driver that worked for any 6.72 tasks or the 6.05 tasks.
Some errors (6.72) seem to be RAM error variants from the earlier errors that appeared for 6.72 tasks, and were corrected using driver 196.21 for Vista, for example. So some drivers currently work for some of these tasks but not others.
I would add operating system to your code/driver/hardware possible error combo.
XP seems to be more stable than Vista or Win7.
For people with one GPU this is a matter of just trying different drivers until they find one that works for All tasks, but when you have several GPUs on several platforms it is more difficult (very slow).
Mark, at least you are using the same operating system across your systems.
By the way your GTX275 is using a different driver (197.13), but I don’t think that means this is a good driver for all cards, just that exact card by the exact manufacturer.
I would suggest you try to go back as far as a 195.xx driver for GT240 cards under Win7. |
|
|
MarkJ Volunteer moderator Volunteer tester Send message
Joined: 24 Dec 08 Posts: 738 Credit: 200,909,904 RAC: 0 Level
Scientific publications
|
Well today they all seem to work. I haven't changed anything, so maybe just a bad batch of wu.
I was holding off on a driver upgrade, waiting for cuda 3.1 to be released. Then it might be worthwhile updating for some stability improvements if nothing else.
[edit]Spoke too soon. A couple failed after posting this[/edit]
____________
BOINC blog |
|
|
MarkJ Volunteer moderator Volunteer tester Send message
Joined: 24 Dec 08 Posts: 738 Credit: 200,909,904 RAC: 0 Level
Scientific publications
|
Well upgraded one of GTX295's to 197.45 drivers. Its more stable, but still gets errors. The GTX295 still running 196.21 drivers just had 4 fail in a row (complete with popups). Nope make that 6.
Anyone tried the 257 drivers? They are beta ones and seem to be where nvidia is going after the 197 series. Apparently they support cuda 3.1 as well.
____________
BOINC blog |
|
|
skgivenVolunteer moderator Volunteer tester
Send message
Joined: 23 Apr 09 Posts: 3968 Credit: 1,995,359,260 RAC: 0 Level
Scientific publications
|
Anyone tried the 257 drivers? They are beta ones and seem to be where nvidia is going after the 197 series. Apparently they support cuda 3.1 as well.
Yes.
Tried it on my GTX260 (Win 7 x64, Boinc 6.10.56). No change on that card; still fails tasks after about 9sec. It was failing all 6.05 and all 6.72 tasks. Up to a week or so ago it was doing well on the 6.03 WU's.
Also tried it on a GTX470 Win XP x86 SP3.
Works fine on that card. |
|
|
MarkJ Volunteer moderator Volunteer tester Send message
Joined: 24 Dec 08 Posts: 738 Credit: 200,909,904 RAC: 0 Level
Scientific publications
|
Anyone tried the 257 drivers? They are beta ones and seem to be where nvidia is going after the 197 series. Apparently they support cuda 3.1 as well.
Yes.
Tried it on my GTX260 (Win 7 x64, Boinc 6.10.56). No change on that card; still fails tasks after about 9sec. It was failing all 6.05 and all 6.72 tasks. Up to a week or so ago it was doing well on the 6.03 WU's.
Also tried it on a GTX470 Win XP x86 SP3.
Works fine on that card.
It seems (in my opinion) to be bugs with the app rather than the hardware or driver. Hopefully the guys can track it down and fix it, Gets rather annoying having all the tasks fail and the popups every time.
____________
BOINC blog |
|
|
|
It seems (in my opinion) to be bugs with the app rather than the hardware or driver. Hopefully the guys can track it down and fix it, Gets rather annoying having all the tasks fail and the popups every time.
It is why I stopped running GPU Grid on the GTX295 card ... running all MW on it now ... only run GPU Grid on the GTX260 which still seems to run the tasks just fine ... |
|
|
|
The "Never Ending Story" continues.
My GTX295 doing well for days (weeks) but immediately the fault series starts.
I analyse my "Everest" logs - no temperature problems.
Doing restarts, change clocking, change drivers, change BM-version, change from xp32-host to vista64-host (and back) - the fault series continues.
Ok - I say to myself - it's a faulty WU-series.
But one look at the long series of valid WUs - crunched by my two high overclocked GTX260 - and I'm back at square one.
Wearily fell asleep and next day the GTX295 doing well again for days (weeks).
It's GPUGrid crunchers KARMA... |
|
|
MarkJ Volunteer moderator Volunteer tester Send message
Joined: 24 Dec 08 Posts: 738 Credit: 200,909,904 RAC: 0 Level
Scientific publications
|
I noticed that the GTX295's would always fail the 2nd wu. Turning off SLI (or as they now call it under 197.45 drivers - Multi-GPU mode) fixes it. That accounts for why my GTX275 seemed immune.
This all seems to have started with the 6.05/6.72 apps. Prior to that everything was fine. Upgrading drivers (I was using 196.21, now on 197.45) doesn't make any difference.
____________
BOINC blog |
|
|
Mad MattSend message
Joined: 29 Aug 09 Posts: 28 Credit: 101,584,171 RAC: 0 Level
Scientific publications
|
I noticed that the GTX295's would always fail the 2nd wu. Turning off SLI (or as they now call it under 197.45 drivers - Multi-GPU mode) fixes it. That accounts for why my GTX275 seemed immune.
Cheers for the hint, that did the trick for me as well. Additionally it seems to me that WUs are running slightly faster now. Could anyone confirm or deny this observation? Last not least: could you please add this to an FAQ? I just was happy to stumble upon the information here.
Host: http://www.gpugrid.net/results.php?hostid=73368
____________
|
|
|
ftpd Send message
Joined: 6 Jun 08 Posts: 152 Credit: 328,250,382 RAC: 0 Level
Scientific publications
|
How can i put it down with the new driver 257.21 for the gtx295?
The choice is automatical or gtx295 A or B or cpu!
Multi-gpu mode is on or off, but after changing it is allways on.
With the old drivers it works also better for seti@home!
____________
Ton (ftpd) Netherlands |
|
|
skgivenVolunteer moderator Volunteer tester
Send message
Joined: 23 Apr 09 Posts: 3968 Credit: 1,995,359,260 RAC: 0 Level
Scientific publications
|
Ton, I see you upgraded all your cards to the latest driver.
Hopefully we will soon see an improvement in performance for the Fermi cards (CUDA 3010), but I am not sure you will see any performance gain in the non Fermi cards. If you cannot configure the GTX295 to work in non-sli mode, you might want to roll back the driver, but perhaps you can disable Sli in NVidia Control Panel.
New Driver, new problems!
Good luck, |
|
|
Mad MattSend message
Joined: 29 Aug 09 Posts: 28 Credit: 101,584,171 RAC: 0 Level
Scientific publications
|
How can i put it down with the new driver 257.21 for the gtx295?
The choice is automatical or gtx295 A or B or cpu!
Multi-gpu mode is on or off, but after changing it is allways on.
With the old drivers it works also better for seti@home!
Using 197.45 here on XP-64. Everything is running perfectly.But I guess you need some other drivers because of a Fermi present?
____________
|
|
|