Advanced search

Message boards : Graphics cards (GPUs) : Failures since upgrading to 190.38

Author Message
MarkJ
Volunteer moderator
Volunteer tester
Send message
Joined: 24 Dec 08
Posts: 738
Credit: 200,909,904
RAC: 0
Level
Leu
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 11332 - Posted: 26 Jul 2009 | 10:55:37 UTC

Had 6 wu fail today since machine was upgraded to 190.38. Interestingly its the only machine of the 5 running GPUgrid that seems to be having the problem. Machine is a Win XP box with dual GTX260's in it. Links to the wu:

ERROR: c:\cygwin\home\speechserver\gpumd2\src\pme\CPME_cufft.cu, line 104:
Cuda error: Kernel [reduce4_kernel] failed in file 'reduction.cu' in line 171 :
ERROR: c:\cygwin\home\speechserver\gpumd2\src\pme\CPME_cufft.cu, line 11:
ERROR: c:\cygwin\home\speechserver\gpumd2\src\pme\CPME_cufft.cu, line 104:
Cuda error: Kernel [fft_data_swizzle_in] failed in file 'c:\cygwin\home\speechserver\gpumd2\src\pme\CPME_cufft.cu'
ERROR: c:\cygwin\home\speechserver\gpumd2\src\pme\CPME_cufft.cu, line 11:

In the mean time i've set it to NNW in the hope that both the download issues will go away and the cuda 2.2 app will fix things.
____________
BOINC blog

STE\/E
Send message
Joined: 18 Sep 08
Posts: 368
Credit: 2,640,096,048
RAC: 48,737,442
Level
Phe
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwat
Message 11337 - Posted: 26 Jul 2009 | 13:55:08 UTC
Last modified: 26 Jul 2009 | 13:57:10 UTC

I checked both the Box's I mentioned in the other Thread and they both were running at 1/2 speed which I know from past experience leads to continual errors until fixed.

I Uninstalled the Drivers on both Box's and reinstalled them, after rebooting both Box's were running @ full speed. They may not stay running like that though because I have 1 Card now (GTX 260) being RMA'ed for the same reason. Apparently there is a fix or work around for that problem and if the 2 Box's continue to drop back to half speed I'll have to try it rather than have to try & RMA 2 more Cards.

All 4 Cards in the 2 Box's are GTX 260's FYI ... All the Wu's on both Box's got Trashed too doing the Reinstalling of the Drivers even though I suspend GPUGrid & the Wu's, Didn't matter, they were gone after BOINC Started back up & had to download fresh ones.

Mark Henderson
Send message
Joined: 21 Dec 08
Posts: 51
Credit: 26,320,167
RAC: 0
Level
Val
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwat
Message 11338 - Posted: 26 Jul 2009 | 14:18:33 UTC - in response to Message 11337.

Poorboy, read my post. " Link to prevent Nvidia 200 Downclocking" I believe that could be the problem. The 200 goes into power saving mode by design.

Mark Henderson
Send message
Joined: 21 Dec 08
Posts: 51
Credit: 26,320,167
RAC: 0
Level
Val
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwat
Message 11339 - Posted: 26 Jul 2009 | 14:19:25 UTC - in response to Message 11337.
Last modified: 26 Jul 2009 | 14:25:34 UTC

Poorboy, read my post. " Link to prevent Nvidia 200 Downclocking" I believe that could be the problem. The 200 goes into power saving mode by design. I thought my 1st 200 series card was bad also, but it was not. 3d performance mode has to be forced via. software, Riva Tuner is what I use.
I have been trying to get the word out as a LOT of people have been posting Boinc wide, thinking their cards are bad.
Maybe this is not your problem but it sounds just like my experience.


I checked both the Box's I mentioned in the other Thread and they both were running at 1/2 speed which I know from past experience leads to continual errors until fixed.

I Uninstalled the Drivers on both Box's and reinstalled them, after rebooting both Box's were running @ full speed. They may not stay running like that though because I have 1 Card now (GTX 260) being RMA'ed for the same reason. Apparently there is a fix or work around for that problem and if the 2 Box's continue to drop back to half speed I'll have to try it rather than have to try & RMA 2 more Cards.

All 4 Cards in the 2 Box's are GTX 260's FYI ... All the Wu's on both Box's got Trashed too doing the Reinstalling of the Drivers even though I suspend GPUGrid & the Wu's, Didn't matter, they were gone after BOINC Started back up & had to download fresh ones.

Profile Bymark
Avatar
Send message
Joined: 23 Feb 09
Posts: 30
Credit: 5,897,921
RAC: 0
Level
Ser
Scientific publications
watwatwatwatwat
Message 11342 - Posted: 26 Jul 2009 | 16:15:23 UTC - in response to Message 11339.
Last modified: 26 Jul 2009 | 16:56:18 UTC

Joining this tread:
wuid=653971
I think i's a server problem?


On one of my xp32 with 260:

<core_client_version>6.4.7</core_client_version>
<![CDATA[
<message>
- exit code 98 (0x62)
</message>
<stderr_txt>
# Using CUDA device 0
# Device 0: "GeForce GTX 260"
# Clock rate: 1242000 kilohertz
# Total amount of global memory: 939196416 bytes
# Number of multiprocessors: 27
# Number of cores: 216
# Amber: readparm : Reading parm file parameters
# PARM file in AMBER 7 format
# Encounter 10-12 H-bond term
WARNING: parameters.cu, line 568: Found zero 10-12 H-bond term.
WARNING: parameters.cu, line 568: Found zero 10-12 H-bond term.
MDIO ERROR: cannot open file "restart.coor"
ERROR: c:\cygwin\home\speechserver\gpumd2\src\pme\CPME_cufft.cu, line 104: cufftExecC2R (gridcalc3)
called boinc_finish

</stderr_txt>
]]>

and

CPU time 280.4531
stderr out
<core_client_version>6.4.7</core_client_version>
<![CDATA[
<message>
- exit code 98 (0x62)
</message>
<stderr_txt>
# Using CUDA device 0
# Device 0: "GeForce GTX 260"
# Clock rate: 1242000 kilohertz
# Total amount of global memory: 939196416 bytes
# Number of multiprocessors: 27
# Number of cores: 216
MDIO ERROR: cannot open file "restart.coor"
ERROR: c:\cygwin\home\speechserver\gpumd2\src\pme\CPME_cufft.cu, line 104: cufftExecC2R (gridcalc3)
called boinc_finish

</stderr_txt>
]]>
____________
"Silakka"
Hello from Turku > Åbo.

STE\/E
Send message
Joined: 18 Sep 08
Posts: 368
Credit: 2,640,096,048
RAC: 48,737,442
Level
Phe
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwat
Message 11349 - Posted: 26 Jul 2009 | 19:26:14 UTC - in response to Message 11338.
Last modified: 26 Jul 2009 | 20:23:49 UTC

Poorboy, read my post. " Link to prevent Nvidia 200 Downclocking" I believe that could be the problem. The 200 goes into power saving mode by design.


Yes I was just about to see if I could get that to work, somebody sent me the Link a few days ago because I was having the same problem on another 200 Series Card. I didn't try it then because the Card was already sent out for RMA'ing & I should have a different Card in the next 2 days.

I'll Post if that works for me or not later today or tomorrow if I can get it set up right. Thanks

PS: So far this don't seem to be working, I think I'm doing every thing okay but upon Reboot the Settings don't hold. 2 Cores will just read 0 Speed & the 3'rd Core on the Box I'm trying it on just defaults back to stack Speeds.

EVGA Precision Tune & GPU-Z show the same 0 Speed for 2 Cores & Stock Speed for 1 Core, so about all I've managed to do so far is lose 3 more Cores. If it's going to take all this jumping thru hoops to run the New Cuda App's I'm afraid the Grid Project will lose a lot of Participants, especially with the new CUDA Projects starting up.

I've lost 7 Cores already with the Upgrade to the new Drivers to supposedly be able to run the new Cuda App's and don't feel I can afford to lose any more.

Profile Bymark
Avatar
Send message
Joined: 23 Feb 09
Posts: 30
Credit: 5,897,921
RAC: 0
Level
Ser
Scientific publications
watwatwatwatwat
Message 11350 - Posted: 26 Jul 2009 | 20:34:22 UTC
Last modified: 26 Jul 2009 | 20:42:55 UTC

hostid=35303


Try to install drivers again, It seems ok now, with my trouble host and
all others 4 gpu computers are working fine with 190.38. It seems like over gigs of drivers can go wrong some time?
____________
"Silakka"
Hello from Turku > Åbo.

STE\/E
Send message
Joined: 18 Sep 08
Posts: 368
Credit: 2,640,096,048
RAC: 48,737,442
Level
Phe
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwat
Message 11351 - Posted: 26 Jul 2009 | 21:21:58 UTC - in response to Message 11350.

hostid=35303


Try to install drivers again, It seems ok now, with my trouble host and
all others 4 gpu computers are working fine with 190.38. It seems like over gigs of drivers can go wrong some time?


Already tried that and within a few hours both Box's had Trashed 4 more Wu's each.

naja002
Avatar
Send message
Joined: 25 Sep 08
Posts: 111
Credit: 10,352,599
RAC: 0
Level
Pro
Scientific publications
watwatwatwatwatwatwatwat
Message 11353 - Posted: 27 Jul 2009 | 0:11:48 UTC

I've installed 190.38 on 2 dual gpu rigs.

The Q6600--no problems.

The i7 920 was nothing but problems. I upgraded to 6.6.37 and that seemed to fix the issue. I reinstalled the driver. I've switched back to RT, forced the driver and forced 3D performance. It has been running for a couple of days now error free.

6.6.37

Force Driver in RT--Post #9

Force 3D Performance



STE\/E
Send message
Joined: 18 Sep 08
Posts: 368
Credit: 2,640,096,048
RAC: 48,737,442
Level
Phe
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwat
Message 11355 - Posted: 27 Jul 2009 | 0:19:07 UTC - in response to Message 11353.

I've switched back to RT, forced the driver and forced 3D performance. It has been running for a couple of days now error free.

6.6.37

Force Driver in RT--Post #9

Force 3D Performance





I've tried Mark's Fix on 4 Box's just a few hours ago so I won't really know if it worked or not until in the morning probably. If the Speeds of the Cards don't drop back by then at least it will be longer than they have been holding the Speeds. Usually within a few hr's the Wu's will error because of the Speed Drop.

I didn't do the Fix on 3 other Box's because so far I haven't been having any problems with them & it seems any time I do a Settings change that requires a Reboot the Wu's gets Trashed when BOINC restarts again after the Reboot so I didn't want to Trash any more Wu's today than I already had.

Profile Bymark
Avatar
Send message
Joined: 23 Feb 09
Posts: 30
Credit: 5,897,921
RAC: 0
Level
Ser
Scientific publications
watwatwatwatwat
Message 11357 - Posted: 27 Jul 2009 | 6:26:20 UTC - in response to Message 11351.

hostid=35303


Try to install drivers again, It seems ok now, with my trouble host and
all others 4 gpu computers are working fine with 190.38. It seems like over gigs of drivers can go wrong some time?


Already tried that and within a few hours both Box's had Trashed 4 more Wu's each.


Nope did't help... get errors in 2-3 hours.

<core_client_version>6.4.7</core_client_version>
<![CDATA[
<message>
The system cannot find the path specified. (0x3) - exit code 3 (0x3)
</message>
<stderr_txt>
# Using CUDA device 0
# Device 0: "GeForce GTX 260"
# Clock rate: 1242000 kilohertz
# Total amount of global memory: 939196416 bytes
# Number of multiprocessors: 27
# Number of cores: 216
MDIO ERROR: cannot open file "restart.coor"
# Using CUDA device 0
# Device 0: "GeForce GTX 260"
# Clock rate: 1242000 kilohertz
# Total amount of global memory: 939196416 bytes
# Number of multiprocessors: 27
# Number of cores: 216
Cuda error in file '..\cuda/cutil.h' in line 968 : unspecified launch failure.
Memory usage: host: bytes device: bytes
Assertion failed: 0, file ..\cuda/cutil.h, line 968

This application has requested the Runtime to terminate it in an unusual way.
Please contact the application's support team for more information.

</stderr_txt>
]]>


____________
"Silakka"
Hello from Turku > Åbo.

Profile Bymark
Avatar
Send message
Joined: 23 Feb 09
Posts: 30
Credit: 5,897,921
RAC: 0
Level
Ser
Scientific publications
watwatwatwatwat
Message 11408 - Posted: 27 Jul 2009 | 22:07:14 UTC - in response to Message 11357.
Last modified: 27 Jul 2009 | 22:17:39 UTC

hostid=35303

Almost 10 years ago I started with seti, with a Pentium MMX 166 MHz, and It seems like my boinc career will end with seti, this 260 won't work with gpugrid anymore after update to 190.38.

"not yet format the hard drive, anything else is done"
____________
"Silakka"
Hello from Turku > Åbo.

STE\/E
Send message
Joined: 18 Sep 08
Posts: 368
Credit: 2,640,096,048
RAC: 48,737,442
Level
Phe
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwat
Message 11409 - Posted: 27 Jul 2009 | 22:49:32 UTC - in response to Message 11408.

hostid=35303

Almost 10 years ago I started with seti, with a Pentium MMX 166 MHz, and It seems like my boinc career will end with seti, this 260 won't work with gpugrid anymore after update to 190.38.

"not yet format the hard drive, anything else is done"


I don't think I'll finish My BOINC Career with SETI but the way my Cards keep dropping like flies it probably won't be with GPUGrid either. Had 6 Cards down already for errors & found 2 more this afternoon that for all practical purposes they may as well be down.

It's a Dual GTX 275 Setup that hasn't turned in but 3 Wu's in the last 50 Hr's, it's not turning in errors but it's not really turning in anything because it's slowed to a crawl I guess. BFG's not going to say oh sure send us the 8 Cards you can't crunch with anymore and we'll send you 8 shiny new ones so I'm pretty much stuck with them I figure for better or worse.

Mark Henderson
Send message
Joined: 21 Dec 08
Posts: 51
Credit: 26,320,167
RAC: 0
Level
Val
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwat
Message 11410 - Posted: 27 Jul 2009 | 22:58:11 UTC - in response to Message 11409.
Last modified: 27 Jul 2009 | 23:01:57 UTC

Are you all uninstalling the old Nvidia driver first from add remove programs, then uninstall Pxysx, and then running " Driver Sweeper " in safe mode after reboot to remove all the old remnants before updating Nvidia drivers?
I would suggest this if it has't already been tried.
I take this long route and seldom have problems.

Don't uninstall Physx before Nvidia drivers, I messed up doing that. Nvidia first and then Physx.

Profile GDF
Volunteer moderator
Project administrator
Project developer
Project tester
Volunteer developer
Volunteer tester
Project scientist
Send message
Joined: 14 Mar 07
Posts: 1957
Credit: 629,356
RAC: 0
Level
Gly
Scientific publications
watwatwatwatwat
Message 11422 - Posted: 28 Jul 2009 | 8:39:07 UTC - in response to Message 11410.

Roll back to 185.xx, there seem to be problems with 190.xx over some hardware.
185.xx will be fine for gpugrid for quite a while.

gdf

Profile Bymark
Avatar
Send message
Joined: 23 Feb 09
Posts: 30
Credit: 5,897,921
RAC: 0
Level
Ser
Scientific publications
watwatwatwatwat
Message 11431 - Posted: 28 Jul 2009 | 13:49:41 UTC - in response to Message 11422.
Last modified: 28 Jul 2009 | 14:33:23 UTC

Tried that yesterday, same result. Hostid=35303 is now on NNW.
Seti is running fine on that host. (one error in 24h)
Something is strange with that computer, now it's running one seti gpu and 2 mc on double core amd 5600. Nice :), nothing is oc.
http://setiathome.berkeley.edu/results.php?hostid=4914727
and
Hostid=35303 cpuz.txt



http://personal.inet.fi/surf/tbymark/boinc/cpuz.txt
Tried that too:
Are you all uninstalling the old Nvidia driver first from add remove programs, then uninstall Pxysx, and then running " Driver Sweeper " in safe mode after reboot to remove all the old remnants before updating Nvidia drivers?
I would suggest this if it has't already been tried.
I take this long route and seldom have problems.

Don't uninstall Physx before Nvidia drivers, I messed up doing that. Nvidia first and then Physx.

Roll back to 185.xx, there seem to be problems with 190.xx over some hardware.
185.xx will be fine for gpugrid for quite a while.

gdf

____________
"Silakka"
Hello from Turku > Åbo.

STE\/E
Send message
Joined: 18 Sep 08
Posts: 368
Credit: 2,640,096,048
RAC: 48,737,442
Level
Phe
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwat
Message 11436 - Posted: 28 Jul 2009 | 15:09:44 UTC - in response to Message 11422.

Roll back to 185.xx, there seem to be problems with 190.xx over some hardware.
185.xx will be fine for gpugrid for quite a while.

gdf


I tried that on 4 Cards & still got the Errors and Down-clocking with them, re-installed the 190.38 Drivers & am Processing the Collatz Wu's just fine with no errors or Down-clocking, even re-overclocked them again and they still ran fine.

I'll run that for awhile and keep an eye on the Forum here for a real fix with the 190.38's or or try a new Driver Version or Client as they come out & see if that fixes the Cards that went South on the Grid Project.

ExtraTerrestrial Apes
Volunteer moderator
Volunteer tester
Avatar
Send message
Joined: 17 Aug 08
Posts: 2705
Credit: 1,311,122,549
RAC: 0
Level
Met
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 11482 - Posted: 29 Jul 2009 | 19:03:27 UTC

Am I right that there's not a single reported failure with G9x cards, only G200 are affected? But some of them still run fine with 190.xx?

MrS
____________
Scanning for our furry friends since Jan 2002

Profile Zydor
Send message
Joined: 8 Feb 09
Posts: 252
Credit: 1,309,451
RAC: 0
Level
Ala
Scientific publications
watwatwatwat
Message 11483 - Posted: 29 Jul 2009 | 19:25:00 UTC - in response to Message 11482.

My 9800GTX+ has been ok with 190.38 - no failures or blips of any kind.

Regards
Zy

Profile Steve Dodd
Send message
Joined: 26 Dec 08
Posts: 17
Credit: 3,249,337,729
RAC: 17,097,359
Level
Arg
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 11484 - Posted: 29 Jul 2009 | 20:39:34 UTC

Question, possibly for PoorBoy. The GTX 260 cards - are they the Core 216 version? I'm having the same problems as everyone else getting GPUGRID wu to run on this card (XP Home 32-bit, Q6600, stock everything). I've tried to roll back to previous versions of the driver (currently running 185.XX) with no positive results. I'm not showing a downclocking problem using GPU-Z.

STE\/E
Send message
Joined: 18 Sep 08
Posts: 368
Credit: 2,640,096,048
RAC: 48,737,442
Level
Phe
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwat
Message 11487 - Posted: 29 Jul 2009 | 22:24:35 UTC - in response to Message 11484.
Last modified: 29 Jul 2009 | 22:25:36 UTC

Question, possibly for PoorBoy. The GTX 260 cards - are they the Core 216 version? I'm having the same problems as everyone else getting GPUGRID wu to run on this card (XP Home 32-bit, Q6600, stock everything). I've tried to roll back to previous versions of the driver (currently running 185.XX) with no positive results. I'm not showing a downclocking problem using GPU-Z.


Yes, the 260's I have are all 216 Shader Versions, once they started Down-clocking I couldn't stop them from doing that no matter what I did. I tried the Down-clocking Fix, going back to the 185.18's, Re-installing the 190.38's & different BOINC Clients. I even set them all back to their Default running Speeds but nothing worked.

I'd reset them to their Default running speeds and within as little as a few minutes some of them would drop to half speed and start giving errors on the Wu's after that. I've been running the Collatz Project for almost 2 days now with the same BOINC Client & NVIDIA Drivers (6.6.36 & 190.38) without 1 single error and not 1 NVIDIA Card has Dropped it's Speed even after re-Overclocking them again to run the Collatz Wu's.

So all I can assume is some how or way the Grid Wu's run must have something to do with them Down-clocking as fast as I could reset them to their original speeds again.

STE\/E
Send message
Joined: 18 Sep 08
Posts: 368
Credit: 2,640,096,048
RAC: 48,737,442
Level
Phe
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwat
Message 11488 - Posted: 29 Jul 2009 | 22:28:34 UTC - in response to Message 11482.

Am I right that there's not a single reported failure with G9x cards, only G200 are affected? But some of them still run fine with 190.xx?

MrS


Just my GTX 260 216 Shader Versions were affected by the Down-clocking bug, but all my cards 260's, 275's, 280's & 295's gave errors. The 260's seemed to give more than the rest though.

Profile Steve Dodd
Send message
Joined: 26 Dec 08
Posts: 17
Credit: 3,249,337,729
RAC: 17,097,359
Level
Arg
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 11490 - Posted: 30 Jul 2009 | 5:14:35 UTC - in response to Message 11487.

PoorBoy,
My GTX 260 Core 216 shows clock rates of 576MHz GPU clock, 999MHz Memory clock, and 1242 MHz Shader clock (GPU-Z values). These are also shown as the default clock values. Am I right in my interpretation of your problem that one or all of these clock values are 1/2 of my clock values, or am I running at 1/2 speed and don't know it :)

STE\/E
Send message
Joined: 18 Sep 08
Posts: 368
Credit: 2,640,096,048
RAC: 48,737,442
Level
Phe
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwat
Message 11501 - Posted: 30 Jul 2009 | 12:04:08 UTC - in response to Message 11490.
Last modified: 30 Jul 2009 | 12:07:03 UTC

PoorBoy,
My GTX 260 Core 216 shows clock rates of 576MHz GPU clock, 999MHz Memory clock, and 1242 MHz Shader clock (GPU-Z values). These are also shown as the default clock values. Am I right in my interpretation of your problem that one or all of these clock values are 1/2 of my clock values, or am I running at 1/2 speed and don't know it :)


Your Clock values look like the Default Values for most 260's unless their the OC Type right from the Factory then they would be a little higher.

What my cards would do (Not all but some of them) is after making sure they were indeed running at the Default Values with GPU-Z is after some running time drop to 300 Core & 400 Memory. They weren't at idle either because the Grid Wu's would be running & showing Progression. Once yhey dropped to 1/2 speed the errors would follow soon after. The only way I found to get the Speed back to Defaults is to Re-Boot the Computer affected by the 1/2 Speed GPU.

Sometime it would only take a few minutes before they would drop their speed and other times they would take an hour or two before dropping their speed. Like I said I've been running Collatz with no problems and run all my cards at 650-Core 1475-Shaders 1100-Memory. Some guys run them higher than that but those are the speeds I've found to be the most stable for me with no Hang up's or error's so that's what I run them at.

Colatz is down now & I have no work from them but I'm very reluctant to re-start the Grid Project back up again and have to go thru all the headaches I went thru for 3 days earlier so I haven't. Been just letting the NVIDIA cards sit for now & hoping the Collatz Project comes back up soon.

Profile [FVG] bax
Avatar
Send message
Joined: 18 Jun 08
Posts: 29
Credit: 17,772,874
RAC: 0
Level
Pro
Scientific publications
watwatwatwatwatwat
Message 11508 - Posted: 30 Jul 2009 | 16:49:06 UTC

GTX 260 - 216 SP - Xp SP3

185.xx driver: 10 WUs, 10 error while computing

190.38 driver: 10 WUs, 9 error while computing


sorry, bye bye

Profile Bymark
Avatar
Send message
Joined: 23 Feb 09
Posts: 30
Credit: 5,897,921
RAC: 0
Level
Ser
Scientific publications
watwatwatwatwat
Message 11510 - Posted: 30 Jul 2009 | 17:50:49 UTC - in response to Message 11508.
Last modified: 30 Jul 2009 | 18:07:00 UTC

Same here, my gpus went to Collatz and with no errors yet, waiting for a new nvidia driver and if it work then maybe coming back!
____________
"Silakka"
Hello from Turku > Åbo.

Rabinovitch
Avatar
Send message
Joined: 25 Aug 08
Posts: 143
Credit: 64,937,578
RAC: 0
Level
Thr
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwat
Message 11623 - Posted: 3 Aug 2009 | 6:22:45 UTC

100% Wus exiting with error after 1 or 2 hours of processing. Win7 Ultimate x64, 6.6.36-6.6.38 BM (have no try 6.6.37 yet, snd I have a doubts it will be helpful), 190.38, certainly, nVidia driver. GTX260 with 192 shader blocks.

p.s. SETI gpu Wus are being processed well (with only few errors).
____________
From Siberia with love!

MarkJ
Volunteer moderator
Volunteer tester
Send message
Joined: 24 Dec 08
Posts: 738
Credit: 200,909,904
RAC: 0
Level
Leu
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 11719 - Posted: 8 Aug 2009 | 14:21:52 UTC

My GTX260's still seem to have a 50% to 100% failure rate for GPUgrid. Both cards are 216sp versions and are now running the GPUgrid 6.67 app with 190.38 drivers under XP. My other machines which have GTS250's seem quite happy running the 6.67 app with the 190.38 driver.

It seems to be G200 chip cards with the problem when used in conjunction with the 190.38 drivers. It also seems to be specific to GPUgrid as other projects with cuda apps appear to work.

I recall GDF mentioned they had optimised their FFT code, so perhaps that is an area for investigation. Maybe they could look at using the cuda-supplied FFT libraries (unless there isn't an equivilent function) instead of the optimised code? I'm willing to run a few tests if that will help. In the mean time, like other G200 card owners, I will have to keep them occupied by running other cuda work.
____________
BOINC blog

Rabinovitch
Avatar
Send message
Joined: 25 Aug 08
Posts: 143
Credit: 64,937,578
RAC: 0
Level
Thr
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwat
Message 11728 - Posted: 8 Aug 2009 | 18:16:26 UTC

I hopethey will fix all the problems at last. I really like this project and I want to participate in it, but I still should only crunch SETI and Collatz project's Wus, 'cause there are almost no errors on these projects...
____________
From Siberia with love!

Profile GDF
Volunteer moderator
Project administrator
Project developer
Project tester
Volunteer developer
Volunteer tester
Project scientist
Send message
Joined: 14 Mar 07
Posts: 1957
Credit: 629,356
RAC: 0
Level
Gly
Scientific publications
watwatwatwatwat
Message 11731 - Posted: 8 Aug 2009 | 19:21:47 UTC - in response to Message 11728.

We don't have any error on 190.xx. However, we cannot test on 260 because all our cards are 280, 275 or 8800GT.

gdf

MarkJ
Volunteer moderator
Volunteer tester
Send message
Joined: 24 Dec 08
Posts: 738
Credit: 200,909,904
RAC: 0
Level
Leu
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 11733 - Posted: 9 Aug 2009 | 1:10:17 UTC - in response to Message 11731.

We don't have any error on 190.xx. However, we cannot test on 260 because all our cards are 280, 275 or 8800GT.

gdf


Its starting to sound like its limited to the GTX260 cards then if other G200 based cards seem to work. Have there been any reports of other cards having similar failure rates?
____________
BOINC blog

Profile rebirther
Avatar
Send message
Joined: 7 Jul 07
Posts: 53
Credit: 3,048,781
RAC: 0
Level
Ala
Scientific publications
watwatwatwatwat
Message 11737 - Posted: 9 Aug 2009 | 7:07:44 UTC
Last modified: 9 Aug 2009 | 7:14:44 UTC

It looks like that the failure rate on >=GTX260 is increasing. I have no problems with collatz Wus or the older 182.50 driver. All WUs crashing with exit code 1 not at start but after 8h :(. I will test some seti WUs. Cannot run this project anymore as long as this issue will be solved. Too much waste time...

poppageek
Avatar
Send message
Joined: 4 Jul 09
Posts: 76
Credit: 114,610,402
RAC: 0
Level
Cys
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 11742 - Posted: 9 Aug 2009 | 8:28:24 UTC

I have a GTX 260 (192) that worked fine under 182.50 but errors 100% on any driver higher. This one now runs F@H.
I have a GTX 260 (216) that works perfectly with 190.38 on GPUGrid.

Profile Bymark
Avatar
Send message
Joined: 23 Feb 09
Posts: 30
Credit: 5,897,921
RAC: 0
Level
Ser
Scientific publications
watwatwatwatwat
Message 11754 - Posted: 9 Aug 2009 | 12:14:22 UTC - in response to Message 11742.

I have a GTX 260 (192) that worked fine under 182.50 but errors 100% on any driver higher. This one now runs F@H.
I have a GTX 260 (216) that works perfectly with 190.38 on GPUGrid.


My one of 3 GTX 260 (216) wont work with 190.38 on GPUGrid, worked fine under 182.50....
____________
"Silakka"
Hello from Turku > Åbo.

STE\/E
Send message
Joined: 18 Sep 08
Posts: 368
Credit: 2,640,096,048
RAC: 48,737,442
Level
Phe
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwat
Message 11767 - Posted: 9 Aug 2009 | 20:37:02 UTC - in response to Message 11733.

We don't have any error on 190.xx. However, we cannot test on 260 because all our cards are 280, 275 or 8800GT.

gdf


Its starting to sound like its limited to the GTX260 cards then if other G200 based cards seem to work. Have there been any reports of other cards having similar failure rates?


I have 1 or Possibly 2 GTX 295's that do the same thing, re-set themselves to 300/Core & 400/Memory plus 4 GTX 260's that I know of for sure & possibly 1 or 2 more that do it too.

Some of them don't just do it at this Project either, I've had a few of them do it @ the Collatz Project running their CUDA Wu's. So it leads me to believe it's the Drivers because all the Cards I have that are acting up now ran without error until I Upgraded them to the 190.38 Drivers.

As stated be me and others going back to the 186.xx or even 185.xx Drivers doesn't fix the Problem either once the Cards are infected with the Re-Setting bug ...

I've pulled all the NVIDIA Cards that I know Re-Set themselves & I'm re-testing them in a Box that had a GTX 280 & GTX 295 running without Problems, I'm going to run each Card in that Box to eliminate the PSU as the Cause of the Re-Setting. I figure if the PSU can run a GTX 280 & 295 without problems it should have the power to run a lone GTX 260 without PSU Problems.

ExtraTerrestrial Apes
Volunteer moderator
Volunteer tester
Avatar
Send message
Joined: 17 Aug 08
Posts: 2705
Credit: 1,311,122,549
RAC: 0
Level
Met
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 11769 - Posted: 9 Aug 2009 | 21:08:40 UTC - in response to Message 11767.

You're assuming that the errors you are seeing on your systems upon clock changes are caused by the clock changes.. which may very well be the case. but you're also assuming that all 190.38 problems are related to this similar cause - which I'm not so sure about. GDF said the errors (most?) with 190.38 happen in the FFT part.. which doesn't mix well with assuming downclocking as the reason.

MrS
____________
Scanning for our furry friends since Jan 2002

STE\/E
Send message
Joined: 18 Sep 08
Posts: 368
Credit: 2,640,096,048
RAC: 48,737,442
Level
Phe
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwat
Message 11770 - Posted: 9 Aug 2009 | 21:18:03 UTC - in response to Message 11769.

You're assuming that the errors you are seeing on your systems upon clock changes are caused by the clock changes.. which may very well be the case. but you're also assuming that all 190.38 problems are related to this similar cause - which I'm not so sure about. GDF said the errors (most?) with 190.38 happen in the FFT part.. which doesn't mix well with assuming downclocking as the reason.

MrS


It's a case of which came first, the Chicken or the egg, in this case it's the Error or the Re-set. In other words did the Card Re-set itself & then the error occurs or did the error occur & then the card re-set it's self ???

I know several times I've seen a Grid Wu hung or not progressing, I'd check the clock settings with GPU-z & they would be where their supposed to be. But upon Stopping & Restarting BOINC to kick start the Wu again within minutes if not seconds the Computation Error would occur & I'd check the clock settings again and they would be at 300/Core 400/Memory ...

frankhagen
Send message
Joined: 18 Sep 08
Posts: 65
Credit: 3,037,414
RAC: 0
Level
Ala
Scientific publications
watwatwatwatwat
Message 11776 - Posted: 10 Aug 2009 | 15:06:22 UTC - in response to Message 11770.

It's a case of which came first, the Chicken or the egg, in this case it's the Error or the Re-set. In other words did the Card Re-set itself & then the error occurs or did the error occur & then the card re-set it's self ???


now my old GTX260 got the flu too - over a year of crunching with very few errors, never seen it throttle down before. :((

ExtraTerrestrial Apes
Volunteer moderator
Volunteer tester
Avatar
Send message
Joined: 17 Aug 08
Posts: 2705
Credit: 1,311,122,549
RAC: 0
Level
Met
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 11784 - Posted: 10 Aug 2009 | 19:58:24 UTC - in response to Message 11770.

It's a case of which came first, the Chicken or the egg, in this case it's the Error or the Re-set. In other words did the Card Re-set itself & then the error occurs or did the error occur & then the card re-set it's self ???


That's part of what I was thinking.

Could we say that, since after an error the card stays clocked down and subsequent WUs fail (if I remember correctly), the downclocking causes the error? I don't think so: it could also be that the driver detects no GPU activity (since the WUs fail) and therefore keeps it clocked down. Due to some reason this downclocking could be forced in newer drivers.

What if you change clocks manually during computation? Does that work? I know it did on my 9800GTX+ when I tried last time.

Can you set 2D clocks manually, maybe in the power profile? And see if you get an error. If not I'd say the downclocking is really "just" a symptom and not the cause.

MrS
____________
Scanning for our furry friends since Jan 2002

STE\/E
Send message
Joined: 18 Sep 08
Posts: 368
Credit: 2,640,096,048
RAC: 48,737,442
Level
Phe
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwat
Message 11789 - Posted: 10 Aug 2009 | 20:45:38 UTC - in response to Message 11784.
Last modified: 10 Aug 2009 | 20:57:04 UTC

Could we say that, since after an error the card stays clocked down and subsequent WUs fail (if I remember correctly), the downclocking causes the error? I don't think so: it could also be that the driver detects no GPU activity (since the WUs fail) and therefore keeps it clocked down. Due to some reason this downclocking could be forced in newer drivers.


Yes, once the Card/Cards clock down all subsequent Wu's will fail if not corrected, I have caught them clocked down though & rebooted & restarted BOINC & have had the Wu's finish successfully ...

Hmmmmmmmm, think I just answered my own question of which came first, now that I think if it I have found Cards clocked down still running the same Wu's they had been for up to 20 Hours already when I found them. So in those cases anyway the clock down came first but didn't make the Wu error but just run slower than Krap ...

But if I remember correctly as soon as I rebooted those computers where I found the card clocked down but the Wu still running within a few minutes of restarting BOINC after the reboot the Wu did error then even though the Card was once again running at normal speed again.

What if you change clocks manually during computation? Does that work? I know it did on my 9800GTX+ when I tried last time.

Can you set 2D clocks manually, maybe in the power profile? And see if you get an error. If not I'd say the downclocking is really "just" a symptom and not the cause.

MrS


Yes I can reset the Clock to a Higher or lower speed while running the Wu's but so far that hasn't produced an error on a running Wu.


PS: LOL, Tech Support, you got to Love Um, Sapphires response to my RMA Request after 24 hour's: Of course my response will be addressed in the order it was recieved so I assume there will be another 24 hour wait before I get another e-mail asking if I had the Big Black Cord from the Wall plugged into the back of the Computer in order for the Card to work ... ;)

Did you connect the card's 6 and 8pin pwoer connectors? the card requires those connectors to be connected in order for the card to function properly.


Now how would I have been able to use the card for the last month or so which I explained to them in my RMA Request if I hadn't hooked the 6 & 8 Pin Connectors to the Card ...

I had a EVGA RMA# in less than 30 Minutes this morning for 1 of the GTX 260 that clocks down, no questions asked either. I just told them the problem and 5 minutes later I had the RMA# ...

ExtraTerrestrial Apes
Volunteer moderator
Volunteer tester
Avatar
Send message
Joined: 17 Aug 08
Posts: 2705
Credit: 1,311,122,549
RAC: 0
Level
Met
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 11791 - Posted: 10 Aug 2009 | 20:58:55 UTC - in response to Message 11789.

Maybe the poor guy wanted to make sure that by the term "using" you didn't refer to having replaced your old paper weight with some new shiny and very green thing ;)

MrS
____________
Scanning for our furry friends since Jan 2002

STE\/E
Send message
Joined: 18 Sep 08
Posts: 368
Credit: 2,640,096,048
RAC: 48,737,442
Level
Phe
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwat
Message 11792 - Posted: 10 Aug 2009 | 21:40:19 UTC - in response to Message 11791.

Maybe the poor guy wanted to make sure that by the term "using" you didn't refer to having replaced your old paper weight with some new shiny and very green thing ;)

MrS


I told him I knew where he lived and would be visiting him if he didn't give me a RMA#, he asked me to provide a copy of the purchase receipt ... :P

Profile Kokomiko
Avatar
Send message
Joined: 18 Jul 08
Posts: 190
Credit: 24,093,690
RAC: 0
Level
Pro
Scientific publications
watwatwatwatwatwatwatwatwatwat
Message 11794 - Posted: 10 Aug 2009 | 22:35:46 UTC - in response to Message 11792.


I told him I knew where he lived and would be visiting him if he didn't give me a RMA#, he asked me to provide a copy of the purchase receipt ... :P


Ahh, you have the same problem ... :(. The people from Galaxytech also ask me for a purchase receipt. I have 3 years warranty and the card is less than one year at the market. Why they need now a proof by the purchase receipt?

____________

STE\/E
Send message
Joined: 18 Sep 08
Posts: 368
Credit: 2,640,096,048
RAC: 48,737,442
Level
Phe
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwat
Message 11795 - Posted: 10 Aug 2009 | 22:54:07 UTC - in response to Message 11794.
Last modified: 10 Aug 2009 | 23:04:09 UTC


I told him I knew where he lived and would be visiting him if he didn't give me a RMA#, he asked me to provide a copy of the purchase receipt ... :P


Ahh, you have the same problem ... :(. The people from Galaxytech also ask me for a purchase receipt. I have 3 years warranty and the card is less than one year at the market. Why they need now a proof by the purchase receipt?


Usually the Warranty only applies to the Original Buyer of the Video Card/Cards, so they are just insuring your the Original Buyer by asking for a purchase receipt.

Some Companies like BFG have a Lifetime Warranty but to only the Original Buyer, they don't want to be paying a 2'nd or 3'rd owner of the Card for Warranty Service ... ;)

PS: Sapphire has sent me a RMA Form which I've filled out & sent back to another place along with another proof of purchase receipt on the 4850 X2 ... Now I have to go thru the same thing on the RMA Request I turned in this morning on a 4870 Card that quit working too. Still haven't heard anything on that yet.

I'm Sending the EVGA GTX 260 out tommorrow morning too on a RMA. Still testing 5 BFG Cards in another Box to make sure it just wasn't something wrong with the Box they came out of. So far all have gave errors in the Test Box too. Probably ship them out to on a RMA later this week.

Profile Bymark
Avatar
Send message
Joined: 23 Feb 09
Posts: 30
Credit: 5,897,921
RAC: 0
Level
Ser
Scientific publications
watwatwatwatwat
Message 12723 - Posted: 24 Sep 2009 | 17:33:14 UTC

Do we have some solution yet to this 260 cards issue?
What has happened, if something?

____________
"Silakka"
Hello from Turku > Åbo.

Profile Paul D. Buck
Send message
Joined: 9 Jun 08
Posts: 1050
Credit: 37,321,185
RAC: 0
Level
Val
Scientific publications
watwatwatwatwatwatwatwatwatwat
Message 12731 - Posted: 25 Sep 2009 | 1:40:34 UTC - in response to Message 12723.

Do we have some solution yet to this 260 cards issue?
What has happened, if something?

It seems to be card dependent because I have not had this issue with either of my two cards (yet) ...

Profile GDF
Volunteer moderator
Project administrator
Project developer
Project tester
Volunteer developer
Volunteer tester
Project scientist
Send message
Joined: 14 Mar 07
Posts: 1957
Credit: 629,356
RAC: 0
Level
Gly
Scientific publications
watwatwatwatwat
Message 12734 - Posted: 25 Sep 2009 | 7:37:05 UTC - in response to Message 12731.

We have contacted AGAIN Nvidia yesterday.

gdf

Post to thread

Message boards : Graphics cards (GPUs) : Failures since upgrading to 190.38

//