Message boards : Graphics cards (GPUs) : Failures since upgrading to 190.38
Author | Message |
---|---|
Had 6 wu fail today since machine was upgraded to 190.38. Interestingly its the only machine of the 5 running GPUgrid that seems to be having the problem. Machine is a Win XP box with dual GTX260's in it. Links to the wu: | |
ID: 11332 | Rating: 0 | rate: / Reply Quote | |
I checked both the Box's I mentioned in the other Thread and they both were running at 1/2 speed which I know from past experience leads to continual errors until fixed. | |
ID: 11337 | Rating: 0 | rate: / Reply Quote | |
Poorboy, read my post. " Link to prevent Nvidia 200 Downclocking" I believe that could be the problem. The 200 goes into power saving mode by design. | |
ID: 11338 | Rating: 0 | rate: / Reply Quote | |
Poorboy, read my post. " Link to prevent Nvidia 200 Downclocking" I believe that could be the problem. The 200 goes into power saving mode by design. I thought my 1st 200 series card was bad also, but it was not. 3d performance mode has to be forced via. software, Riva Tuner is what I use. I checked both the Box's I mentioned in the other Thread and they both were running at 1/2 speed which I know from past experience leads to continual errors until fixed. | |
ID: 11339 | Rating: 0 | rate: / Reply Quote | |
Joining this tread: | |
ID: 11342 | Rating: 0 | rate: / Reply Quote | |
Poorboy, read my post. " Link to prevent Nvidia 200 Downclocking" I believe that could be the problem. The 200 goes into power saving mode by design. Yes I was just about to see if I could get that to work, somebody sent me the Link a few days ago because I was having the same problem on another 200 Series Card. I didn't try it then because the Card was already sent out for RMA'ing & I should have a different Card in the next 2 days. I'll Post if that works for me or not later today or tomorrow if I can get it set up right. Thanks PS: So far this don't seem to be working, I think I'm doing every thing okay but upon Reboot the Settings don't hold. 2 Cores will just read 0 Speed & the 3'rd Core on the Box I'm trying it on just defaults back to stack Speeds. EVGA Precision Tune & GPU-Z show the same 0 Speed for 2 Cores & Stock Speed for 1 Core, so about all I've managed to do so far is lose 3 more Cores. If it's going to take all this jumping thru hoops to run the New Cuda App's I'm afraid the Grid Project will lose a lot of Participants, especially with the new CUDA Projects starting up. I've lost 7 Cores already with the Upgrade to the new Drivers to supposedly be able to run the new Cuda App's and don't feel I can afford to lose any more. | |
ID: 11349 | Rating: 0 | rate: / Reply Quote | |
hostid=35303 | |
ID: 11350 | Rating: 0 | rate: / Reply Quote | |
hostid=35303 Already tried that and within a few hours both Box's had Trashed 4 more Wu's each. | |
ID: 11351 | Rating: 0 | rate: / Reply Quote | |
I've installed 190.38 on 2 dual gpu rigs. | |
ID: 11353 | Rating: 0 | rate: / Reply Quote | |
I've switched back to RT, forced the driver and forced 3D performance. It has been running for a couple of days now error free. I've tried Mark's Fix on 4 Box's just a few hours ago so I won't really know if it worked or not until in the morning probably. If the Speeds of the Cards don't drop back by then at least it will be longer than they have been holding the Speeds. Usually within a few hr's the Wu's will error because of the Speed Drop. I didn't do the Fix on 3 other Box's because so far I haven't been having any problems with them & it seems any time I do a Settings change that requires a Reboot the Wu's gets Trashed when BOINC restarts again after the Reboot so I didn't want to Trash any more Wu's today than I already had. | |
ID: 11355 | Rating: 0 | rate: / Reply Quote | |
hostid=35303 Nope did't help... get errors in 2-3 hours. <core_client_version>6.4.7</core_client_version> <![CDATA[ <message> The system cannot find the path specified. (0x3) - exit code 3 (0x3) </message> <stderr_txt> # Using CUDA device 0 # Device 0: "GeForce GTX 260" # Clock rate: 1242000 kilohertz # Total amount of global memory: 939196416 bytes # Number of multiprocessors: 27 # Number of cores: 216 MDIO ERROR: cannot open file "restart.coor" # Using CUDA device 0 # Device 0: "GeForce GTX 260" # Clock rate: 1242000 kilohertz # Total amount of global memory: 939196416 bytes # Number of multiprocessors: 27 # Number of cores: 216 Cuda error in file '..\cuda/cutil.h' in line 968 : unspecified launch failure. Memory usage: host: bytes device: bytes Assertion failed: 0, file ..\cuda/cutil.h, line 968 This application has requested the Runtime to terminate it in an unusual way. Please contact the application's support team for more information. </stderr_txt> ]]> ____________ "Silakka" Hello from Turku > Åbo. | |
ID: 11357 | Rating: 0 | rate: / Reply Quote | |
hostid=35303 | |
ID: 11408 | Rating: 0 | rate: / Reply Quote | |
hostid=35303 I don't think I'll finish My BOINC Career with SETI but the way my Cards keep dropping like flies it probably won't be with GPUGrid either. Had 6 Cards down already for errors & found 2 more this afternoon that for all practical purposes they may as well be down. It's a Dual GTX 275 Setup that hasn't turned in but 3 Wu's in the last 50 Hr's, it's not turning in errors but it's not really turning in anything because it's slowed to a crawl I guess. BFG's not going to say oh sure send us the 8 Cards you can't crunch with anymore and we'll send you 8 shiny new ones so I'm pretty much stuck with them I figure for better or worse. | |
ID: 11409 | Rating: 0 | rate: / Reply Quote | |
Are you all uninstalling the old Nvidia driver first from add remove programs, then uninstall Pxysx, and then running " Driver Sweeper " in safe mode after reboot to remove all the old remnants before updating Nvidia drivers? | |
ID: 11410 | Rating: 0 | rate: / Reply Quote | |
Roll back to 185.xx, there seem to be problems with 190.xx over some hardware. | |
ID: 11422 | Rating: 0 | rate: / Reply Quote | |
Tried that yesterday, same result. Hostid=35303 is now on NNW. Roll back to 185.xx, there seem to be problems with 190.xx over some hardware. ____________ "Silakka" Hello from Turku > Åbo. | |
ID: 11431 | Rating: 0 | rate: / Reply Quote | |
Roll back to 185.xx, there seem to be problems with 190.xx over some hardware. I tried that on 4 Cards & still got the Errors and Down-clocking with them, re-installed the 190.38 Drivers & am Processing the Collatz Wu's just fine with no errors or Down-clocking, even re-overclocked them again and they still ran fine. I'll run that for awhile and keep an eye on the Forum here for a real fix with the 190.38's or or try a new Driver Version or Client as they come out & see if that fixes the Cards that went South on the Grid Project. | |
ID: 11436 | Rating: 0 | rate: / Reply Quote | |
Am I right that there's not a single reported failure with G9x cards, only G200 are affected? But some of them still run fine with 190.xx? | |
ID: 11482 | Rating: 0 | rate: / Reply Quote | |
My 9800GTX+ has been ok with 190.38 - no failures or blips of any kind. | |
ID: 11483 | Rating: 0 | rate: / Reply Quote | |
Question, possibly for PoorBoy. The GTX 260 cards - are they the Core 216 version? I'm having the same problems as everyone else getting GPUGRID wu to run on this card (XP Home 32-bit, Q6600, stock everything). I've tried to roll back to previous versions of the driver (currently running 185.XX) with no positive results. I'm not showing a downclocking problem using GPU-Z. | |
ID: 11484 | Rating: 0 | rate: / Reply Quote | |
Question, possibly for PoorBoy. The GTX 260 cards - are they the Core 216 version? I'm having the same problems as everyone else getting GPUGRID wu to run on this card (XP Home 32-bit, Q6600, stock everything). I've tried to roll back to previous versions of the driver (currently running 185.XX) with no positive results. I'm not showing a downclocking problem using GPU-Z. Yes, the 260's I have are all 216 Shader Versions, once they started Down-clocking I couldn't stop them from doing that no matter what I did. I tried the Down-clocking Fix, going back to the 185.18's, Re-installing the 190.38's & different BOINC Clients. I even set them all back to their Default running Speeds but nothing worked. I'd reset them to their Default running speeds and within as little as a few minutes some of them would drop to half speed and start giving errors on the Wu's after that. I've been running the Collatz Project for almost 2 days now with the same BOINC Client & NVIDIA Drivers (6.6.36 & 190.38) without 1 single error and not 1 NVIDIA Card has Dropped it's Speed even after re-Overclocking them again to run the Collatz Wu's. So all I can assume is some how or way the Grid Wu's run must have something to do with them Down-clocking as fast as I could reset them to their original speeds again. | |
ID: 11487 | Rating: 0 | rate: / Reply Quote | |
Am I right that there's not a single reported failure with G9x cards, only G200 are affected? But some of them still run fine with 190.xx? Just my GTX 260 216 Shader Versions were affected by the Down-clocking bug, but all my cards 260's, 275's, 280's & 295's gave errors. The 260's seemed to give more than the rest though. | |
ID: 11488 | Rating: 0 | rate: / Reply Quote | |
PoorBoy, | |
ID: 11490 | Rating: 0 | rate: / Reply Quote | |
PoorBoy, Your Clock values look like the Default Values for most 260's unless their the OC Type right from the Factory then they would be a little higher. What my cards would do (Not all but some of them) is after making sure they were indeed running at the Default Values with GPU-Z is after some running time drop to 300 Core & 400 Memory. They weren't at idle either because the Grid Wu's would be running & showing Progression. Once yhey dropped to 1/2 speed the errors would follow soon after. The only way I found to get the Speed back to Defaults is to Re-Boot the Computer affected by the 1/2 Speed GPU. Sometime it would only take a few minutes before they would drop their speed and other times they would take an hour or two before dropping their speed. Like I said I've been running Collatz with no problems and run all my cards at 650-Core 1475-Shaders 1100-Memory. Some guys run them higher than that but those are the speeds I've found to be the most stable for me with no Hang up's or error's so that's what I run them at. Colatz is down now & I have no work from them but I'm very reluctant to re-start the Grid Project back up again and have to go thru all the headaches I went thru for 3 days earlier so I haven't. Been just letting the NVIDIA cards sit for now & hoping the Collatz Project comes back up soon. | |
ID: 11501 | Rating: 0 | rate: / Reply Quote | |
GTX 260 - 216 SP - Xp SP3 | |
ID: 11508 | Rating: 0 | rate: / Reply Quote | |
Same here, my gpus went to Collatz and with no errors yet, waiting for a new nvidia driver and if it work then maybe coming back! | |
ID: 11510 | Rating: 0 | rate: / Reply Quote | |
100% Wus exiting with error after 1 or 2 hours of processing. Win7 Ultimate x64, 6.6.36-6.6.38 BM (have no try 6.6.37 yet, snd I have a doubts it will be helpful), 190.38, certainly, nVidia driver. GTX260 with 192 shader blocks. | |
ID: 11623 | Rating: 0 | rate: / Reply Quote | |
My GTX260's still seem to have a 50% to 100% failure rate for GPUgrid. Both cards are 216sp versions and are now running the GPUgrid 6.67 app with 190.38 drivers under XP. My other machines which have GTS250's seem quite happy running the 6.67 app with the 190.38 driver. | |
ID: 11719 | Rating: 0 | rate: / Reply Quote | |
I hopethey will fix all the problems at last. I really like this project and I want to participate in it, but I still should only crunch SETI and Collatz project's Wus, 'cause there are almost no errors on these projects... | |
ID: 11728 | Rating: 0 | rate: / Reply Quote | |
We don't have any error on 190.xx. However, we cannot test on 260 because all our cards are 280, 275 or 8800GT. | |
ID: 11731 | Rating: 0 | rate: / Reply Quote | |
We don't have any error on 190.xx. However, we cannot test on 260 because all our cards are 280, 275 or 8800GT. Its starting to sound like its limited to the GTX260 cards then if other G200 based cards seem to work. Have there been any reports of other cards having similar failure rates? ____________ BOINC blog | |
ID: 11733 | Rating: 0 | rate: / Reply Quote | |
It looks like that the failure rate on >=GTX260 is increasing. I have no problems with collatz Wus or the older 182.50 driver. All WUs crashing with exit code 1 not at start but after 8h :(. I will test some seti WUs. Cannot run this project anymore as long as this issue will be solved. Too much waste time... | |
ID: 11737 | Rating: 0 | rate: / Reply Quote | |
I have a GTX 260 (192) that worked fine under 182.50 but errors 100% on any driver higher. This one now runs F@H. | |
ID: 11742 | Rating: 0 | rate: / Reply Quote | |
I have a GTX 260 (192) that worked fine under 182.50 but errors 100% on any driver higher. This one now runs F@H. My one of 3 GTX 260 (216) wont work with 190.38 on GPUGrid, worked fine under 182.50.... ____________ "Silakka" Hello from Turku > Åbo. | |
ID: 11754 | Rating: 0 | rate: / Reply Quote | |
We don't have any error on 190.xx. However, we cannot test on 260 because all our cards are 280, 275 or 8800GT. I have 1 or Possibly 2 GTX 295's that do the same thing, re-set themselves to 300/Core & 400/Memory plus 4 GTX 260's that I know of for sure & possibly 1 or 2 more that do it too. Some of them don't just do it at this Project either, I've had a few of them do it @ the Collatz Project running their CUDA Wu's. So it leads me to believe it's the Drivers because all the Cards I have that are acting up now ran without error until I Upgraded them to the 190.38 Drivers. As stated be me and others going back to the 186.xx or even 185.xx Drivers doesn't fix the Problem either once the Cards are infected with the Re-Setting bug ... I've pulled all the NVIDIA Cards that I know Re-Set themselves & I'm re-testing them in a Box that had a GTX 280 & GTX 295 running without Problems, I'm going to run each Card in that Box to eliminate the PSU as the Cause of the Re-Setting. I figure if the PSU can run a GTX 280 & 295 without problems it should have the power to run a lone GTX 260 without PSU Problems. | |
ID: 11767 | Rating: 0 | rate: / Reply Quote | |
You're assuming that the errors you are seeing on your systems upon clock changes are caused by the clock changes.. which may very well be the case. but you're also assuming that all 190.38 problems are related to this similar cause - which I'm not so sure about. GDF said the errors (most?) with 190.38 happen in the FFT part.. which doesn't mix well with assuming downclocking as the reason. | |
ID: 11769 | Rating: 0 | rate: / Reply Quote | |
You're assuming that the errors you are seeing on your systems upon clock changes are caused by the clock changes.. which may very well be the case. but you're also assuming that all 190.38 problems are related to this similar cause - which I'm not so sure about. GDF said the errors (most?) with 190.38 happen in the FFT part.. which doesn't mix well with assuming downclocking as the reason. It's a case of which came first, the Chicken or the egg, in this case it's the Error or the Re-set. In other words did the Card Re-set itself & then the error occurs or did the error occur & then the card re-set it's self ??? I know several times I've seen a Grid Wu hung or not progressing, I'd check the clock settings with GPU-z & they would be where their supposed to be. But upon Stopping & Restarting BOINC to kick start the Wu again within minutes if not seconds the Computation Error would occur & I'd check the clock settings again and they would be at 300/Core 400/Memory ... | |
ID: 11770 | Rating: 0 | rate: / Reply Quote | |
It's a case of which came first, the Chicken or the egg, in this case it's the Error or the Re-set. In other words did the Card Re-set itself & then the error occurs or did the error occur & then the card re-set it's self ??? now my old GTX260 got the flu too - over a year of crunching with very few errors, never seen it throttle down before. :(( | |
ID: 11776 | Rating: 0 | rate: / Reply Quote | |
It's a case of which came first, the Chicken or the egg, in this case it's the Error or the Re-set. In other words did the Card Re-set itself & then the error occurs or did the error occur & then the card re-set it's self ??? That's part of what I was thinking. Could we say that, since after an error the card stays clocked down and subsequent WUs fail (if I remember correctly), the downclocking causes the error? I don't think so: it could also be that the driver detects no GPU activity (since the WUs fail) and therefore keeps it clocked down. Due to some reason this downclocking could be forced in newer drivers. What if you change clocks manually during computation? Does that work? I know it did on my 9800GTX+ when I tried last time. Can you set 2D clocks manually, maybe in the power profile? And see if you get an error. If not I'd say the downclocking is really "just" a symptom and not the cause. MrS ____________ Scanning for our furry friends since Jan 2002 | |
ID: 11784 | Rating: 0 | rate: / Reply Quote | |
Could we say that, since after an error the card stays clocked down and subsequent WUs fail (if I remember correctly), the downclocking causes the error? I don't think so: it could also be that the driver detects no GPU activity (since the WUs fail) and therefore keeps it clocked down. Due to some reason this downclocking could be forced in newer drivers. Yes, once the Card/Cards clock down all subsequent Wu's will fail if not corrected, I have caught them clocked down though & rebooted & restarted BOINC & have had the Wu's finish successfully ... Hmmmmmmmm, think I just answered my own question of which came first, now that I think if it I have found Cards clocked down still running the same Wu's they had been for up to 20 Hours already when I found them. So in those cases anyway the clock down came first but didn't make the Wu error but just run slower than Krap ... But if I remember correctly as soon as I rebooted those computers where I found the card clocked down but the Wu still running within a few minutes of restarting BOINC after the reboot the Wu did error then even though the Card was once again running at normal speed again. What if you change clocks manually during computation? Does that work? I know it did on my 9800GTX+ when I tried last time. Yes I can reset the Clock to a Higher or lower speed while running the Wu's but so far that hasn't produced an error on a running Wu. PS: LOL, Tech Support, you got to Love Um, Sapphires response to my RMA Request after 24 hour's: Of course my response will be addressed in the order it was recieved so I assume there will be another 24 hour wait before I get another e-mail asking if I had the Big Black Cord from the Wall plugged into the back of the Computer in order for the Card to work ... ;) Did you connect the card's 6 and 8pin pwoer connectors? the card requires those connectors to be connected in order for the card to function properly. Now how would I have been able to use the card for the last month or so which I explained to them in my RMA Request if I hadn't hooked the 6 & 8 Pin Connectors to the Card ... I had a EVGA RMA# in less than 30 Minutes this morning for 1 of the GTX 260 that clocks down, no questions asked either. I just told them the problem and 5 minutes later I had the RMA# ... | |
ID: 11789 | Rating: 0 | rate: / Reply Quote | |
Maybe the poor guy wanted to make sure that by the term "using" you didn't refer to having replaced your old paper weight with some new shiny and very green thing ;) | |
ID: 11791 | Rating: 0 | rate: / Reply Quote | |
Maybe the poor guy wanted to make sure that by the term "using" you didn't refer to having replaced your old paper weight with some new shiny and very green thing ;) I told him I knew where he lived and would be visiting him if he didn't give me a RMA#, he asked me to provide a copy of the purchase receipt ... :P | |
ID: 11792 | Rating: 0 | rate: / Reply Quote | |
Ahh, you have the same problem ... :(. The people from Galaxytech also ask me for a purchase receipt. I have 3 years warranty and the card is less than one year at the market. Why they need now a proof by the purchase receipt? ____________ | |
ID: 11794 | Rating: 0 | rate: / Reply Quote | |
Usually the Warranty only applies to the Original Buyer of the Video Card/Cards, so they are just insuring your the Original Buyer by asking for a purchase receipt. Some Companies like BFG have a Lifetime Warranty but to only the Original Buyer, they don't want to be paying a 2'nd or 3'rd owner of the Card for Warranty Service ... ;) PS: Sapphire has sent me a RMA Form which I've filled out & sent back to another place along with another proof of purchase receipt on the 4850 X2 ... Now I have to go thru the same thing on the RMA Request I turned in this morning on a 4870 Card that quit working too. Still haven't heard anything on that yet. I'm Sending the EVGA GTX 260 out tommorrow morning too on a RMA. Still testing 5 BFG Cards in another Box to make sure it just wasn't something wrong with the Box they came out of. So far all have gave errors in the Test Box too. Probably ship them out to on a RMA later this week. | |
ID: 11795 | Rating: 0 | rate: / Reply Quote | |
Do we have some solution yet to this 260 cards issue? | |
ID: 12723 | Rating: 0 | rate: / Reply Quote | |
Do we have some solution yet to this 260 cards issue? It seems to be card dependent because I have not had this issue with either of my two cards (yet) ... | |
ID: 12731 | Rating: 0 | rate: / Reply Quote | |
We have contacted AGAIN Nvidia yesterday. | |
ID: 12734 | Rating: 0 | rate: / Reply Quote | |
Message boards : Graphics cards (GPUs) : Failures since upgrading to 190.38