Message boards : Graphics cards (GPUs) : Unspecified kernel launch failures
Author | Message |
---|---|
I've now had two work units in as many days error out with some variation on the above error message, one GPUGRID and one PrimeGrid, both times a significant proportion of the way through the WU. | |
ID: 15217 | Rating: 0 | rate: / Reply Quote | |
May I suggest you stick to one GPU project at a time, and you may want to upgrade that driver, and then restart your system. Finish whatever tasks you are running first and select no more tasks. | |
ID: 15222 | Rating: 0 | rate: / Reply Quote | |
190.53 is the latest stable driver. Both crashes were while I wasn't using the computer. | |
ID: 15223 | Rating: 0 | rate: / Reply Quote | |
There is a fair chance your problems were caused by overclocking. Even if a system runs stably for ages, a new application or task could stress cards in different ways, and a couple of degrees change in the room temperature could swing your configurations towards unstable (someone shut the room door, or it just got slightly warmer outside and inside). | |
ID: 15225 | Rating: 0 | rate: / Reply Quote | |
Just had another errored PrimeGrid unit, having clocked down to stock settings and disabled new GPUGRID units until this gets sorted or otherwise (less lost credit from an error in an hour-long unit than in a day-long unit). | |
ID: 15228 | Rating: 0 | rate: / Reply Quote | |
Down-clock the GPU (and and other overclocked part of the system) to reference speeds. Let us know if the problem persists thereafter. | |
ID: 15231 | Rating: 0 | rate: / Reply Quote | |
The GPU is now running at reference settings (950MHz), yes, and nothing else has been overclocked. I've had one computation failure so far in about 12 hours since I did that, which is the same sort of rate I was experiencing before downclocking. | |
ID: 15233 | Rating: 0 | rate: / Reply Quote | |
I am more convinced that the problems are related to running 2 projects. | |
ID: 15235 | Rating: 0 | rate: / Reply Quote | |
As I said, I'm not accepting any new GPUGRID units, only PrimeGrid's shorter units. I haven't run a GPUGRID unit since the one that crashed. (I can, of course, turn off PrimeGrid's and turn on GPUGRID's, but since both are giving the same error and PrimeGrid's short units lose me less credit when they crash, I'd prefer to keep it this way around.) | |
ID: 15236 | Rating: 0 | rate: / Reply Quote | |
You were correct to disable GPUGrid tasks and stick to the shorter tasks, at least to see what happens. However you are still getting the odd error. So you have identified that the error is not limited to when 2 different projects are running, but you have not found the cause of the errors. I know it made sense to run shorter tasks (because any error will have less impact on your contribution and points). However, if you were running both projects at the same time and got GPUGrid errors, these errors could have happened as a result of a bug from the PrimeGrid project (for example when switching between tasks - speculation of course). | |
ID: 15239 | Rating: 0 | rate: / Reply Quote | |
The GPUGRID task was the only GPU WU on the system when it crashed; PrimeGrid's tasks were only fetched after that one errored out, according to the message log. I'm pretty sure we can rule out project-specific bugs. (I'm asking here rather than PrimeGrid because, given the error isn't limited to either project, it makes sense to ask the heavily GPU-centric project, rather than the one with a single GPU app). | |
ID: 15240 | Rating: 0 | rate: / Reply Quote | |
Try to lower your shader clock to below 1600 and see if it works. | |
ID: 15263 | Rating: 0 | rate: / Reply Quote | |
I've had no errors since switching to the beta drivers, and I've completed both PrimeGrid and GPUGRID units. Given an approximate rate of about 1 error every 12 hours before the driver switch, the chances of having no errors in the 48 hours or so I've been running GPU units since switching drivers are about 2%, so I'm pretty confident this is sorted. Thanks for the help, folks. | |
ID: 15294 | Rating: 0 | rate: / Reply Quote | |
Glad to hear your problem seems to be solved! | |
ID: 15295 | Rating: 0 | rate: / Reply Quote | |
Message boards : Graphics cards (GPUs) : Unspecified kernel launch failures