Message boards : Number crunching : failing work units?
Author | Message |
---|---|
I have 2 1700x systems running the same version of Mint 19, and the same NVIDIA driver 390.77. Both are current in terms of updates. | |
ID: 51412 | Rating: 0 | rate: / Reply Quote | |
The error the 960 is reporting is generally associated with overclocking. Have you tried reducing the clock on the gpu and memory? | |
ID: 51413 | Rating: 0 | rate: / Reply Quote | |
Thanks for responding. | |
ID: 51414 | Rating: 0 | rate: / Reply Quote | |
Does the b450 have the latest BIOS? I don't have a B450 board so cant comment on any issues with that ... it is a relatively new chipset (kudos to AMD for the new ZEN platform) but have seen big improvements with the b350 and new BIOS releases. | |
ID: 51415 | Rating: 0 | rate: / Reply Quote | |
It is probably the factory overclock. I have seen it on an EVGA GTX 970, though not as severely as this one. | |
ID: 51416 | Rating: 0 | rate: / Reply Quote | |
I tried a GTX-950 stock and got the same error, this card has done many work units in Windows 7 and Linux without issue in other systems. | |
ID: 51428 | Rating: 0 | rate: / Reply Quote | |
I tried a GTX-950 stock and got the same error, this card has done many work units in Windows 7 and Linux without issue in other systems.It has been shown many times that the stress of the GPUGrid app under Linux or Windows XP (especially with SWAN_SYNC applied) is the largest of all projects. That means you should take GPUGrid under Linux or Windows XP to validate the stability of your system, not other projects' under other OSes (=Windows Vista and up). Swapped in a newer more powerful power supply.That's nice Re-installed Mint 19.1, fully updated.That's very precautionary. Updated the BIOS to the latest.That's ok (What BIOS btw? the card's? the MB's? both?) Still fails 100% of the time.Oh, that's because you forgot to heed the advice you've been given before, to reduce the clock speed of your GPU and/or increase its fan speed (to reduce its temperatures). It would be nice to know the error message of your tasks. The previous ones (on your GTX 960) failed with # The simulation has become unstable. Terminating to avoid lock-up (1) which is the clear sign of too high GPU clocks on the given temperature. The other reason of such errors is a faulty memory (warped card, broken soldering of the chips) on the card, but some workunits can go very long before they fail, which suggests the first explanation.Did some Einstein and Milkyway with 0 issues.If you read my post carefully, you should know by now that it doesn't matter. | |
ID: 51429 | Rating: 0 | rate: / Reply Quote | |
GPUGrid is actually not that hard on cards compared to other projects. Just watching the auto boost of cards shows this. | |
ID: 51430 | Rating: 0 | rate: / Reply Quote | |
GPUGrid is actually not that hard on cards compared to other projects. Just watching the auto boost of cards shows this. I can assure you that Zoltan is correct. I run (or have run) all the GPU projects, and know the difference. | |
ID: 51431 | Rating: 0 | rate: / Reply Quote | |
It would be interesting to know if the 960 will run at any power output. | |
ID: 51432 | Rating: 0 | rate: / Reply Quote | |
OK I have it running at 80W now, we will see what happens. Just looking occasionally in nvidia-smi I have not seen it over 71W. Temperature is steady at 69C. | |
ID: 51433 | Rating: 0 | rate: / Reply Quote | |
It ran 3 small work units successfully at 80W, I pushed it up to 100W and will see what happens. | |
ID: 51434 | Rating: 0 | rate: / Reply Quote | |
This test demonstrates that the clock on the 960 is too high for GPUgrid tasks. | |
ID: 51435 | Rating: 0 | rate: / Reply Quote | |
Just noticed you have more failures, presumably when you upped the power to 100W. | |
ID: 51436 | Rating: 0 | rate: / Reply Quote | |
Not really sure why only your B450 motherboard is exhibiting this behaviour. Does anyone else reading this post have any experience with AMD B450 (AM4 socket) motherboards and GPUgrid tasks? I know that the B350 boards are fine. This is undoubtedly related to the GPU, not the MB. I'm running 13x 1060 and 2x 1050Ti GPUs on 5x Ryzen 7 systems (2x X370 and 3x X470 MBs) with nary a hiccup (3 GPUs per system). This still leaves 13 threads per machine for running CPU projects at max speed. | |
ID: 51439 | Rating: 0 | rate: / Reply Quote | |
Not really sure why only your B450 motherboard is exhibiting this behaviour. Does anyone else reading this post have any experience with AMD B450 (AM4 socket) motherboards and GPUgrid tasks? I know that the B350 boards are fine. Maybe, but as discussed above I can put this GPU in another system and it will work, it even works in this system in Windows. And I moved a GTX-950 that was working fine in another system and experienced the same errors here. I am not sure we really understand what is happening here. But it finished a Long work unit with no error at 90W. | |
ID: 51443 | Rating: 0 | rate: / Reply Quote | |
The runtime for the long task was good. According to the chart on the Performance tab, your 960 is only 24 minutes slower than the fastest time for this card. | |
ID: 51444 | Rating: 0 | rate: / Reply Quote | |
Message boards : Number crunching : failing work units?