Message boards : Number crunching : Unit crash after 0 second
Author | Message |
---|---|
Unit crash after 0 second, | |
ID: 51629 | Rating: 0 | rate: / Reply Quote | |
Both your cards are pretty old, they may not be capable of working with these WU's. Have you completed any work units? | |
ID: 51630 | Rating: 0 | rate: / Reply Quote | |
Unit crash after 0 second, The error message at the end of the stderr.txt is: SWAN : FATAL Unable to load module .nonbonded.cu. (300) It means that the Quadro K5000 is too old for this project as it is only Compute Capability 3.0.Your Titan should work fine, but it gets very hot (83°C): (Task 20673822) <core_client_version>7.14.2</core_client_version>
<![CDATA[
<message>
(unknown error) - exit code -52 (0xffffffcc)</message>
<stderr_txt>
# GPU [GeForce GTX TITAN] Platform [Windows] Rev [3212] VERSION [80]
# SWAN Device 0 :
# Name : GeForce GTX TITAN
# ECC : Disabled
# Global mem : 6144MB
# Capability : 3.5
# PCI ID : 0000:28:00.0
# Device clock : 928MHz
# Memory clock : 3004MHz
# Memory width : 384bit
# Driver version : r419_29 : 41935
# GPU 0 : 81C
# GPU 1 : 70C
# GPU 0 : 82C
# GPU 0 : 83C
# GPU [Quadro K5000] Platform [Windows] Rev [3212] VERSION [80]
# SWAN Device 1 :
# Name : Quadro K5000
# ECC : Disabled
# Global mem : 4096MB
# Capability : 3.0
# PCI ID : 0000:0F:00.0
# Device clock : 705MHz
# Memory clock : 2700MHz
# Memory width : 256bit
# Driver version : r419_29 : 41935
SWAN : FATAL Unable to load module .nonbonded.cu. (300)
</stderr_txt>
]]> I don't see any other task assigned to your Titan, so you've probably excluded it by mistake (you should exclude the Quadro K5000) from getting GPUGrid work. | |
ID: 51631 | Rating: 0 | rate: / Reply Quote | |
How to exclude a card in boinc ? | |
ID: 51632 | Rating: 0 | rate: / Reply Quote | |
My config is now too old for your project, I can always do all the others. | |
ID: 51633 | Rating: 0 | rate: / Reply Quote | |
How to exclude a card in boinc ? The format in cc_config.xml <exclude_gpu> <url>project_URL</url> <device_num>N</device_num> <type>NVIDIA|ATI|intel_gpu</type> <app>appname</app> </exclude_gpu> Type is needed if you have more than 1 manufacture. Intel iGPU + NV as an example. https://boinc.berkeley.edu/wiki/Client_configuration | |
ID: 51678 | Rating: 0 | rate: / Reply Quote | |
Since May 13th, all newly loaded tasks end with an error after 0 seconds of compute time. Log for all of these: <core_client_version>7.9.3</core_client_version> <![CDATA[ <message> process exited with code 212 (0xd4, -44)</message> <stderr_txt> </stderr_txt> ]]> The same machine (it is a GTX750Ti under Ubuntu 18.04LTS Linux) has completed hundreds of tasks for a few years. Any idea what exactly the error means? I just resetted GPUGRID on that machine hoping it will resume computation properly. Meanwhile I am testing Einstein to see whether the card is still OK. Michael. ____________ President of Rechenkraft.net - Germany's first and largest distributed computing organization. | |
ID: 51794 | Rating: 0 | rate: / Reply Quote | |
There seems to be alot of work units that have high failure rates. I noticed that my mutilcard machine has no work and checked. All work units have errored out. Thought it was a problem with my machine then checked the work units and see that not 1 of them has been completed by numerous other machines. Looks like all the work units are on their way to 9 failed attempts. Checked Server Status and can see the different types of work units which are failing is climbing. | |
ID: 51809 | Rating: 0 | rate: / Reply Quote | |
The noise my notebook was making was very low. I understood there was a problem. I checked Boinc and GPUgrid works were failing with error code 212. I had one failure that Boinc couldn't resume the job. Later all jobs ended without starting with error code 212. | |
ID: 51813 | Rating: 0 | rate: / Reply Quote | |
The problem is discussed here: | |
ID: 51831 | Rating: 0 | rate: / Reply Quote | |
Message boards : Number crunching : Unit crash after 0 second