Unit crash after 0 second

Message boards : Number crunching : Unit crash after 0 second

Author	Message
Zarck Send message Joined: 16 Aug 08 Posts: 145 Credit: 328,473,995 RAC: 0 Level Scientific publications	Message 51629 - Posted: 14 Mar 2019 \| 11:13:07 UTC Last modified: 14 Mar 2019 \| 11:54:29 UTC
	Unit crash after 0 second, No problem with Asteroids Gpu, Einstein Gpu, Milkyway Gpu, Seti Gpu. Blocage des unités après 0 seconde, Pas de problème avec Asteroids Gpu, Einstein Gpu, Folding@home Gpu, Milkyway Gpu, Séti Gpu. @+ _ Nvidia Quadro K5000 + Geforce Titan. ____________
	ID: 51629 \| Rating: 0 \| rate: / Reply Quote

flashawk Send message Joined: 18 Jun 12 Posts: 297 Credit: 3,572,627,986 RAC: 0 Level Scientific publications	Message 51630 - Posted: 14 Mar 2019 \| 20:49:14 UTC - in response to Message 51629.
	Both your cards are pretty old, they may not be capable of working with these WU's. Have you completed any work units? ____________
	ID: 51630 \| Rating: 0 \| rate: / Reply Quote

Retvari Zoltan Send message Joined: 20 Jan 09 Posts: 2343 Credit: 16,201,255,749 RAC: 0 Level Scientific publications	Message 51631 - Posted: 14 Mar 2019 \| 20:51:51 UTC - in response to Message 51629.
	Unit crash after 0 second, No problem with Asteroids Gpu, Einstein Gpu, Milkyway Gpu, Seti Gpu. Nvidia Quadro K5000 + Geforce Titan. The error message at the end of the stderr.txt is: SWAN : FATAL Unable to load module .nonbonded.cu. (300) It means that the Quadro K5000 is too old for this project as it is only Compute Capability 3.0. Your Titan should work fine, but it gets very hot (83°C): (Task 20673822) <core_client_version>7.14.2</core_client_version> <![CDATA[ <message> (unknown error) - exit code -52 (0xffffffcc)</message> <stderr_txt> # GPU [GeForce GTX TITAN] Platform [Windows] Rev [3212] VERSION [80] # SWAN Device 0 : # Name : GeForce GTX TITAN # ECC : Disabled # Global mem : 6144MB # Capability : 3.5 # PCI ID : 0000:28:00.0 # Device clock : 928MHz # Memory clock : 3004MHz # Memory width : 384bit # Driver version : r419_29 : 41935 # GPU 0 : 81C # GPU 1 : 70C # GPU 0 : 82C # GPU 0 : 83C # GPU [Quadro K5000] Platform [Windows] Rev [3212] VERSION [80] # SWAN Device 1 : # Name : Quadro K5000 # ECC : Disabled # Global mem : 4096MB # Capability : 3.0 # PCI ID : 0000:0F:00.0 # Device clock : 705MHz # Memory clock : 2700MHz # Memory width : 256bit # Driver version : r419_29 : 41935 SWAN : FATAL Unable to load module .nonbonded.cu. (300) </stderr_txt> ]]> I don't see any other task assigned to your Titan, so you've probably excluded it by mistake (you should exclude the Quadro K5000) from getting GPUGrid work.
	ID: 51631 \| Rating: 0 \| rate: / Reply Quote

Zarck Send message Joined: 16 Aug 08 Posts: 145 Credit: 328,473,995 RAC: 0 Level Scientific publications	Message 51632 - Posted: 14 Mar 2019 \| 23:22:26 UTC Last modified: 15 Mar 2019 \| 0:05:45 UTC
	How to exclude a card in boinc ? Comment exclure une carte dans boinc ? ____________
	ID: 51632 \| Rating: 0 \| rate: / Reply Quote

Zarck Send message Joined: 16 Aug 08 Posts: 145 Credit: 328,473,995 RAC: 0 Level Scientific publications	Message 51633 - Posted: 15 Mar 2019 \| 9:36:29 UTC
	My config is now too old for your project, I can always do all the others. ____________
	ID: 51633 \| Rating: 0 \| rate: / Reply Quote

mmonnin Send message Joined: 2 Jul 16 Posts: 337 Credit: 7,527,351,065 RAC: 9,754,454 Level Scientific publications	Message 51678 - Posted: 31 Mar 2019 \| 0:33:48 UTC - in response to Message 51632. Last modified: 31 Mar 2019 \| 0:34:22 UTC
	How to exclude a card in boinc ? Comment exclure une carte dans boinc ? The format in cc_config.xml <exclude_gpu> <url>project_URL</url> <device_num>N</device_num> <type>NVIDIA\|ATI\|intel_gpu</type> <app>appname</app> </exclude_gpu> Type is needed if you have more than 1 manufacture. Intel iGPU + NV as an example. https://boinc.berkeley.edu/wiki/Client_configuration
	ID: 51678 \| Rating: 0 \| rate: / Reply Quote

Michael H.W. Weber Send message Joined: 9 Feb 16 Posts: 71 Credit: 607,916,391 RAC: 0 Level Scientific publications	Message 51794 - Posted: 14 May 2019 \| 5:58:35 UTC Last modified: 14 May 2019 \| 6:00:04 UTC
	Since May 13th, all newly loaded tasks end with an error after 0 seconds of compute time. Log for all of these: <core_client_version>7.9.3</core_client_version> <![CDATA[ <message> process exited with code 212 (0xd4, -44)</message> <stderr_txt> </stderr_txt> ]]> The same machine (it is a GTX750Ti under Ubuntu 18.04LTS Linux) has completed hundreds of tasks for a few years. Any idea what exactly the error means? I just resetted GPUGRID on that machine hoping it will resume computation properly. Meanwhile I am testing Einstein to see whether the card is still OK. Michael. ____________ President of Rechenkraft.net - Germany's first and largest distributed computing organization.
	ID: 51794 \| Rating: 0 \| rate: / Reply Quote

Zalster Send message Joined: 26 Feb 14 Posts: 211 Credit: 4,496,324,562 RAC: 0 Level Scientific publications	Message 51809 - Posted: 14 May 2019 \| 14:01:08 UTC - in response to Message 51794.
	There seems to be alot of work units that have high failure rates. I noticed that my mutilcard machine has no work and checked. All work units have errored out. Thought it was a problem with my machine then checked the work units and see that not 1 of them has been completed by numerous other machines. Looks like all the work units are on their way to 9 failed attempts. Checked Server Status and can see the different types of work units which are failing is climbing. ____________
	ID: 51809 \| Rating: 0 \| rate: / Reply Quote

sis651 Send message Joined: 25 Nov 13 Posts: 66 Credit: 193,925,538 RAC: 0 Level Scientific publications	Message 51813 - Posted: 14 May 2019 \| 17:22:56 UTC
	The noise my notebook was making was very low. I understood there was a problem. I checked Boinc and GPUgrid works were failing with error code 212. I had one failure that Boinc couldn't resume the job. Later all jobs ended without starting with error code 212.
	ID: 51813 \| Rating: 0 \| rate: / Reply Quote

Michael H.W. Weber Send message Joined: 9 Feb 16 Posts: 71 Credit: 607,916,391 RAC: 0 Level Scientific publications	Message 51831 - Posted: 16 May 2019 \| 10:37:17 UTC Last modified: 16 May 2019 \| 10:37:34 UTC
	The problem is discussed here: http://www.gpugrid.net/forum_thread.php?id=4924&nowrap=true#51786 Michael. ____________ President of Rechenkraft.net - Germany's first and largest distributed computing organization.
	ID: 51831 \| Rating: 0 \| rate: / Reply Quote

Post to thread

Message boards : Number crunching : Unit crash after 0 second

	About	Science	Volunteers	Performance	Forum	Join us	Donate

Author	Message
Zarck Send message Joined: 16 Aug 08 Posts: 145 Credit: 328,473,995 RAC: 0 Level Scientific publications	Message 51629 - Posted: 14 Mar 2019 \| 11:13:07 UTC Last modified: 14 Mar 2019 \| 11:54:29 UTC
	Unit crash after 0 second, No problem with Asteroids Gpu, Einstein Gpu, Milkyway Gpu, Seti Gpu. Blocage des unités après 0 seconde, Pas de problème avec Asteroids Gpu, Einstein Gpu, Folding@home Gpu, Milkyway Gpu, Séti Gpu. @+ _ Nvidia Quadro K5000 + Geforce Titan. ____________
	ID: 51629 \| Rating: 0 \| rate: / Reply Quote

flashawk Send message Joined: 18 Jun 12 Posts: 297 Credit: 3,572,627,986 RAC: 0 Level Scientific publications	Message 51630 - Posted: 14 Mar 2019 \| 20:49:14 UTC - in response to Message 51629.
	Both your cards are pretty old, they may not be capable of working with these WU's. Have you completed any work units? ____________
	ID: 51630 \| Rating: 0 \| rate: / Reply Quote

Retvari Zoltan Send message Joined: 20 Jan 09 Posts: 2343 Credit: 16,201,255,749 RAC: 0 Level Scientific publications	Message 51631 - Posted: 14 Mar 2019 \| 20:51:51 UTC - in response to Message 51629.
	Unit crash after 0 second, No problem with Asteroids Gpu, Einstein Gpu, Milkyway Gpu, Seti Gpu. Nvidia Quadro K5000 + Geforce Titan. The error message at the end of the stderr.txt is: SWAN : FATAL Unable to load module .nonbonded.cu. (300) It means that the Quadro K5000 is too old for this project as it is only Compute Capability 3.0. Your Titan should work fine, but it gets very hot (83°C): (Task 20673822) <core_client_version>7.14.2</core_client_version> <![CDATA[ <message> (unknown error) - exit code -52 (0xffffffcc)</message> <stderr_txt> # GPU [GeForce GTX TITAN] Platform [Windows] Rev [3212] VERSION [80] # SWAN Device 0 : # Name : GeForce GTX TITAN # ECC : Disabled # Global mem : 6144MB # Capability : 3.5 # PCI ID : 0000:28:00.0 # Device clock : 928MHz # Memory clock : 3004MHz # Memory width : 384bit # Driver version : r419_29 : 41935 # GPU 0 : 81C # GPU 1 : 70C # GPU 0 : 82C # GPU 0 : 83C # GPU [Quadro K5000] Platform [Windows] Rev [3212] VERSION [80] # SWAN Device 1 : # Name : Quadro K5000 # ECC : Disabled # Global mem : 4096MB # Capability : 3.0 # PCI ID : 0000:0F:00.0 # Device clock : 705MHz # Memory clock : 2700MHz # Memory width : 256bit # Driver version : r419_29 : 41935 SWAN : FATAL Unable to load module .nonbonded.cu. (300) </stderr_txt> ]]> I don't see any other task assigned to your Titan, so you've probably excluded it by mistake (you should exclude the Quadro K5000) from getting GPUGrid work.
	ID: 51631 \| Rating: 0 \| rate: / Reply Quote

Zarck Send message Joined: 16 Aug 08 Posts: 145 Credit: 328,473,995 RAC: 0 Level Scientific publications	Message 51632 - Posted: 14 Mar 2019 \| 23:22:26 UTC Last modified: 15 Mar 2019 \| 0:05:45 UTC
	How to exclude a card in boinc ? Comment exclure une carte dans boinc ? ____________
	ID: 51632 \| Rating: 0 \| rate: / Reply Quote

Zarck Send message Joined: 16 Aug 08 Posts: 145 Credit: 328,473,995 RAC: 0 Level Scientific publications	Message 51633 - Posted: 15 Mar 2019 \| 9:36:29 UTC
	My config is now too old for your project, I can always do all the others. ____________
	ID: 51633 \| Rating: 0 \| rate: / Reply Quote

mmonnin Send message Joined: 2 Jul 16 Posts: 337 Credit: 7,527,351,065 RAC: 9,754,454 Level Scientific publications	Message 51678 - Posted: 31 Mar 2019 \| 0:33:48 UTC - in response to Message 51632. Last modified: 31 Mar 2019 \| 0:34:22 UTC
	How to exclude a card in boinc ? Comment exclure une carte dans boinc ? The format in cc_config.xml <exclude_gpu> <url>project_URL</url> <device_num>N</device_num> <type>NVIDIA\|ATI\|intel_gpu</type> <app>appname</app> </exclude_gpu> Type is needed if you have more than 1 manufacture. Intel iGPU + NV as an example. https://boinc.berkeley.edu/wiki/Client_configuration
	ID: 51678 \| Rating: 0 \| rate: / Reply Quote

Michael H.W. Weber Send message Joined: 9 Feb 16 Posts: 71 Credit: 607,916,391 RAC: 0 Level Scientific publications	Message 51794 - Posted: 14 May 2019 \| 5:58:35 UTC Last modified: 14 May 2019 \| 6:00:04 UTC
	Since May 13th, all newly loaded tasks end with an error after 0 seconds of compute time. Log for all of these: <core_client_version>7.9.3</core_client_version> <![CDATA[ <message> process exited with code 212 (0xd4, -44)</message> <stderr_txt> </stderr_txt> ]]> The same machine (it is a GTX750Ti under Ubuntu 18.04LTS Linux) has completed hundreds of tasks for a few years. Any idea what exactly the error means? I just resetted GPUGRID on that machine hoping it will resume computation properly. Meanwhile I am testing Einstein to see whether the card is still OK. Michael. ____________ President of Rechenkraft.net - Germany's first and largest distributed computing organization.
	ID: 51794 \| Rating: 0 \| rate: / Reply Quote

Zalster Send message Joined: 26 Feb 14 Posts: 211 Credit: 4,496,324,562 RAC: 0 Level Scientific publications	Message 51809 - Posted: 14 May 2019 \| 14:01:08 UTC - in response to Message 51794.
	There seems to be alot of work units that have high failure rates. I noticed that my mutilcard machine has no work and checked. All work units have errored out. Thought it was a problem with my machine then checked the work units and see that not 1 of them has been completed by numerous other machines. Looks like all the work units are on their way to 9 failed attempts. Checked Server Status and can see the different types of work units which are failing is climbing. ____________
	ID: 51809 \| Rating: 0 \| rate: / Reply Quote

sis651 Send message Joined: 25 Nov 13 Posts: 66 Credit: 193,925,538 RAC: 0 Level Scientific publications	Message 51813 - Posted: 14 May 2019 \| 17:22:56 UTC
	The noise my notebook was making was very low. I understood there was a problem. I checked Boinc and GPUgrid works were failing with error code 212. I had one failure that Boinc couldn't resume the job. Later all jobs ended without starting with error code 212.
	ID: 51813 \| Rating: 0 \| rate: / Reply Quote

Michael H.W. Weber Send message Joined: 9 Feb 16 Posts: 71 Credit: 607,916,391 RAC: 0 Level Scientific publications	Message 51831 - Posted: 16 May 2019 \| 10:37:17 UTC Last modified: 16 May 2019 \| 10:37:34 UTC
	The problem is discussed here: http://www.gpugrid.net/forum_thread.php?id=4924&nowrap=true#51786 Michael. ____________ President of Rechenkraft.net - Germany's first and largest distributed computing organization.
	ID: 51831 \| Rating: 0 \| rate: / Reply Quote