Compute error 1(0x1) on all units since last night

Message boards : Graphics cards (GPUs) : Compute error 1(0x1) on all units since last night

Author	Message
Alez Send message Joined: 17 Nov 12 Posts: 10 Credit: 185,958,753 RAC: 0 Level Scientific publications	Message 27896 - Posted: 1 Jan 2013 \| 12:49:56 UTC
	MY GTX 660 TI and GTX 650 just suddenly started erroring out every task all with the same error as far as I can tell. Name 2x11_8-NOELIA_hfXA_long-0-2-RND7200_1 Workunit 3977330 Created 1 Jan 2013 \| 5:44:59 UTC Sent 1 Jan 2013 \| 10:14:59 UTC Received 1 Jan 2013 \| 10:23:43 UTC Server state Over Outcome Computation error Client state Compute error Exit status 1 (0x1) Computer ID 138949 <core_client_version>7.0.33</core_client_version> <![CDATA[ <message> Incorrect function. (0x1) - exit code 1 (0x1) </message> ]]> Everything was working fine until last night. nVidia drivers 306,97 any ideas whats wrong ?
	ID: 27896 \| Rating: 0 \| rate: / Reply Quote

Retvari Zoltan Send message Joined: 20 Jan 09 Posts: 2343 Credit: 16,201,255,749 RAC: 0 Level Scientific publications	Message 27897 - Posted: 1 Jan 2013 \| 13:15:39 UTC - in response to Message 27896.
	MY GTX 660 TI and GTX 650 just suddenly started erroring out every task all with the same error as far as I can tell. <core_client_version>7.0.33</core_client_version> <![CDATA[ <message> Incorrect function. (0x1) - exit code 1 (0x1) </message> ]]> Everything was working fine until last night. nVidia drivers 306,97 any ideas whats wrong ? Sometimes the card (driver, or the OS) gets stuck, and only a restart can resolve it. Have you tried a system restart?
	ID: 27897 \| Rating: 0 \| rate: / Reply Quote

Alez Send message Joined: 17 Nov 12 Posts: 10 Credit: 185,958,753 RAC: 0 Level Scientific publications	Message 27898 - Posted: 1 Jan 2013 \| 13:27:57 UTC
	Just reset GPUgrid and away to restart system. Was wondering if there was a known error as i've already trashed 32 units so didn't want to keep trashing more.
	ID: 27898 \| Rating: 0 \| rate: / Reply Quote

skgiven Volunteer moderator Volunteer tester Send message Joined: 23 Apr 09 Posts: 3968 Credit: 1,995,359,260 RAC: 0 Level Scientific publications	Message 27900 - Posted: 1 Jan 2013 \| 15:10:42 UTC - in response to Message 27898. Last modified: 1 Jan 2013 \| 15:33:55 UTC
	I appear to have had a similar problem. It started last night, just after midnight CET. http://www.gpugrid.net/results.php?hostid=139265&offset=0&show_names=0&state=5&appid= Long tasks just started failing, one after the other. Most failed after ~200sec. They might have just been failing on my GTX660Ti, and not my GTX470's; a task was running on it. After I restarted the same task started to run on my GTX660Ti, and now seems to be progressing normally... GPUGrid stopped sending me work, so I will have to run some jobs from other projects and wait for my rating to improve before getting new tasks (only ~4h if the one task I have completes and reports successfully). As well as the possibility that this was cause by bad tasks, this could have been cause by a CPU Boinc project, Boinc, or be down to the driver (306.97 in my case). W7x64. Of the failed WU's, two tasks also failed on other systems: http://www.gpugrid.net/workunit.php?wuid=3977079 http://www.gpugrid.net/workunit.php?wuid=3977023 However some resends ran successfully, suggesting it's not an issue with GPUGrid. ____________ FAQ's HOW TO: - Opt out of Beta Tests - Ask for Help
	ID: 27900 \| Rating: 0 \| rate: / Reply Quote

Alez Send message Joined: 17 Nov 12 Posts: 10 Credit: 185,958,753 RAC: 0 Level Scientific publications	Message 27902 - Posted: 1 Jan 2013 \| 15:30:46 UTC
	Reset the project, did a clean nVidia driver update to 310.70 and rebooted. So far got 1 task and that seems to be running to completion. 15 more % to go and we will see....
	ID: 27902 \| Rating: 0 \| rate: / Reply Quote

skgiven Volunteer moderator Volunteer tester Send message Joined: 23 Apr 09 Posts: 3968 Credit: 1,995,359,260 RAC: 0 Level Scientific publications	Message 27906 - Posted: 1 Jan 2013 \| 21:34:30 UTC - in response to Message 27902.
	The task I had failed! http://www.gpugrid.net/result.php?resultid=6235285 Name 2x12_4-NOELIA_hfXA_long-0-2-RND6878_0 Workunit 3977346 Created 19 Dec 2012 \| 20:37:02 UTC Sent 1 Jan 2013 \| 5:58:06 UTC Received 1 Jan 2013 \| 17:01:28 UTC Server state Over Outcome Computation error Client state Compute error Exit status 98 (0x62) Computer ID 139265 Report deadline 6 Jan 2013 \| 5:58:06 UTC Run time 35,731.98 CPU time 30,914.86 Validate state Invalid Credit 0.00 Application version Long runs (8-12 hours on fastest card) v6.16 (cuda42) ERROR: file deven.cpp line 1106: # Energies have become nan Perhaps it was one of the earlier tasks that failed on completion? It wasn't resent. ____________ FAQ's HOW TO: - Opt out of Beta Tests - Ask for Help
	ID: 27906 \| Rating: 0 \| rate: / Reply Quote

Retvari Zoltan Send message Joined: 20 Jan 09 Posts: 2343 Credit: 16,201,255,749 RAC: 0 Level Scientific publications	Message 27912 - Posted: 2 Jan 2013 \| 0:39:58 UTC - in response to Message 27906.
	The task I had failed! ERROR: file deven.cpp line 1106: # Energies have become nan Perhaps it was one of the earlier tasks that failed on completion? It wasn't resent. Since then, it was resent to another host, so we will see. We have 17880 unsent workunits (and as low as 2174 in progress) at the moment, so a resend takes more time than usual.
	ID: 27912 \| Rating: 0 \| rate: / Reply Quote

skgiven Volunteer moderator Volunteer tester Send message Joined: 23 Apr 09 Posts: 3968 Credit: 1,995,359,260 RAC: 0 Level Scientific publications	Message 27915 - Posted: 2 Jan 2013 \| 13:16:19 UTC - in response to Message 27912.
	I have identified the root of the problem I was encountering, and it was simply that the GTX660Ti's fan remained/was stuck at 40%. I had it on a profile, so fan speed would increase with temperature, but after updating MSI Afterburner a couple of days back the profile was not applied to the GTX660Ti, it only applied to the GTX470. That's what I get for 'upgrading' software without any real need. ____________ FAQ's HOW TO: - Opt out of Beta Tests - Ask for Help
	ID: 27915 \| Rating: 0 \| rate: / Reply Quote

skgiven Volunteer moderator Volunteer tester Send message Joined: 23 Apr 09 Posts: 3968 Credit: 1,995,359,260 RAC: 0 Level Scientific publications	Message 27930 - Posted: 3 Jan 2013 \| 11:13:56 UTC - in response to Message 27915. Last modified: 3 Jan 2013 \| 14:00:31 UTC
	I've had another error on that system (GTX660Ti now at 62°C): 6286250 4012377 2 Jan 2013 \| 13:14:43 UTC 2 Jan 2013 \| 20:26:00 UTC Error while computing 18,859.67 1,537.28 --- Long runs (8-12 hours on fastest card) v6.16 (cuda42) Stderr output <core_client_version>7.0.42</core_client_version> <![CDATA[ <message> - exit code 98 (0x62) </message> <stderr_txt> MDIO: cannot open file "restart.coor" ERROR: file deven.cpp line 1106: # Energies have become nan called boinc_finish </stderr_txt> ]]> It also failed on another system using the 3.1app. 6285719 79738 2 Jan 2013 \| 8:55:50 UTC 2 Jan 2013 \| 10:36:53 UTC Error while computing 9.51 0.05 --- Long runs (8-12 hours on fastest card) v6.16 (cuda31) 6286250 139265 2 Jan 2013 \| 13:14:43 UTC 2 Jan 2013 \| 20:26:00 UTC Error while computing 18,859.67 1,537.28 --- Long runs (8-12 hours on fastest card) v6.16 (cuda42) 6287663 142106 2 Jan 2013 \| 23:35:09 UTC 7 Jan 2013 \| 23:35:09 UTC In progress --- --- --- Long runs (8-12 hours on fastest card) v6.16 (cuda42) I went through earlier WU failures and while most WU's eventually succeeded most of the resends failed on at least one other system, some failing numerous times. The issue seems to be the same for Long and Short WU's: http://www.gpugrid.net/results.php?hostid=139265&offset=0&show_names=0&state=5&appid= While the errors are mostly early in the runs, some occur late into the run. It's also an issue for both apps (3.2 and 4.2), and there seems to be quite a few 'error while downloading' failures. ____________ FAQ's HOW TO: - Opt out of Beta Tests - Ask for Help
	ID: 27930 \| Rating: 0 \| rate: / Reply Quote

skgiven Volunteer moderator Volunteer tester Send message Joined: 23 Apr 09 Posts: 3968 Credit: 1,995,359,260 RAC: 0 Level Scientific publications	Message 27940 - Posted: 5 Jan 2013 \| 13:00:30 UTC - in response to Message 27930.
	These probably belong in the Energies have become nan thread, but, 6294105 4017217 139265 4 Jan 2013 \| 23:14:14 UTC 5 Jan 2013 \| 12:31:59 UTC Error while computing 42,955.90 3,395.41 --- Long runs (8-12 hours on fastest card) v6.17 (cuda42) 6293240 112581 4 Jan 2013 \| 18:40:13 UTC 4 Jan 2013 \| 18:49:20 UTC Error while computing 2.16 2.09 --- Long runs (8-12 hours on fastest card) v6.17 (cuda42) 6294105 139265 4 Jan 2013 \| 23:14:14 UTC 5 Jan 2013 \| 12:31:59 UTC Error while computing 42,955.90 3,395.41 --- Long runs (8-12 hours on fastest card) v6.17 (cuda42) 6296657 --- --- --- Unsent --- --- --- ____________ FAQ's HOW TO: - Opt out of Beta Tests - Ask for Help
	ID: 27940 \| Rating: 0 \| rate: / Reply Quote

Post to thread

Message boards : Graphics cards (GPUs) : Compute error 1(0x1) on all units since last night

	About	Science	Volunteers	Performance	Forum	Join us	Donate

Author	Message
Alez Send message Joined: 17 Nov 12 Posts: 10 Credit: 185,958,753 RAC: 0 Level Scientific publications	Message 27896 - Posted: 1 Jan 2013 \| 12:49:56 UTC
	MY GTX 660 TI and GTX 650 just suddenly started erroring out every task all with the same error as far as I can tell. Name 2x11_8-NOELIA_hfXA_long-0-2-RND7200_1 Workunit 3977330 Created 1 Jan 2013 \| 5:44:59 UTC Sent 1 Jan 2013 \| 10:14:59 UTC Received 1 Jan 2013 \| 10:23:43 UTC Server state Over Outcome Computation error Client state Compute error Exit status 1 (0x1) Computer ID 138949 <core_client_version>7.0.33</core_client_version> <![CDATA[ <message> Incorrect function. (0x1) - exit code 1 (0x1) </message> ]]> Everything was working fine until last night. nVidia drivers 306,97 any ideas whats wrong ?
	ID: 27896 \| Rating: 0 \| rate: / Reply Quote

Retvari Zoltan Send message Joined: 20 Jan 09 Posts: 2343 Credit: 16,201,255,749 RAC: 0 Level Scientific publications	Message 27897 - Posted: 1 Jan 2013 \| 13:15:39 UTC - in response to Message 27896.
	MY GTX 660 TI and GTX 650 just suddenly started erroring out every task all with the same error as far as I can tell. <core_client_version>7.0.33</core_client_version> <![CDATA[ <message> Incorrect function. (0x1) - exit code 1 (0x1) </message> ]]> Everything was working fine until last night. nVidia drivers 306,97 any ideas whats wrong ? Sometimes the card (driver, or the OS) gets stuck, and only a restart can resolve it. Have you tried a system restart?
	ID: 27897 \| Rating: 0 \| rate: / Reply Quote

Alez Send message Joined: 17 Nov 12 Posts: 10 Credit: 185,958,753 RAC: 0 Level Scientific publications	Message 27898 - Posted: 1 Jan 2013 \| 13:27:57 UTC
	Just reset GPUgrid and away to restart system. Was wondering if there was a known error as i've already trashed 32 units so didn't want to keep trashing more.
	ID: 27898 \| Rating: 0 \| rate: / Reply Quote

skgiven Volunteer moderator Volunteer tester Send message Joined: 23 Apr 09 Posts: 3968 Credit: 1,995,359,260 RAC: 0 Level Scientific publications	Message 27900 - Posted: 1 Jan 2013 \| 15:10:42 UTC - in response to Message 27898. Last modified: 1 Jan 2013 \| 15:33:55 UTC
	I appear to have had a similar problem. It started last night, just after midnight CET. http://www.gpugrid.net/results.php?hostid=139265&offset=0&show_names=0&state=5&appid= Long tasks just started failing, one after the other. Most failed after ~200sec. They might have just been failing on my GTX660Ti, and not my GTX470's; a task was running on it. After I restarted the same task started to run on my GTX660Ti, and now seems to be progressing normally... GPUGrid stopped sending me work, so I will have to run some jobs from other projects and wait for my rating to improve before getting new tasks (only ~4h if the one task I have completes and reports successfully). As well as the possibility that this was cause by bad tasks, this could have been cause by a CPU Boinc project, Boinc, or be down to the driver (306.97 in my case). W7x64. Of the failed WU's, two tasks also failed on other systems: http://www.gpugrid.net/workunit.php?wuid=3977079 http://www.gpugrid.net/workunit.php?wuid=3977023 However some resends ran successfully, suggesting it's not an issue with GPUGrid. ____________ FAQ's HOW TO: - Opt out of Beta Tests - Ask for Help
	ID: 27900 \| Rating: 0 \| rate: / Reply Quote

Alez Send message Joined: 17 Nov 12 Posts: 10 Credit: 185,958,753 RAC: 0 Level Scientific publications	Message 27902 - Posted: 1 Jan 2013 \| 15:30:46 UTC
	Reset the project, did a clean nVidia driver update to 310.70 and rebooted. So far got 1 task and that seems to be running to completion. 15 more % to go and we will see....
	ID: 27902 \| Rating: 0 \| rate: / Reply Quote

skgiven Volunteer moderator Volunteer tester Send message Joined: 23 Apr 09 Posts: 3968 Credit: 1,995,359,260 RAC: 0 Level Scientific publications	Message 27906 - Posted: 1 Jan 2013 \| 21:34:30 UTC - in response to Message 27902.
	The task I had failed! http://www.gpugrid.net/result.php?resultid=6235285 Name 2x12_4-NOELIA_hfXA_long-0-2-RND6878_0 Workunit 3977346 Created 19 Dec 2012 \| 20:37:02 UTC Sent 1 Jan 2013 \| 5:58:06 UTC Received 1 Jan 2013 \| 17:01:28 UTC Server state Over Outcome Computation error Client state Compute error Exit status 98 (0x62) Computer ID 139265 Report deadline 6 Jan 2013 \| 5:58:06 UTC Run time 35,731.98 CPU time 30,914.86 Validate state Invalid Credit 0.00 Application version Long runs (8-12 hours on fastest card) v6.16 (cuda42) ERROR: file deven.cpp line 1106: # Energies have become nan Perhaps it was one of the earlier tasks that failed on completion? It wasn't resent. ____________ FAQ's HOW TO: - Opt out of Beta Tests - Ask for Help
	ID: 27906 \| Rating: 0 \| rate: / Reply Quote

Retvari Zoltan Send message Joined: 20 Jan 09 Posts: 2343 Credit: 16,201,255,749 RAC: 0 Level Scientific publications	Message 27912 - Posted: 2 Jan 2013 \| 0:39:58 UTC - in response to Message 27906.
	The task I had failed! ERROR: file deven.cpp line 1106: # Energies have become nan Perhaps it was one of the earlier tasks that failed on completion? It wasn't resent. Since then, it was resent to another host, so we will see. We have 17880 unsent workunits (and as low as 2174 in progress) at the moment, so a resend takes more time than usual.
	ID: 27912 \| Rating: 0 \| rate: / Reply Quote

skgiven Volunteer moderator Volunteer tester Send message Joined: 23 Apr 09 Posts: 3968 Credit: 1,995,359,260 RAC: 0 Level Scientific publications	Message 27915 - Posted: 2 Jan 2013 \| 13:16:19 UTC - in response to Message 27912.
	I have identified the root of the problem I was encountering, and it was simply that the GTX660Ti's fan remained/was stuck at 40%. I had it on a profile, so fan speed would increase with temperature, but after updating MSI Afterburner a couple of days back the profile was not applied to the GTX660Ti, it only applied to the GTX470. That's what I get for 'upgrading' software without any real need. ____________ FAQ's HOW TO: - Opt out of Beta Tests - Ask for Help
	ID: 27915 \| Rating: 0 \| rate: / Reply Quote

skgiven Volunteer moderator Volunteer tester Send message Joined: 23 Apr 09 Posts: 3968 Credit: 1,995,359,260 RAC: 0 Level Scientific publications	Message 27930 - Posted: 3 Jan 2013 \| 11:13:56 UTC - in response to Message 27915. Last modified: 3 Jan 2013 \| 14:00:31 UTC
	I've had another error on that system (GTX660Ti now at 62°C): 6286250 4012377 2 Jan 2013 \| 13:14:43 UTC 2 Jan 2013 \| 20:26:00 UTC Error while computing 18,859.67 1,537.28 --- Long runs (8-12 hours on fastest card) v6.16 (cuda42) Stderr output <core_client_version>7.0.42</core_client_version> <![CDATA[ <message> - exit code 98 (0x62) </message> <stderr_txt> MDIO: cannot open file "restart.coor" ERROR: file deven.cpp line 1106: # Energies have become nan called boinc_finish </stderr_txt> ]]> It also failed on another system using the 3.1app. 6285719 79738 2 Jan 2013 \| 8:55:50 UTC 2 Jan 2013 \| 10:36:53 UTC Error while computing 9.51 0.05 --- Long runs (8-12 hours on fastest card) v6.16 (cuda31) 6286250 139265 2 Jan 2013 \| 13:14:43 UTC 2 Jan 2013 \| 20:26:00 UTC Error while computing 18,859.67 1,537.28 --- Long runs (8-12 hours on fastest card) v6.16 (cuda42) 6287663 142106 2 Jan 2013 \| 23:35:09 UTC 7 Jan 2013 \| 23:35:09 UTC In progress --- --- --- Long runs (8-12 hours on fastest card) v6.16 (cuda42) I went through earlier WU failures and while most WU's eventually succeeded most of the resends failed on at least one other system, some failing numerous times. The issue seems to be the same for Long and Short WU's: http://www.gpugrid.net/results.php?hostid=139265&offset=0&show_names=0&state=5&appid= While the errors are mostly early in the runs, some occur late into the run. It's also an issue for both apps (3.2 and 4.2), and there seems to be quite a few 'error while downloading' failures. ____________ FAQ's HOW TO: - Opt out of Beta Tests - Ask for Help
	ID: 27930 \| Rating: 0 \| rate: / Reply Quote

skgiven Volunteer moderator Volunteer tester Send message Joined: 23 Apr 09 Posts: 3968 Credit: 1,995,359,260 RAC: 0 Level Scientific publications	Message 27940 - Posted: 5 Jan 2013 \| 13:00:30 UTC - in response to Message 27930.
	These probably belong in the Energies have become nan thread, but, 6294105 4017217 139265 4 Jan 2013 \| 23:14:14 UTC 5 Jan 2013 \| 12:31:59 UTC Error while computing 42,955.90 3,395.41 --- Long runs (8-12 hours on fastest card) v6.17 (cuda42) 6293240 112581 4 Jan 2013 \| 18:40:13 UTC 4 Jan 2013 \| 18:49:20 UTC Error while computing 2.16 2.09 --- Long runs (8-12 hours on fastest card) v6.17 (cuda42) 6294105 139265 4 Jan 2013 \| 23:14:14 UTC 5 Jan 2013 \| 12:31:59 UTC Error while computing 42,955.90 3,395.41 --- Long runs (8-12 hours on fastest card) v6.17 (cuda42) 6296657 --- --- --- Unsent --- --- --- ____________ FAQ's HOW TO: - Opt out of Beta Tests - Ask for Help
	ID: 27940 \| Rating: 0 \| rate: / Reply Quote