Advanced search

Message boards : Graphics cards (GPUs) : Compute error (access violation)

Author Message
Profile Stefan Ledwina
Avatar
Send message
Joined: 16 Jul 07
Posts: 464
Credit: 155,111,881
RAC: 1,677,369
Level
Ile
Scientific publications
watwatwatwatwatwatwatwat
Message 3931 - Posted: 22 Nov 2008 | 21:08:14 UTC

After a long time without errors my Vista host just had a computation error.
The task was 130123, and here's the output of stderr.out:

<core_client_version>6.4.0</core_client_version>
<![CDATA[
<message>
- exit code -1073741819 (0xc0000005)
</message>
<stderr_txt>
# Using CUDA device 0
# Device 0: "GeForce GTX 260"
# Clock rate: 1242000 kilohertz
# Number of multiprocessors: 24
# Number of cores: 192
MDIO ERROR: cannot open file "restart.coor"


Unhandled Exception Detected...

- Unhandled Exception Record -
Reason: Access Violation (0xc0000005) at address 0x001FC37F read attempt to address 0x0000000C

Engaging BOINC Windows Runtime Debugger...



********************


BOINC Windows Runtime Debugger Version 6.3.10


Dump Timestamp : 11/22/08 21:59:46
Install Directory :
Data Directory : F:\ProgramData\BOINC
Project Symstore :
LoadLibraryA( F:\ProgramData\BOINC\dbghelp.dll ): GetLastError = 126
Loaded Library : dbghelp.dll
LoadLibraryA( F:\ProgramData\BOINC\symsrv.dll ): GetLastError = 126
LoadLibraryA( symsrv.dll ): GetLastError = 126
LoadLibraryA( F:\ProgramData\BOINC\srcsrv.dll ): GetLastError = 126
LoadLibraryA( srcsrv.dll ): GetLastError = 126
LoadLibraryA( F:\ProgramData\BOINC\version.dll ): GetLastError = 126
Loaded Library : version.dll
Debugger Engine : 4.0.5.0
Symbol Search Path: F:\ProgramData\BOINC\slots\2;F:\ProgramData\BOINC\projects\www.gpugrid.net;srv*C:\Users\Stefan\AppData\Local\Temp\symbols*http://msdl.microsoft.com/download/symbols;srv*C:\Users\Stefan\AppData\Local\Temp\symbols*http://boinc.berkeley.edu/symstore




*** Dump of the Process Statistics: ***

- I/O Operations Counters -
Read: 2564, Write: 0, Other 447

- I/O Transfers Counters -
Read: 0, Write: 640, Other 0

- Paged Pool Usage -
QuotaPagedPoolUsage: 160888, QuotaPeakPagedPoolUsage: 160888
QuotaNonPagedPoolUsage: 9840, QuotaPeakNonPagedPoolUsage: 9840

- Virtual Memory Usage -
VirtualSize: 119664640, PeakVirtualSize: 119926784

- Pagefile Usage -
PagefileUsage: 40275968, PeakPagefileUsage: 47501312

- Working Set Size -
WorkingSetSize: 37261312, PeakWorkingSetSize: 44654592, PageFaultCount: 13140

*** Dump of thread ID 4876 (state: Ready): ***

- Information -
Status: Base Priority: Above Normal, Priority: Above Normal, , Kernel Time: 1092007.000000, User Time: 5772037.000000, Wait Time: 6260024.000000

- Unhandled Exception Record -
Reason: Access Violation (0xc0000005) at address 0x001FC37F read attempt to address 0x0000000C

- Registers -
eax=00000000 ebx=03d87990 ecx=0021e750 edx=000003cf esi=03d87990 edi=03d84498
eip=001fc37f esp=0017b3c0 ebp=03d46bd0
cs=0023 ss=002b ds=002b es=002b fs=0053 gs=002b efl=00010246

- Callstack -
ChildEBP RetAddr Args to Child
03d46bd0 03d84498 00000002 00000000 03d46ae0 00000000 tcl85!Tcl_GetString+0x0
03d46bd4 00000000 00000000 03d46ae0 00000000 00000003 tcl85!+0x0 SymFromAddr(): GetLastError = '126' SymGetLineFromAddr(): GetLastError = '126' SymGetModuleInfo(): GetLastError = '126' Address = '03d84498'

*** Dump of thread ID 5864 (state: Waiting): ***

- Information -
Status: Wait Reason: ExecutionDelay, , Kernel Time: 0.000000, User Time: 0.000000, Wait Time: 6260021.000000

- Registers -
eax=00000000 ebx=00000000 ecx=00000000 edx=00000000 esi=0286ff48 edi=00000000
eip=7752081d esp=0286ff08 ebp=0286ff6c
cs=0023 ss=002b ds=002b es=002b fs=0053 gs=002b efl=00000202

- Callstack -
ChildEBP RetAddr Args to Child
0286ff6c 76a50c88 00000064 00000000 0286ff94 00457abb ntdll!NtDelayExecution+0x0
0286ff7c 00457abb 00000064 00000000 76ace3f3 00000000 kernel32!Sleep+0x0
0286ff94 7757cfed 00000000 700ae735 00000000 00000000 acemd_6.52_windows_intelx86__cu!+0x0
0286ffd4 7757d1ff 00457ab0 00000000 00000000 00000000 ntdll!RtlCreateUserProcess+0x0
0286ffec 00000000 00457ab0 00000000 00000000 00000000 ntdll!RtlCreateProcessParameters+0x0


*** Debug Message Dump ****


*** Foreground Window Data ***
Window Name :
Window Class :
Window Process ID: 0
Window Thread ID : 0

Exiting...

</stderr_txt>
]]>

____________

pixelicious.at - my little photoblog

Profile K1atOdessa
Send message
Joined: 25 Feb 08
Posts: 249
Credit: 370,320,941
RAC: 0
Level
Asp
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 3932 - Posted: 22 Nov 2008 | 21:59:05 UTC - in response to Message 3931.

I had a similar error a few days ago, haven't had one before or since like this one.

http://www.gpugrid.net/result.php?resultid=124412

Running Windows XP, not Vista.

Looks like a problem with the task 124412, due to many other errors. Your task is different and not yet sent to anyone else.

Maybe just a case of a few bad WU?

Profile dataman
Avatar
Send message
Joined: 18 Sep 08
Posts: 36
Credit: 100,352,867
RAC: 0
Level
Cys
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwat
Message 3953 - Posted: 23 Nov 2008 | 19:27:59 UTC

All of my wu's on my 9800GTX are now failing with the same message:

<core_client_version>6.3.21</core_client_version>
<![CDATA[
<message>
Incorrect function. (0x1) - exit code 1 (0x1)
</message>
<stderr_txt>
# Using CUDA device 0
# Device 0: "GeForce 9800 GTX/9800 GTX+"
# Clock rate: 1944000 kilohertz
# Number of multiprocessors: 16
# Number of cores: 128
MDIO ERROR: cannot open file "restart.coor"

</stderr_txt>
]]>

Windows Vista 64bit

My 88000GT runs them fine with the same configuration. My drivers are current with the CUDA site.

Any help would be appreciated.
____________

ExtraTerrestrial Apes
Volunteer moderator
Volunteer tester
Avatar
Send message
Joined: 17 Aug 08
Posts: 2705
Credit: 1,311,122,549
RAC: 0
Level
Met
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 3959 - Posted: 23 Nov 2008 | 21:54:47 UTC

Reboot the PC? Scale your OC back?

MrS
____________
Scanning for our furry friends since Jan 2002

Profile dataman
Avatar
Send message
Joined: 18 Sep 08
Posts: 36
Credit: 100,352,867
RAC: 0
Level
Cys
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwat
Message 3961 - Posted: 23 Nov 2008 | 22:06:54 UTC - in response to Message 3959.

Thanks but I have rebooted several times and I am not OC'd. I just had another one fail.
____________

ExtraTerrestrial Apes
Volunteer moderator
Volunteer tester
Avatar
Send message
Joined: 17 Aug 08
Posts: 2705
Credit: 1,311,122,549
RAC: 0
Level
Met
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 3968 - Posted: 23 Nov 2008 | 22:57:48 UTC

The standard clock for 9800GTX+ is 1.83 GHz, for 9800GTX it's 1.69 GHz. Your card runs at 1.95 GHz and probably ~750 MHz core, so it's way above spec.. which may cause problems.

MrS
____________
Scanning for our furry friends since Jan 2002

Profile dataman
Avatar
Send message
Joined: 18 Sep 08
Posts: 36
Credit: 100,352,867
RAC: 0
Level
Cys
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwat
Message 3974 - Posted: 23 Nov 2008 | 23:25:37 UTC

Thank you for the information ETA. Not sure what happened as the card is just as it was out of the box and I've not OC'ed the PC. Gives me a place to look. Thanks again.
____________

ExtraTerrestrial Apes
Volunteer moderator
Volunteer tester
Avatar
Send message
Joined: 17 Aug 08
Posts: 2705
Credit: 1,311,122,549
RAC: 0
Level
Met
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 3978 - Posted: 23 Nov 2008 | 23:45:17 UTC

Factory overclocked cards are quite common with nVidia chips and some manufacturers are quite agressive with their choices. I suppose you have such one, maybe it's called "super clocked" or "AMP edition" or so. When I said "your overclock" I also had the factory-OC in mind and didn't want to imply that it had to be you who set that speed.

Regards,
MrS
____________
Scanning for our furry friends since Jan 2002

Profile Stefan Ledwina
Avatar
Send message
Joined: 16 Jul 07
Posts: 464
Credit: 155,111,881
RAC: 1,677,369
Level
Ile
Scientific publications
watwatwatwatwatwatwatwat
Message 3987 - Posted: 24 Nov 2008 | 3:32:49 UTC

And what exactly had the incorrect function (exit code 1) question to do with the actual thread about the access violation? ;-)
____________

pixelicious.at - my little photoblog

ExtraTerrestrial Apes
Volunteer moderator
Volunteer tester
Avatar
Send message
Joined: 17 Aug 08
Posts: 2705
Credit: 1,311,122,549
RAC: 0
Level
Met
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 4007 - Posted: 24 Nov 2008 | 20:56:39 UTC

Good catch.. didn't bother to read the first part of the thread again ;)

MrS
____________
Scanning for our furry friends since Jan 2002

Profile Stefan Ledwina
Avatar
Send message
Joined: 16 Jul 07
Posts: 464
Credit: 155,111,881
RAC: 1,677,369
Level
Ile
Scientific publications
watwatwatwatwatwatwatwat
Message 4031 - Posted: 25 Nov 2008 | 13:56:35 UTC - in response to Message 4007.

Bad, bad boy! ;-)
____________

pixelicious.at - my little photoblog

JAMC
Send message
Joined: 16 Nov 08
Posts: 28
Credit: 12,688,454
RAC: 0
Level
Pro
Scientific publications
watwatwatwatwatwat
Message 4091 - Posted: 30 Nov 2008 | 0:25:07 UTC

I just had these two error out- 139978 after 0s run time and 138301 after 28,812s

http://www.gpugrid.net/result.php?resultid=139978
http://www.gpugrid.net/result.php?resultid=138301

These are both on the same machine and seemed to happen at the same time... anything to worry about?

Profile GDF
Volunteer moderator
Project administrator
Project developer
Project tester
Volunteer developer
Volunteer tester
Project scientist
Send message
Joined: 14 Mar 07
Posts: 1957
Credit: 629,356
RAC: 0
Level
Gly
Scientific publications
watwatwatwatwat
Message 4092 - Posted: 30 Nov 2008 | 11:21:21 UTC - in response to Message 3931.
Last modified: 30 Nov 2008 | 11:22:32 UTC

The first one is a new workunit which we are submitting with a different molecular model.
They work in our systems, but seem to have more problems in GPUGRID. Maybe, they are just more sensitive to overclocks. Just a few of them for now.

gdf

STE\/E
Send message
Joined: 18 Sep 08
Posts: 368
Credit: 2,643,546,048
RAC: 48,686,993
Level
Phe
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwat
Message 4093 - Posted: 30 Nov 2008 | 12:36:18 UTC

I've had a few here and there too, usually they error out in a few minutes but some took 2-3 hours before they erred out ...

Post to thread

Message boards : Graphics cards (GPUs) : Compute error (access violation)

//