Author |
Message |
|
After a long time without errors my Vista host just had a computation error.
The task was 130123, and here's the output of stderr.out:
<core_client_version>6.4.0</core_client_version>
<![CDATA[
<message>
- exit code -1073741819 (0xc0000005)
</message>
<stderr_txt>
# Using CUDA device 0
# Device 0: "GeForce GTX 260"
# Clock rate: 1242000 kilohertz
# Number of multiprocessors: 24
# Number of cores: 192
MDIO ERROR: cannot open file "restart.coor"
Unhandled Exception Detected...
- Unhandled Exception Record -
Reason: Access Violation (0xc0000005) at address 0x001FC37F read attempt to address 0x0000000C
Engaging BOINC Windows Runtime Debugger...
********************
BOINC Windows Runtime Debugger Version 6.3.10
Dump Timestamp : 11/22/08 21:59:46
Install Directory :
Data Directory : F:\ProgramData\BOINC
Project Symstore :
LoadLibraryA( F:\ProgramData\BOINC\dbghelp.dll ): GetLastError = 126
Loaded Library : dbghelp.dll
LoadLibraryA( F:\ProgramData\BOINC\symsrv.dll ): GetLastError = 126
LoadLibraryA( symsrv.dll ): GetLastError = 126
LoadLibraryA( F:\ProgramData\BOINC\srcsrv.dll ): GetLastError = 126
LoadLibraryA( srcsrv.dll ): GetLastError = 126
LoadLibraryA( F:\ProgramData\BOINC\version.dll ): GetLastError = 126
Loaded Library : version.dll
Debugger Engine : 4.0.5.0
Symbol Search Path: F:\ProgramData\BOINC\slots\2;F:\ProgramData\BOINC\projects\www.gpugrid.net;srv*C:\Users\Stefan\AppData\Local\Temp\symbols*http://msdl.microsoft.com/download/symbols;srv*C:\Users\Stefan\AppData\Local\Temp\symbols*http://boinc.berkeley.edu/symstore
*** Dump of the Process Statistics: ***
- I/O Operations Counters -
Read: 2564, Write: 0, Other 447
- I/O Transfers Counters -
Read: 0, Write: 640, Other 0
- Paged Pool Usage -
QuotaPagedPoolUsage: 160888, QuotaPeakPagedPoolUsage: 160888
QuotaNonPagedPoolUsage: 9840, QuotaPeakNonPagedPoolUsage: 9840
- Virtual Memory Usage -
VirtualSize: 119664640, PeakVirtualSize: 119926784
- Pagefile Usage -
PagefileUsage: 40275968, PeakPagefileUsage: 47501312
- Working Set Size -
WorkingSetSize: 37261312, PeakWorkingSetSize: 44654592, PageFaultCount: 13140
*** Dump of thread ID 4876 (state: Ready): ***
- Information -
Status: Base Priority: Above Normal, Priority: Above Normal, , Kernel Time: 1092007.000000, User Time: 5772037.000000, Wait Time: 6260024.000000
- Unhandled Exception Record -
Reason: Access Violation (0xc0000005) at address 0x001FC37F read attempt to address 0x0000000C
- Registers -
eax=00000000 ebx=03d87990 ecx=0021e750 edx=000003cf esi=03d87990 edi=03d84498
eip=001fc37f esp=0017b3c0 ebp=03d46bd0
cs=0023 ss=002b ds=002b es=002b fs=0053 gs=002b efl=00010246
- Callstack -
ChildEBP RetAddr Args to Child
03d46bd0 03d84498 00000002 00000000 03d46ae0 00000000 tcl85!Tcl_GetString+0x0
03d46bd4 00000000 00000000 03d46ae0 00000000 00000003 tcl85!+0x0 SymFromAddr(): GetLastError = '126' SymGetLineFromAddr(): GetLastError = '126' SymGetModuleInfo(): GetLastError = '126' Address = '03d84498'
*** Dump of thread ID 5864 (state: Waiting): ***
- Information -
Status: Wait Reason: ExecutionDelay, , Kernel Time: 0.000000, User Time: 0.000000, Wait Time: 6260021.000000
- Registers -
eax=00000000 ebx=00000000 ecx=00000000 edx=00000000 esi=0286ff48 edi=00000000
eip=7752081d esp=0286ff08 ebp=0286ff6c
cs=0023 ss=002b ds=002b es=002b fs=0053 gs=002b efl=00000202
- Callstack -
ChildEBP RetAddr Args to Child
0286ff6c 76a50c88 00000064 00000000 0286ff94 00457abb ntdll!NtDelayExecution+0x0
0286ff7c 00457abb 00000064 00000000 76ace3f3 00000000 kernel32!Sleep+0x0
0286ff94 7757cfed 00000000 700ae735 00000000 00000000 acemd_6.52_windows_intelx86__cu!+0x0
0286ffd4 7757d1ff 00457ab0 00000000 00000000 00000000 ntdll!RtlCreateUserProcess+0x0
0286ffec 00000000 00457ab0 00000000 00000000 00000000 ntdll!RtlCreateProcessParameters+0x0
*** Debug Message Dump ****
*** Foreground Window Data ***
Window Name :
Window Class :
Window Process ID: 0
Window Thread ID : 0
Exiting...
</stderr_txt>
]]>
____________
pixelicious.at - my little photoblog |
|
|
|
I had a similar error a few days ago, haven't had one before or since like this one.
http://www.gpugrid.net/result.php?resultid=124412
Running Windows XP, not Vista.
Looks like a problem with the task 124412, due to many other errors. Your task is different and not yet sent to anyone else.
Maybe just a case of a few bad WU? |
|
|
dataman Send message
Joined: 18 Sep 08 Posts: 36 Credit: 100,352,867 RAC: 0 Level
Scientific publications
|
All of my wu's on my 9800GTX are now failing with the same message:
<core_client_version>6.3.21</core_client_version>
<![CDATA[
<message>
Incorrect function. (0x1) - exit code 1 (0x1)
</message>
<stderr_txt>
# Using CUDA device 0
# Device 0: "GeForce 9800 GTX/9800 GTX+"
# Clock rate: 1944000 kilohertz
# Number of multiprocessors: 16
# Number of cores: 128
MDIO ERROR: cannot open file "restart.coor"
</stderr_txt>
]]>
Windows Vista 64bit
My 88000GT runs them fine with the same configuration. My drivers are current with the CUDA site.
Any help would be appreciated.
____________
|
|
|
|
Reboot the PC? Scale your OC back?
MrS
____________
Scanning for our furry friends since Jan 2002 |
|
|
dataman Send message
Joined: 18 Sep 08 Posts: 36 Credit: 100,352,867 RAC: 0 Level
Scientific publications
|
Thanks but I have rebooted several times and I am not OC'd. I just had another one fail.
____________
|
|
|
|
The standard clock for 9800GTX+ is 1.83 GHz, for 9800GTX it's 1.69 GHz. Your card runs at 1.95 GHz and probably ~750 MHz core, so it's way above spec.. which may cause problems.
MrS
____________
Scanning for our furry friends since Jan 2002 |
|
|
dataman Send message
Joined: 18 Sep 08 Posts: 36 Credit: 100,352,867 RAC: 0 Level
Scientific publications
|
Thank you for the information ETA. Not sure what happened as the card is just as it was out of the box and I've not OC'ed the PC. Gives me a place to look. Thanks again.
____________
|
|
|
|
Factory overclocked cards are quite common with nVidia chips and some manufacturers are quite agressive with their choices. I suppose you have such one, maybe it's called "super clocked" or "AMP edition" or so. When I said "your overclock" I also had the factory-OC in mind and didn't want to imply that it had to be you who set that speed.
Regards,
MrS
____________
Scanning for our furry friends since Jan 2002 |
|
|
|
And what exactly had the incorrect function (exit code 1) question to do with the actual thread about the access violation? ;-)
____________
pixelicious.at - my little photoblog |
|
|
|
Good catch.. didn't bother to read the first part of the thread again ;)
MrS
____________
Scanning for our furry friends since Jan 2002 |
|
|
|
Bad, bad boy! ;-)
____________
pixelicious.at - my little photoblog |
|
|
JAMCSend message
Joined: 16 Nov 08 Posts: 28 Credit: 12,688,454 RAC: 0 Level
Scientific publications
|
I just had these two error out- 139978 after 0s run time and 138301 after 28,812s
http://www.gpugrid.net/result.php?resultid=139978
http://www.gpugrid.net/result.php?resultid=138301
These are both on the same machine and seemed to happen at the same time... anything to worry about? |
|
|
GDFVolunteer moderator Project administrator Project developer Project tester Volunteer developer Volunteer tester Project scientist Send message
Joined: 14 Mar 07 Posts: 1957 Credit: 629,356 RAC: 0 Level
Scientific publications
|
The first one is a new workunit which we are submitting with a different molecular model.
They work in our systems, but seem to have more problems in GPUGRID. Maybe, they are just more sensitive to overclocks. Just a few of them for now.
gdf |
|
|
STE\/ESend message
Joined: 18 Sep 08 Posts: 368 Credit: 2,643,546,048 RAC: 48,686,993 Level
Scientific publications
|
I've had a few here and there too, usually they error out in a few minutes but some took 2-3 hours before they erred out ... |
|
|