Message boards : Number crunching : ATMML work units erroring out with "Illegal instruction"
Author | Message |
---|---|
I have an older computer crunching full-time for WCG and GPUGRID. + python bin/rbfe_explicit_sync.py QB_A24_A36_asyncre.cntl run.sh: line 24: 12561 Illegal instruction (core dumped) python bin/rbfe_explicit_sync.py $CONFIG_FILE 2024-09-25 15:43:12 (12505): bin/bash exited; CPU time 19.896937 2024-09-25 15:43:12 (12505): app exit status: 0x84 2024-09-25 15:43:12 (12505): called boinc_finish(195) The full output of an example failed WU is here: https://www.gpugrid.net/result.php?resultid=36014912 Looking around online, this appears to be a common error with Python ML frameworks that include pre-compiled binaries built for newer CPU targets with support for SSE4, AVX, etc., where the user attempts to run it on a processor that doesn't support those instruction set extensions. Looking at the GPUGRID apps list (https://www.gpugrid.net/apps.php), the only CPU requirement listed is a 64-bit version of Windows or Linux running on an x86-64 processor. And if that listing is machine-generated in a way that doesn't list the actual CPU requirements, then GPUGRID's "Join Us" page (https://www.gpugrid.net/join.php) also similarly lists the CPU requirements as "64-bit" with at least one core, which literally any x86-64 CPU should be able to meet. If support for additional instruction set extensions beyond the base x86-64 spec is a requirement of ATMML, then fair enough -- this computer's processor is admittedly quite old (although the GPU is much newer and AFAIK supports all the GPUGRID apps currently running). However, if the CPU is only being used to run a CUDA app that does the actual heavy lifting, then ATMML's CPU requirements could be unnecessarily limiting the users who are able to run these jobs. For the time being, I've opted out of ATMML work units in my preferences. | |
ID: 61830 | Rating: 0 | rate: / Reply Quote | |
I found this related entry in the Linux kernel log, if it's helpful: traps: python[12561] trap invalid opcode ip:73ddfba85876 sp:7ffc7f5a2a90 error:0 in libOpenMMPME.so[73ddfba85000+3d000] | |
ID: 61831 | Rating: 0 | rate: / Reply Quote | |
my hunch is that you're right about the CPU being the issue. it lacks SSE4.1, SSE4.2, and AVX/2, and up. I wouldn't be surprised at all if SSE4 or AVX were used since those features are so ubiquitous in basically all x86_64 in the last decade. | |
ID: 61832 | Rating: 0 | rate: / Reply Quote | |
At least your 6GB 1060 card can crunch quantum chemistry task successfully with the cpu that you have. | |
ID: 61833 | Rating: 0 | rate: / Reply Quote | |
Message boards : Number crunching : ATMML work units erroring out with "Illegal instruction"