Message boards : Graphics cards (GPUs) : Units failing
Author | Message |
---|---|
Can someone have a quick look and let me know the problem here, a few computed fine but most errored out. | |
ID: 58929 | Rating: 0 | rate: / Reply Quote | |
Can someone have a quick look and let me know the problem here, a few computed fine but most errored out. likely failing because your GT1030 doesn't have enough GPU memory. these python tasks use a lot of VRAM. GT1030 is probably too weak to run these kinds of tasks unfortunately. and yes it's normal to see that behavior with your RTX 3090. the app has intermittent GPU use ____________ | |
ID: 58930 | Rating: 0 | rate: / Reply Quote | |
Thanks, some other odd behaviour I see on the 3090 machine, it seems to start the WU at 2%, if I pause Boinc and restart later the units elapsed time resets to 0 and the percentage goes back to 2%? | |
ID: 58931 | Rating: 0 | rate: / Reply Quote | |
The program goes through several stages. The first and second 1% stages are unpacking files from an archive, and don't need to be repeated - progress will move to 2% instantly. | |
ID: 58932 | Rating: 0 | rate: / Reply Quote | |
I have the same problem. I have 109 units errored out (zero completed) between 2% and 4% completed. What is going on? | |
ID: 59412 | Rating: 0 | rate: / Reply Quote | |
A GT 1030 with 2047 MB of video RAM will be below the minimum specification to run these tasks. Sorry about that. | |
ID: 59413 | Rating: 0 | rate: / Reply Quote | |
There are so many GPU's out there with ONLY 2GB memory - it is inconceivable you are unable to harness this energy source. | |
ID: 59433 | Rating: 0 | rate: / Reply Quote | |
There are so many GPU's out there with ONLY 2GB memory Alas, but at the moment this is true. If you are interested in helping projects in the field of medicine, then you should pay attention to Folding@home. While this project is outside the BOINC ecosystem, it is undoubtedly worthy of attention. Its hardware requirements are quite modest and there are always tasks to crunch. | |
ID: 59434 | Rating: 0 | rate: / Reply Quote | |
Just had a unit fail on my other machine, W11 with the following: | |
ID: 59616 | Rating: 0 | rate: / Reply Quote | |
Outcome Computation error | |
ID: 59617 | Rating: 0 | rate: / Reply Quote | |
This simply means the task failed. Some GPUgrid tasks will fail. It is somewhat inherent in the type of computation they are doing. Errors will occur with some jobs for other reasons. | |
ID: 59618 | Rating: 0 | rate: / Reply Quote | |
Just had a unit fail on my other machine, W11 with the following: Are you running these on your cpu? | |
ID: 59619 | Rating: 0 | rate: / Reply Quote | |
The Python on GPU tasks ALWAYS run on the cpu. | |
ID: 59620 | Rating: 0 | rate: / Reply Quote | |
Just had a unit fail on my other machine, W11 with the following: ____________ Not just one CPU but also on all your cores plus GPU. Need to set your swap file to at least 50GB. It is memory hungry. On message boards click on news. Only the latest two or three threads concern Python and everything is being discussed on those threads. The rest are ACMED threads. Mikey, enjoy. | |
ID: 59621 | Rating: 0 | rate: / Reply Quote | |
Keith noted: The Python on GPU tasks ALWAYS run on the cpu. I see that happening with the windows version too. My last task completed in 14:45:35 but it shows 304,952.2 seconds (84.7 hrs) as well as same CPU time. That is a bit confusing to me but the fun is in the challenges. As for the 37 errors that preceded my 3 successful runs, they were caused by a lack of page file size according to the STDERR output: : [WinError 1455] The paging file is too small for this operation to complete. Error loading "C:\ProgramData\BOINC\slots\1\lib\site-packages\torch\lib\shm.dll" or one of its dependencies That host only has a 256GB m.2 drive in it and I had to free up 30GB of space in order for it to expand the virtual memory enough for the unpacked files to reside completely. My combined commit charge is usually around 53GB while running these Python apps. I let windoze create a second swap file on a SATA HHD in this host and now I have ample (windows managed) 44GB swap files on both drives. Haven't noticed any drop in performance, but haven't benchmarked it to know for sure. Anyone running multiple GPUs will need tons of swap file space I would speculate. I'm going to try to run these on other hosts but I'm having problems joining those hosts to GPUgrid. ____________ "Together we crunch To check out a hunch And wish all our credit Could just buy us lunch" Piasa Tribe - Illini Nation | |
ID: 59622 | Rating: 0 | rate: / Reply Quote | |
Outcome Computation error That only tells you that the WU failed and exited. To find out what actually caused the error you need to scroll down to the stderr output section. Look for events immediately above the line which reads "called BOINC finish". They are the fatal errors usually. These WUs require a 4GB graphics card as a minimum from current experience, although I will try to run one on a GTX1060 3GB if I can. They use about 2.8 GB graphics mem from observation. Be sure to give BOINC access to a large percentage of virtual memory, too. Python apps appear to me to run almost completely in virtual memory as my 16GBs of RAM are only half used. The CPU appears to use the GPU as a slave and claim the co-processing as its own CPU time. It appears to run a worker scenario where the GPU is intermittently called on to provide the math required for the scenario laid out by the wrapper program. Define rollouts storage (From stderr output of successful WU.) Looks like machine learning research. Cool. Someone please tell me if I'm assuming something wrong. ____________ "Together we crunch To check out a hunch And wish all our credit Could just buy us lunch" Piasa Tribe - Illini Nation | |
ID: 59623 | Rating: 0 | rate: / Reply Quote | |
You can use tail -F from mingw64 to read wrapper_run.out file. | |
ID: 59625 | Rating: 0 | rate: / Reply Quote | |
https://www.gpugrid.net/forum_thread.php?id=5233 | |
ID: 59626 | Rating: 0 | rate: / Reply Quote | |
I've started to see ACEMD 3 tasks are failing for me while Python GPU tasks run properly. | |
ID: 59641 | Rating: 0 | rate: / Reply Quote | |
Your driver supports CUDA 12, but the application is CUDA 11.3.1 ____________ | |
ID: 59642 | Rating: 0 | rate: / Reply Quote | |
Just had a unit fail on my other machine, W11 with the following: Thank you very much I think I will try these after the New Year | |
ID: 59644 | Rating: 0 | rate: / Reply Quote | |
There are so many GPU's out there with ONLY 2GB memory - it is inconceivable you are unable to harness this energy source. There are so many projects our there which will run fine on the Geforce 1030. :) | |
ID: 59663 | Rating: 0 | rate: / Reply Quote | |
Thx for the reply. Does this mean I should downgrade my driver to one that supports CUDA 11 to make ACEMD 3 App compatible? Python App runs properly with the current driver so this is confusing me a bit. | |
ID: 59664 | Rating: 0 | rate: / Reply Quote | |
no you don't need to do anything. CUDA is backwards compatible. ____________ | |
ID: 59665 | Rating: 0 | rate: / Reply Quote | |
Message boards : Graphics cards (GPUs) : Units failing