Advanced search

Message boards : Number crunching : 2 GPUs - but only 1 ATMML downloads - why?

Author Message
Erich56
Send message
Joined: 1 Jan 15
Posts: 1131
Credit: 9,960,232,676
RAC: 31,919,313
Level
Tyr
Scientific publications
watwatwatwatwatwatwatwatwat
Message 61881 - Posted: 13 Oct 2024 | 7:42:34 UTC

In one of my boxes which I bought 3 years ago, I've been crunching with 2 RTX3070. For the past months, mainly the ATMML tasks from GPUGRID, 1 task on each GPU.
Since last week though, only 1 task runs at a time, no second one is downloaded. Only when the running task gets finished, the next one downloads. This behaviour is totally in contrast to how it's been all the time before.
To make sure that the second GPU is not defective, I tried a few other GPU projects - they all downloaded 2 (or more) tasks and crunched 2 tasks in parallel (1 task each GPU, as usual).
Any idea what problem I'm having with GPUGRID?

Profile ServicEnginIC
Avatar
Send message
Joined: 24 Sep 10
Posts: 581
Credit: 9,572,812,024
RAC: 20,230,570
Level
Tyr
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 61882 - Posted: 13 Oct 2024 | 9:21:42 UTC - in response to Message 61881.

First, you can check that you have set fraction_done_exact for ATMML app at that host.
This is the way I have it configured at app_config.xml file:

<app_config>
<app>
<name>ATMML</name>
<gpu_versions>
<cpu_usage>0.99</cpu_usage>
<gpu_usage>1</gpu_usage>
</gpu_versions>
<fraction_done_exact/>
</app>
</app_config>

Once edited, you have to go to "Options" - "Read config files" at BOINC Manager to make it to take effect.
This affects the way the estimated completion time is calculated for the running ATMML tasks.
I'm running Linux. At Windows, I think that the route for this file is "C:\Program Data\Boinc\Projects\www.gpugrid.net\app_config.xml"
If it does not exist, you can create it by Notepad.

makracz
Send message
Joined: 9 May 24
Posts: 3
Credit: 1,145,605,000
RAC: 38,825,997
Level
Met
Scientific publications
wat
Message 61883 - Posted: 13 Oct 2024 | 11:42:01 UTC
Last modified: 13 Oct 2024 | 11:45:33 UTC

I think the answer might be simpler. At the moment, there's simply not enough tasks for Windows for the number of hosts. The number of unsent (ATMML) tasks is 0 for most of the time. New tasks trickle one at a time rather than appear in larger batches. So, it's up to the luck of your computer whether it hits the moment when a task is available for download. And they disappear very quickly, within seconds really.
The situation has improved over the weekend, but during the week my host barely managed to get one task per day.

Erich56
Send message
Joined: 1 Jan 15
Posts: 1131
Credit: 9,960,232,676
RAC: 31,919,313
Level
Tyr
Scientific publications
watwatwatwatwatwatwatwatwat
Message 61884 - Posted: 13 Oct 2024 | 11:55:27 UTC - in response to Message 61883.

I think the answer might be simpler. At the moment, there's simply not enough tasks for Windows for the number of hosts. The number of unsent (ATMML) tasks is 0 for most of the time. New tasks trickle one at a time rather than appear in larger batches. So, it's up to the luck of your computer whether it hits the moment when a task is available for download. And they disappear very quickly, within seconds really.
The situation has improved over the weekend, but during the week my host barely managed to get one task per day.

I am aware that the number of unsent tasks has been zero for about 2 weeks. But as long as the number of "tasks in process" is about 500 or higher (currently 700+), it's always been the case that a new task could be downloaded within a few hours. That's the experience I have made since I had joined GPUGRID.
Also, I can see that once a task was finished and uploaded either on this host or on one of the other 3 ones, it normally doesn't take longer than 1-2 hours until a new one comes in.
So I am sure the problem is a different one, unfurtunately :-(

Erich56
Send message
Joined: 1 Jan 15
Posts: 1131
Credit: 9,960,232,676
RAC: 31,919,313
Level
Tyr
Scientific publications
watwatwatwatwatwatwatwatwat
Message 61885 - Posted: 13 Oct 2024 | 11:59:08 UTC - in response to Message 61882.

ServicEnginIC wrote:

First, you can check that you have set fraction_done_exact for ATMML app at that host.
This is the way I have it configured at app_config.xml file:
...

thanks for the hint.
I added an app_config.xml according to your suggestion. However, this did not solve the problem :-(

Erich56
Send message
Joined: 1 Jan 15
Posts: 1131
Credit: 9,960,232,676
RAC: 31,919,313
Level
Tyr
Scientific publications
watwatwatwatwatwatwatwatwat
Message 61886 - Posted: 13 Oct 2024 | 15:33:23 UTC - in response to Message 61885.

ServicEnginIC wrote:

First, you can check that you have set fraction_done_exact for ATMML app at that host.
This is the way I have it configured at app_config.xml file:
...

thanks for the hint.
I added an app_config.xml according to your suggestion. However, this did not solve the problem :-(

shortly after I had added the above mentioned app_config.xml, the one running task got finished and uploaded, but no other task was downloaded for hours :-(
So I decided to remove the app_config.xml (maybe it does not work for Windows), and surprise: within a few minutes, not only 1 new task was downloaded, but 2 - one for each GPU.
So, the problem seems to be solved - although it's still not clear how it got solved.

Profile ServicEnginIC
Avatar
Send message
Joined: 24 Sep 10
Posts: 581
Credit: 9,572,812,024
RAC: 20,230,570
Level
Tyr
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 61891 - Posted: 14 Oct 2024 | 10:29:00 UTC - in response to Message 61886.

The effect of app_config.xml will persist until you re-read config files, or restart BOINC/computer...

Post to thread

Message boards : Number crunching : 2 GPUs - but only 1 ATMML downloads - why?

//