Message boards : Graphics cards (GPUs) : No GPUGRID jobs for over a week
Author | Message |
---|---|
I seem to have stopped running GPUGRID jobs for the last ~12 days, though other GPU BOINC projects run jobs and there seems enough jobs in the queue: Tue 18 Jul 2017 05:45:25 PM EDT | GPUGRID | update requested by user Local cc_config and client version calculus:~ # cat /var/lib/boinc/cc_config.xml Another odd thing is my project stats show 1 and only 1 task (from Jan 2016) but nothing credited since, but that's not correct. http://www.gpugrid.net/results.php?userid=63993 Any hints would be appreciated. | |
ID: 47632 | Rating: 0 | rate: / Reply Quote | |
Exit BOINC manager with stopping scientific applications, and then update your NVidia driver. | |
ID: 47633 | Rating: 0 | rate: / Reply Quote | |
The installed driver is latest Nvidia long lived (375.66) calculus:~ # nvidia-smi | |
ID: 47634 | Rating: 0 | rate: / Reply Quote | |
Updated proprietary Nvidia driver to latest short lived branch (381.22) but still not getting jobs. calculus:/home/paracelsus # nvidia-smi Are there other appropriate cc_config.xml debug flags that can be set to help determine why? I'm not seeing the reason in the log so far. Requests for new tasks always result in the following even when >2000 tasks available: Wed 19 Jul 2017 07:25:33 AM EDT | GPUGRID | update requested by user Thanks for any tips, I'd like to keep contributing to the project. | |
ID: 47635 | Rating: 0 | rate: / Reply Quote | |
Same problem I had....kept deferring communication..no tasks even though available. | |
ID: 47637 | Rating: 0 | rate: / Reply Quote | |
I've enable other debug flags but am unable to determine why no work units received since early July. Sat 22 Jul 2017 07:07:31 PM EDT | GPUGRID | [work_fetch] set_request() for CPU: ninst 5 nused_total 0.00 nidle_now 5.00 fetch share 1.00 req_inst 5.00 req_secs 216900.00 Sat 22 Jul 2017 07:07:31 PM EDT | GPUGRID | [work_fetch] set_request() for NVIDIA GPU: ninst 1 nused_total 0.00 nidle_now 1.00 fetch share 1.00 req_inst 1.00 req_secs 43380.00 Sat 22 Jul 2017 07:07:31 PM EDT | GPUGRID | [sched_op] Starting scheduler request Sat 22 Jul 2017 07:07:31 PM EDT | GPUGRID | [work_fetch] request: CPU (216900.00 sec, 5.00 inst) NVIDIA GPU (43380.00 sec, 1.00 inst) Sat 22 Jul 2017 07:07:31 PM EDT | GPUGRID | Sending scheduler request: To fetch work. Sat 22 Jul 2017 07:07:31 PM EDT | GPUGRID | Requesting new tasks for CPU and NVIDIA GPU Sat 22 Jul 2017 07:07:31 PM EDT | GPUGRID | [sched_op] CPU work request: 216900.00 seconds; 5.00 devices Sat 22 Jul 2017 07:07:31 PM EDT | GPUGRID | [sched_op] NVIDIA GPU work request: 43380.00 seconds; 1.00 devices Sat 22 Jul 2017 07:07:31 PM EDT | GPUGRID | [http] HTTP_OP::init_post(): http://www.ps3grid.net/PS3GRID_cgi/cgi Sat 22 Jul 2017 07:07:31 PM EDT | GPUGRID | [http] [ID#1] Info: Connection 4 seems to be dead! Sat 22 Jul 2017 07:07:31 PM EDT | GPUGRID | [http] [ID#1] Info: Closing connection 4 Sat 22 Jul 2017 07:07:31 PM EDT | GPUGRID | [http] [ID#1] Info: Connection 5 seems to be dead! Sat 22 Jul 2017 07:07:31 PM EDT | GPUGRID | [http] [ID#1] Info: Closing connection 5 Sat 22 Jul 2017 07:07:31 PM EDT | GPUGRID | [http] [ID#1] Info: Connection 7 seems to be dead! Sat 22 Jul 2017 07:07:31 PM EDT | GPUGRID | [http] [ID#1] Info: Closing connection 7 Sat 22 Jul 2017 07:07:31 PM EDT | GPUGRID | [http] [ID#1] Info: Connection 8 seems to be dead! Sat 22 Jul 2017 07:07:31 PM EDT | GPUGRID | [http] [ID#1] Info: Closing connection 8 Sat 22 Jul 2017 07:07:31 PM EDT | GPUGRID | [http] [ID#1] Info: Hostname www.ps3grid.net was found in DNS cache Sat 22 Jul 2017 07:07:31 PM EDT | GPUGRID | [http] [ID#1] Info: Trying 84.89.134.145... Sat 22 Jul 2017 07:07:31 PM EDT | GPUGRID | [http] [ID#1] Info: TCP_NODELAY set Sat 22 Jul 2017 07:07:31 PM EDT | GPUGRID | [http] [ID#1] Info: Connected to www.ps3grid.net (84.89.134.145) port 80 (#9) Sat 22 Jul 2017 07:07:31 PM EDT | GPUGRID | [http] [ID#1] Sent header to server: POST /PS3GRID_cgi/cgi HTTP/1.1 Sat 22 Jul 2017 07:07:31 PM EDT | GPUGRID | [http] [ID#1] Sent header to server: Host: www.ps3grid.net Sat 22 Jul 2017 07:07:31 PM EDT | GPUGRID | [http] [ID#1] Sent header to server: User-Agent: BOINC client (x86_64-pc-linux-gnu 7.6.33) Sat 22 Jul 2017 07:07:31 PM EDT | GPUGRID | [http] [ID#1] Sent header to server: Accept: */* Sat 22 Jul 2017 07:07:31 PM EDT | GPUGRID | [http] [ID#1] Sent header to server: Accept-Encoding: deflate, gzip Sat 22 Jul 2017 07:07:31 PM EDT | GPUGRID | [http] [ID#1] Sent header to server: Content-Type: application/x-www-form-urlencoded Sat 22 Jul 2017 07:07:31 PM EDT | GPUGRID | [http] [ID#1] Sent header to server: Accept-Language: en_US Sat 22 Jul 2017 07:07:31 PM EDT | GPUGRID | [http] [ID#1] Sent header to server: Content-Length: 10407 Sat 22 Jul 2017 07:07:31 PM EDT | GPUGRID | [http] [ID#1] Sent header to server: Expect: 100-continue Sat 22 Jul 2017 07:07:31 PM EDT | GPUGRID | [http] [ID#1] Sent header to server: Sat 22 Jul 2017 07:07:31 PM EDT | GPUGRID | [http] [ID#1] Sent header to server: urlencoded Sat 22 Jul 2017 07:07:31 PM EDT | GPUGRID | [http] [ID#1] Sent header to server: Accept-Language: en_US Sat 22 Jul 2017 07:07:31 PM EDT | GPUGRID | [http] [ID#1] Sent header to server: a Sat 22 Jul 2017 07:07:32 PM EDT | GPUGRID | [http] [ID#1] Received header from server: HTTP/1.1 100 Continue Sat 22 Jul 2017 07:07:32 PM EDT | GPUGRID | [http] [ID#1] Info: We are completely uploaded and fine Sat 22 Jul 2017 07:07:32 PM EDT | GPUGRID | [http] [ID#1] Received header from server: HTTP/1.1 200 OK Sat 22 Jul 2017 07:07:32 PM EDT | GPUGRID | [http] [ID#1] Received header from server: Date: Sat, 22 Jul 2017 23:07:32 GMT Sat 22 Jul 2017 07:07:32 PM EDT | GPUGRID | [http] [ID#1] Received header from server: Server: Apache/2.4.6 (CentOS) OpenSSL/1.0.1e-fips mod_auth_gssapi/1.3.1 mod_auth_kerb/5.4 mod_fcgid/2.3.9 PHP/5.4.16 mod_wsgi/3.4 Python/2.7.5 Sat 22 Jul 2017 07:07:32 PM EDT | GPUGRID | [http] [ID#1] Received header from server: Transfer-Encoding: chunked Sat 22 Jul 2017 07:07:32 PM EDT | GPUGRID | [http] [ID#1] Received header from server: Content-Type: text/xml Sat 22 Jul 2017 07:07:32 PM EDT | GPUGRID | [http] [ID#1] Received header from server: Sat 22 Jul 2017 07:07:32 PM EDT | GPUGRID | [http] [ID#1] Received header from server: fe8 Sat 22 Jul 2017 07:07:32 PM EDT | GPUGRID | [http] [ID#1] Received header from server: <scheduler_reply> Sat 22 Jul 2017 07:07:32 PM EDT | GPUGRID | [http] [ID#1] Received header from server: <scheduler_version>613</scheduler_version> Sat 22 Jul 2017 07:07:32 PM EDT | GPUGRID | [http] [ID#1] Received header from server: <master_url>http://www.gpugrid.net/</master_url> Sat 22 Jul 2017 07:07:32 PM EDT | GPUGRID | [http] [ID#1] Received header from server: <request_delay>31.000000</request_delay> Sat 22 Jul 2017 07:07:32 PM EDT | GPUGRID | [http] [ID#1] Received header from server: <message priority="low">No tasks sent</message> Sat 22 Jul 2017 07:07:32 PM EDT | GPUGRID | [http] [ID#1] Received header from server: <project_name>GPUGRID</project_name> Sat 22 Jul 2017 07:07:32 PM EDT | GPUGRID | [http] [ID#1] Received header from server: <next_rpc_delay>3600.000000</next_rpc_delay> Sat 22 Jul 2017 07:07:32 PM EDT | GPUGRID | [http] [ID#1] Received header from server: <userid>63993</userid> Sat 22 Jul 2017 07:07:32 PM EDT | GPUGRID | [http] [ID#1] Received header from server: <user_name>Paracelsus</user_name> Sat 22 Jul 2017 07:07:32 PM EDT | GPUGRID | [http] [ID#1] Received header from server: <user_total_credit>21424870.080154</user_total_credit> Sat 22 Jul 2017 07:07:32 PM EDT | GPUGRID | [http] [ID#1] Received header from server: <user_expavg_credit>7855.265631</user_expavg_credit> Sat 22 Jul 2017 07:07:32 PM EDT | GPUGRID | [http] [ID#1] Received header from server: <user_create_time>1281536516</user_create_time> Sat 22 Jul 2017 07:07:32 PM EDT | GPUGRID | [http] [ID#1] Received header from server: <email_hash>5aab033e6a675cbde84a2d225a74a6a8</email_hash> Sat 22 Jul 2017 07:07:32 PM EDT | GPUGRID | [http] [ID#1] Received header from server: <cross_project_id>9145123c6a8f6eb97a39746d118e87f2</cross_project_id> Sat 22 Jul 2017 07:07:32 PM EDT | GPUGRID | [http] [ID#1] Received header from server: <host_total_credit>20486475.000000</host_total_credit> Sat 22 Jul 2017 07:07:32 PM EDT | GPUGRID | [http] [ID#1] Received header from server: <host_expavg_credit>7833.474464</host_expavg_credit> Sat 22 Jul 2017 07:07:32 PM EDT | GPUGRID | [http] [ID#1] Received header from server: <host_venue></host_venue> Sat 22 Jul 2017 07:07:32 PM EDT | GPUGRID | [http] [ID#1] Received header from server: <host_create_time>1444494370</host_create_time> Sat 22 Jul 2017 07:07:32 PM EDT | GPUGRID | [http] [ID#1] Received header from server: <team_name></team_name> Sat 22 Jul 2017 07:07:32 PM EDT | GPUGRID | [http] [ID#1] Received header from server: <no_cpu_apps>0</no_cpu_apps> Sat 22 Jul 2017 07:07:32 PM EDT | GPUGRID | [http] [ID#1] Received header from server: <no_cuda_apps>0</no_cuda_apps> Sat 22 Jul 2017 07:07:32 PM EDT | GPUGRID | [http] [ID#1] Received header from server: <no_ati_apps>1</no_ati_apps> Sat 22 Jul 2017 07:07:32 PM EDT | GPUGRID | [http] [ID#1] Received header from server: <gui_urls> Sat 22 Jul 2017 07:07:32 PM EDT | GPUGRID | [http] [ID#1] Received header from server: <gui_url> Sat 22 Jul 2017 07:07:32 PM EDT | GPUGRID | [http] [ID#1] Received header from server: <name>Your account</name> Sat 22 Jul 2017 07:07:32 PM EDT | GPUGRID | [http] [ID#1] Received header from server: <description>View your account information and credit totals</description> Sat 22 Jul 2017 07:07:32 PM EDT | GPUGRID | [http] [ID#1] Received header from server: <url>http://www.gpugrid.net/show_user.php?userid=63993</url> Sat 22 Jul 2017 07:07:32 PM EDT | GPUGRID | [http] [ID#1] Received header from server: </gui_url> Sat 22 Jul 2017 07:07:32 PM EDT | GPUGRID | [http] [ID#1] Received header from server: Sat 22 Jul 2017 07:07:32 PM EDT | GPUGRID | [http] [ID#1] Received header from server: <gui_url> Sat 22 Jul 2017 07:07:32 PM EDT | GPUGRID | [http] [ID#1] Received header from server: <name>Your results</name> Sat 22 Jul 2017 07:07:32 PM EDT | GPUGRID | [http] [ID#1] Received header from server: <description>Your recently completed tasks</description> Sat 22 Jul 2017 07:07:32 PM EDT | GPUGRID | [http] [ID#1] Received header from server: <url>http://www.gpugrid.net/results.php?userid=63993</url> Sat 22 Jul 2017 07:07:32 PM EDT | GPUGRID | [http] [ID#1] Received header from server: </gui_url> Sat 22 Jul 2017 07:07:32 PM EDT | GPUGRID | [http] [ID#1] Received header from server: <gui_url> Sat 22 Jul 2017 07:07:32 PM EDT | GPUGRID | [http] [ID#1] Received header from server: <name>Server state</name> Sat 22 Jul 2017 07:07:32 PM EDT | GPUGRID | [http] [ID#1] Received header from server: <description>Status of GPUGRID's server</description> Sat 22 Jul 2017 07:07:32 PM EDT | GPUGRID | [http] [ID#1] Received header from server: <url>http://www.gpugrid.net/server_status.php</url> Sat 22 Jul 2017 07:07:32 PM EDT | GPUGRID | [http] [ID#1] Received header from server: </gui_url> Sat 22 Jul 2017 07:07:32 PM EDT | GPUGRID | [http] [ID#1] Received header from server: <gui_url> Sat 22 Jul 2017 07:07:32 PM EDT | GPUGRID | [http] [ID#1] Received header from server: <name>Science</name> Sat 22 Jul 2017 07:07:32 PM EDT | GPUGRID | [http] [ID#1] Received header from server: <description>Small contributions, great causes.</description> Sat 22 Jul 2017 07:07:32 PM EDT | GPUGRID | [http] [ID#1] Received header from server: <url>http://www.gpugrid.net/science.php</url> Sat 22 Jul 2017 07:07:32 PM EDT | GPUGRID | [http] [ID#1] Received header from server: </gui_url> Sat 22 Jul 2017 07:07:32 PM EDT | GPUGRID | [http] [ID#1] Received header from server: <gui_url> Sat 22 Jul 2017 07:07:32 PM EDT | GPUGRID | [http] [ID#1] Received header from server: <name>Donate</name> Sat 22 Jul 2017 07:07:32 PM EDT | GPUGRID | [http] [ID#1] Received header from server: <description>Thank you for considering a donation to GPUGRID</description> Sat 22 Jul 2017 07:07:32 PM EDT | GPUGRID | [http] [ID#1] Received header from server: <url>http://www.gpugrid.net/gpugrid_donations.php</url> Sat 22 Jul 2017 07:07:32 PM EDT | GPUGRID | [http] [ID#1] Received header from server: </gui_url> Sat 22 Jul 2017 07:07:32 PM EDT | GPUGRID | [http] [ID#1] Received header from server: <gui_url> Sat 22 Jul 2017 07:07:32 PM EDT | GPUGRID | [http] [ID#1] Received header from server: <name>Forum / Help</name> Sat 22 Jul 2017 07:07:32 PM EDT | GPUGRID | [http] [ID#1] Received header from server: <description>Questions, support and discussions</description> Sat 22 Jul 2017 07:07:32 PM EDT | GPUGRID | [http] [ID#1] Received header from server: Sat 22 Jul 2017 07:07:32 PM EDT | GPUGRID | [http] [ID#1] Info: Connection #9 to host www.ps3grid.net left intact Sat 22 Jul 2017 07:07:32 PM EDT | GPUGRID | Scheduler request completed: got 0 new tasks Sat 22 Jul 2017 07:07:32 PM EDT | GPUGRID | [sched_op] Server version 613 Sat 22 Jul 2017 07:07:32 PM EDT | GPUGRID | No tasks sent Sat 22 Jul 2017 07:07:32 PM EDT | GPUGRID | Project requested delay of 31 seconds Sat 22 Jul 2017 07:07:32 PM EDT | GPUGRID | [work_fetch] backing off CPU 870 sec Sat 22 Jul 2017 07:07:32 PM EDT | GPUGRID | [work_fetch] backing off NVIDIA GPU 301 sec Sat 22 Jul 2017 07:07:32 PM EDT | GPUGRID | [sched_op] Deferring communication for 00:00:31 Sat 22 Jul 2017 07:07:32 PM EDT | GPUGRID | [sched_op] Reason: requested by project Sat 22 Jul 2017 07:07:32 PM EDT | | [work_fetch] Request work fetch: RPC complete Sat 22 Jul 2017 07:07:37 PM EDT | | [work_fetch] ------- start work fetch state ------- Sat 22 Jul 2017 07:07:37 PM EDT | | [work_fetch] target work buffer: 180.00 + 43200.00 sec Sat 22 Jul 2017 07:07:37 PM EDT | | [work_fetch] --- project states --- Sat 22 Jul 2017 07:07:37 PM EDT | climateprediction.net | [work_fetch] REC 744.244 prio -0.768 can't request work: suspended via Manager Sat 22 Jul 2017 07:07:37 PM EDT | GPUGRID | [work_fetch] REC 5838.797 prio 0.000 can't request work: scheduler RPC backoff (25.92 sec) | |
ID: 47647 | Rating: 0 | rate: / Reply Quote | |
Have you tried the 384.47 Linux driver? | |
ID: 47648 | Rating: 0 | rate: / Reply Quote | |
Hi Jacob, | |
ID: 47649 | Rating: 0 | rate: / Reply Quote | |
It seems to be an issue with the server software. Might be impossible to troubleshoot further, without a GPUGRID dev/admin to look into it. Sorry. | |
ID: 47650 | Rating: 0 | rate: / Reply Quote | |
Perhaps the server put your host to the blacklist forever. To fix this you should try to force the BOINC manager to request a new host ID for your host. You can do it by stopping the BOINC manager, editing the client_state.xml, searching for <hostid>260678</hostid>, and replace the number to the number of a previous host of yours (or a random number, if you don't have an older host), saving the client_state.xml, and restaring the BOINC manager. Maybe it won't work for the first time, so you might try this a couple of times. | |
ID: 47653 | Rating: 0 | rate: / Reply Quote | |
Thanks for the suggestion on changing hostid in client_state.xml - and for the tip on stopping boinc manager (and verifying with ps) as otherwise the value reverts. Sun 23 Jul 2017 11:47:59 AM EDT | GPUGRID | [prio] recent est credit: 0.00G in 60.23 sec, 5564.327652 + -0.268874 ->5564.058778 Sun 23 Jul 2017 11:48:15 AM EDT | GPUGRID | [prio] -1.000000 rsf 1.000000 rt 5564.058778 rs 5564.058778 Sun 23 Jul 2017 11:48:15 AM EDT | | [work_fetch] ------- start work fetch state ------- Sun 23 Jul 2017 11:48:15 AM EDT | | [work_fetch] target work buffer: 17280.00 + 25920.00 sec Sun 23 Jul 2017 11:48:15 AM EDT | | [work_fetch] --- project states --- Sun 23 Jul 2017 11:48:15 AM EDT | GPUGRID | [work_fetch] REC 5564.059 prio -1.000 can request work Sun 23 Jul 2017 11:48:15 AM EDT | | [work_fetch] --- state for CPU --- Sun 23 Jul 2017 11:48:15 AM EDT | | [work_fetch] shortfall 345600.00 nidle 8.00 saturated 0.00 busy 0.00 Sun 23 Jul 2017 11:48:15 AM EDT | GPUGRID | [work_fetch] share 0.000 project is backed off (resource backoff: 114.76, inc 600.00) Sun 23 Jul 2017 11:48:15 AM EDT | | [work_fetch] --- state for NVIDIA GPU --- Sun 23 Jul 2017 11:48:15 AM EDT | | [work_fetch] shortfall 43200.00 nidle 1.00 saturated 0.00 busy 0.00 Sun 23 Jul 2017 11:48:15 AM EDT | GPUGRID | [work_fetch] share 0.000 project is backed off (resource backoff: 261.47, inc 1200.00) Sun 23 Jul 2017 11:48:15 AM EDT | | [work_fetch] ------- end work fetch state ------- Sun 23 Jul 2017 11:48:15 AM EDT | | [work_fetch] No project chosen for work fetch | |
ID: 47668 | Rating: 0 | rate: / Reply Quote | |
If you're asking how to read the work_fetch_debug, here goes. Pay attention. Sun 23 Jul 2017 11:48:15 AM EDT | | [work_fetch] ------- start work fetch state ------- A "work fetch iteration" has happened. This usually happens every few seconds, but can also happen when the user has changed something like Suspend/Resume, No-New-Work, Update-click, etc. Sun 23 Jul 2017 11:48:15 AM EDT | | [work_fetch] target work buffer: 17280.00 + 25920.00 sec Your current buffer settings are: Maintain at least 17280 seconds (0.2 days) of work for all resources, and when asking for work optionally ask for an additional 25920 seconds (0.3 days). Sun 23 Jul 2017 11:48:15 AM EDT | | [work_fetch] --- project states --- - "can request work" means that you are not actively setting suspend or no-new-tasks. - In this case, there is not a "project backoff" (which a project could request after you contact it). Sun 23 Jul 2017 11:48:15 AM EDT | | [work_fetch] --- state for CPU --- - "shortfall" is 345600 seconds. This is (17280 + 25920) * 8 CPUs. Basically, all your CPUs don't have any work. In fact, "nidle" (number idle), is 8, meaning all 8 cpu resources are currently idle. WE NEED CPU WORK! Sun 23 Jul 2017 11:48:15 AM EDT | | [work_fetch] --- state for NVIDIA GPU --- - "shortfall" is 43200 seconds. This is (17280 * 25920) * 1 NVIDIA GPU. "nidle" is 1. WE NEED GPU WORK! - GPUGRID says "project is backed off (resource backoff: 261.47, inc 1200.00)" ... This is a RESOURCE BACKOFF. It means, since you didn't get work for this resource type (NVIDIA GPU) last time you asked, then your BOINC Client backs off (stops asking) this project for work for this resource type... for a time interval (261.47 seconds remaining) that can exponentially increment (1200 on next increment) up to 24 hours. - Note: I believe clicking "Update" will clear any project backoffs or resouce backoffs. Sun 23 Jul 2017 11:48:15 AM EDT | | [work_fetch] ------- end work fetch state ------- No "request for work" for you. :) work_fetch_debug correctly decided: Do not ask GPUGrid for work. Sorry this doesn't help solve your problem. But now you know a bit about reading work_fetch_debug. | |
ID: 47670 | Rating: 0 | rate: / Reply Quote | |
Thanks Jacob! | |
ID: 47671 | Rating: 0 | rate: / Reply Quote | |
Clicking "Update", while project GPUGrid is highlighted, should clear its backoffs. | |
ID: 47672 | Rating: 0 | rate: / Reply Quote | |
I have also been having trouble getting tasks for my Win7 machine -- keep getting "no tasks available" when server status shows that plenty are available. This has been going on for several weeks. Now and then I will actually get a task -- but usually not. | |
ID: 47678 | Rating: 0 | rate: / Reply Quote | |
So long GPUGrid... | |
ID: 47687 | Rating: 0 | rate: / Reply Quote | |
Thanks as well for crunching for us. Sorry we can't help you right now but we are aware of the problems with the current implementation. Hopefully in the near future we will find time to improve on it. | |
ID: 47755 | Rating: 0 | rate: / Reply Quote | |
I just wished F@H was in Boinc environment. | |
ID: 47764 | Rating: 0 | rate: / Reply Quote | |
Hi, | |
ID: 47772 | Rating: 0 | rate: / Reply Quote | |
Message boards : Graphics cards (GPUs) : No GPUGRID jobs for over a week