Message boards : Graphics cards (GPUs) : Linux to cuda2.2
Author | Message |
---|---|
Next week we will update the linux application to CUDA2.2 which should require a driver version 185.xx. | |
ID: 12455 | Rating: 0 | rate: / Reply Quote | |
Thanks for this info, GDF. | |
ID: 12457 | Rating: 0 | rate: / Reply Quote | |
I thought you guys switched back in August to CUDA 2.2.... | |
ID: 12469 | Rating: 0 | rate: / Reply Quote | |
only for windows | |
ID: 12471 | Rating: 0 | rate: / Reply Quote | |
i'm using 190.18 beta and CUDA 2.3, works fine for me. | |
ID: 12474 | Rating: 0 | rate: / Reply Quote | |
After running two applications in cuda 2.2 in Linux, the main difference seems to be that the cuda 2.2 application uses less CPU time. The overall speed of work unit completion seems about the same as is the credit granted. I don't know if the new application is any more stable or has any other advantages. | |
ID: 12521 | Rating: 0 | rate: / Reply Quote | |
I'm curious. The Linux version of the client has been bumped to require CUDA 2.2, but still downloads v2.1 of the libs? Is that to be expected? 21-Sep-2009 15:18:43 [http://www.gpugrid.net/] Master file download succeeded 21-Sep-2009 15:18:48 [http://www.gpugrid.net/] Sending scheduler request: Project initialization. 21-Sep-2009 15:18:48 [http://www.gpugrid.net/] Requesting new tasks for CPU and GPU 21-Sep-2009 15:18:53 [GPUGRID] Scheduler request completed: got 1 new tasks 21-Sep-2009 15:18:55 [GPUGRID] Started download of acemd_6.66_x86_64-pc-linux-gnu__cuda 21-Sep-2009 15:18:55 [GPUGRID] Started download of libcufft.so.2.1 21-Sep-2009 15:18:58 [GPUGRID] Finished download of libcufft.so.2.1 21-Sep-2009 15:18:58 [GPUGRID] Started download of libcudart.so.2.1 21-Sep-2009 15:18:59 [GPUGRID] Finished download of libcudart.so.2.1 | |
ID: 12584 | Rating: 0 | rate: / Reply Quote | |
We have just uploaded the new Linux app version 2.2. | |
ID: 12588 | Rating: 0 | rate: / Reply Quote | |
Erm.... You might want to re-think that... ;) 21-Sep-2009 16:20:21 [GPUGRID] Scheduler request completed: got 2 new tasks 21-Sep-2009 16:20:23 [GPUGRID] Started download of acemd_6.68_x86_64-pc-linux-gnu__cuda 21-Sep-2009 16:20:23 [GPUGRID] Started download of libcufft.so.2.2 21-Sep-2009 16:20:27 [GPUGRID] Finished download of libcufft.so.2.2 21-Sep-2009 16:20:27 [GPUGRID] Started download of libcudart.so.2.2 21-Sep-2009 16:20:28 [GPUGRID] Finished download of libcudart.so.2.2 ... 21-Sep-2009 16:20:47 [GPUGRID] Starting task p740000-IBUCH_2_pYEEI_2109-0-20-RND0599_0 using acemd version 668 21-Sep-2009 16:21:01 [GPUGRID] Computation for task p740000-IBUCH_2_pYEEI_2109-0-20-RND0599_0 finished <core_client_version>6.10.6</core_client_version> <![CDATA[ <message> process exited with code 1 (0x1, -255) </message> <stderr_txt> # Using CUDA device 0 # Device 0: "GeForce 8800 GS" # Clock rate: 1.45 GHz # Total amount of global memory: 401932288 bytes # Number of multiprocessors: 12 # Number of cores: 96 # Driver version 2020 # Runtime version 2020 MDIO ERROR: cannot open file "restart.coor" Cuda error: Kernel [mshake_position] failed in file 'mshake.cu' in line 99 : out of memory. </stderr_txt> ]]> | |
ID: 12589 | Rating: 0 | rate: / Reply Quote | |
Is this supposed to be a 'in-place' runtime upgrade or should we need to detach and re-attach to the project to avoid issues? | |
ID: 12591 | Rating: 0 | rate: / Reply Quote | |
You should not need to do anything. | |
ID: 12592 | Rating: 0 | rate: / Reply Quote | |
You should not need to do anything. This is ASSUMING you can get the 185.xx or later driver to load right? Four of my machines are crashing all GPUgrid WUs (using 180.44 driver) and the others are just a WU or two away. I've been trying to get 185.xx built but so far no luck. The NVIDIA...pkg#2.run install goes OK, says it can't find a kernel and builds one. But after a reboot the first thing loaded is still 180.44 and then things go bad. Obviously I'm missing something... The machine is dual boot... Do I need to do something to update grub so he points to the 'rebuilt' kernel from the .run installation? Anybody have 185.xx or better installed on Ubuntu Jaunty (v9.04) 64b using the 2.6.28 kernel? Sep 22 17:15:48 c17-desktop avahi-daemon[3321]: Server startup complete. Host name is c17-desktop.local. Local service cookie is 2274955198. Sep 22 17:15:48 c17-desktop kernel: [ 17.607019] nvidia 0000:01:00.0: PCI INT A -> GSI 16 (level, low) -> IRQ 16 Sep 22 17:15:48 c17-desktop kernel: [ 17.607026] nvidia 0000:01:00.0: setting latency timer to 64 Sep 22 17:15:48 c17-desktop kernel: [ 17.607169] NVRM: loading NVIDIA UNIX x86_64 Kernel Module 180.44 Tue Mar 24 05:46:32 PST 2009 Sep 22 17:15:48 c17-desktop anacron[3448]: Normal exit (0 jobs run) Sep 22 17:15:48 c17-desktop /usr/sbin/cron[3492]: (CRON) INFO (pidfile fd = 3) Sep 22 17:15:48 c17-desktop /usr/sbin/cron[3493]: (CRON) STARTUP (fork ok) Sep 22 17:15:48 c17-desktop kernel: [ 17.695658] NVRM: API mismatch: the client has the version 185.18.36, but Sep 22 17:15:48 c17-desktop kernel: [ 17.695659] NVRM: this kernel module has the version 180.44. Please Sep 22 17:15:48 c17-desktop kernel: [ 17.695659] NVRM: make sure that this kernel module and all NVIDIA driver Sep 22 17:15:48 c17-desktop kernel: [ 17.695660] NVRM: components have the same version. Sep 22 17:15:48 c17-desktop /usr/sbin/cron[3493]: (CRON) INFO (Running @reboot jobs) Sep 22 17:15:49 c17-desktop kernel: [ 18.315167] NVRM: API mismatch: the client has the version 185.18.36, but Sep 22 17:15:49 c17-desktop kernel: [ 18.315168] NVRM: this kernel module has the version 180.44. Please Sep 22 17:15:49 c17-desktop kernel: [ 18.315168] NVRM: make sure that this kernel module and all NVIDIA driver Sep 22 17:15:49 c17-desktop kernel: [ 18.315169] NVRM: components have the same version. Sep 22 17:15:51 c17-desktop acpid: client 3182[0:0] has disconnected Sep 22 17:15:51 c17-desktop acpid: client connected from 3570[0:0] Sep 22 17:15:51 c17-desktop kernel: [ 20.786084] NVRM: API mismatch: the client has the version 185.18.36, but Sep 22 17:15:51 c17-desktop kernel: [ 20.786085] NVRM: this kernel module has the version 180.44. Please Sep 22 17:15:51 c17-desktop kernel: [ 20.786086] NVRM: make sure that this kernel module and all NVIDIA driver Sep 22 17:15:51 c17-desktop kernel: [ 20.786086] NVRM: components have the same version. Sep 22 17:15:51 c17-desktop ntpdate[3259]: adjust time server 206.212.242.132 offset -0.213516 sec Sep 22 17:15:51 c17-desktop ntpd[3609]: ntpd [email protected] Wed May 13 21:10:45 UTC 2009 (1) Sep 22 17:15:51 c17-desktop ntpd[3610]: precision = 1.000 usec Sep 22 17:15:51 c17-desktop ntpd[3610]: Listening on interface #0 wildcard, 0.0.0.0#123 Disabled Sep 22 17:15:51 c17-desktop ntpd[3610]: Listening on interface #1 wildcard, ::#123 Disabled Sep 22 17:15:51 c17-desktop ntpd[3610]: Listening on interface #2 lo, ::1#123 Enabled Sep 22 17:15:51 c17-desktop ntpd[3610]: Listening on interface #3 eth1, fe80::21f:d0ff:fed4:8037#123 Enabled Sep 22 17:15:51 c17-desktop ntpd[3610]: Listening on interface #4 lo, 127.0.0.1#123 Enabled Sep 22 17:15:51 c17-desktop ntpd[3610]: Listening on interface #5 eth1, 192.168.218.17#123 Enabled Sep 22 17:15:51 c17-desktop ntpd[3610]: kernel time sync status 0040 Sep 22 17:15:51 c17-desktop ntpd[3610]: frequency initialized -27.498 PPM from /var/lib/ntp/ntp.drift Sep 22 17:15:53 c17-desktop kernel: [ 22.977509] eth1: no IPv6 routers present Sep 22 17:15:54 c17-desktop acpid: client 3570[0:0] has disconnected Sep 22 17:15:54 c17-desktop acpid: client connected from 3614[0:0] Sep 22 17:15:54 c17-desktop kernel: [ 23.879093] NVRM: API mismatch: the client has the version 185.18.36, but Sep 22 17:15:54 c17-desktop kernel: [ 23.879094] NVRM: this kernel module has the version 180.44. Please Sep 22 17:15:54 c17-desktop kernel: [ 23.879094] NVRM: make sure that this kernel module and all NVIDIA driver Sep 22 17:15:54 c17-desktop kernel: [ 23.879095] NVRM: components have the same version. Sep 22 17:15:54 c17-desktop gdm[3174]: CRITICAL: gdm_config_value_get_bool: assertion `value->type == GDM_CONFIG_VALUE_BOOL' failed Sep 22 17:15:57 c17-desktop acpid: client 3614[0:0] has disconnected Sep 22 17:15:57 c17-desktop acpid: client connected from 3670[0:0] Sep 22 17:16:03 c17-desktop kernel: [ 32.713652] mtrr: base(0xe5000000) is not aligned on a size(0xe00000) boundary Sep 22 17:16:57 c17-desktop console-kit-daemon[2947]: WARNING: Couldn't read /proc/2946/environ: Failed to open file '/proc/2946/environ': No such file or directory Sep 22 17:17:01 c17-desktop /USR/SBIN/CRON[3900]: (root) CMD ( cd / && run-parts --report /etc/cron.hourly) Sep 22 17:17:33 c17-desktop init: tty4 main process (2547) killed by TERM signal Sep 22 17:17:33 c17-desktop init: tty5 main process (2548) killed by TERM signal Sep 22 17:17:33 c17-desktop init: tty1 main process (3557) killed by TERM signal Sep 22 17:17:33 c17-desktop init: tty2 main process (2551) killed by TERM signal Sep 22 17:17:33 c17-desktop init: tty3 main process (2553) killed by TERM signal Sep 22 17:17:33 c17-desktop init: tty6 main process (2555) killed by TERM signal Sep 22 17:17:33 c17-desktop console-kit-daemon[2947]: WARNING: Unable to activate console: No such device or address Sep 22 17:17:52 c17-desktop exiting on signal 15 ____________ - da shu @ HeliOS, "A child's exposure to technology should never be predicated on an ability to afford it." | |
ID: 12636 | Rating: 0 | rate: / Reply Quote | |
For me, the Linux 2.2 applications are not only running well on my GTX 260 but they are running faster and producing more credit per day. For my system, this has been a real bonus. My system is Ubuntu 9.04 64bit running the Nvidia 190.32 drivers. | |
ID: 12648 | Rating: 0 | rate: / Reply Quote | |
For me, the Linux 2.2 applications are not only running well on my GTX 260 but they are running faster and producing more credit per day. For my system, this has been a real bonus. My system is Ubuntu 9.04 64bit running the Nvidia 190.32 drivers. Turns out the problem was that after "disabling" the 180.44 I needed to reboot back to the desktop w/o that driver BEFORE I started the terminal install of the Nvidia drivers. This pretty much got me thru your step #3. 190.32 now installed and at least I got my desktop back. I un-suspended the not-yet-started 6.68 WUs and the 1st one started up. Just checked back on it and the first got errors about 30 minutes into it. A 669-GIANNI... is running now. We'll see how it does. I've got a couple other machines with GTS-250s in them I need to go do. I think I'll try the 185.xx in the next one. H<core_client_version>6.4.5</core_client_version> <![CDATA[ <message> process exited with code 1 (0x1, -255) </message> <stderr_txt> # Using CUDA device 0 # Device 0: "GeForce GTX 260" # Clock rate: 1.24 GHz # Total amount of global memory: 938803200 bytes # Number of multiprocessors: 27 # Number of cores: 216 # Driver version 2030 # Runtime version 2020 MDIO ERROR: cannot open file "restart.coor" Cuda error: Kernel [PmeRealSpace_compute_forces] failed in file 'PmeRealSpace.cu' in line 172 : the launch timed out and was terminated. </stderr_txt> ]]> ____________ - da shu @ HeliOS, "A child's exposure to technology should never be predicated on an ability to afford it." | |
ID: 12652 | Rating: 0 | rate: / Reply Quote | |
If I were you, I would skip over the 185 drivers in the other machines and go straight to the 190.32 drivers. There seems to be no downside in using them and they will handle cuda 2.3 when the project gradually moves to that application. | |
ID: 12655 | Rating: 0 | rate: / Reply Quote | |
If I were you, I would skip over the 185 drivers in the other machines and go straight to the 190.32 drivers. There seems to be no downside in using them and they will handle cuda 2.3 when the project gradually moves to that application. One thing I noticed on 190.32... nvclock -s reports the 'current' GPU clock at the 2D setting. This is on the GTX-260. I don't think it's really running at 300MHz as it seems to be clicking along too fast for that but not sure. On the 2nd machine.. GTS-250 with 185.18.36 nvclock -s reports the proper stuff...not sure where the discrepancy is yet. But now I can't get work on this machine :-( Wed 23 Sep 2009 12:28:22 AM CDT|GPUGRID|Sending scheduler request: Requested by user. Requesting 259203 seconds of work, reporting 0 completed tasks Wed 23 Sep 2009 12:28:27 AM CDT|GPUGRID|Scheduler request completed: got 0 new tasks Wed 23 Sep 2009 12:28:27 AM CDT|GPUGRID|Message from server: No work sent Wed 23 Sep 2009 12:28:27 AM CDT|GPUGRID|Message from server: Full-atom molecular dynamics on Cell processor is not available for your type of computer. Any thoughts on how to make it get work other than coming up off of BOINC 6.4.5. I've already detached and re-attached. ____________ - da shu @ HeliOS, "A child's exposure to technology should never be predicated on an ability to afford it." | |
ID: 12662 | Rating: 0 | rate: / Reply Quote | |
Which version of nvclock are you running? Version 0.8 beta 4 should report correctly. You may also like to use nvidia-settings to look at temperature and overclocking. This is available in the Ubuntu repositories. nvclock-gtk is also available and should give you a more graphical look at your settings. | |
ID: 12676 | Rating: 0 | rate: / Reply Quote | |
Which version of nvclock are you running? Version 0.8 beta 4 should report correctly.That's what I'm using, from the Ubuntu Repositories. I have "nvclock -n 650 -m 1050 -f" embedded in the start function of the /etc/init.d/boinc-client script. This seems to work fine as nvclock -s reports the "Coolbits" 2D and 3D settings as 650 1050. However "Current" is right now showing 324 & 1053. This occurs ONLY on my main desktop with the GTX-260. Some of the other machines are also Ubuntu Jaunty 64b with the same boinc version and same 190.32 driver but with GTS-250 or 8800GT or 9600GSO cards. There is also a machine with the 185.18 driver and a 9800GT. And two machines with GTS-250s & boinc 6.4.5 from getdeb. None of the other machines show this anomaly. For now I'm assuming it's really running at 650MHz while crunching the WU. It's this WU that's running now. I think this is the last app version 6.68 WU I have. What sort of reported run time in seconds should I expect assuming the card was running at the default 3D clocks? Although I'm not sure I trust that number because I see the last valid WU I see with two machines doing it says that the other guy's GTS-250 took 1/16th the time my GTX-260 did!?? But my GTX-260 has a higher average (FreeDC) and RAC than any of my GTS-250s (by about 50%, as it should). Maybe I'll just reinstall nvclock and see if that does anything. You may also like to use nvidia-settings to look at temperature and overclocking. This is available in the Ubuntu repositories.Installed and using it also. nvclock-gtk is also available and should give you a more graphical look at your settings.Have never figured out to invoke it, have it installed. Any tips? I guess I expected it to show up in a menu. It's been awhile. The 3D clock speeds are higher and this is what your card is using.Sorry, but to this one I gotta say "no duh". ;-) As for getting work, check the followingLocal Prefs do not have "use GPU" checked but they have been cleared. I changed from using CPDN to GPUgrid for my preferences so I could make sure "use GPU" was on. It is and the machine (C17) with the funny nvclock is crunching the WU at what seems a normal rate. 2) You may prefer to use the latest version of BOINC manager 6.10.4 which sometimes does a better job of requesting work.Also upgraded boinc to 6.10.6 today on a few machines. Did the 190.32 driver last night. 3) Have you had any work units fail and with repeated requests are you over 9 work units requested and failed in the last 24 hours on the total number of machines on your account? If so, you may have to wait a day for more work to be sent. If this is the case, the problem will fix itself.It appears it was this what caused the other two machines to not get work last night. They both got work sometime earlier today. Anway, Thanx for your continued support. ____________ - da shu @ HeliOS, "A child's exposure to technology should never be predicated on an ability to afford it." | |
ID: 12702 | Rating: 0 | rate: / Reply Quote | |
BTW, nvclock-gtk that I'd tried some months ago... if you type it right... nvclock_gtk in a terminal... amazing the difference... LOL. | |
ID: 12707 | Rating: 0 | rate: / Reply Quote | |
There can a lot of difference from machine to machine on work unit times. For comparison here are some typical run times on recent work units on my GTX 260 with Ubuntu 9.04 64bit. | |
ID: 12716 | Rating: 0 | rate: / Reply Quote | |
Message boards : Graphics cards (GPUs) : Linux to cuda2.2