Message boards : Wish list : SWAN_SYNC in Linux client
Author | Message |
---|---|
I'm convinced that the lack of "SWAN_SYNC" in the Linux client is hindering this project. #1 #2
CPU: i7-4790K @ 4.4GHz i3-4160 @ 3.6GHz,
M/B: Gigabyte GA-Z87X-OC MB ASUS B85M-G
RAM: 2x4GB DDR3 1866MHz 2x4GB DDR3 1333MHz
GPU: Gigabyte GTX 980Ti WF3OC Zotac GTX 980Ti AMP
CLK: 1315MHz, v384.13 1354MHz, v359.6
OS: Linux (Ubuntu 14.04 LTS) Windows XP x64, SWAN_SYNC ON
No CPU tasks were running on these hosts during the test.Tasklists: #1: Linux and #2: WinXPx64 with SWAN_SYNC ON Note that until 29 July the Linux client had a GTX 1080Ti, which was about as fast as a GTX 980Ti in my Windows XP x64 host. Here are the test results so far: -- run time --
Workunit batch credits #atoms #1 #2
ADRIA_FOLDT1019s2_v2_predicted_pred_ss_contacts_50_T1019s2 63,300 11,269 10,117 8,236
ADRIA_FOLDT1015_v2_predicted_pred_ss_contacts_50_T1015s1 63,300 11,184 10,200 8,210
ADRIA_FOLDUCB_NTL9_50_crystal_crystal_ucb_50_ntl9 63,750 11,340 11,000 8,410
PABLO_2IDP_P10275_1_ALAP23W_IDP 110,400 24,609 12,747 -----
PABLO_2IDP_P01106_1_ALAP26W_IDP 110,400 24,591 ------ 9,441
PABLO_2IDP_P01106_1_ASNP3P_IDP 110,400 24,469 12,692 -----
PABLO_2IDP_P10275_1_ARGP27P_IDP 110,400 24,577 ------ 9,426
PABLO_2IDP_P10275_1_ALAP23Y_IDP 110,400 24,639 11,970 -----
PABLO_2IDP_P01106_2_ARGP7P_IDP 110,400 24,593 ------ 9,433
PABLO_2IDP_P01106_1_ALAP26P_IDP 110,400 24,596 12,008 -----
PABLO_2IDP_P61244_0_ARGP38P_IDP 110,400 24,481 ------ 9,457
The Windows XP x64 client with SWAN_SYNC ON is faster than the Linux client by 19-31% which is pretty high to ignore. (I've Normalized the percentage calculation by the GPU clock ratio of the two hosts: the GPU in the Windows XP x64 host is 3% faster) It is a myth that Linux does not need SWAN_SYNC. So I urge the GPUGrid staff to put the SWAN_SYNC option back in the Linux client, and publish how to make it work. It should be easy, as the Linux client puts the following line in its output: # CUDA Synchronisation mode: BLOCKING If it's more convenient, there could be two Linux app: the present one, and anonther with SWAN_SYNC ON. The user could select in his/her profile the preferred one. The default should be the present one (without SWAN_SYNC). | |
ID: 50180 | Rating: 0 | rate: / Reply Quote | |
That is a very nice comparison, and it certainly would be nice to give SWAN_SYNC in Linux a try. Note that until 29 July the Linux client had a GTX 1080Ti, which was about as fast as a GTX 980Ti in my Windows XP x64 host. This seems a little strange. The 1080Ti in Linux should easily beat the 980Ti in WinXP, unless it is somehow being limited. Maybe there is a memory or bus limitation of some sort? | |
ID: 50181 | Rating: 0 | rate: / Reply Quote | |
That is a very nice comparison, and it certainly would be nice to give SWAN_SYNC in Linux a try.There's no such limitation: 1. The iGPU is off 2. The NVidia GPU runs at PCIe3.0x16 3. The (CPU) memory is running in dual channel mode 4. No CPU tasks are running The only thing missing is SWAN_SYNC I can swap the OS between the two hosts, but there's no point in it. | |
ID: 50182 | Rating: 0 | rate: / Reply Quote | |
It is probably the memory bus on the cards themselves that is the limitation. The GTX 980 Ti has 384 bits, while the GTX 1080 Ti has only 352 bits. While the memory on the 1080 Ti is faster, there is probably not much difference in bandwidth, which is probably what is limiting the performance of the 1080 Ti. | |
ID: 50183 | Rating: 0 | rate: / Reply Quote | |
It is probably the memory bus on the cards themselves that is the limitation. The GTX 980 Ti has 384 bits, while the GTX 1080 Ti has only 352 bits. While the memory on the 1080 Ti is faster, there is probably not much difference in bandwidth, which is probably what is limiting the performance of the 1080 Ti.This is nonsense to put a lower bandwidth memory to a faster GPU. The NVidia homepage is a little inconsistent (or I should call erratic) by the definition of Memory Bandwidth, (which is simply the memory clock multiplied by the bus width in bits divided by 8 to convert it to bytes). GTX 980 Ti Memory Specs:
Memory Clock 7.01 Gbps
Standard Memory Config 6 GB
Memory Interface GDDR5
Memory Interface Width 384-bit
Memory Bandwidth (GB/sec) 336.5 7.01*384/8=336.48 GigaByte/secGTX 1080 Ti Memory Specs:
Standard Memory Config 11 GB GDDR5X
Memory Interface Width 352-bit
Memory Bandwidth (GB/sec) 11 Gbps 11*352/8=484 GigaByte/sec484/336.48=1,4384 So in reality the GTX 1080 Ti has 43,84% more memory bandwidth over the GTX 980 Ti. | |
ID: 50184 | Rating: 0 | rate: / Reply Quote | |
That may explain the bandwidth, but it just confirms that the GTX 1080 Ti should be faster. I will be interested to see the numbers if they can enable SWAN_SYNC on Linux. Good luck in your effort to jog them on that. | |
ID: 50186 | Rating: 0 | rate: / Reply Quote | |
Correct me if I'm wrong, but I think the main bottleneck for this application for operating systems without SWAN_SYNC is when the data comes over the PCIe BUS from the GPU, it is waiting in system memory for the CPU to do what it needs to do (I assume double precision compute) and send it back to the GPU for GPGPU compute. | |
ID: 50187 | Rating: 0 | rate: / Reply Quote | |
With SWAN_SYNC it leaves one thread spun up so there is no delay in the data stream leading to a much more efficient process. As with Zoltan's data, you have up to 2000 seconds of time savings which on a WU is over 20% saved time AND power. The thing about processors is, even if they don't have data to compute, they still use almost the same amount of power when at max clock speed (which our GPUs are.) The only thing that SWAN_SYNC can do is provide more processor support to the GPU. But all the other cores were free in Zoltan's tests anyway, so it is not clear how SWAN_SYNC can improve that. (Maybe locking a given CPU core to the GPU makes it faster?) | |
ID: 50188 | Rating: 0 | rate: / Reply Quote | |
Locking one thread will provide the least latency. Because the computation replies so much on the latency of the CPU, things like CPU clock speed, PCIe bandwidth, memory speed and operating system optimizations like SWAN_SYNC lower the latency far more than without SWAN_SYNC. | |
ID: 50189 | Rating: 0 | rate: / Reply Quote | |
Maybe. But that says that allowing the OS to chose the right core is worse than locking the GPU to a given core. That might be true when you have multiple CPU projects running. I am not sure it will show up in tests with all the cores free. So I think you will have to test it under varying conditions to see the real effect in practice. | |
ID: 50190 | Rating: 0 | rate: / Reply Quote | |
Correct me if I'm wrong, but I think the main bottleneck for this application for operating systems without SWAN_SYNC is when the data comes over the PCIe BUS from the GPU, it is waiting in system memoryI think the data is waiting in the GPU's memory. ... for the CPU to do what it needs to do (I assume double precision compute)As far as we know it that's the case and send it back to the GPU for GPGPU compute.Actually without SWAN_SYNC the app gives control back to the OS. When the GPU finished a bit of calculation it gives an interrupt to the OS (signaling it needs attention). This interrupt takes thousands of CPU cycles to process, as the state of the interrupted process should be saved (even if it's the "system idle" process), and the state of the GPUGrid app should be restored to be able to continue from where it gave back control to the OS. With SWAN_SYNC ON this step is obviously omitted so there's much less latency in each step resulting in much faster processing. The faster the GPU the larger the performance loss. See my old post about the comparison of the two way of processing. | |
ID: 50191 | Rating: 0 | rate: / Reply Quote | |
I don't have very fast GPUs on at the moment, only a couple of GTX 750 Ti's. But they show 95 to 96% utilization under Linux. I will try out a GTX 1070 if they get SWAN_SYNC enabled, to see more of a difference. It looks promising, but it has been ages since I used it last under Windows. | |
ID: 50192 | Rating: 0 | rate: / Reply Quote | |
I've compared the run times of my GTX 1080Tis under different OS. | |
ID: 50213 | Rating: 0 | rate: / Reply Quote | |
Very interesting thread. I don't have any answers but an observation. At first I thought those CPUs are different. Linux rig has an i7-4790k with 16 lanes, 4 GHz & 8 MB cache. WinXP rig has an i3-4160 with 16 lanes, 3.6 GHz & 4 MB cache. I've seen where MIP WUs at WCG choke for lack of L3 cache but here smaller cache computer is faster. | |
ID: 50471 | Rating: 0 | rate: / Reply Quote | |
I've compared the run times of my GTX 1080Tis under different OS. This may be the wrong place to ask a Windows 7 64-bit question, but my GTX 970 is running the same time (7 1/2 hours) on a PABLO_2IDP whether I have swan_sync enabled or not. I am using the 373.06 drivers (CUDA 8), which used to be the best ones, but maybe no longer are. What is the current CUDA version I should be using? | |
ID: 50524 | Rating: 0 | rate: / Reply Quote | |
Well it made a small difference. | |
ID: 50526 | Rating: 0 | rate: / Reply Quote | |
... I assume nothing was plugged into either of the 1x slots.True. I build single GPU hosts to maximize the performance of the GPUGrid app. The Asus manual has no block diagram so maybe they use all 16 CPU lanes to run the single 16x slot and do all other bus functions with their B85 chip.That's the case. Nothing is plugged into the other PCIe slots. The GA-Z87X may be the issue. A much better manual with a block diagram shows that only one of the four 16x slots is capable of 3.0 x16 speed and that's if nothing else is plugged in.The GPU is plugged into the slot closest to the CPU (the x16 one). Nothing else is plugged into the other PCIe slots. I've bought this MB when I thought that I should build dual GPU hosts, but it turned out that they hinder each other's performance, so I gave up putting a second GPU in the same MB. The Z87 handles the 1x slots and other bus functions. Is there a way with Linux to monitor actual bus speed???I don't know. Also, the BIOS usually has a version of an option for the PCIe speed: Auto, Gen 1, Gen 2 or Gen 3. I set all mine to Gen 3 from the default Auto.This host had Windows XP before, and the card run on PCIe3.0x16. I didn't change the hardware, I just installed Linux on that host to compare its performance. | |
ID: 50533 | Rating: 0 | rate: / Reply Quote | |
What is the current CUDA version I should be using?CUDA 8 is fine at the moment. But you should know that NVidia changes (hopefully improves) the CUDA driver also, so later drivers with only CUDA 8 support could have better performance, than the latest drivers. (I haven't tested it.) The latest CUDA version is 9.2.217 | |
ID: 50534 | Rating: 0 | rate: / Reply Quote | |
Well it made a small difference.You can't really test in on an OS with WDDM, as WDDM in itself cause a lot of latency. I would rather have the CPU.I don't want to force SWAN_SYNC on everyone, I just like to have the choice to use it. Now I'm forced not to use it by the lack of this option. And I don't think drivers would make a difference.They did made a difference back in the CUDA6.5 times under Windows XP, as the fastest driver was the 359.06. The later drivers had CUDA 8, and it somehow made them a bit (3-6% as far as I can recall) slower with the CUDA 6.5 GPUGrid app. My Windows XP hosts are still using this driver. I have only one Windows XP host online at the moment (with a GTX TITAN X GPU, which is basically a GTX 980 Ti with all the CUDA cores), but it still can beat hosts with a GTX 1080 Ti. Check the "Performance" page, section "Top performers per batch", batch PABLO_2IDP_P01106_2_GLUP20P_ID. My Windows XP host is at the 7th place, you can find a host with a GTX TITAN Xp at the 23rd place. There are 8 hosts with GTX 1080 Ti which are ranked lower than my GTX TITAN X under Windows XP with SWAN_SYNC ON. | |
ID: 50536 | Rating: 0 | rate: / Reply Quote | |
Yes, I am sure it is useful. It would just take a faster card than I use at the moment. | |
ID: 50537 | Rating: 0 | rate: / Reply Quote | |
To answer my on question: /lib/systemd/system/boinc-client.service This file should be edited as 'root'.In the [service] section of that file there should be a line containing: Environment="SWAN_SYNC=1" It should be exactly like the above (the SWAN_SYNC should be capitalised, and the quotation marks should be there.)The host should be rebooted to make this change take effect. Thanks to Rod4x4 for this solution! | |
ID: 50807 | Rating: 0 | rate: / Reply Quote | |
Hey Zoltan, what do you mean by edited as root? Does this mean you change User=boinc to User=root? | |
ID: 50812 | Rating: 0 | rate: / Reply Quote | |
Hey Zoltan, what do you mean by edited as root? Does this mean you change User=boinc to User=root? Try running the text editor as root; for example: sudo gedit /lib/systemd/system/boinc-client.service | |
ID: 50813 | Rating: 0 | rate: / Reply Quote | |
Good to hear that Zoltan and Rod4X4 worked it out how to get SWAN_SYNC=1 working on repository versions of BOINC. My posts in the Linux thread seems to have provided the necessary dialog and testing for the solution to be exposed. | |
ID: 50814 | Rating: 0 | rate: / Reply Quote | |
That's what I meant.Hey Zoltan, what do you mean by edited as root? Does this mean you change User=boinc to User=root?Try running the text editor as root; for example: | |
ID: 50815 | Rating: 0 | rate: / Reply Quote | |
NOTE: 1. Ensure the Environment= line is before the ExecStart= line. The Environment= line is best placed as the first line after the [Service] heading. 2. Performing an update on the boinc-client package may need this line re-added after the update. (have not tested this) | |
ID: 50816 | Rating: 0 | rate: / Reply Quote | |
First of all, thank you very much to all kind contributors to bringing this to work, in this and other threads. | |
ID: 50824 | Rating: 0 | rate: / Reply Quote | |
Thank you all! | |
ID: 50869 | Rating: 0 | rate: / Reply Quote | |
Message boards : Wish list : SWAN_SYNC in Linux client