Message boards : Graphics cards (GPUs) : Pascal Settings and Performance
Author | Message |
---|---|
nanoprobe wrote:
Nanoprobe, you are using your card under Linux, which doesn't have WDDM, while JoergF using his card under Windows 7 which has WDDM. | |
ID: 45114 | Rating: 0 | rate: / Reply Quote | |
One at a time ADRIA_1JWP_dist (Win8.1 OS) on GTX 1070 and GTX 1060 (3GB) have higher GPU usage than SDOERR CASP (a3d) or GERALD CXCL12. | |
ID: 45115 | Rating: 0 | rate: / Reply Quote | |
Thanks, | |
ID: 45116 | Rating: 0 | rate: / Reply Quote | |
Thanks, You're welcome! The proof of concept CASP runs have fewer atoms, so GPU utilization is generally lower. It's probably the case that the bigger the GPU the lower the utilization for such work units. This might even be exasperated with the Pascals compared to Maxwell GPU's (or not)... [..]The runs that include atomic contacts take longer because they involve calculating contact interactions too. What I reported in the CASP thread: CASP runtimes (atom and step amount) vary so this just a general reference. I haven't noticed any difference between GPU usage with contacts or having none. Though CPU usage 5% higher at ~15% per WU when the contacts are included. For reference I'm seeing ~81% GPU utilization on a GTX970 on Linux (Ubuntu x64 16.04 LTS) crunching a CASP22SnoS_crystal_ss_contacts task. PCIE Bandwidth Utilization is ~17% (PCIE2 x16). It's a low-spec system but I'm not using the CPU for anything else. When I do the GPU's performance drops off. When I run a cartain mt CPU app on another system (W10) the GPU utilization drops to ~15% I have >10% CPU usage on each Pascal WU. Mostly around 25% average (4C/4T Haswell S series) crunching (2) WU's. When shooting for the most efficient runtimes it help's to have CPU clock speed above 3GHz (Preferably >3.5GHz). Every GTX 1070 host faster than mine have a overclocked 'K' series even though my Pascals are at 2.1GHz (1.5GHz on Maxwell). WDDM performance degradation versus Linux or XP is similar to PCIe width affecting runtimes. PCIe3.0 x4 runtimes can be ~10% slower if PCIe 3.0 x8 not an option. (Maybe the AMD Zen platform will have more than the 16/28/40 CPU PCIe3.0 lanes Intel currently offers.) PCIe 2.0 has an overhead of 20% (8bit/10b line-code encoding) while PCIe3.0 is 128bit/130b. In reality PCIe2.0 has an available bandwidth max of 80%. PCIe3.0 provides 98.4% available bandwidth. Intel (4) bi-directional lanes at 1GB/s per lane DMI link on Haswell and Ivy Bridge is suppose to be faster than AMD's 500MB/s? per lane. Skylake doubled the bandwidth with (4) lanes at 2GB/s per lane. Maybe during MT CPU compute the DMI link became a bottleneck causing dramatic GPU utilization loss? Or PCIe flooded out completely. | |
ID: 45118 | Rating: 0 | rate: / Reply Quote | |
Since the 9.14 app is CUDA 8.0, and there is a couple of CUDA8.0 drivers for Windows XP I've installed my GTX 1080 under Windows XP x64 with the latest XP driver available for GTX 960 (368.81), but the 9.14 app did not work with this setup. It said that the Task blablabla exited with zero status but no 'finished' file
If this happens repeatedly you may need to reset the project. But the task did not run into an error, so these two lines repeated infinitely until I've suspended the task.When I booted this host to Windows 10 the task resumed normally. So now I really have to learn to install a Linux based BOINC host. | |
ID: 45121 | Rating: 0 | rate: / Reply Quote | |
Is there anyone, who is using a GTX 1080 or TITAN X (Pascal) under Linux with swan_sync on? | |
ID: 45128 | Rating: 0 | rate: / Reply Quote | |
WDDM performance degradation versus Linux or XP is similar to PCIe width affecting runtimes. PCIe3.0 x4 runtimes can be ~10% slower if PCIe 3.0 x8 not an option. (Maybe the AMD Zen platform will have more than the 16/28/40 CPU PCIe3.0 lanes Intel currently offers. I admit that I did not work into the matter yet. Is there any way to bypass the WDDM degradation? It is somewhat frustrating to see a 1080 performing worse than a 1070 or 980ti just because of low utilization. Actually I dont get more than 75% load on my 1080. ____________ I would love to see HCF1 protein folding and interaction simulations to help my little boy... someday. | |
ID: 45138 | Rating: 0 | rate: / Reply Quote | |
WDDM performance degradation versus Linux or XP is similar to PCIe width affecting runtimes. PCIe3.0 x4 runtimes can be ~10% slower if PCIe 3.0 x8 not an option. (Maybe the AMD Zen platform will have more than the 16/28/40 CPU PCIe3.0 lanes Intel currently offers. Yes by moving to Linux. There are a few remedies on WDDM OS's that help gain GPU utilization - enable SWAN-SYNC (I don't use this). Have a CPU above 3.5GHz with a GPU + CPU PCIe3.0 x16 connection. (single GPU set-ups seem be faster than a system with 2 or 3 of the same CPU and GPU's.) Or compute 2 tasks at a time with 30 to 50% longer runtime than single at a time. PCIe3.0 x8 is the bare minimum for overclock scaling. GTX 1060 and above with PCIe3.0 x4 will encounter a 4~12% performance drop off from x8 depending on type of WU. | |
ID: 45139 | Rating: 0 | rate: / Reply Quote | |
Thanks... to my mind the config cannot be the bottleneck. I run 2 Tasks with 1 virtual CPU core per task [1CPU/0.5GPU] and utilize a i7-3770S which should be fast enough. But now that you mention it, I still have an old 1155 board in my primary PC that is PCIe2.0 only and therefore the GTX 1080 is linked by PCIe2.0x16 which is equal to PCIe3.0x8 in terms of throughput. Can this be the reason? | |
ID: 45154 | Rating: 0 | rate: / Reply Quote | |
Thanks... to my mind the config cannot be the bottleneck. I run 2 Tasks with 1 virtual CPU core per task [1CPU/0.5GPU] and utilize a i7-3770S which should be fast enough. But now that you mention it, I still have an old 1155 board in my primary PC that is PCIe2.0 only and therefore the GTX 1080 is linked by PCIe2.0x16 which is equal to PCIe3.0x8 in terms of throughput. Can this be the reason? The i7-3770S is PCIE3x16 capable, but I guess there could be some LGA 1055 motherboards that are PCIE2 only. Which motherboard model do you have? If it is PCIE2 that could be the issue or one of the main issues. You could probably get a replacement PCIE3 capable motherboard if that's the case. IF you crunch using your integrated HD Graphics 4000 gPU, that would impact on the GTX1080's performance, as would crunching lots of CPU projects. Basically for optimal performance for GPUGrid (especially for such a high end GPU) you want to be crunching for as few CPU projects as possible. MT apps are a no-no and running apps in a VM can bog the systems down. CPU speed and RAM speed also impact, but while there are faster processors, that CPU isn't bad and it does have a PCIE3 controller on board (but probably just isn't using it). HT off and SWAN_SYNC might help a little too. ____________ FAQ's HOW TO: - Opt out of Beta Tests - Ask for Help | |
ID: 45156 | Rating: 0 | rate: / Reply Quote | |
Thanks ... the mainboard is an ASUS P8P67-M (socket 1155) and definitely PCIe2.0 only. Yes, the CPU does support 3.0 but the board doesn't and so it could be part of the issue, as you wrote. | |
ID: 45158 | Rating: 0 | rate: / Reply Quote | |
If I were you I would swap the 1070 with the 1080 or try to pick up a second hand PCIE3 1055 motherboard for now. | |
ID: 45164 | Rating: 0 | rate: / Reply Quote | |
If I were you I would swap the 1070 with the 1080 or try to pick up a second hand PCIE3 1055 motherboard for now. I have already considered that but sometimes chaning the motherboard will also lead to different SATA contollers and drivers and therefore Windows will no longer boot. Which means, well, it is surely possible but not that easy. Noticed all the 1080's and 1070's are reporting 4GB graphics memory. IIRC it's not an issue. yes, I have noticed that as well. Does this affect the performance in any way? ____________ I would love to see HCF1 protein folding and interaction simulations to help my little boy... someday. | |
ID: 45165 | Rating: 0 | rate: / Reply Quote | |
Noticed all the 1080's and 1070's are reporting 4GB graphics memory. IIRC it's not an issue. I think it's Boinc that reports this and the app reads the details directly from the hardware/system itself. So it wouldn't impact upon performance. Most tasks tend to use less than 1GB of GDDR and the most I can recall is around 1.7GB. ____________ FAQ's HOW TO: - Opt out of Beta Tests - Ask for Help | |
ID: 45167 | Rating: 0 | rate: / Reply Quote | |
A good upgrade time will be when the GTX1080Ti and Zen arrives and are available in sufficient quantities and with competition for prices to be reasonable. That might be in 6 to 9 months time, but possibly more depending on the competition. yes... and not to forget the AMD RX490. If this one performs well, which I hope, it will have positive influence on Nvidia pricing in general. Which means: Prices DOWN ;-) ____________ I would love to see HCF1 protein folding and interaction simulations to help my little boy... someday. | |
ID: 45168 | Rating: 0 | rate: / Reply Quote | |
Pulled my GTX970 from my Ubuntu x64 16.04 LTS system and replaced it with a GTX1060-3GB. | |
ID: 45169 | Rating: 0 | rate: / Reply Quote | |
Comparing two tasks; one which ran on the 970 and the second which ran on the 1060-3GB I can say on my setup that the 1060-3GB is ~ 3% faster than the 970 and uses ~9% more CPU. Great... which means the comparison by SP GFLOPs out from the specification works for this kind of jobs, more or less. What is the average GPU usage of the 1060? ____________ I would love to see HCF1 protein folding and interaction simulations to help my little boy... someday. | |
ID: 45170 | Rating: 0 | rate: / Reply Quote | |
The GPU usage is ~78% - similar but the utilization is spiky and might vary during the run. When I had the 970 in I used NVidia X Server Settings to observe the GPU Utilization. However, I've just observed that keeping the NV X Server Settings window open increases the apparent GPU utilization: | |
ID: 45175 | Rating: 0 | rate: / Reply Quote | |
Pulled my GTX970 from my Ubuntu x64 16.04 LTS system and replaced it with a GTX1060-3GB. The difference is more noticible with the 20ns tasks: e35s7_e32s8p0f82-SDOERR_CASP10_crystal_ss_20ns_ntl9_2-0-1-RND9465_0 : 5,365.27 3,215.99 12,750.00 v9.14 (cuda80) e14s4_e9s6p0f34-SDOERR_CASP22S_crystal_ss_20ns_ntl9_0-0-1-RND4064_0 : 5,946.69 2,911.63 12,750.00 (cuda65) e15s3_e14s5p0f90-SDOERR_CASP22S_crystal_contacts_20ns_ntl9_2-0-1-RND8066_0 : 5,966.27 2,947.58 12,750.00 v8.48 (cuda65) In this case the 1060-3GB is ~10% faster than the GTX970 and the CPU usage is also around 10% greater. As most tasks at GPUGrid tend to be longer, 10% might more accurately reflect the differences between the cards than the short tasks; which spend as much time loading but less time running. So ~10% faster for ~45% less energy ~60% better in terms of performance/Watt. ____________ FAQ's HOW TO: - Opt out of Beta Tests - Ask for Help | |
ID: 45177 | Rating: 0 | rate: / Reply Quote | |
Thanks. You have saved me the trouble. That is a nice improvement for the GTX 1060, but not enough to buy a new card. I will leave my GTX 970 on GPUGrid, and my 1060 on Folding, where it gets as much improvement, if not a little more, due to the Quick Return Bonus. | |
ID: 45178 | Rating: 0 | rate: / Reply Quote | |
In this case the 1060-3GB is ~10% faster than the GTX970 and the CPU usage is also around 10% greater. As most tasks at GPUGrid tend to be longer, 10% might more accurately reflect the differences between the cards than the short tasks; which spend as much time loading but less time running. So ~10% faster for ~45% less energy ~60% better in terms of performance/Watt I think on WDDM system's the 3GB GTX 1060 ~20% faster than GTX 970 - at least this what I've observed on PCIe 3.0 x4 GTX 970 at 1.5GHz and 2.1GHz GTX 1060. The 1152CUDA is a great cruncher for here and GTX 1070 even more so from a purely watt/performance point of view. As you mentioned in another thread: the GTX 1060 (3GB) is hands down cost/performance king. IMO: both the GTX 1060 and GTX 1070 are going to be the most efficient GPUGRID GPU until Pascal refresh or Volta - ACEMD scaling a major factor (maybe someday the app will make GTX 1080 work at 95% on WDDM). My GTX 1070 hasn't risen past 110W (80% GPU usage) while staying mostly under 100W. My (2) GTX 970 would hit 170W on some GERALD's with 86% GPU usage. When I start to crunch on the GPU the system uses ~125W, so the GPU is using ~75W. A true MiniFit.JR connectors 6 pin (not the one that missing a 12V pin like 4 pin molex adapter to 6-pin type.) can provide more than 75W. Check PSU wire gauge to determine it's amperage limit and you'll find out what the (3) 12V PCIe 6 pin wires are capable of. Tomshardware website has detailed power consumption tests showing how each card draws it's power. Some vBIOS software from AIB (Zotac / MSI / Gigabyte / some EVGA) draw mostly all of it's power from PSU - <25W from PCIe slot that controls up to 3 phases though mostly 1 or 2 on GPU board. If you have a laser thermometer or do simple old fashion skin method - check the PCIe capacitors. If the PCIe is providing most of the power (66W) they'll be hot - if barely warm then 6-pin is main provider. My 4+1 phase Gigabyte windforce OC GTX 1060 (3GB) get's majority of power from PSU with a 6-pin at 116% power limit (140W) = Primegrid Genefer program and si software scientist benchmark max's out power. Quoted from xdev.com Pascal OC guide (link in the x80 Pascal thread) 13A per contact (16AWG wire in small connector) to 8.5A/contact (18AWG wire). This means that using common 18AWG cable, 6-pin connector specified for 17A of current (3 contacts for +12V power, 2 contacts for GND return, one contact for detect). 8-pin have 25.5A current specification (3 contacts for +12V power, 3 contacts for GND return and 2 contacts for detection). 6-pin is 204W at +12.0V level or 306W for 8-pin accordingly. | |
ID: 45182 | Rating: 0 | rate: / Reply Quote | |
After completing a 50nm long SCOERR_CASP22SnoS task it's looking more like the GTX1060-3GB can do a Long task in 73% of the time a 970 can (though I'm not certain my settings were identical back on the 30th Oct when using the 970; might have been running a CPU task then). If setup was identical that would make the 1060-3GB 36% faster at long runs, but others would have to demonstrate that too before I'd accept it. I've got a Long PABLO SH2 now and should be able to compare that tomorrow to 3 similar task I ran a few days ago when I definitely had the same setup. Still the same +10% CPU usage. | |
ID: 45184 | Rating: 0 | rate: / Reply Quote | |
The PABLO_SH2TRIPER took 3% longer on the 1060-3GB than it did on a 970, so there is a lot of performance variation. CPU usage was also 11% less when using the 1060: | |
ID: 45189 | Rating: 0 | rate: / Reply Quote | |
e9s8_e8s1p0f0-SDOERR_CASP22SnoS_crystal_contacts_50ns_ntl9_0-0-1-RND0969_0 : 14,398.54 7,695.54 63,750.00 v9.14 (cuda80) -- GTX 1060 3GB @ 2.1GHz / 67% GPU usage / 51% BUS / 74W e10s5_e8s4p0f261-SDOERR_CASP22SnoS_crystal_ss_50ns_ntl9_1-0-1-RND6842_0 15,021.71 6,281.00 63,750.00 (cuda80) -- GTX 1070 @ 2.1GHz / 59% GPU usage / 37% BUS / 78W e5s9_e2s1p0f88-SDOERR_CASP22SnoS_crystal_ss_50ns_ntl9_1-0-1-RND2882_0 12,249.42 6,445.78 63,750.00 (cuda80) You're single GTX 1060 system is 4.21% faster than my GTX 1060 3GB. The higher PCIe bandwidth usage on my system probably due to having 4 GPU's. GTX 1070 PCIe3 x8 19% faster than my GTX 1060 PCIe3 x4. The PABLO_SH2TRIPER took 3% longer on the 1060-3GB than it did on a 970, so there is a lot of performance variation. CPU usage was also 11% less when using the 1060: -- GTX 1070 @ 2.1GHz / 69% GPU / 51% BUS / 96W: e13s5_e5s7p0f442-PABLO_SH2TRIPEP_W_TRI_2-0-1-RND9211_0 11929451 16,843.90 6,162.14 145,800.00 (cuda80) -- GTX 1060 (3GB) @ 2.1GHz / 74% GPU / 60% BUS / 85W: e15s20_e14s21p0f117-PABLO_SH2TRIPEP_S_TRI_1-0-1-RND6936_0 23,441.73 6,269.66 145,800.00 Long runs (cuda80) GTX 1070 PABLO_SH2TRIPEP 28.1% faster than my GTX 1060. Surprisingly I haven't received any unstable simulation messages on overclocked at 2.1GHz completed WU . | |
ID: 45199 | Rating: 0 | rate: / Reply Quote | |
Thanks for posting your performances. | |
ID: 45209 | Rating: 0 | rate: / Reply Quote | |
GTX 1060 6GB: | |
ID: 45215 | Rating: 0 | rate: / Reply Quote | |
I had 88-92% utlization yesterday, now it's only 65%. I changed nothing in the system. It depends (beside the system) on the workunit. Yesterday you had an ADRIA_1JWP_dist, which uses the CPU less than your recent SDOERR_CASP22S20M_crystal_ss_contacts_50ns_ntl9 workunit. Power consumption is 72W average at 65%. It's because you didn't set the SWAN_SYNC environmental value, and without it the GPUGrid app doesn't use a CPU thread that much to make your CPU to boost. | |
ID: 45216 | Rating: 0 | rate: / Reply Quote | |
Thanks, SWAN_SYNC seems to help, now utilization is at 72% even if only GPUGRID is running. | |
ID: 45222 | Rating: 0 | rate: / Reply Quote | |
Would getting a faster CPU help? (i5 6600k/i7 6700k) No, it won't. | |
ID: 45223 | Rating: 0 | rate: / Reply Quote | |
If you want to maximize GPU usage on an operating system wich has WDDM (Windows 7, 8, 8.1, 10) you should: 8 CPU cores + 2 GPU tasks: 100/8*(1+2) =37.5 [38%]
12 CPU cores + 3 GPU tasks: 100/12*(1+3)=33.333 [34%]
4 CPU cores + 2 GPU tasks: 100/4*(1+2) =75 [75%] Theoretically this calculation can result in more than 100%, but in this case you should type 100% (2 CPU cores + 2 GPUs: 100/2*(1+2)=150), and do not crunch CPU projects at all.Another method to set the number of CPUs in the cc_config.xml file: The actual number should be set to the number of GPU tasks + 1. Do not set this number higher than the number of your CPU's threads. For example for 2 GPU tasks you should replace the 2 by 3 in the example below: Copy the following to the clipboard: notepad c:\ProgramData\BOINC\cc_config.xml Press Windows key + R, then paste and press enter.If you see an empty file, copy and paste the following: <cc_config>
<options>
<ncpus>2</ncpus>
</options>
</cc_config> If your cc_config.xml already has an <options> section and there is no <ncpus> tag in it, you should insert the line <ncpus>2</ncpus> right after the <options> tag.Click file -> save and click [save]. If your BOINC manager is running, click on Options -> read config files. How not to crunch on the iGPU (the Intel GPU integrated into recent Intel CPUs): 1, Do not attach to projects with Intel (OpenCL) clients, or disable this application in the project's computing preferences (it is practical to use a different venue for these hosts) 2, Disable the iGPU in the cc_config.xml file: copy the following to the clipboard: notepad c:\ProgramData\BOINC\cc_config.xml Press Windows key + R, then paste and press enter.If you see an empty file, copy and paste the following text: <cc_config>
<options>
<ignore_intel_dev>0</ignore_intel_dev>
</options>
</cc_config> If your cc_config.xml already has an <options> section and there is no <ignore_intel_dev> tag in it, you should insert the line <ignore_intel_dev>0</ignore_intel_dev> right after the <options> tag.Click file -> save and click [save]. If your BOINC manager is running, you can click Options -> read config files. To apply the SWAN_SYNC environmental value: Click Start, copy & paste systempropertiesadvanced and press enter. Click on [Environmental Variables] Look for the lower section called "System Variables", click on the [New] button below the list of System Variables. Type SWAN_SYNC in the name field Type 1 in the Value field Click [OK] 3 times. Exit BOINC manager with stopping scientific applications. Start BOINC manager. To run two GPUGrid tasks on a single GPU: The app_config.xml file should be placed to the project's home directory (by default it's at c:\ProgramData\BOINC\projects\www.gpugrid.net\) Copy the following to the clipboard: notepad c:\ProgramData\BOINC\projects\www.gpugrid.net\app_config.xml Press Windows key + R, then paste and press enter. Copy & paste the following text:<app_config>
<app>
<name>acemdlong</name>
<gpu_versions>
<gpu_usage>0.5</gpu_usage>
<cpu_usage>1.0</cpu_usage>
</gpu_versions>
</app>
<app>
<name>acemdshort</name>
<gpu_versions>
<gpu_usage>0.5</gpu_usage>
<cpu_usage>1.0</cpu_usage>
</gpu_versions>
</app>
</app_config> Click file -> save and click [save]. Exit BOINC manager with stopping scientific applications. Start BOINC manager. (If your BOINC manager is running, you can click Options -> read config files.) | |
ID: 45224 | Rating: 0 | rate: / Reply Quote | |
Thanks, I'll try these solutions! | |
ID: 45225 | Rating: 0 | rate: / Reply Quote | |
Thanks Retvari Zoltan, now my card works with utilisation 96-98% on windows 10 (driver 375.70) with 2 task. | |
ID: 45228 | Rating: 0 | rate: / Reply Quote | |
I've successfully installed Ubuntu 16.04 LTS on one of my hosts. | |
ID: 45236 | Rating: 0 | rate: / Reply Quote | |
Is this environmental value handled by the new (9.14) Linux app? To answer my own question: I think the new (9.14) Linux app doesn't support SWAN_SYNC=1, as I've started BOINC from the terminal by sudo /usr/bin/boinc --dir /var/lib/boinc-client and the CPU usage remained 7-8% (it should be 25%). I've checked previously that the SWAN_SYNC=1 is listed by sudo printenv This feature should be added. | |
ID: 45237 | Rating: 0 | rate: / Reply Quote | |
1fdq-SDOERR_OPMcharmm6-0-1-RND3215_1 longest WU I've encountered to date on GTX 1070. | |
ID: 45269 | Rating: 0 | rate: / Reply Quote | |
1fdq-SDOERR_OPMcharmm6-0-1-RND3215_1 longest WU I've encountered to date on GTX 1070. GPU Clocks? ____________ FAQ's HOW TO: - Opt out of Beta Tests - Ask for Help | |
ID: 45273 | Rating: 0 | rate: / Reply Quote | |
1fdq-SDOERR_OPMcharmm6-0-1-RND3215_1 longest WU I've encountered to date on GTX 1070. 2.1GHz core and 3.8GHz (7.6GHz) memory - 2012MHz out of the box boost. My Pascal throttles in 12.5MHz increments every 8C starting at 32C - I set a +110MHz offset to keep the constant 2.1GHz. | |
ID: 45275 | Rating: 0 | rate: / Reply Quote | |
Anyone knows how to force GPUGRID to work with two different cards : Pascal (cuda 80) and GTX 670 (cuda 65). When I put new card into computer my old card stopped work with GPUGRID. Do you know how to solve this problem? | |
ID: 45322 | Rating: 0 | rate: / Reply Quote | |
Anyone knows how to force GPUGRID to work with two different cards : Pascal (cuda 80) and GTX 670 (cuda 65). When I put new card into computer my old card stopped work with GPUGRID. Do you know how to solve this problem? Basically No: Either the app sorts that out or there are two different queues and you can manipulate your Boinc config files to do what you want. At present the cuda80 app is exclusively for Pascal's and the cuda65 app doesn't work for Pascal's. The cuda80 app has also populated all queues - which is fine for most people's setups. If possible move one of the GPUs to another system. In theory you could have two instances of Boinc with different drive locations and exclude one GPU for each instance, but in practice running two instances of Boinc just doesn't work. ____________ FAQ's HOW TO: - Opt out of Beta Tests - Ask for Help | |
ID: 45323 | Rating: 0 | rate: / Reply Quote | |
Message boards : Graphics cards (GPUs) : Pascal Settings and Performance