Message boards : Graphics cards (GPUs) : Gigabyte GTX 780 Ti OC (Windforce 3x) problems
Author | Message |
---|---|
I have serious problems with my new card, and after trying to make it work with GPUGrid for a whole day - quite frankly - I've run out of ideas. | |
ID: 34426 | Rating: 0 | rate: / Reply Quote | |
ok already done on Point 4 sry ^^ | |
ID: 34429 | Rating: 0 | rate: / Reply Quote | |
I also doubt the card is defective but if it can't run GPUgrid tasks with the numerous tweaks you've tried then something is wrong somewhere. | |
ID: 34430 | Rating: 0 | rate: / Reply Quote | |
I am sorry to hear you have problems with you new card Zoltan. I can't help you, as you don't know the answer yourself then I certainly don't know. The only thing I can think of is that these cards are to powerful but that can't be true? | |
ID: 34435 | Rating: 0 | rate: / Reply Quote | |
I also doubt the card is defective but if it can't run GPUgrid tasks with the numerous tweaks you've tried then something is wrong somewhere. Errrr..... I've Googled terms like "GV-N78TOC-3GD problem", "problem GV-N78TOC-3GD", "fubar GV-N78TOC-3GD", "GV-N78TOC-3GD fail", etc. and haven't found any reports or reviews speaking of problems. That's pioneer's fate. Do any of the diagnostic programs you've run test with CUDA or in CUDA mode? I ask because if you can demonstrate that it consistently fails on CUDA but not OpenGL or alternatively fails OpenGL and CUDA but nothing else then you have something definite that you can present to Gigabyte and/or NVIDIA if you want to motivate them to investigate and perhaps issue a driver update or BIOS update. Primegrid PPS Sieve & GeneFer is CUDA (I'm not sure about its version though). As I further tested with PrimeGrid, the GeneFer CUDA client was stuck once, and anonther task has failed - I'm not sure about why. FurMark is OpenGL. Heaven's Benchmark is DX11 & tessellation is OpenGL4.0. The point is you've more or less eliminated clock and voltage settings, mobo and drivers as the problem. The only other possible things I can think of to experiment with are: The card first failed under WinXPx64, I've switched to Win7x64 and Win8.1x64 only for further testing the card without interrupting the crunching. 2) CUDA vs. OpenGL vs. "game mode" or whatever the correct term is My guess is that the GPU Boost 2.0 make mistakes, or the algorithm in the GPUGrid client which detects when the simulation becomes unstable, or I've got a very tricky error in my card. 3) localized over-heating which I explain below I got that. As you say, it's very unlikely that this is the source of my problems. However, I'll check the heatsink if I can remove it without voiding warranty. If you have good vision you can spot badly curved/warped surfaces easily with what I call "the straight edge and light test" which is a very common test. You probably are familiar with how it works but don't rely on it unless you know your vision is good. A much better way of measuring flatness is to use a dial gauge on a pivot as it will easily show defects as small as .005 inches if used properly. A top quality dial gauge and pivot are expensive but there are less expensive models that are accurate enough for the job we're talking about. Or you can take the GPU and heatsink to a machine shop and pay them to check the flatness. From my experience it's much more common error that one or two corners of the heatsink is not fastened well, so it's touching the chip only on one (or two) edge(s), not on its entire surface. | |
ID: 34450 | Rating: 0 | rate: / Reply Quote | |
I have two questions. I'm still crunching under WinXPx64 on that host. I've installed Win7 only to have DX11 and some other fancy stuff for the graphical tests. 2. I downloaded and installed Kepler BIOS Tweaker as you suggested in the thread I started, but all I see are empty fields and clicking in it does not work, I can not put any values in. What did I do wrong there? The previous (1.25) version can extract / flash the BIOS from the card if nvflash.exe is located in its folder. The latest one (1.26) can't, it can manipulate the firmware image in a file, so you have to extract / flash it manually with nvflash.exe (GPU-Z can extract the BIOS from the card through GUI, but it also uses a built-in copy of nvflash.exe) | |
ID: 34451 | Rating: 0 | rate: / Reply Quote | |
I've flashed the working card's BIOS to the OC card (they have different vendor and PCI subsystem IDs, so I was a little concerned about doing it). The card was working ok, but the GPUGrid client still fails. I'm afraid that I have to sell this beautiful card to a gamer... | |
ID: 34452 | Rating: 0 | rate: / Reply Quote | |
From my experience it's much more common error that one or two corners of the heatsink is not fastened well, so it's touching the chip only on one (or two) edge(s), not on its entire surface. Yep, that happens frequently and bit me once a few years ago on a CPU. The temperatures were a little high but still reasonable. It took me a long time to figure it out. ____________ BOINC <<--- credit whores, pedants, alien hunters | |
ID: 34453 | Rating: 0 | rate: / Reply Quote | |
Zoltan, you have already tried a lot, and as you have tried the GPU on different rigs and with different operating systems, my guess is that the card is a dud, most likely GDDR/some capacitor. Alternatively the GPUGrid app/s simply don't work with it due to some obscure oddity in the app/bespoke card (slim chance). | |
ID: 34456 | Rating: 0 | rate: / Reply Quote | |
Having just completed a new Haswell build, I have learned that a motherboard that is quite stable with its own internal graphics can go bananas once I insert a GTX 660 and try to run BOINC/GPUGrid. As soon as BOINC starts (barely having time to reach the desktop, if that), I get BSODs. That itself is not so unusual, but like you I had already underclocked/overvolted the card sufficiently that it should have worked fine. | |
ID: 34457 | Rating: 0 | rate: / Reply Quote | |
1) a curved heat spreader on the GPU Yesterday I've had the nerves to dismount the heatsink from the failing card. There were 2 things that surprised me: 1. there are only 7 screws fixing the whole heatsink assembly to the card. (on the original one there are 4 bigger and 4 smaller screws only for the GPU) 2. the thermal grease and the surface of the heatsink for the GPU was ok, the thermal pads for the RAM chips also was fine, but the one long thermal pad for the 8 chips of the GPU's power supply was too short, so it was stretched to reach the 8th chip, therefore it became too thin between the 7th and 8th. I've cut some strips from the unnecessary parts of the long side of the thermal pad, and put it to the 8th chip. Unfortunately, this didn't helped. I couldn't disassemble the card since then, but I'll do it on 27th when I get home. | |
ID: 34461 | Rating: 0 | rate: / Reply Quote | |
Zoltan, you have already tried a lot, and as you have tried the GPU on different rigs and with different operating systems, my guess is that the card is a dud, most likely GDDR/some capacitor. Alternatively the GPUGrid app/s simply don't work with it due to some obscure oddity in the app/bespoke card (slim chance). Yesterday I've checked both sides of the card (I did check the backside before). There is a lot of components on the front side of the board, so it's impossible to check every soldering (not to mention the BGA packaging of the GPU and the RAM chips, which covers the actual soldering). But I didn't notice any sloppy soldering, loose or missing components (however, there are unused component spaces on the board, but it's normal). The PCIe power connectors (2x8 pin) are a little different on this card than usual: the latch(?) (I don't know how we call it even in my native language) in which the clamp of the PCIe connector clicks in is much shallower than usual (about 1/5th of the normal), so it's much easier to remove the PCIe power connectors from the card. The most interesting part is that the card is running PrimeGrid tasks just fine. (I know that they're not comparable to GPUGrid tasks) I could make the card consume more power with FurMark than when crunching GPUGrid tasks, and it was running for about an hour without errors. I let Heaven Benchmark run for a whole night, and it didn't produce any artifacts. Some loose/desperate suggestions (possibly already covered): I didn't try short runs before you asked, but the short run is also failed. What was installed with the NVidia drivers; all the 3D and sound crap or just the drivers? On the WinXPx64 host only the graphics drivers installed, on the same hardware I've installed everything under Win7, and on the other MB with Win8.1 I've also installed everything. Have you tried lowering the power target (MSI Afterburner...)? Yep. Lowering, increasing. What about System Power settings? Never go to sleep, never turn off monitor (never give up, never surrender :)) NVidia control panel settings? Prefer max performance, That's why I've installed everything under Win7. For a moment I thought it helped, but after 10 secs the WU crashed. PhysX pointing where - specific GPU or CPU (not sure that even matters though)? It points to the GPU, but the WinXPx64 doesn't have PhysX, so that's irrelevant I guess. Have you tried to drop the GPU memory to 3000MHz? I've tried 3400MHz and 3300MHz. I just did something new: flashed the BIOS of the Graphic card in a RealVNC session :). It's now down to 3000MHz. Motherboard Bios upgrade (long shot). Both MB have the latest BIOS installed (before it all began), GA-Z87X-OC: F6, DH87RL: 0323 CPU drivers from Intel (might be messing with the bus)? Do such drivers exist? Could you please give me a link? Chipset update? The latest chipset drivers are installed (9.4.0.1027) Different versions of Boinc (perhaps even completely uninstalling Boinc and then reinstalling. I don't like the series 7 of the BOINC manager, but after the first couple of failures I've upgraded to 7.2.33 from 6.10.60. The only difference I've noticed, that now I can see such status messages as "Trying to restart unstable simulation" instead of "waiting for GPU memory" (the latter made me to upgrade to the latest BOINC manager). But it didn't help. Tried Linux? I'm kind of a Windows guy. :) I even hate power shell (namely the concept that there is a lot of things you can't do through GUI). I don't know Linux. I think it's not a good idea trying something unknown to fix a tricky error. Besides, I don't believe that the source is the OS, because another GTX 780 Ti is working fine on my system. | |
ID: 34465 | Rating: 0 | rate: / Reply Quote | |
Have you tried to drop the GPU memory to 3000MHz? Wow! A short run is finished at 3000MHz memory clock. There was two restarts, so I'm lowering the RAM frequency to 2900MHz, and trying a long run. | |
ID: 34467 | Rating: 0 | rate: / Reply Quote | |
I see you have had 2 more successful runs and one failure, | |
ID: 34476 | Rating: 0 | rate: / Reply Quote | |
I see you have had 2 more successful runs and one failure, Yep, the error is still there, despite that now the RAM runs at 2800MHz. Sometimes lowering the frequency makes things worse, and I think that lowering even more the RAM frequency won't solve completely the problem of my card. Despite the memory drop that's still around 28% faster than my GTX770. This card runs under Win8.1, but I'll put it in my WinXPx64 host (if I can fix it), and it will be even more faster :). For each task, the logs show the GPU temps going up to 64C and then the card sometimes stops working for a while. This suggests to me that there is something not right with the cooling. While the GPU is fine, I suspect the GDDR5 is not (or something related to the GDDR). Perhaps a bad module that might not get used by other types of work. I'll dismount the cooler assembly once again when I get home, and check the thermal pads again, but I think this is either a memory power line failure or a RAM chip failure. I guess that some of the capacitors have insufficient capacity, or sloppily soldered (or missing). First I'll check it with my naked eye (through my reading glasses), but if I don't find something suspicious I'll take a couple of macro photos from different parts of the card, and check the photos. Now that I know what part of the card malfunctions, it's not a mysterious error anymore, and I have ideas about finding and fixing the card. If I can't fix it, I still can RMA the card, as now I'm confident that this card is bad, also I can prove it to the RMA guys. | |
ID: 34477 | Rating: 0 | rate: / Reply Quote | |
I see you have had 2 more successful runs and one failure, Well, fortunately I wasn't right about that: I've put this card to my WinXPx64 host's PCIe 2.0 x4 slot, and it had a couple of errors, so I've lowered the RAM frequency to 2700MHz, and now it's running smoothly. There is a 2000 sec loss compared to the (standard and oc-ed) card in the PCIe 3.0 x16 slot. I'll try the OC card in the PCIe 3.0 x8 slot (which will make the standard card to run at x8 also) to see how much loss is caused by the lowered RAM frequency. | |
ID: 34514 | Rating: 0 | rate: / Reply Quote | |
I just got my new 780ti OC windforce 3x and have the same problems. My other card is fine but this card fails almost instantly with computation error. I tried on both Ubuntu (319.76 and 331.20 drivers) and Windows with no joy so far.. I haven't tried lowering the clocks yet I am still troubleshooting it :( | |
ID: 34553 | Rating: 0 | rate: / Reply Quote | |
Yes, this card is working fine since I've lowered its memory clock to 2700MHz. | |
ID: 34558 | Rating: 0 | rate: / Reply Quote | |
Yes, this card is working fine since I've lowered its memory clock to 2700MHz. Yes it would be great if someone with another brand has the OC version and see how that goes. I had the plan to buy a EVGA 780Ti OC, but as my "normal" 780Ti heavily under performs yours with Win7. I decided to wait for the Maxwell. But will try Linux first in the coming days. ____________ Greetings from TJ | |
ID: 34562 | Rating: 0 | rate: / Reply Quote | |
You've answered my unasked question: is this problem by design, or just my card is faulty? From your perspective it is the design of the card and I agree with your perspective. If you ask the manufacturer and present all the evidence you have uncovered, their response might be like "It's a problem with the application, that card is for gaming applications where a few errors won't be noticed. It's not for data crunching that requires high precision and reliability." Do you plan to RMA it? IIUC, you have had to downclock the memory below the frequency used on the standard model (not OC), yes? ____________ BOINC <<--- credit whores, pedants, alien hunters | |
ID: 34565 | Rating: 0 | rate: / Reply Quote | |
From your perspective it is the design of the card and I agree with your perspective. If you ask the manufacturer and present all the evidence you have uncovered, their response might be like "It's a problem with the application, that card is for gaming applications where a few errors won't be noticed. It's not for data crunching that requires high precision and reliability." I'm aware of (and accept) the manufacturer's perspective. However, none of my previous OC cards showed such flaw, including theirs. It's very strange, that the memory clock is the one which had to be reduced to the 77% of its original frequency to fix this problem. However it is much harder to make the RMA guys accept this error condition at the shop I've bought this card, since I'm sure that they are testing graphics cards only with games, and 3D accelerator tests (which show no problem at all). Do you plan to RMA it? No, as it is working now, and probably the replacement card would have the same flaw. It is a better option to sell this card to a gamer, and buy a different OC card (from a different manufacturer, or the new version of this card). IIUC, you have had to downclock the memory below the frequency used on the standard model (not OC), yes? Yes, since the OC and the non-OC card originally have the same memory frequency (3500MHz). | |
ID: 34569 | Rating: 0 | rate: / Reply Quote | |
hello Retvari Zoltan* thanks for the problem solution. Can run mine at 3100MHz | |
ID: 34805 | Rating: 0 | rate: / Reply Quote | |
hello Retvari Zoltan* thanks for the problem solution. You're welcome! The thanks goes to skgiven as well. Can run mine at 3100MHz You've got more luck with your card than me with mine, but it could be because you run it under Win8.1. If you'd run it under WinXP, probably you should reduce a little further the memory clock frequency. If I'll have some time and guts then I'll try to change the power buffering capacitors around the RAM chips for bigger capacity on my card. | |
ID: 34810 | Rating: 0 | rate: / Reply Quote | |
I finally got around to flashing my card down to 2700 memory and now it works fine under linux. its a little slower than yours (about 1500 seconds) but I am also running worldcommunitygrid on every other thread and the ambient is about 25c here recently. | |
ID: 34811 | Rating: 0 | rate: / Reply Quote | |
When you had you GPU stripped did you notice what type of GDDR5 your Gigabyte Windforce 3X card was using? | |
ID: 34830 | Rating: 0 | rate: / Reply Quote | |
When you had you GPU stripped did you notice what type of GDDR5 your Gigabyte Windforce 3X card was using? It's using 12 pieces of Hynix H5GQ2H24AFA R2C. There are 6 groups of 2 (adjacent) RAM chips. The groups have a FET along with 5 resistors and 2 capacitors between the RAM chips they belong to. One of the capacitors is bigger. I suspect that either this bigger capacitor is not big enough for 2 RAM chips, or the capacitors around the whole memory array are not big enough for the array. It would be nice to have the electrical scheme of this board, or at least some recommended circuit diagram from Hynix. | |
ID: 34833 | Rating: 0 | rate: / Reply Quote | |
Have you checked Gigabyte's website for an errata sheet and/or correction sheet dealing with this issue? I suspect many hundreds of their customers are having the same issue for exactly the same reason (wrong component or failed component such as a cap or resistor) and I would think by now Gigabyte is aware of the problem. If not then the more people who report it and the workaround (downclocking the memory) the sooner they will become aware they have a big problem and a potential big blow to their reputation and move on the issue. | |
ID: 34834 | Rating: 0 | rate: / Reply Quote | |
H5GQ2H24AFA R2C or, | |
ID: 34849 | Rating: 0 | rate: / Reply Quote | |
How is possible to have crunching time like you Zoltan (about 16.000 sec.) on the last WU? I have 4 GPU like you, GTX780ti, and my crunching time is 24.000 sec. Why? | |
ID: 34909 | Rating: 0 | rate: / Reply Quote | |
Your GTX780Ti temperatures look very cool - too cool. If you are not using water cooling, your GPU's may be downclocking. | |
ID: 34911 | Rating: 0 | rate: / Reply Quote | |
How is possible to have crunching time like you Zoltan (about 16.000 sec.) on the last WU? I have 4 GPU like you, GTX780ti, and my crunching time is 24.000 sec. Why? You have the same I have. To get Zoltan's time you need XP or Linux. There is a thread about it: http://www.gpugrid.net/forum_thread.php?id=3580 ____________ Greetings from TJ | |
ID: 34916 | Rating: 0 | rate: / Reply Quote | |
48 CPU threads vs 8threads - less resource conflict, HT doesn't scale really well for some CPU/GPU project combinations. I gasped when I saw 48 too but it turns out his CPU has 12 real cores (24 virtual) and I suspect he might have 2 CPUs. Also, that's a socket 2011 CPU with 40 PCIe lanes so I doubt there is congestion on the PCIe bus unless his mobo is lane restricted. BTW, I saw ads for that CPU... $2,500 US!!! Gattorantolo... get with the penguin :-) ____________ BOINC <<--- credit whores, pedants, alien hunters | |
ID: 34918 | Rating: 0 | rate: / Reply Quote | |
48 CPU threads vs 8threads - less resource conflict, HT doesn't scale really well for some CPU/GPU project combinations. The memory bandwidth could limit the performance of the GPU tasks, as this Xeon E5-2695v2 and E5-2697v2 processors are basically two i7-4960X processors within a single package: They have lowered clock frequency (for staying within 130W TDP), some (safety) features turned on, but they have only the same 4-channel memory interface as the i7-4960X capable of 59.7GB/s. So the two CPU chips inside the physical CPU sharing its memory interface (and its bandwidth). If the two physical processors share this physical memory interface, this could reduce this bandwidth further. But I think that the two physical processors have separated physical memory, yet they can access each other's physical memory through QPI or somehow, but this method can also reduce the throughput of the memory interface. If CPU tasks running on all cores (virtual+real) of both physical CPUs, this impact could be big enough to reduce the performance of the GPU tasks by 10-20%. These hosts have Windows 8(.1) on them, which is not a server OS, so I'm sure it's not aware of aligning the memory allocation of an application to the CPU's physical memory it's running on. Even the application can be switched over to the other physical CPU, which is a time consuming process, and will force the CPU to handle the application's data transfer through (and with the help of) the other physical CPU. I think that only the Datacenter versions of the MS server OSes can handle this complex task. I don't know Linux so there maybe such edition of that OS also. | |
ID: 34923 | Rating: 0 | rate: / Reply Quote | |
Your GTX780Ti temperatures look very cool - too cool. If you are not using water cooling, your GPU's may be downclocking. Water cooling of course :-) Thank you for your help GPUGRID cruncher ;-), now i know the "Problem"! ____________ Member of Boinc Italy. | |
ID: 34926 | Rating: 0 | rate: / Reply Quote | |
Gattorantolo, what are your Boinc processor usage settings, GPU clocks, and the RAM frequency? | |
ID: 34927 | Rating: 0 | rate: / Reply Quote | |
As an aside, Linux scales VERY well, which is why it's used in data centers, including Microsoft's. Oh quit pulling my leg. ____________ BOINC <<--- credit whores, pedants, alien hunters | |
ID: 34929 | Rating: 0 | rate: / Reply Quote | |
So I was able to complete this one task-gluilex2x33-NOELIA_DIPEPT1-0-2-RND9057_2 | |
ID: 34932 | Rating: 0 | rate: / Reply Quote | |
The successful task was a NOELIA_DIPEPT Work Unit. Typically these WU's utilize the GPU to a lesser extent. The task is quite different than the SANTI_MAR tasks. | |
ID: 34944 | Rating: 0 | rate: / Reply Quote | |
Just for information, I went to the Gigabyte site and they are now selling a card called GV-N78TOC-3GD (Rev. 1.0). Is it the same as the one you described having problems? Other question, does this problem occur also with the similar GV-N78TGHZ-3GD? | |
ID: 35031 | Rating: 0 | rate: / Reply Quote | |
GV-N78TOC-3GD (Rev. 1.0) is the same card as I got in Thailand which has the problems. It works ok now with the memory downclocked but runs at the same speed as my 780 oc'd which is disappointing (which is also the same speed as my titan which is sitting unused :/) | |
ID: 35032 | Rating: 0 | rate: / Reply Quote | |
GV-N78TOC-3GD (Rev. 1.0) is the same card as I got in Thailand which has the problems. It works ok now with the memory downclocked but runs at the same speed as my 780 oc'd which is disappointing (which is also the same speed as my titan which is sitting unused :/) Oh dear, such a waste. Just send me that unused Titan, and I'll put it in one of my hosts. :) | |
ID: 35033 | Rating: 0 | rate: / Reply Quote | |
Just for information, I went to the Gigabyte site and they are now selling a card called GV-N78TOC-3GD (Rev. 1.0). Is it the same as the one you described having problems? Yes, it's the same, my card is rev 1.0 Other question, does this problem occur also with the similar GV-N78TGHZ-3GD? Good question, I hope someone will answer that, as I don't plan to buy one just to find out. | |
ID: 35034 | Rating: 0 | rate: / Reply Quote | |
I still think their memory voltages are wrong. | |
ID: 35035 | Rating: 0 | rate: / Reply Quote | |
Gigabyte has released a new frimware (ver F3) for this card in april. | |
ID: 37058 | Rating: 0 | rate: / Reply Quote | |
No luck at 3500MHz - The task I've referred in my previous post has failed after 5168 sec. | |
ID: 37061 | Rating: 0 | rate: / Reply Quote | |
Thanks for the update! Oh well :/ | |
ID: 37062 | Rating: 0 | rate: / Reply Quote | |
Forgive me, as I'm late to this thread, but... have you tried setting the GPU fans manually, via Precision-X or MSI Afterburner, to the maximum fan % allowed for that GPU, just to see if keeping the GPU cooler will have an effect? Set it to maximum for 2 days, to test, maybe? | |
ID: 37063 | Rating: 0 | rate: / Reply Quote | |
I sympathize with you Retvari, having problems with a new, shiny piece of kit is a frustrating experience... I'm in a similar situation with you, having a 750Ti acting in a psychotic manner. | |
ID: 37064 | Rating: 0 | rate: / Reply Quote | |
Forgive me, as I'm late to this thread, but... have you tried setting the GPU fans manually, via Precision-X or MSI Afterburner, to the maximum fan % allowed for that GPU, just to see if keeping the GPU cooler will have an effect? Set it to maximum for 2 days, to test, maybe? I've set a manual fan 'curve' in MSI Afterburner before I got this card: 20°C:40% -> 80°C:100% I've tried every trick in the book on this card, none of them helped except reducing the RAM clock to 2700MHz. This card rarely goes above 70°C: This workunit was processed at 33°C ambient temperature, and GPU 1 max temp was 70°C. | |
ID: 37065 | Rating: 0 | rate: / Reply Quote | |
There was a successful I770-SANTI_p53final at 3300MHz, but a e2s948_e1s373f83-SANTI_marsalWTbound2, and a 2x118-NOELIA_TRPS1S4 has failed. | |
ID: 37066 | Rating: 0 | rate: / Reply Quote | |
My Gigabyte GTX 780Ti OC is crunching fine (i.e. without "Simulation unstable" messages) at 3100MHz RAM clock for more than 1 day now. | |
ID: 37071 | Rating: 0 | rate: / Reply Quote | |
Your 3.1GHz ties in well with what I thought the situation might be, Assuming H5GQ2H24AFR R2C, these require 1.6V to support 3.5GHz. My solution would be to stick with it at 3.1GHz, if it proves to be stable, or sell the card and get an equivalent second hand card that does run at 3.5GHz. 288larsson who posted in this thread also has a Gigabyte GTX 780 Ti OC (Windforce 3x) GPU. Alas I don't know how to change the GDDR5 voltage. ____________ FAQ's HOW TO: - Opt out of Beta Tests - Ask for Help | |
ID: 37120 | Rating: 0 | rate: / Reply Quote | |
I wrote to Gigabyte support, they replied that I should try my card the way I did back in December. | |
ID: 37125 | Rating: 0 | rate: / Reply Quote | |
H5GQ2H24AFA R2C or, You were right: my card is built on 8 pieces of H5GQ2H24AFR R2C. The only excuse for my mistake is that the font they use makes it very hard to tell apart the "A" from the "R", especially when the oil from the thermal pads is covering the chip's package. I've got two more failures (NOELIA_THROMBIN units) on this card, so now my card is down at 3.0GHz. I've found a capacitor around the RAM chips (on the other side of the PCB), on which the voltage is measured as 1.58~1.633 volts. I think I should check it using an oscilloscope, to find out if the capacitor is undersized. But first I have to know the right spot. I didn't find any info on this board's wiring. | |
ID: 37138 | Rating: 0 | rate: / Reply Quote | |
There's a new BIOS version (F4) for this card on Gigabyte's website. | |
ID: 37967 | Rating: 0 | rate: / Reply Quote | |
The task running on this card got a couple of "Simulation became unstable" messages in the stderr.txt file, so I took down the GDDR5 clock by another 100MHz (now it runs at 3.3GHz). | |
ID: 37968 | Rating: 0 | rate: / Reply Quote | |
There were "Simulation became unstable" messages, and a failed WU at 3.3GHz. | |
ID: 37971 | Rating: 0 | rate: / Reply Quote | |
Have you considered running Heaven, to determine how far you may need to downclock? If you can get Heaven to run at max settings overnight with no issues, then I'd consider it stable. | |
ID: 37973 | Rating: 0 | rate: / Reply Quote | |
Have you considered running Heaven, to determine how far you may need to downclock? If you can get Heaven to run at max settings overnight with no issues, then I'd consider it stable. When I first tested the card, I did. The only application failed is GPUGrid's. See the first post of this thread. BTW the card seems to be stable at 3.2GHz, but different workunit batches could use different parts of the GPU. I suspect that something messed up with the GDDR5 voltage, or the PSU of the memory subsystem on this card series. | |
ID: 37977 | Rating: 0 | rate: / Reply Quote | |
I apologize - although I did read most of the thread, I did miss the part where you said you tested with Heaven. | |
ID: 37980 | Rating: 0 | rate: / Reply Quote | |
How can one read these dmp files Jacob? | |
ID: 37987 | Rating: 0 | rate: / Reply Quote | |
I don't know. I think, if you have the Windows SDK and development tools installed, and have Windows symbols available, you might be able to step through them. But that's all beyond my ability. | |
ID: 37988 | Rating: 0 | rate: / Reply Quote | |
How can one read these dmp files Jacob? Having Visual 2013 helps with reading certain files, but most can be read with notepad, if text is involved. (watchdog are mostly text files) If you have a .dmp file in your C:\Windows\LiveKernelReports\WATCHDOG directory You can read these files with notepad. Run as admin, you'll see a prompt "user doesn't have access" if in non-admin mode. If you have game that hard on a GPU (BF4, Metro2033/Last Light) if you don't have any games on you're disk- Heaven is great tool to stress, Or 3Dmark Vantage benchmark has looping for TMU, ROP, Memory test that strain cards to limits. The extreme Firestrike benchmark loops, and will fail an card overclocked. If have you Nvidia Cuda samples: the n-body test can be looped, a card will also fail the random number samples, if over clocked too high. This how I Found my cards best temps and voltage. With a custom bios and Nvidia Inspector Bat files, as Jacob has shown for setting "Max boost", works wonders once know cards limit for core/memory speeds and voltage. New Gm204 can run at 1.000V with a 1.2 GHz speed. 1.025 voltage also. Overclocking records past 2GhZ (GM204 card is first ever to break 2Ghz) with L2N. Many 1.5 GHz speeds have be reached with air cooling and stock voltage. GM204 is truly an engineering feat. The amount features added along with new filtering tech really raises Molecular Dynamics function for single precision. | |
ID: 38007 | Rating: 0 | rate: / Reply Quote | |
Have you considered running Heaven, to determine how far you may need to downclock? If you can get Heaven to run at max settings overnight with no issues, then I'd consider it stable. I see this card runs ~80C. Do you know Voltage control temps on card? VRM runs over 100C on some GTX780ti cards.(rated for 110C for you're card.) Gigabyte been worse offender, from viewing 780ti owner boards. eVGA and Asus 780ti's VRM is rated at 120-125c. Do you know temps for DDR memory? These temps run really hot on certain GTX780ti's New Zotac's GTX970/980 along eVGA's 900 series have highest rated core/boost speeds. | |
ID: 38008 | Rating: 0 | rate: / Reply Quote | |
Thanks for your help eXaPower, but I have tried notepad, wordpad but no normal reading is possible. | |
ID: 38010 | Rating: 0 | rate: / Reply Quote | |
Thanks for your help eXaPower, but I have tried notepad, wordpad but no normal reading is possible. I neglected to mention Win8.1 notepad will open these type files- only if Visual been installed prior, but license expired or is current. During a period you're host can be tweaked- Try fiddling with some windows System32 program list to see if one will allow it opening it or...... You can gain access to newest Visual version (To create/test custom made programs/prior or custom files) with new CUDA 6.5.19 toolkit, if want full visual VC++ redis, Microsoft has developer account (you block all info being sent to them- Just do a custom install.) Trial period for 60-90 days. Once CUDA toolkit/Visual are linked together a world learning DIY programs is opened. Nvidia has a debugging program with they're Registered Developer program. Intel has a great AVX/FMA3 DIY programming tool. AMD is a HSA member. Linux is intertwined with NVidia HSA. A lot of options are available. Freedom of choice. | |
ID: 38014 | Rating: 0 | rate: / Reply Quote | |
I see this card runs ~80C. No. This card (GPU1) runs at 65-70°C. The other card (GPU0) - which is a standard NVidia design - runs fine on 3.5GHz at 80°C. Do you know Voltage control temps on card? I don't know voltage control temps, but I think it should be lower than the other card's, as this card has more phase on that VR, and this card has better cooling. Do you know temps for DDR memory? These temps run really hot on certain GTX780ti's I don't know that either, but the same reasoning applies to the RAM chips as for the VRM chips. | |
ID: 38015 | Rating: 0 | rate: / Reply Quote | |
I see this card runs ~80C. If time permits - thin gauge wires with correct metal probes to attach? (Do you have tools for you're Gigabyte Ti?) , you can manually read temp with proper equipment. (Or if you already have kit for electrical/ or temp readouts.) | |
ID: 38021 | Rating: 0 | rate: / Reply Quote | |
How can one read these dmp files Jacob? Hi TJ you can read those .dmp files with bluescreenviewer http://www.nirsoft.net/utils/blue_screen_view.html#DownloadLinks Just download the app, unzip it, and run it from the resulting file called BlueScreenView.exe, then go to the options menu & click on "advanced Options", then click the radio button that says "load a single minidump file" then just direct it to the folder that was mentioned. C:\Windows\LiveKernelReports\WATCHDOG, & pick the .dmp file you want. I hope the results give you what your looking for. | |
ID: 38063 | Rating: 0 | rate: / Reply Quote | |
I had another failed workunit on this card, so I took another 100MHz off, it's now running at 3.1GHz GDDR5 clock. | |
ID: 38072 | Rating: 0 | rate: / Reply Quote | |
Thanks JugNut I will try it over the weekend. | |
ID: 38078 | Rating: 0 | rate: / Reply Quote | |
Hello, | |
ID: 38105 | Rating: 0 | rate: / Reply Quote | |
Your 3.1GHz ties in well with what I thought the situation might be, Hello ! How comes these 780Ti are doing OK on all other projects but GPUGRID ? If it was a hardware issue, one should have problems on all projects I guess ? Thank You | |
ID: 38107 | Rating: 0 | rate: / Reply Quote | |
Hello ! Hello Philippe, It's because the GPUGrid app is the most advanced one. It's compiled with the latest CUDA version, so it can utilize the card like no other project's app can. The "GPU usage" measurement is misleading. Could you please specify all details of your GTX780Ti (Manufacturer, model, clocks), and your PSU (Manufacturer, model, wattage, efficiency)? | |
ID: 38114 | Rating: 0 | rate: / Reply Quote | |
Hello Zoltan, | |
ID: 38119 | Rating: 0 | rate: / Reply Quote | |
All WU's crashing / errors => | |
ID: 38120 | Rating: 0 | rate: / Reply Quote | |
I see this card runs ~80C. In this review of your card (right one?), they measured temps using thermal imaging and found the VRM is running quite hot (87C @ load) http://www.guru3d.com/articles_pages/gigabyte_geforce_gtx_780_ti_windforce_3x_review,9.html | |
ID: 38127 | Rating: 0 | rate: / Reply Quote | |
Hello !
They also run OK on PPS Sieve (PrimeGrid) ... Will probably build a 100 % WC crunchbox, but not with the 780Ti, but will wait until the 980 are accepted by GPUGRID. In the meantime, any idea what I can do in order to be able to crunch on GPUGRID ? I can use EVGA Precision X in order to decrease power or temp ... Thank You Philippe | |
ID: 38129 | Rating: 0 | rate: / Reply Quote | |
Interesting, I remember having to cool the back of a GPU to keep it stable (might have been a ref GTX660 or 650TiBoost). I just used a case fan angled up at the bottom of the card. | |
ID: 38130 | Rating: 0 | rate: / Reply Quote | |
The GPU temp is monitored using EVGA Precision X, and the fan speed (in %) = Temp + 10 => the 3 "Windforce" fans are running already very fast + the 3 external fans are helping with heat dispersal ... | |
ID: 38131 | Rating: 0 | rate: / Reply Quote | |
My point is that the GPU fans cool the top of the card, but not the back. | |
ID: 38132 | Rating: 0 | rate: / Reply Quote | |
Thank you for your message ! | |
ID: 38133 | Rating: 0 | rate: / Reply Quote | |
Would GPU WaterCooling be the ideal solution ? | |
ID: 38143 | Rating: 0 | rate: / Reply Quote | |
EDIT : It seems that today not only 780Ti have problems :/ | |
ID: 38161 | Rating: 0 | rate: / Reply Quote | |
I do not think that water cooling is the answer as all tasks are failing on your card. | |
ID: 38174 | Rating: 0 | rate: / Reply Quote | |
Thank you for your message. | |
ID: 38244 | Rating: 0 | rate: / Reply Quote | |
Message boards : Graphics cards (GPUs) : Gigabyte GTX 780 Ti OC (Windforce 3x) problems