Author |
Message |
GDFVolunteer moderator Project administrator Project developer Project tester Volunteer developer Volunteer tester Project scientist Send message
Joined: 14 Mar 07 Posts: 1957 Credit: 629,356 RAC: 0 Level
Scientific publications
|
http://en.wikipedia.org/wiki/Comparison_of_Nvidia_graphics_processing_units#GeForce_600_Series
I don't think that the table is correct. Flops are too high.
gdf |
|
|
skgivenVolunteer moderator Volunteer tester
Send message
Joined: 23 Apr 09 Posts: 3968 Credit: 1,995,359,260 RAC: 0 Level
Scientific publications
|
Far too high; unless the 256 cuda cores of a GTX650 for $179 really will outperform the 1024 cuda cores of a one year old GTX590 ($669). No chance; that would kill their existing market, and you know how NVidia likes to use the letter S.
My calculated guess is that a GTX680 will have a GFlops peak of around 3000 to 3200 - just over twice that of a GTX580, assuming most of the rest of the info is reasonably accurate.
When it comes to crunching, a doubling of the 500 generation performance would be a reasonable expectation, but 4.8 times seems too high.
I don't see how XDR2 would in itself double performance, and I doubt that architectural enhancements will squeeze out massive performance gains given that it's dropped in size from 520 to 334mm.sq.; transistor count will apparently remain the same.
Perhaps for some enhanced application that fully uses the performance of XDR2 you might see such silly numbers, but for crunching I wouldn't expect anything more than a 2.0 to 2.5 times increase in performance (generation on generation).
____________
FAQ's
HOW TO:
- Opt out of Beta Tests
- Ask for Help |
|
|
|
Wow, that looks totally stupid! Looks like a boy's christmas wish list. Well, before every new GPU generation you'll find a rumor for practically every posible (and impossible) configuration floating around..
- XDR2 memory: seems like rumor guys have fallen in love with it. It's not on GCN, though. And if I were nVidia I'd consider it too risky to transition the entire new product lineup at once. There'd need to be serious production capacity ready by now.
- Traditionally nVidia rather goes with wider memory busses than higher clocks, which matches well with their huge chips. I don't see any reason for this to change.
- The core clocks are much higher than even on AMDs HD7970 (925 MHz on pre-release slides). Traditionally nVidias core clocks have been lower and I see no reason why this should change now.
- The shader clocks are totally through the roof. They hit 2.1 GHz on heavily OC'ed G92s, but the stock clocks have been hoovering around 1.5 GHz for a long time. Going anywhere higher hurts power efficiency.
- They introduced a fixed factor of 2 between base and shader clock with Fermi. Why, if they'd change it again with Kepler? I'd expect this to stay, for some time at least.
- 3.0 billion transistors for the flag ship would actually be lower than GF100 and GF110 at ~3.2 billion. At the same time the shader count is said to increase to 640. And the shaders support more advanced features (i.e. must become bigger). Unless Fermi was a totally inefficient chip (I'm talking about the design, not the GF100 chip!), I don't expect this to be possible.
- Just 190 W TDP for their flagship? They've designed power constrained monster chips since some time. If these specs were true, rest assured that Kepler would have gotten a lot more shaders.
- The proposed die size of 334 mm² actually looks reasonable for a 3.0 billion transsitor chip at 28 nm.
- The astronomic FLOPS are a direct result of the insane clocks speeds. Not going to happen.
Overall the proposed data looks more like a traditional ATI "mean & lean" design than a nVidia design.
They may be able to push clocks speeds way higher if they used even more hand crafted logic rather than synthesized one (like in a CPU). Count me in for a pleasant surprise if they actually pulled that off (it requires tons and megatons of work).
MrS
____________
Scanning for our furry friends since Jan 2002 |
|
|
|
Rumors say there's a too large bug in the Kepler A1 stepping in the PCIe 3 part, which means introduction will have to wait another stepping -> maybe April.
MrS
____________
Scanning for our furry friends since Jan 2002 |
|
|
|
Now there are rumours that they will be realsed on this month.
tomshardware "Rumor: Nvidia Prepping to Launch Kepler in February" |
|
|
|
I don't expect anything from nVidia until April.
In any case, I hope AMD and nVidia continue to compete vigorously. |
|
|
MarkJ Volunteer moderator Volunteer tester Send message
Joined: 24 Dec 08 Posts: 738 Credit: 200,909,904 RAC: 0 Level
Scientific publications
|
Anther article with a table of cards/chip types:
here
Its a bit blurry. I can't claim credit for finding this one it was posted by one of the guys over at Seti. Interesting spec sheets though.
____________
BOINC blog |
|
|
|
That's the same what's being posted here. Looks credible to me fore sure. Soft evolution of the current design, no more CC 2.1 style super sacalar shaders (all 32 shaders per SM). Even the expected performance compared to AMD fits.
However, in the comments people seems very sure that there's no "hot shader clock" in Kepler. That's strange and would represent a decisive redesign. I'd go as far to say: nVidia needs the "2 x performance per shader" from the hot clock. If they removed this they'd either have to increase the whole chip clock (unlikely) or perform a serious redesign of the shaders, make them more power efficient (easy at lower clocks) and either greatly improve their performance (not easy) or make them much smaller (this was not being done here, according to these specs).
So overall.. let's wait for April then :D
MrS
____________
Scanning for our furry friends since Jan 2002 |
|
|
Zydor Send message
Joined: 8 Feb 09 Posts: 252 Credit: 1,309,451 RAC: 0 Level
Scientific publications
|
Charlie's always good for a read on this stuff - he seems to have mellowed in his old age just lately :)
GK104:
http://semiaccurate.com/2012/02/01/physics-hardware-makes-keplergk104-fast/
GK110:
http://semiaccurate.com/2012/02/07/gk110-tapes-out-at-last/
I hope the increasing rumours on performance are true - whether its real raw power, or slight of hand aimed at gamers - either which way is a win for consumers as prices will trend down with competition, an aspect thats been sorely lacking in the last 3 years.
2012 shaping up to be a fun year :)
Regards
Zy |
|
|
|
GK110 release in Q3 2012.. painful for nVidia, but quitre possible given they w´don't want to repeat Fermi and it's a huge chips. Which needs another stepping before final tests can be made (some other news 1 or 2 weeks ago).
And the other article: very interesting read. If Charlie is right (and he has been right in the past) Kepler is indeed a dramatic departure from the current designs.
MrS
____________
Scanning for our furry friends since Jan 2002 |
|
|
skgivenVolunteer moderator Volunteer tester
Send message
Joined: 23 Apr 09 Posts: 3968 Credit: 1,995,359,260 RAC: 0 Level
Scientific publications
|
No more CC 2.1-like issues will mean choosing a GF600 NVidia GPU to contribute towards GPUGrid will be easier; basically down to what you can afford.
____________
FAQ's
HOW TO:
- Opt out of Beta Tests
- Ask for Help |
|
|
|
They are probably going to be CC 3.0. What ever that will mean ;)
MrS
____________
Scanning for our furry friends since Jan 2002 |
|
|
skgivenVolunteer moderator Volunteer tester
Send message
Joined: 23 Apr 09 Posts: 3968 Credit: 1,995,359,260 RAC: 0 Level
Scientific publications
|
I'm concerned about the 256-bit memory path and comments such as "The net result is that shader utilization is likely to fall dramatically". Suggestions are that unless your app uses physics 'for Kepler' performances will be poor, but if they do use physics 'for Kepler' performances will be good. Of course only games sponsored by Nvidia will be physically enhanced 'for Kepler', not research apps.
With NVidia (and AMD) going out of their way to have patches coded for games that tend to be used for benchmarking, the Internets Dubious Information On Technology will be even more salty and pinched. So wait for a Kepler app and then see the performances before buying.
____________
FAQ's
HOW TO:
- Opt out of Beta Tests
- Ask for Help |
|
|
GDFVolunteer moderator Project administrator Project developer Project tester Volunteer developer Volunteer tester Project scientist Send message
Joined: 14 Mar 07 Posts: 1957 Credit: 629,356 RAC: 0 Level
Scientific publications
|
If the added speed is due to some new instructions, then we might be able to take advantage of them or not. We have no idea. Memory bandwidth should not be a big problem.
gdf |
|
|
|
GK104 is supposed to be a small chip. With 256 bit bandwidth you can easily get HD6970 performance in games without running into limitations. Try to push performance considerably higher and your shaders will run dry. That's what Charlie suggests.
This is totally unrelated to GP-GPU performance: just take a look at how little bandwidth MW requires. It "depends on the code".. as always ;)
And, as GDF said, if nVidia made the shaders more flexible (they probably did) and more efficient for game physics, this could easily benefit real physics (the equations will be different, the general scheme rather similar).
MrS
____________
Scanning for our furry friends since Jan 2002 |
|
|
GDFVolunteer moderator Project administrator Project developer Project tester Volunteer developer Volunteer tester Project scientist Send message
Joined: 14 Mar 07 Posts: 1957 Credit: 629,356 RAC: 0 Level
Scientific publications
|
Some interesting info:
http://wccftech.com/alleged-nvidia-kepler-gk104-specs-exposed-gpu-feature-1536-cuda-cores-hotclocks-variants/ |
|
|
GDFVolunteer moderator Project administrator Project developer Project tester Volunteer developer Volunteer tester Project scientist Send message
Joined: 14 Mar 07 Posts: 1957 Credit: 629,356 RAC: 0 Level
Scientific publications
|
http://www.brightsideofnews.com/news/2012/2/10/real-nvidia-kepler2c-gk1042c-geforce-gtx-670680-specs-leak-out.aspx
If this is real, it seems that kepler multiprocessors are doubled GF104 MP. I hope that it works better than GF104 for compute.
gdf |
|
|
skgivenVolunteer moderator Volunteer tester
Send message
Joined: 23 Apr 09 Posts: 3968 Credit: 1,995,359,260 RAC: 0 Level
Scientific publications
|
If you can only use 32 of the 48 cuda cores on 104 then you could be looking at 32 from 96 with Kepler, which woud make them no better than existing hardware. Obviously they might have made changes that allow for easier access, so we don't know that will be the case, but the ~2.4 times performance over 104 should be read as 'maximum' performance, as in 'up to'. My impression is that Kepler will generally be OK cards with some exceptional performances here and there, where physics can be used to enhance performance. I think you will have some development to do before you get much out of the card, but hey that's what you do!
____________
FAQ's
HOW TO:
- Opt out of Beta Tests
- Ask for Help |
|
|
GDFVolunteer moderator Project administrator Project developer Project tester Volunteer developer Volunteer tester Project scientist Send message
Joined: 14 Mar 07 Posts: 1957 Credit: 629,356 RAC: 0 Level
Scientific publications
|
no, it should be at least 64/96, but still i hope they have improved the scheduling.
Anyway, with such changes there will be time for optimizations.
gdf |
|
|
|
32 from 96 would mean going 3-way superscalar. They may be green, but they're not mad ;)
As GDF said, 64 of 96 would retain the current 1.5-way superscalar ratio. And seeing how this did OK, but not terribly good I'd also say they rather increase the the number of wave fronts in flight than this ratio. I wouldn't be surprised if they processes each of the 32 threads/warps/pixels/whatever in a wave front in one clock, rather than 2 times 16 in 2 clocks.
And don't forget that shader clock speeds are down, so don't expect a linear speed increase with shader number. Anyway, it's getting interesting!
MrS
____________
Scanning for our furry friends since Jan 2002 |
|
|
GDFVolunteer moderator Project administrator Project developer Project tester Volunteer developer Volunteer tester Project scientist Send message
Joined: 14 Mar 07 Posts: 1957 Credit: 629,356 RAC: 0 Level
Scientific publications
|
I wouldn't be surprised if they processes each of the 32 threads/warps/pixels/whatever in a wave front in one clock, rather than 2 times 16 in 2 clocks.
MrS
That's what it seems from the diagram, they have 32 load/store units now. |
|
|
|
They've got the basic parameters of the HD7970 totally wrong, although it's been officially introduced 2 months ago. Performance is also wrong: it should be ~30% faster than HD6970 in games, but they're saying 10%. They could argue that their benchmark is not what you'd typically get in games.. but then what else is it?
I'm not going to trust their data on unreleased hardware ;)
MrS
____________
Scanning for our furry friends since Jan 2002 |
|
|
GDFVolunteer moderator Project administrator Project developer Project tester Volunteer developer Volunteer tester Project scientist Send message
Joined: 14 Mar 07 Posts: 1957 Credit: 629,356 RAC: 0 Level
Scientific publications
|
It seems that we are close
http://semiaccurate.com/2012/03/05/nvidia-will-launch-gk104keplergtx680-in-a-week/
gdf |
|
|
|
More news.
March 8, is the day where the press that Nvidia
March 12, Nvidia will paper launch the cards
March 23-March 26 sellings
http://semiaccurate.com/2012/03/08/the-semiaccurate-guide-to-nvidia-keplergk104gtx680-launch-activities/
____________
HOW TO - Full installation Ubuntu 11.10 |
|
|
Zydor Send message
Joined: 8 Feb 09 Posts: 252 Credit: 1,309,451 RAC: 0 Level
Scientific publications
|
More rumours ... Guru3D article:
http://www.guru3d.com/news/nvidia-geforce-gtx-680-up-to-4gb-gddr5/
Regards
Zy |
|
|
Zydor Send message
Joined: 8 Feb 09 Posts: 252 Credit: 1,309,451 RAC: 0 Level
Scientific publications
|
Alledged pics of a 680...
http://www.guru3d.com/news/new-nvidia-geforce-gtx-680-pictures-surface/
Regards
Zy |
|
|
GDFVolunteer moderator Project administrator Project developer Project tester Volunteer developer Volunteer tester Project scientist Send message
Joined: 14 Mar 07 Posts: 1957 Credit: 629,356 RAC: 0 Level
Scientific publications
|
Better pictures, benchmarks and specifications.
http://www.tomshardware.com/news/Nvidia-Kepler-GeForce-GTX680-gpu,15012.html
It should be out 23th March, but by the time it gets to Barcelona is going to be May or June.
If somebody cand give one to the project we can start porting the code earlier. This seems to be an even bigger change than Fermi cards were.
|
|
|
|
I still have a bad feeling about the 1536 CUDA cores.... |
|
|
|
What sort of "bad feeling"? |
|
|
skgivenVolunteer moderator Volunteer tester
Send message
Joined: 23 Apr 09 Posts: 3968 Credit: 1,995,359,260 RAC: 0 Level
Scientific publications
|
I had the same sort of "bad feeling" - these cuda cores are not what they use to be, and the route to using them is different. Some things could be much faster if PhysX can be used, but if not who knows.
http://www.tomshardware.com/news/hpc-tesla-nvidia-GPU-compute,15001.html Might be worth a look.
____________
FAQ's
HOW TO:
- Opt out of Beta Tests
- Ask for Help |
|
|
|
I wouldn't worry about it. I'm pretty sure the 6xx cards will be great. If they're not, you can always buy more 5xx cards at plummeting prices. There's really no losing here I think. |
|
|
GDFVolunteer moderator Project administrator Project developer Project tester Volunteer developer Volunteer tester Project scientist Send message
Joined: 14 Mar 07 Posts: 1957 Credit: 629,356 RAC: 0 Level
Scientific publications
|
Well at the very least they are seem to be like the fermi gpus with 48 cores per multiprocessor which we know that have a comparative poor performance.
I hope that they figured it out, otherwise without code changes it might well be on par with a gtx580.
|
|
|
|
There's a price going to be paid for increasing the shader count by a factor of 3 while even lowering TDP. 28 nm alone is by far not enough for this.
Seems like Kepler is more in line with AMDs vision: provide plenty of raw horse power and make "OK to use", but not as bad as with VLIW, and not as easy as previously. Could be the two teams are converging to rather similar architectures with Kepler and GCN. The devil's just in the details and software.
(I haven't seen anything but rumors on Kepler, though)
MrS
____________
Scanning for our furry friends since Jan 2002 |
|
|
skgivenVolunteer moderator Volunteer tester
Send message
Joined: 23 Apr 09 Posts: 3968 Credit: 1,995,359,260 RAC: 0 Level
Scientific publications
|
Suggested price is $549, and suggested 'paper' launch date is 22nd March.
With the 1536 shaders being thinner than before, similar to AMD's approach, getting more work from the GPU and reaching the shaders might be the challenge this time.
The proposed ~195W TDP sits nicely between an HD 7950 and 7970, and noticeably lower than the 244W of the GTX580 (25% higher), so even if it can just match a GTX580 the energy savings are not insignificant. The price however is a bit daunting and until a working app is developed (which might take some time) we will have no idea of performances compared to the GTX500's.
____________
FAQ's
HOW TO:
- Opt out of Beta Tests
- Ask for Help |
|
|
|
What sort of "bad feeling"?
I have two things on my mind:
1. The GTX 680 looks to me more like an improved GTX 560 than an improved GTX 580. If the GTX 560's bottleneck is present in the GTX 680, then GPUGrid could utilize only the 2/3rd of its shaders (i.e. 1024 from 1536)
2. It could mean that the Tesla and Quadro series will be improved GTX 580s, and we won't have an improved GTX 580 in the GeForce product line. |
|
|
|
Hi, I am most strange is the following relationship:
GTX580 = 3,000 = Transistors Mill. 512 cores. GF 110
GTX680 = 3,540 = Transistors Mill. 1536 cores. GK 104
I do not understand that with few transistors can triple cores.
GTX 285 = 1400 Trans. Mill 240 cores Die 470mm2
GTX 580 = 3000 Trans. Mill 512 cores Die 520mm2
GTX680 = 3540 Trans. Mill 1536 cores Die 294mm2
These numbers do not add up to me, the relationship of these values between GTX200 and GTX500 do not fit the GTX600 evolution. |
|
|
|
That's because Kepler fundamentally changes the shader design. How is not exactly clear yet.
@Retvari: and that's why comparisons to GTX560 are not relevant here. I'm saying it's going to be great, just that it'll be very different.
BTW: in the past nVidia chips got rather close to their TDP in "typical" loads, i.e. games. There an HD7970 hovers around the 200 W mark. 250 W is just the power tune limit.
Edit: further information for the brave.. original is chineese.
MrS
____________
Scanning for our furry friends since Jan 2002 |
|
|
|
That's because Kepler fundamentally changes the shader design. How is not exactly clear yet.
@Retvari: and that's why comparisons to GTX560 are not relevant here. I'm saying it's going to be great, just that it'll be very different.
I know, but none of the rumors comfort me. I remember how much was expected of the GTX 460-560 line, and they are actually great for games, but not so good at GPUGrid. I'm afraid that nVidia want to separate their gaming product line from the professional product line even more than before.
I'd like to upgrade my GTX 590, because it's too noisy, but I'm not sure it will worth it.
We'll see it in a few months. |
|
|
GDFVolunteer moderator Project administrator Project developer Project tester Volunteer developer Volunteer tester Project scientist Send message
Joined: 14 Mar 07 Posts: 1957 Credit: 629,356 RAC: 0 Level
Scientific publications
|
They cannot afford to separate gaming and computing. The chips will still need to be the same for economy of scale and there is a higher and higher interest in computing within games.
Changes are good, after all there are more shaders, we just have to learn how to use them. As it is the flagship product we are prepared to invest a lot on it.
gdf |
|
|
|
Well.. even if they perform "only" like a Fermi CC 2.0 with 1024 or even 768 Shaders: that would still be great, considering they accomplish it with just 3.5 billion transistors instead of 3.2 billion for 512 CC 2.0 shaders. That's significant progress anyway.
MrS
____________
Scanning for our furry friends since Jan 2002 |
|
|
|
Well.. even if they perform "only" like a Fermi CC 2.0 with 1024 or even 768 Shaders: that would still be great, considering they accomplish it with just 3.5 billion transistors instead of 3.2 billion for 512 CC 2.0 shaders. Thats significant progress anyway.S
Agreed! Don't fotget power consumtion too. I want a chip not a stove!
Industry will never make a huge jump. They have to put in value the research investment. It's always more profitable two small steps than a big one.
____________
HOW TO - Full installation Ubuntu 11.10 |
|
|
|
Otherwise people will be disappointed the next you "only" make a medium step.. ;)
MrS
____________
Scanning for our furry friends since Jan 2002 |
|
|
Zydor Send message
Joined: 8 Feb 09 Posts: 252 Credit: 1,309,451 RAC: 0 Level
Scientific publications
|
Pre-Order Site in Holland - 500 Euros
http://www.guru3d.com/news.html#15424
3DMark 11 benchmark, which if verified is interesting. I am being cautious about games claims until I know about any emdedded PhysX code. The 3DMark 11 bench is however more interesting. If that translates into the Compute side as well as it indicates .... could be interesting. Still, lets await reality, but I hope it is as good as the 3DMark 11 result, competition is sorely needed out there.
http://www.guru3d.com/news/new-gtx-680-benchmarks-surface/
Regards
Zy |
|
|
|
I was thinking about what could be the architectural bottleneck, which results the under utilization of the CUDA cores in the CC2.1 product line.
The ratio of the other parts versus the CUDA cores in a shader multiprocessor is increased compared to the CC 2.0 architecture, except the load/store units.
While the CC2.0 has 16 LD/ST units for 32 CUDA cores, the CC2.1 has 16 LD/ST units for 48 CUDA cores.
And what do I see in the latest picture of the GF104 architecture?
There are 32 LD/ST units for 192 CUDA cores. (there were 64 LD/ST units on the previous 'leaked' picture)
If these can utilize only 64 CUDA cores here at GPUGrid, then only 512 of the 1536 shaders could be utilized here.
Now that's what I call a bad feeling.
But I'm not a GPGPU expert, and these pictures could be misleading.
Please, prove me that I'm wrong. |
|
|
skgivenVolunteer moderator Volunteer tester
Send message
Joined: 23 Apr 09 Posts: 3968 Credit: 1,995,359,260 RAC: 0 Level
Scientific publications
|
I wouldn't expect things to work straight out of the box this time. I concur with Zoltan on the potential accessibility issue, or worsening of. I'm also concerned about potential loss of cuda core function; what did NVidia strip out of the shaders? Then there is a much speculated reliance of PhysX and potential movement onto the GPU of some functionality. So, looks like app development might keep Gianni away from mischief for some time :)
The memory bandwidth has not increased from the GTX580, leaving space for a GTX700 perhaps, and there is no mention of OpenCL 1.2, or DirectX 11.1 that I can see of. In many respects NVidia and AMD have either swapped positions or equilibration this time (TDP, die size, transistor count). Perhaps NVidia will revert to type in a future incarnation.
____________
FAQ's
HOW TO:
- Opt out of Beta Tests
- Ask for Help |
|
|
|
The problem with CC 2.1 cards should have been the superscalar arrangement. It was nicely written down by Anandtech here. In short: one SM in CC 2.0 cards works on 2 warps in parallel. Each of these can issue on instruction per cycle for 16 "threads"/pixels/values. With CC 2.1 the design changed: there are still 2 warps with 16 threads each, but both can issue 2 instruction per clock if the next instruction is not dependent on the result of the current one.
Load/Store units could also be an issue, but I think this is much more severe.
MrS
____________
Scanning for our furry friends since Jan 2002 |
|
|
Zydor Send message
Joined: 8 Feb 09 Posts: 252 Credit: 1,309,451 RAC: 0 Level
Scientific publications
|
680 SLI 3DMark 11 benchmarks (Guru3D via a VrZone benching session)
http://www.guru3d.com/news/geforce-gtx-680-sli-performance-uncovered/
Regards
Zy |
|
|
|
The problem with CC 2.1 cards should have been the superscalar arrangement. It was nicely written down by Anandtech here. In short: one SM in CC 2.0 cards works on 2 warps in parallel. Each of these can issue on instruction per cycle for 16 "threads"/pixels/values. With CC 2.1 the design changed: there are still 2 warps with 16 threads each, but both can issue 2 instruction per clock if the next instruction is not dependent on the result of the current one.
Load/Store units could also be an issue, but I think this is much more severe.
MrS
The Anandtech's article you've linked was quite enlightening.
I missed to compare the number of warp schedulers in my previous post.
Since then I've find a much better figure of the two architectures.
Comparison of the CC2.1 and CC2.0 architecture:
Based on that Anandtech article, and the picture of the GTX 680's SMX I've concluded that it will be superscalar as well. There are twice as many dispatch units as warp schedulers, while in the CC2.0 architecture their number is equal.
There are 4 warp schedulers for 12 CUDA cores in the GTX 680's SMX so at the moment I think GPUGrid could utilize only the 2/3 of its shaders (1024 of 1536), just like of the CC2.1 cards (there are 2 warp schedulers for 6 cuda cores), unless nVidia built some miraculous component in the warp schedulers.
In addition, based on the transistor count I think the GTX 680's FP64 capabilities (which is irrelevant at GPUGrid) will be reduced or perhaps omitted. |
|
|
|
They cannot afford to separate gaming and computing. The chips will still need to be the same for economy of scale and there is a higher and higher interest in computing within games.
Changes are good, after all there are more shaders, we just have to learn how to use them. As it is the flagship product we are prepared to invest a lot on it.
gdf
I remember the events before the release of the Fermi architecture: nVidia showed different double precision simulations running much faster in real time on Fermi than on GT200b. I haven't seen anything like that this time. Furthermore there is no mention of ECC at all in the rumors of GTX 680.
It looks to me that this time nVidia is going to release their flagship gaming product before the professional one. I don't think they simplified the professional line that much.
What if they release a slightly modified GF110 made on 28nm lithography as their professional product line? (efficiency is much more important in the professional product line than peak chip performance - of course it would be faster than the GF110 based Teslas) |
|
|
|
Glad to hear it was the right information for you :)
I think there's more going on. Note that in CC 2.1 they had 3 blocks of 16 shaders, which are arranged in 6 columns with 8 shaders each in the diagram. In the GK104 diagram, however, there are columns of 16 shaders. If these were still blocks of 16 shaders, there would be 12 of the blocks, which in turn would require 12 dispatch units - much more than available.
This wouldn't make sense. What I suppose they did instead is to arrange the shaders in blocks of 32, so that all threads within a warp can be scheduled at once (instead of taking 2 consecutive clocks). In this case there'd be "only" 6 of these blocks to distribute among 4 warps with 8 dispatch units.
Worst case we should stil see 2/3 of the shaders not utilized. However, there are 4 warps instead of 2 now. Still (as in CC 2.1) every 2nd warp needs to provide some instruction suitable for parallel execution, but load balancing should be improved.
And there's still the chance they increased the "out of order window", which is the amount of instructions that the hardware can look ahead to find instructions suitable for superscalar execution. As far as I understand this had only been the next instruction in CC 2.1.
I too suppose it's not going to be a DP monster - and it doesn't have to be as a mainly consumer / graphics oriented card. Leave that for GK100/GK110 (whatever the flag ship will be).
MrS
____________
Scanning for our furry friends since Jan 2002 |
|
|
SMTB1963 Send message
Joined: 27 Jun 10 Posts: 38 Credit: 524,420,921 RAC: 0 Level
Scientific publications
|
Looks like some guys over at XS managed to catch tom's hardware with their pants down. Apparently, tom's briefly exposed some 680 performance graphs on their site and XS member Olivon was able to scrape them before access was removed. Quote from Olivon:
An old habit from Tom's Hardware. Important is to be quick
LOL!
Anyways, the graphs that stand out:
Other relevant (for our purposes) graphs:
|
|
|
skgivenVolunteer moderator Volunteer tester
Send message
Joined: 23 Apr 09 Posts: 3968 Credit: 1,995,359,260 RAC: 0 Level
Scientific publications
|
Release date is supposed to be today!
I expect Europe has to wait for the US to wake up, before the Official reviews start. Until then tweaktown's unofficial review might be worth a look, but no CUDA testing, just games.
There is an NVidia Video here.
The card introduces GPU Boost (Dynamic Clock Speed), and 'fur' fans will be pleased!
LegitReviews posted suggested GK110 details, including release date.
____________
FAQ's
HOW TO:
- Opt out of Beta Tests
- Ask for Help |
|
|
|
2304 is another fancy number, regarding the powers of 2.
Probably the next generation will contain 7919 CUDA cores. :) |
|
|
5potSend message
Joined: 8 Mar 12 Posts: 411 Credit: 2,083,882,218 RAC: 0 Level
Scientific publications
|
Interesting and tantalizing numbers. Can't wait to see how they perform. |
|
|
GDFVolunteer moderator Project administrator Project developer Project tester Volunteer developer Volunteer tester Project scientist Send message
Joined: 14 Mar 07 Posts: 1957 Credit: 629,356 RAC: 0 Level
Scientific publications
|
it appears that they are actually available for real at least in the UK.
So it is not a paper launch.
gdf |
|
|
skgivenVolunteer moderator Volunteer tester
Send message
Joined: 23 Apr 09 Posts: 3968 Credit: 1,995,359,260 RAC: 0 Level
Scientific publications
|
nvidia-geforce-gtx-680-review by Ryan Smith of AnandTech.
Compute performance certainly isn't great and FP64 is terrible (1/24)!
They can be purchased online from around £400 to £440 in the UK, though the only ones I can see in stock are £439.99! Some are 'on order'. So yeah, real launch, but somewhat limited and expensive stock. Also, they are the same price as an HD 7970. While AMD launched both the HD 7970 and HD 7950, NVidia had but one, as yet... This is different from the GTX480/GTX470 and the GTX580/GTX570 launches.
We will have to wait and see how they perform when GPUGrid get's hold of one, but my expectations are not high.
Other Reviews:
Tom’s Hardware
Guru 3D
TechSpot
HardOCP
Hardware Heaven
Hardware Canucks
TechPowerUp
Legit Reviews
LAN OC
Xbit Labs
TweakTown
Phoronix
Tbreak
Hot Hardware
Link Ref, http://news.techeye.net/hardware/nvidia-gtx-680-retakes-performance-crown-barely#ixzz1psRj9zuD
____________
FAQ's
HOW TO:
- Opt out of Beta Tests
- Ask for Help |
|
|
|
Here in Hungary I can see in stock only the Asus GTX680-2GD5 for 165100HUF, that's 562.5€, or £468.3 (including 27% VAT in Hungary)
I can see a PNY version for 485.5€ (£404), and a Gigabyte for 498€ (£414.5) but these are not in stock, so these prices might be inaccurate. |
|
|
|
So its compute power has actually decreased significantly from the GTX 580?! The Bulldozer fiasco continues. What a disappointing year for computer hardware. |
|
|
|
It's build for gaming, and that's what it does best. We'll have to wait a few more months for their new compute monster (GK110).
MrS
____________
Scanning for our furry friends since Jan 2002 |
|
|
GDFVolunteer moderator Project administrator Project developer Project tester Volunteer developer Volunteer tester Project scientist Send message
Joined: 14 Mar 07 Posts: 1957 Credit: 629,356 RAC: 0 Level
Scientific publications
|
So far we have no idea of how the performance will be here.
I don't expect anything super at start (gtx580 like performance), but we are willing to spend time optimizing for it.
gdf |
|
|
|
$499.99 in USA :(
Amazon |
|
|
|
Summarizing the reviews: gaming performance is like we have expected it, computing performace still not known, since folding isn't working on the GTX680 yet, probably the GPUGrid client won't work either without some optimization. |
|
|
|
http://www.tomshardware.com/reviews/geforce-gtx-680-review-benchmark,3161-14.html
Moreover, Nvidia limits 64-bit double-precision math to 1/24 of single-precision, protecting its more compute-oriented cards from being displaced by purpose-built gamer boards. The result is that GeForce GTX 680 underperforms GeForce GTX 590, 580 and to a much direr degree, the three competing boards from AMD.
Does GPUGRID use 64-bit double-precision math?
____________
Reno, NV
Team: SETI.USA
|
|
|
GDFVolunteer moderator Project administrator Project developer Project tester Volunteer developer Volunteer tester Project scientist Send message
Joined: 14 Mar 07 Posts: 1957 Credit: 629,356 RAC: 0 Level
Scientific publications
|
If somebody can run here on a gtx680, let us know.
thanks,
gdf |
|
|
GDFVolunteer moderator Project administrator Project developer Project tester Volunteer developer Volunteer tester Project scientist Send message
Joined: 14 Mar 07 Posts: 1957 Credit: 629,356 RAC: 0 Level
Scientific publications
|
Almost nothing, this should not matter.
gdf
http://www.tomshardware.com/reviews/geforce-gtx-680-review-benchmark,3161-14.html
Moreover, Nvidia limits 64-bit double-precision math to 1/24 of single-precision, protecting its more compute-oriented cards from being displaced by purpose-built gamer boards. The result is that GeForce GTX 680 underperforms GeForce GTX 590, 580 and to a much direr degree, the three competing boards from AMD.
Does GPUGRID use 64-bit double-precision math?
|
|
|
|
If somebody can run here on a gtx680, let us know.
thanks,
gdf
Should be getting an EVGA version tomorrow morning here in UK - cost me £405.
Already been asked to do some other Boinc tests first though. |
|
|
|
It would be nice if you also reported here what you find for other projects - thanks!
MrS
____________
Scanning for our furry friends since Jan 2002 |
|
|
|
I wonder how this tweaked architecture will perform with these BOINC projects.
So far compute doesn't seem like Kepler's strong point.
Also, a little off topic...
But is then any progress being made on the AMD side of things?
I haven't heard a single peep about it for over a month.
If the developers still don't have a 7970, fine.
Please at least confirm as much...
Thanks. |
|
|
GDFVolunteer moderator Project administrator Project developer Project tester Volunteer developer Volunteer tester Project scientist Send message
Joined: 14 Mar 07 Posts: 1957 Credit: 629,356 RAC: 0 Level
Scientific publications
|
We have a small one, good enough for testing. The code works on Windows with some bugs. We are assessing the performance.
gdf |
|
|
skgivenVolunteer moderator Volunteer tester
Send message
Joined: 23 Apr 09 Posts: 3968 Credit: 1,995,359,260 RAC: 0 Level
Scientific publications
|
It would seem NVidia have stopped support for XP; there are no XP drivers for the GTX 680!
http://www.geforce.com/drivers/results/42929 I think I posted about this a few months ago.
Suggestions are that the 301.1 driver is needed (probably Win).
http://www.geforce.com/drivers/beta-legacy
A Linux 295.33 driver was also released on the 22dn, and NVidia's driver support for Linux is >>better than AMD's.
The cards fan profile is such that the fans don't make much noise; so it might get hot. This isn't a good situation. If we can't use with WinXP, then we are looking at W7 (and presumably an 11% or more hit in performance)? If we use Linux we could be faced with cooling issues.
The 301.1 driver might work on a 2008R2 server, but probably not on earlier servers.
Good luck,
____________
FAQ's
HOW TO:
- Opt out of Beta Tests
- Ask for Help |
|
|
|
OK have now got my 680 and started by running some standard gpu tasks for Seti.
On the 580 it would take (on average) 3m 40s to do one task. On the 680 (at normal settings) it takes around 3m 10s.
The card I have is an EVGA so can be overclocked using their Precision X tool.
The first overclock was too aggressive and it was clearly causing the gpu tasks to error out however lowering the overclock resulted in gpu tasks now taking around 2m 50s each.
Going to try and get a GPUGRID task shortly to see how that goes. |
|
|
|
Tried to download and run 2 x GPUGRID tasks but both crashed out before completing the download saying acemd.win2382 had stopped responding.
So not sure what the problem is? |
|
|
|
Just reset the graphics card back to "normal" ie. no overclock and still errors out - this time it did finish downloading but crashed out as soon as it started on the work unit so looks like this project does not yet work on the 680 ? |
|
|
|
stderr_txt:
# Using device 0
# There is 1 device supporting CUDA
# Device 0: "GeForce GTX 680"
# Clock rate: 0.71 GHz
# Total amount of global memory: -2147483648 bytes
# Number of multiprocessors: 8
# Number of cores: 64
SWAN : Module load result [.fastfill.cu.] [200]
SWAN: FATAL : Module load failed
Assertion failed: 0, file swanlib_nv.c, line 390
We couldn't be having a 31-bit overflow on that memory size, could we? |
|
|
|
stderr_txt:
# Using device 0
# There is 1 device supporting CUDA
# Device 0: "GeForce GTX 680"
# Clock rate: 0.71 GHz
# Total amount of global memory: -2147483648 bytes
# Number of multiprocessors: 8
# Number of cores: 64
SWAN : Module load result [.fastfill.cu.] [200]
SWAN: FATAL : Module load failed
Assertion failed: 0, file swanlib_nv.c, line 390
We couldn't be having a 31-bit overflow on that memory size, could we?
In English please?
|
|
|
skgivenVolunteer moderator Volunteer tester
Send message
Joined: 23 Apr 09 Posts: 3968 Credit: 1,995,359,260 RAC: 0 Level
Scientific publications
|
The GPUGRID application doesn't support the GTX680 yet. We'll have test units soon and - if there are no problems - we'll update over the weekend or early next week.
MJH
English - GPUGrid's applications don't yet support the GTX680. MJH is working on an app and might get one ready soon; over the weekend or early next week.
PS. Your SETI runs show the card has some promise ~16% faster @stock than a GTX580 (244W TDP). Or 30% faster overclocked. Not sure that will be possible here, but you'll know fairly soon. Even if the GTX680 (195W TDP) just matches the GTX580 the performance/power gain might be note worthy; ~125% performance/Watt, or 45% at 116% performance of a GTX580.
____________
FAQ's
HOW TO:
- Opt out of Beta Tests
- Ask for Help |
|
|
|
Thanks for that simple to understand reply :)
I will suspend GPUGRID on that machine until the project does support the 680. |
|
|
skgivenVolunteer moderator Volunteer tester
Send message
Joined: 23 Apr 09 Posts: 3968 Credit: 1,995,359,260 RAC: 0 Level
Scientific publications
|
Good idea; no point returning lots of failed tasks!
I expect you will see an announcement when there is a working/beta app.
Thanks,
____________
FAQ's
HOW TO:
- Opt out of Beta Tests
- Ask for Help |
|
|
|
HPCwire - NVIDIA Launches first Kepler GPUs at gamers; HPC version waiting in the wings.
http://www.hpcwire.com/hpcwire/2012-03-22/nvidia_launches_first_kepler_gpus_at_gamers_hpc_version_waiting_in_the_wings.html?featured=top |
|
|
matlockSend message
Joined: 12 Dec 11 Posts: 34 Credit: 86,423,547 RAC: 0 Level
Scientific publications
|
Why would there be cooling issues in Linux? I keep my 560Ti448Core very cool by manually setting the fan speed in the nvidia settings application, after setting "Coolbits" to "5" in the xorg.conf.
It would seem NVidia have stopped support for XP; there are no XP drivers for the GTX 680!
http://www.geforce.com/drivers/results/42929 I think I posted about this a few months ago.
Suggestions are that the 301.1 driver is needed (probably Win).
http://www.geforce.com/drivers/beta-legacy
A Linux 295.33 driver was also released on the 22dn, and NVidia's driver support for Linux is >>better than AMD's.
The cards fan profile is such that the fans don't make much noise; so it might get hot. This isn't a good situation. If we can't use with WinXP, then we are looking at W7 (and presumably an 11% or more hit in performance)? If we use Linux we could be faced with cooling issues.
The 301.1 driver might work on a 2008R2 server, but probably not on earlier servers.
Good luck,
|
|
|
skgivenVolunteer moderator Volunteer tester
Send message
Joined: 23 Apr 09 Posts: 3968 Credit: 1,995,359,260 RAC: 0 Level
Scientific publications
|
Well, if we make the rather speculative presumption that a GF680 would work with Coolbits straight out of the box, then yes we can cool a card on Linux, but AFAIK it only works for one GPU and not for overclocking/downclocking. I think Coolbits was more useful in the distant past, but perhaps it will still work for GF600's.
Anyway, when the manufacturer variants appear, with better default cooling profiles, GPU temps won't be something to worry about on any OS.
Cheers for the tip/recap, it's been ~1year since I put it in an FAQ.
____________
FAQ's
HOW TO:
- Opt out of Beta Tests
- Ask for Help |
|
|
matlockSend message
Joined: 12 Dec 11 Posts: 34 Credit: 86,423,547 RAC: 0 Level
Scientific publications
|
It appears there may be another usage of the term "Coolbits" (unfortunately) for some old software. The one I was referring to is part of the nvidia Linux driver, and is set within the Device section of the xorg.conf.
http://en.gentoo-wiki.com/wiki/Nvidia#Manual_Fan_Control_for_nVIDIA_Settings
It has worked for all of my nvidia GPUs so far. |
|
|
skgivenVolunteer moderator Volunteer tester
Send message
Joined: 23 Apr 09 Posts: 3968 Credit: 1,995,359,260 RAC: 0 Level
Scientific publications
|
Thanks Mowskwoz, we have taken this thread a bit off target, so I might move our fan control on linux posts to a linux thread later. I will look into NVidia CoolBits again.
I see Zotac intend to release a GTX 680 chip clocked at 2GHz!
An EVGA card has already OC'ed to 1.8GHz, so the markets should see some sweet bespoke GTX680's in the future.
So much for PCIE 3.0
I see NVidia are listing a GT 620 in their drivers section...
____________
FAQ's
HOW TO:
- Opt out of Beta Tests
- Ask for Help |
|
|
|
Wow, this 680 monster seems to run with Handbrakes on, poor performance on CL, more worst than 580 and of corse HD79x0.
nVidia want to protect their quadro/tesla Cards.
Or i get it wrong?
http://www.tomshardware.com/reviews/geforce-gtx-680-review-benchmark,3161-15.html
and
http://www.tomshardware.com/reviews/geforce-gtx-680-review-benchmark,3161-14.html
____________
|
|
|
|
No, that seems to be the case. OpenCL performance is poor at best, although in the single non-OpenCL bench I saw it performed decently. Not great, but at least better than the 580. Double precision performance is abysmal, it looks like ATI will be holding onto that crown for the forseeable future. I will be curious to see exactly what these projects can get out of the card, but so far it's not all that inspiring on the compute end of things. |
|
|
|
For the 1.8 GHz LN2 was neccessary. That's extreme and usually yields clock speds ~25% higher than achievable with water cooling. Reportedly the voltage was only 1.2 V, which sounds unbelievable.
2 GHz is a far stretch from this. I doubt it's possible even with triple stage phase change cooling (by far not as cold as LN2, but sustainable). And the article says "probably only for the chinese market". Hello? If you go all the way to produce such a monster you'll want to sell them on Ebay, worldwide. You'd earn thousands of bucks a piece.
And something like "poor OpenCL performance" can not be said. It all depends on the software you're running. And mind you, Kepler offloads some scheduling work to the compiler rather than doing it in hardware. This will take some time to mature.
Anyway, as others have said, double precision performance is downright ugly. Don't buy these for Milkyway.
MrS
____________
Scanning for our furry friends since Jan 2002 |
|
|
|
We have a small one, good enough for testing. The code works on Windows with some bugs. We are assessing the performance.
gdf
That's pretty good news.
I'm glad that AMD managed to put out three different cores that are GCN based.
The cheaper cards still have most if not all of the compute capabilities of the HD 7970.
Hopefully there will be a testing app soon and I'll be one of the first in line. ;) |
|
|
|
Okay so this thread has been all over the place.. can someone sum up?
Is the 680 good or bad? |
|
|
5potSend message
Joined: 8 Mar 12 Posts: 411 Credit: 2,083,882,218 RAC: 0 Level
Scientific publications
|
They're testing today.
|
|
|
|
Hello: The summary of what I've read several analyzes on the performance of GTX680 in caculation is as follows:
Simple Presision............. +50% to +80%
Double Precision............. -30% to -73%
" Because it’s based around double precision math the GTX 680 does rather poorly here, but the surprising bit is that it did so to a larger degree than we’d expect. The GTX 680’s FP64 performance is 1/24th its FP32 performance, compared to 1/8th on GTX 580 and 1/12th on GTX 560 Ti. Still, our expectation would be that performance would at least hold constant relative to the GTX 560 Ti, given that the GTX 680 has more than double the compute performance to offset the larger FP64 gap " |
|
|
|
Hey where's that from? Is there more of the good stuff? Did Anandtech update their launch article?
MrS
____________
Scanning for our furry friends since Jan 2002 |
|
|
skgivenVolunteer moderator Volunteer tester
Send message
Joined: 23 Apr 09 Posts: 3968 Credit: 1,995,359,260 RAC: 0 Level
Scientific publications
|
Yes, looks like Ryan added some more info to the article. He tends to do this - it's good reporting, makes their reviews worth revisiting.
http://www.anandtech.com/show/5699/nvidia-geforce-gtx-680-review/17
Any app requiring doubles is likely to struggle, as seen with PG's.
Gianni said that the GTX 680 is as fast as a GTX580 on a CUDA 4.2 app here.
When released the new CUDA4.2 app is also supposed to be 15% faster for Fermi cards, which is more important at this stage.
The app is still designed for Fermi, but can't be redesigned for the GTX680 until the dev tools are less buggy.
In the long run it's likely that there will be several app improvement steps for the GF600.
____________
FAQ's
HOW TO:
- Opt out of Beta Tests
- Ask for Help |
|
|
|
Why does nvidia caps his 6xx series in this way? when they think it kills there own tesla series cards....why they still sold them when they perform that bad in comparsion to the modern desktop cards??? It would much cheaper for us and nvdia would sold much more of there desktop cards to grid computing...or they set an example of 8 of uncensored gtx680 chips on one tesla card for that price a tesla costs..
____________
DSKAG Austria Research Team: http://www.research.dskag.at
|
|
|
|
Why does nvidia caps his 6xx series in this way? when they think it kills there own tesla series cards....
plain simple?
they wanted gaming performance and sacrificed computing capabilities which are not needed there.
why they still sold them when they perform that bad in comparsion to the modern desktop cards??? It would much cheaper for us and nvdia would sold much more of there desktop cards to grid computing...or they set an example of 8 of uncensored gtx680 chips on one tesla card for that price a tesla costs..
GK-104 is not censored!
it's plain simple a mostly pure 32-bit desgin.
i bet they will come up with something completey different for kepler cards.
|
|
|
skgivenVolunteer moderator Volunteer tester
Send message
Joined: 23 Apr 09 Posts: 3968 Credit: 1,995,359,260 RAC: 0 Level
Scientific publications
|
Some of us expected this divergence in the GeForce.
GK104 is a Gaming Card, and we will see a Compute card (GK110 or whatever) probably towards the end of the year (maybe Aug but more likely Dec).
Although it's not what some wanted, it's still a good card; matches a GTX580 but uses less power (making it about 25% more efficient). GPUGrid does not rely on OpenCL or FP64, so these weaknesses are not an issue here. Stripping down FP64 and OpenCL functionality helps efficiency on games and probably CUDA to some extent.
With app development, performance will likely increase. Even a readily achievable 10% improvement would mean a theoretical 37% performance per Watt improvement over the GTX580. If the performance can be improved by 20% over the GTX580 then the GTX680 would be 50% more efficient here. There is a good chance this will be attained, but when is down to dev tools.
____________
FAQ's
HOW TO:
- Opt out of Beta Tests
- Ask for Help |
|
|
|
ok i read ya both answers and understood, i only read anywhere that it is cut in performance for not matching there tesla. Seems to be a wrong article then ^^ (dont ask where i read that, dont know anymore). So i beleave now the gtx680 is a still good card then ;)
____________
DSKAG Austria Research Team: http://www.research.dskag.at
|
|
|
|
So i beleave now the gtx680 is a still good card then ;)
well, it is - if you know what you get.
taken from the CUDA_C guide in CUDA 4.2.6 beta:
CC 2.0 compared to CC 3.0
OP's per clock-cycle and SM/SMX:
32-bit floating-point: 32 : 192
64-bit floating-point: 16 : 8
32-bit integer add: 32 : 168
32-bit integershift, compare : 16: 8
logical operations: 32: 136
32-bit integer : 16 : 32
.....
+ optimal warp-size seems to have moved up from 32 to 64 now!
it's totally different and the apps need to be optimized to take advantage of that. |
|
|
|
Another bit to add regarding FP64 performance: apparently GK104 uses 8 dedicated hardware units for this, in addition to the regular 192 shaders per SMX. So they actually spent more transistors to provide a little FP64 capability (for development or sparse usage).
MrS
____________
Scanning for our furry friends since Jan 2002 |
|
|