The New GPU

Westmere marked a change in the way Intel approached integrated graphics. The GPU was moved onto the CPU package and used an n-1 manufacturing process (45nm when the CPU was 32nm). Performance improved but it still wasn't exactly what we'd call acceptable.

Sandy Bridge brought a completely redesigned GPU core onto the processor die itself. As a co-resident of the CPU, the GPU was treated as somewhat of an equal - both processors were built on the same 32nm process.

With Ivy Bridge the GPU remains on die but it grows more than the CPU does this generation. Intel isn't disclosing the die split but there are more execution units this round (16 up from 12 in SNB) so it would appear as if the GPU occupies a greater percentage of the die than it did last generation. It's not near a 50/50 split yet, but it's continued indication that Intel is taking GPU performance seriously.

The Ivy Bridge GPU adds support for OpenCL 1.1, DirectX 11 and OpenGL 3.1. This will finally bring Intel's GPU feature set on par with AMD's. Ivy also adds three display outputs (up from two in Sandy Bridge). Finally, Ivy Bridge improves anisotropic filtering quality. As Intel Fellow Tom Piazza put it, "we now draw circles instead of flower petals" referring to image output from the famous AF tester.

Intel made the Ivy Bridge GPU more modular than before. In SNB there were two GPU configurations: GT1 and GT2. Sandy Bridge's GT1 had 6 EUs (shaders/cores/execution units) while GT2 had 12 EUs, both configurations had one texture sampler. Ivy Bridge was designed to scale up and down more easily. GT2 has 16 EUs and 2 texture samplers, while GT1 has an unknown number of EUs (I'd assume 8) and 1 texture sampler.
I mentioned that Ivy Bridge was designed to scale up, unfortunately that upwards scaling won't be happening in IVB - GT2 will be the fastest configuration available. The implication is that Intel had plans for IVB with a beefier GPU but it didn't make the cut. Perhaps we will see that change in Haswell.

As we've already mentioned, Intel is increasing the number of EUs in Ivy Bridge however these EUs are much better performers than their predecessors. Sandy Bridge's EUs could co-issue MADs and transcendental operations, Ivy Bridge can do twice as many MADs per clock. As a result, a single Ivy Bridge EU gets close to twice the IPC of a Sandy Bridge EU - in other words, you're looking at nearly 2x the GFLOPS in shader bound operations as Sandy Bridge per EU. Combine that with more EUs in Ivy Bridge and this is where the bulk of the up-to-60% increase in GPU performance comes from.

Intel also added a graphics-specific L3 cache within Ivy Bridge. Despite being able to share the CPU's L3 cache, a smaller cache located within the graphics core allows frequently accessed data to be accessed without firing up the ring bus.

There are other performance enhancements within the shader core. Scatter & gather operations now execute 32x faster than Sandy Bridge, which has implications for both GPU compute and general 3D gaming performance.

Despite the focus on performance, Intel actually reduced the GPU clock in Ivy Bridge. It now runs at up to 95% of the SNB GPU clock, at a lower voltage, while offering much higher performance. Thanks primarily to Intel's 22nm process (the aforementioned architectural improvements help as well), GPU performance per watt nearly doubles over Sandy Bridge. In our Llano review we found that AMD delivered much longer battery life in games (nearly 2x SNB) - Ivy Bridge should be able to help address this.

Quick Sync Performance Improved

With Sandy Bridge Intel introduced an extremely high performing hardware video transcode engine called Quick Sync. The solution ended up delivering the best combination of image quality and performance of any available hardware accelerated transcoding options from AMD, Intel and NVIDIA. Quick Sync leverages a combination of fixed function hardware, IVB's video decode engine and the EU array.

The increase in EUs and improvements to their throughput both contribute to increases in Quick Sync transcoding performance. Presumably Intel has also done some work on the decode side as well, which is actually one of the reasons Sandy Bridge was so fast at transcoding video. The combination of all of this results in up to 2x the video transcoding performance of Sandy Bridge. There's also the option of seeing less of a performance increase but delivering better image quality.

I've complained in the past about the lack of free transcoding applications (e.g. Handbrake, x264) that support Quick Sync. I suspect things will be better upon Ivy Bridge's arrival.

Power Efficiency Improvements & Configurable TDP Final Words
Comments Locked

97 Comments

View All Comments

  • AstroGuardian - Monday, September 19, 2011 - link

    "Intel implied that upward scalability was a key goal of the Ivy Bridge GPU design, perhaps we will see that happen in 2013."

    No we wont. The world ends in 2012 remember?
  • JonnyDough - Monday, September 19, 2011 - link

    It ended in the year 2000. Hello! Y2K ring any bells? Come on, keep up with current events would ya?
  • TheRyuu - Monday, September 19, 2011 - link

    "I've complained in the past about the lack of free transcoding applications (e.g. Handbrake, x264) that support Quick Sync. I suspect things will be better upon Ivy Bridge's arrival."

    As long as Intel doesn't expose the Quick Sync API there is no way for such applications to make use of it, not to mention the technical limitations.

    There are hints on doom9 that they know a bit about the lower level details but that it's all NDA'ed. Even with that knowledge he says that it's probably not possible or probable to do so.

    You can find various rambling/rage here:
    http://forum.doom9.org/showthread.php?t=156761 (Dark_Shikari and pengvado are the x264 devs).

    tl;dr: http://forum.doom9.org/showthread.php?p=1511469#po... (to the end of the thread)
  • fic2 - Monday, September 19, 2011 - link

    I would also wonder who (software wise) would be willing to put a lot of resources into supporting something that isn't really available on most SB platforms - or at least not available without jumping through hoops (correct mb, correct chip, 3rd party software, etc).
  • fic2 - Monday, September 19, 2011 - link

    "By the time Ivy Bridge arrives however, AMD will have already taken another step forward with Trinity."

    I wonder how realistic this is considering that AMD can't even get Bulldozer out the door.

    My money is on Ivy Bridge showing up before Trinity.
  • Beenthere - Monday, September 19, 2011 - link

    Considering Trinity was shown at IDF up and running and the fact that Trinity and other AMD nex gen products were developed concurrently with Zambezi and Opteron Bulldozer chips - which have been shipping by the tens of thousands already, I'd say Trinity will be here in Q1 '12.
  • fic2 - Monday, September 19, 2011 - link

    "Opteron Bulldozer chips - which have been shipping by the tens of thousands already"

    And, yet, nobody can benchmark them.

    I hope that I am wrong, but given AMD's continual delays shipping the desktop BD I am not holding my breath.

    Whichever comes first gets my money - assuming that BD is actually competitive with SB performance.
  • thebeastie - Tuesday, September 20, 2011 - link

    You talk about what's for support for handbrake but to put it harshly your mind is stuck in the past gen device era.
    I simply grab a full DVD and run makemkv on it to just store it unmodified in a single file and copy it to my iPad2 directly.
    Plays perfectly fine under avplayerhd.

    I consider it that you would have to be insane as in you think your an onion to bother handbrakin your videos if you got a device like ipad2 that can just play them straight.

    If your the hoarder type that insists that you watch Rambo 4 etc every week and need to pack 100+ full movies on your single device at the same time your a freak so pipe you niche life style comments to /dev/null.
    I would not understand why you have time to bother shrinking/ converting your movies all the time over just getting sick of some of them and putting new stuff on from time to time.
  • TheRyuu - Tuesday, September 20, 2011 - link

    8.5GB for a movie seems a bit impractical for an ipad.
  • thebeastie - Wednesday, September 21, 2011 - link

    Full 8gb is big but they still copy of amazingly quickly over to a ipad2 64gb, a lot of DVDs don't get that full size anyway.
    If you bought a honeycomb tablet and put sdslot storage on it, I am sure it would be a extremely painfull slow copying experience if you use SD over built in flash, maybe this is what Apple avoid sd lslotd in the first place. Built in flash is lighting fast and less draw on battery.

    Having full on pc and just coying over in 2mins vs bothering to convert I know what i just choose full copy every time.
    Once I have watched it takes at least a year before I consider watching the same thing again.

Log in

Don't have an account? Sign up now