ET3D: The other question is, how do you predict that? Was there anything particular that made you think that would be the case? You're talking about understanding the architecture, but what about it could help predict this?
Qualified guesswork, or something like that :p It generally doesn't matter now, though. You essentially get to something playable by turning on or off some extra filters.
And the architecture differences aren't really as apocryphal as it sounds, either. Or extremely different between nvidia and amd. It used to be, before everyone moved to general purpose "shader units", instead of vertex and pixel shaders.
But basically what happens is that you have a number of simple processors (simd-units) that run simple math operations over the graphics card ram. So what you're looking for is 1. bus transfer rate and bus width to the pci bus, 2. internal graphics card memory controller speed. 3. number of shader units. 4. instruction queue depth. and 5. instruction rate (i.e., clock speeds).
The problem is that these shader units are difficult to compare "objectively" from specs (Fermi shaders vs. kepler and pascal, maxwell, vs. AMD cores.. complete jumble). And that some post-filters, culling operations, and so on can be more or less efficient on cards that have comparative benchmark results.
But on desktops now you at least don't need to worry about something being cpu or gpu-bound any more, since pretty much every cpu is fast enough. Still, it's possible to crash the framerate in certain games by selecting particular filters that are inefficient, and it's possible to stall a cpu-bound routine in other ways. So one thing that would help a lot when analyzing benchmarks would be to look at what sort of effects and filters cause slowdowns, and compare that. This is something Anandtech won't do, for example, because they genuinely think that what they get from the publisher or their partners is the objective truth, and things like that.
To complicate things more, nvidia are very good at tweaking drivers to specific games, to basically collapse some routines into fewer instructions. While AMD won't do that. So often you used to see that low-end nvidia cards score well in benchmarks, in spite of having "objectively" slower hardware. But that the average framerate overall suddenly had very low points, for example. It's not as bad now, but selected benchmarks helped sell a lot of geforce cards that weren't very good for the price. Their new g-sync thing as well - it's just an admittance that they won't make cards that won't have these periodic slowdowns in complex scenes. It's just not the priority.
Anyway. So what is interesting more recently is that we are (very) slowly get more and more interesting effects put into games that rely on computation of in-scene locations. So to avoid slowdowns for effects like this, developers usually just optimize them for output via include packages that .. nvidia makes. Like physx. And then you know on beforehand that if you can force it through those include libraries, then the nvidia cards will run it well enough. And then that's generally the baseline. To move from that then usually involves adding a second toolchain. Like the hairworks stuff vs. tressfx for example. Either it's a baseline target on one card, or it's partial support on the other one. Or else that component is completely different between the two architectures.
And this is essentially decided without any marginal performance differences in mind. It's about ease of use of libraries and tools instead. And because nvidia have been very good at marketing their stuff lately, there's a draw towards using their tools. And that stops people from moving to more general purpose compute based graphics contexts.
This has lasted for a very long time now. Much longer than people less cynical than me thought. Until we end up at a point where Nvidia actually do make a chip that's super-focused on simple effects and short return time for those types of effects, like Pascal is (it's really primarily a mobile chip - they've reduced the power-draw massively without losing performance, basically, like on maxwell - super for a laptop and a 40-60w package). It's basically the same card and featureset as before, just more efficient.
Because that's what games run on. Compute effects and location based particle system effects tend to be some kind of hobby-project for programmers anyway, so if those types of effects show up, then you know that this is because some unknown and underpaid programmer managed to smuggle something into the code that runs super-efficiently on the simple logic the smx/simd-units runs on. If you've played bf3, then there are a lot of lighting and shade effects in that game that is specifically written general shader code for the game-engine with that in mind. Which... isn't typically how you do it when you rely on 3rd party tools.
So when looking at architecture now, it's really the same problem that we had several years ago: will you choose something that you know will work perfectly fine right now, and will have support for as long as that card is relevant. Or will you accept some extra tweaking and some inefficient drivers - in return for potentially getting some extra features that programmers may eventually make use of.
But until some next generation GCN on amd cards, modern cards now are literally idential. What you're really looking at when you see a 8% difference in benchmark scores is that you may or may not get 54 and a half frames per second with 16xfsaa so you need to turn something down to keep 60fps.
And the only really interesting thing that's happened lately is the apu designs. But even then the potential of that design is fairly low, and in limited and specialized applications, you would very likely prefer a traditional gpu/cpu design with just more compute units available on the card instead.
So yeah.. it's basically the same. Nvidia have some really good itx-cabinet cards now that run very smoothly on few watts (though they don't advertise that as much as the titan cards). And AMD have a few solid value cards, in that their top of the line fails to impress in the benchmarks.
Eventually we're going to see something else, of course. And it will be compute based with predictable process diagrams, without any doubt. So that's where the improvement has to happen, with compute performance on some high-level language (read: OpenCL).
But the kind of differences I listed on top of the post, that would have been the difference between the Witcher 2 running in 60 or 30 fps with a specific set of effects turned on a while back, or what might be interesting to laptop gamers deciding between a 40w and a 120w package --- this is completely insignificant on desktop cards now, because it affects a lower treshold of 80 vs. 70 fps with everything including ray-bans turned on. You're in reality talking about buying a super-charged monster-card that spends twice as many watts to run 16 instead of 8 x full-screen anti-aliasing (or basically supersampling the same inaccurate pixel in the texture 90 times - it's a complete waste of processing power. And if you bought a water-cooling unit to cool it down that cost as much as the card - the colour in the tubes better be really good).
So when you shop for cards now, try to look for a target you're looking for, and then buy the cheapest card that achieves that. And then keep it for the next 10 years.