kohlrak: CPU instructions on x86 now include encryption, which is also great for built in random number generator. For legal reasons (gambling) and speed reasons (saves on the code for running a huge PRNG), I could see why someone would use that, but i'm not sure if it's available ring3.
SSE is a perfect example. You can work on huge chunks of data, using only a few instructions, which means you can cut down
alot on clock requirements. You can more than double the amount of calculations per second by using SSE over FPU (Floating Point Unit), which is the default for C++. And this is ignoring that, a while ago, x86 had floating point arithmatic without a floating point unite (so strings and/or BCD numbers, which are really, really slow). And i'm not even going to get into scenarios where things like certain ARM and Atmel CPUs don't even have "div" instructions.
I'll defer to your obvious CPU expertize on those matters. I'm a bit disappointed though, I was really hoping to get a decent secondary gaming rig for when my friends come over for 500$-700$ on used hardware.
kohlrak: And also, the truth is, instruction requirements are quite common, but we don't really see it, because most of the time a particular gamer already has the up-to-date CPU, anyway, and new instructions aren't all that common (but it's a huge reason to upgrade). What kils me the most, though, is that when a company uses cross-platform coding even when they have absolutely no intention of working cross-platform. Moreover, most target machines are all the same processors, anyway, so there's no reason why you don't mix the C++ with assembly (which you can do easily, either with inline, separate sources that link together at linker stage, or using intrinsics [the most common method]).
Well, I'll put a damper on this one, because now we're touching some of my expertize (full stack web developper here, specializing in scalable cloud applications).
Increasingly, an explosion of clients are running outside the desktop (hello smart phones, Raspiberry Pis, consoles and smart or embedded devices of all kinds).
And let's not even talk about the backend running in the cloud (on some abstract Intel architecture and in some cases arm as well nowadays).
Add to this the fact that the bottleneck of 95%+ of apps out there is not the CPU, but I/O and the case of code that is CPU-optimized to the point to the point of being hardware restrictive is pretty thin.
kohlrak: EDIT: Another thing is that "cross CPU assembly" actually is a thing, too. Assembly has a bad rep not only for it's "complexity" (simplest programming language, if you take the time to think about it) and the fact it doesn't hold your hand (type checking and all those other annoying compiler errors). If you screw up, it's your own fault.
I think people are not doing assembly for the same reason most people are not doing C/C++ outside of gaming and system-level programming.
I'll tooth my own horn a bit there and say I'm one of those few developers who can actually write flawless C++ without memory leaks, segfault errors and weird undefined behavior (usually caused by some race condition in a multi-threaded application) that pops up once in a while, but the reality is that it will take me 3 times as long to do it in C++ than it would take me to write it in Python and most developers out there simply are not meticulous enough to manage it.
CPU is not a bottleneck for most things and time to market is a big factor and when you can do something 3 times as fast with a more abstract language, it's a worthwhile tradeoff most of the time.
Not saying there's not <5% of your app that you'll want to do in C++ because you'll want to squeeze all the CPU performance that you can out of that part, but its exactly that, less than 5% of your app.