cjrgreen: The rap against the newer AMD CPUs is that they do not reach their full potential under Windows 7. They still run fine, just not so fast as they could under a smarter OS. The problem will be corrected in Windows 8.
Psyringe: Hmm, could you elaborate on that? (Feel free to talk technical.) I do know that one of the current problems of the Bulldozer architecture is the fact that Windows assigns threads suboptimally for these CPUs, it will assign a new thread to an idle core in a half-occupied module even if there are totally idle modules available. This means that both threads need to share the same floating point unit, which is an unnecessary bottleneck - the thread would have had its own FPU if it had been assigned to the idle module.
I also know that a Windows update was announced that supposedly optimizes the way Windows assigns threads to cores in the Bulldozer architecture.
However, the performance gain is estimated to be only 10%, because apparently this isn't the only problem of these CPUs. Also, I've read reviews where people had disabled one integer core in each module (to _force_ Windows to assign each thread to its own module), and the results were still disappointing.
Hence, I'm interested in the way Windows 8 is supposed to improve the situation. Please don't take this post as an attack on your statement (I guess it might come across as such because my previous post probably reads rather rant-y). I'm really genuinely interested in the technical background of your statement, because in my own upgrade plan, I'm left with the situation that neither of the currently available CPUs fits the purpose of the machine. Hence, I'm interested in everything that sheds new light on that situation.
Yes, your understanding is correct. The Windows scheduler does not "know" how to optimize core affinity for the new AMD architecture; thus it makes some poor decisions that impact performance.
Intel CPUs have damned big shared L2 (in Core 2) or L3 (in Nehalem and Sandy Bridge) caches. Core affinity doesn't too much matter, because any core can get to any row of cache without a significant performance penalty.
AMD CPUs have separate L2 caches. Exactly how the L2 cache is arranged differs among their various architectures. A core that needs a row that is not in its cache has to go through HyperTransport to get it from the core that has it. So on AMD CPUs, core affinity matters a lot.
There are two specific (limitations, blunders, call them what you want) in the Windows scheduler that particularly affect AMD FX CPUs. One is, the scheduler pays little attention to core affinity even when the number of threads that need scheduling is small. You end up with, say, a 4-thread load with each of the threads being bounced between cores 1, 2, 3, and 4. The other is, the scheduler doesn't pay attention to what the core parking logic is doing. Windows is aggressive about parking cores (putting them in a low-power state), but when the scheduler blindly assigns a thread to a parked core, the core has to be powered up again, and this is slow.
Microsoft tried to fix this in Windows 7. The original fix was withdrawn, but it looks like they recently (January 12) got it right. It is now up again as KB2645594. It also requires KB2646060 (selectively disable core parking).
These fixes were already made in Windows 8. Microsoft had to backport them to Windows 7. The performance improvement is said to be 10% on loads that are "lightly" threaded (that is, up to 4 CPU-bound threads). This performance improvement can be seen in current builds of Windows 8, so it's definitely there.
Bulldozer is still a disappointment, if not such a waste of good sand as the original Phenoms were ;)