When it comes to us testing particular hardware, we have to make the tests as fair as possible. Of course this means that we can’t test the top end graphics cards at the same settings as a low end card and vice versa, as it would deem the results pointless.
We also thought that it would be better if you knew which tests we ran on which hardware.
What we have devised, is to have categories for the hardware that we test.
Here is a list of the cards and what section they would come under:
Tests are run on the medium settings with no AA/AF using DirectX 9/10
Mainstream
Tests are run on the highest settings with 2 AA and 8 AF using DirectX 10/11
Extreme
Tests are run on the highest settings with full AA/AF using DirectX 10/11
All tests are run at 3 resolutions (1280×1024, 1680×1050, 1920×1200)
Tools used to benchmark:
The following benchmark tests are run using Lavalys Everest:
This benchmark measures the maximum achiveable memory read bandwidth. The code behind this benchmark method is written in Assembly and it is extremely optimized for every popular AMD and Intel processor core variants by utilizing the appropriate x86, MMX, 3DNow!, SSE, SSE2 or SSE4.1 instruction set extension. The benchmark reads a 16 MB sized, 1 MB aligned data buffer from system memory into the CPU. Memory is read in forward direction, continuously without breaks
In order to avoid concurrent threads competing over system memory bandwidth, Memory Read benchmark utilizes only one processor core and one thread.
This benchmark measures the maximum achiveable memory write bandwidth. The code behind this benchmark method is written in Assembly and it is extremely optimized for every popular AMD and Intel processor core variants by utilizing the appropriate x86, MMX, 3DNow!, SSE or SSE2 instruction set extension. The benchmark writes a 16 MB sized, 1 MB aligned data buffer from the CPU into the system memory. Memory is written in forward direction, continuously without breaks.
In order to avoid concurrent threads competing over system memory bandwidth, Memory Write benchmark utilizes only one processor core and one thread.
This benchmark measures the maximum achiveable memory copy speed. The code behind this benchmark method is written in Assembly and it is extremely optimized for every popular AMD and Intel processor core variants by utilizing the appropriate x86, MMX, 3DNow!, SSE, SSE2 or SSE4.1 instruction set extension. The benchmark copies a 8 MB sized, 1 MB aligned data buffer into another 8 MB sized, 1 MB aligned data buffer through the CPU. Memory is copied in forward direction, continuously without breaks.
In order to avoid concurrent threads competing over system memory bandwidth, Memory Copy benchmark utilizes only one processor core and one thread.
This benchmark measures the typical delay when the CPU reads data from system memory. Memory latency time means the penalty measured from the issuing of the read command until the data arrives to the integer registers of the CPU. The code behind this benchmark method is written in Assembly, and uses 1 MB alignment, 1024-byte stride size. Memory is accessed in forward direction.
Memory Latency benchmark test uses only the basic x86 instructions and utilizes only one processor core and one thread.
This simple integer benchmark focuses on the branch prediction capabilities and the misprediction penalties of the CPU. It finds the solutions for the classic “Queens problem” on a 10 by 10 sized chessboard (http://mathworld.wolfram.com/QueensProblem.html.
At the same clock speed theoretically the processor with the shorter pipeline and smaller misprediction penalties will attain higher benchmark scores. For example — with HyperThreading disabled — the Intel Northwood core processors get higher scores than the Intel Prescott core based ones due to the 20-step vs 31-step long pipeline. However, with enabled HyperThreading the picture is controversial, because due to architectural bottlenecks the Northwood core runs out of internal resources and slows down. Similarly, at the same clock speed AMD K8 class processors will be faster than AMD K7 ones due to the improved branch prediction capabilities of the K8 architecture.
CPU Queen test uses integer MMX, SSE2 and SSSE3 optimizations. It consumes less than 1 MB system memory and it is HyperThreading, multi-processor (SMP) and multi-core (CMP) aware.
This integer benchmark peforms different common tasks used during digital photo processing.
It performs the following tasks on a very large RGB image:
| · | Fill |
| · | Flip |
| · | Rotate90R (rotate 90 degrees CW) |
| · | Rotate90L (rotate 90 degrees CCW) |
| · | Random (fill the image with random coloured pixels) |
| · | RGB2BW (colour to black & white conversion) |
| · | Difference |
| · | Crop [EVEREST Version 2.10 and later] |
This benchmark stresses the integer arithmetic and multiplication execution units of the CPU and also the memory subsystem. Due to the fact that this test performs high memory read/write traffic, it cannot effectively scale in situations where more than 2 processing threads used. For example, on a 8-way Pentium III Xeon system the 8 processing threads will be “fighting” over the memory, creating a serious bottleneck that would lead to as low scores as a 2-way or 4-way similar processor based system could achieve.
CPU PhotoWorxx test uses only the basic x86 instructions, and it is HyperThreading, multi-processor (SMP) and multi-core (CMP) aware.
This integer benchmark measures combined CPU and memory subsystem performance through the public ZLib compression library Version 1.2.3 (http://www.zlib.net).
CPU ZLib test uses only the basic x86 instructions, and it is HyperThreading, multi-processor (SMP) and multi-core (CMP) aware.
This integer benchmark measures CPU performance using AES (a.k.a. Rijndael) data encryption. It utilizes Vincent Rijmen, Antoon Bosselaers and Paulo Barreto’s public domain C code in ECB mode (http://www.esat.kuleuven.ac.be/~rijmen/rijndael/rijndael-fst-3.0.zip).
CPU AES test uses only the basic x86 instructions, and it’s hardware accelerated on VIA PadLock Security Engine capable VIA C3 and VIA C7 processors. The test consumes 48 MB memory, and it is HyperThreading, multi-processor (SMP) and multi-core (CMP) aware.
This benchmark measures the single precision (also known as 32-bit) floating-point performance through the computation of several frames of the popular “Julia” fractal. The code behind this benchmark method is written in Assembly, and it is extremely optimized for every popular AMD and Intel processor core variants by utilizing the appropriate x87, 3DNow!, 3DNow!+ or SSE instruction set extension.
FPU Julia test consumes less than 1 MB system memory, and it is HyperThreading, multi-processor (SMP) and multi-core (CMP) aware.
This benchmark measures the double precision (also known as 64-bit) floating-point performance through the computation of several frames of the popular “Mandelbrot” fractal. The code behind this benchmark method is written in Assembly, and it is extremely optimized for every popular AMD and Intel processor core variants by utilizing the appropriate x87 or SSE2 instruction set extension.
FPU Mandel test consumes less than 1 MB system memory, and it is HyperThreading, multi-processor (SMP) and multi-core (CMP) aware.
This benchmark measures the extended precision (also known as 80-bit) floating-point performance through the computation of a single frame of a modified “Julia” fractal. The code behind this benchmark method is written in Assembly, and it is extremely optimized for every popular AMD and Intel processor core variants by utilizing trigonometric and exponential x87 instructions.
FPU SinJulia test consumes less than 1 MB system memory, and it is HyperThreading, multi-processor (SMP) and multi-core (CMP) aware.