Home

News

Forums

Hardware

CPUs

Mainboards

Video

Guides

CPU Prices

Memory Prices

Shop



Sharky Extreme : October 11, 2008





Regular Sections

- Weekly CPU Prices
- Weekly Memory Prices
- PC Buyer's Guides
- Private Eye
- Forums Spotlight
- The Rear View
- The Silicon Money Pit
- SharkyForums
- Site Info
- Links
- About Us

So far, we have looked at timings primarily from the viewpoint of the memory, however things will look much different from the system perspective. The real issue, when it comes to performance, is preventing the CPU from 'stalling' or waiting for data. For the system to be running at peak efficiency, the processor must always be busy. As will soon be seen, this is not as easy as it sounds.

The maximum memory throughput in a system is determined by multiplying the bus speed by the bus width. This means a system with a 64-bit bus running at 100MHz would have a theoretical maximum throughput of 800MB/sec (64 bits divided by 8 bits per byte is 8 Bytes. 8 bytes multiplied by 100 million cycles per second is 800MB/sec), however it is virtually impossible to achieve this kind of throughput.

One reason it is not possible to actually reach the theoretical maximum is because of the initial latency of the memory. As an example of this, consider an SDRAM module that has 5-1-1-1 timings on a 100MHz bus that is 64 bits wide. Assuming that no other factors slow down the data transfer, this means that in 80ns exactly 32 bytes of data have been transferred. The actual throughput in this case is only 400MB/sec (1 billion / 80ns times 32 bytes). In other words, due to the limitations of the SDRAM itself only half of the bandwidth can be utilized.

Of course, the burst length may be longer than 4 cycles, which would provide better utilization of the bandwidth, but the example also did not take into account certain propagation delays. The chipset itself needs to perform a few operations, which also takes up a few cycles. As a result, the typical SDRAM timings from system perspective is more likely 7-1-1-1 or even 8-1-1-1, thereby causing the effective bandwidth utilization to drop to as low as 290MB/sec.

The first real world example of this was the SDRAM timings that could be achieved with the i430TX chipset vs. the i430VX chipset. Apparently, the TX chipset reduced the overhead by one cycle, resulting in about a 10% performance improvement. Since cache sizes were relatively small at the time (16K L1 and 256K L2), this resulted in measurable overall differences in system performance.

There is another type of latency, known as 'turn around latency', which occurs when the memory has to transition from reading to writing, or vice-versa. While this is not as big a factor as initial latency, it does lower the effective bandwidth. In fact, any cycle that is not spent transferring data can be thought of as a wasted cycle.



Copyright © 2001 INT Media Group, Incorporated. All Rights Reserved. Legal Notices | Licensing , Reprints , & Permissions | Privacy Policy