Like the GeForce2 GTS before it, the GeForce3 architecture utilizes four pixel pipelines, each with two texture units. The two chips are divergent, however, in how these pixel pipelines can be controlled. For instance, on the GeForce2 GTS, a quad-textured pixel would have to be processed twice (once for the first two textures, and again for the remaining two). On the other hand, the GeForce3 can dedicate two pixel pipelines to a quad-textured pixel. Granted, only two of these quad-textured pixels can be processed in a single clock cycle, but it provides developers with more flexibility for effects without as much of a performance hit.
The core itself will be clocked at 200MHz - remember, 57 million transistors is over twice that of the GeForce2 GTS, and heat is undoubtedly a major consideration. A quick calculation reminds us that the GeForce3 will have the same theoretical pixel and texel fillrates as the GeForce2 GTS.
200MHz * 4 pixel pipelines = 800Mpixels/s
200MHz * 4 pixel pipelines * 2 texture units per pipeline = 1,600Mtexels/s
NVIDIA was quick to point out that the "raw power" philosophy they had followed in the past is no longer feasible, as memory bandwidth can easily choke the performance of even the fastest accelerators. Instead, they are shooting for a more refined product, like ATI did with the RADEON GPU. Because of this, we are not expecting the GeForce3 to significantly outperform a GeForce2 Ultra board in today's applications.
All of the retail GeForce3 boards should be shipping with 64MB of DDR SDRAM, so wave goodbye to those paltry 32MB performance boards of last year. Clocked at 230MHz (effectively 460MHz), the GeForce3 will deliver a theoretical memory bandwidth of 7.36GB/s, which can be broken down accordingly:
460MHz * (128 bit bus / 8 = 16 bytes) = 7360MB/s
You'll notice that this figure does not take into account latencies and efficiency losses for the DDR memory. After these factors are taken into consideration the number will be a bit lower, however, we are hoping that NVIDIA's new Lightspeed Memory Architecture will be able to compensate.
It is no secret that today's most quickest DRAM modules are not fast enough to allow the latest generation of graphics products to shine to their full potential in high-resolution situations. For this reason, ATI developed the HyperZ technology, which eliminated much of the bandwidth that was wasted in the process of fetching data or clearing the Z buffer. A lossless compression algorithm was added to further minimize the footprint left by this data, without sacrificing precision.
NVIDIA developed a similar set of bandwidth-maximizing techniques to get the most of the 230MHz DDR modules used on the GeForce3. The first such technique involves what is called a "crossbar memory controller." Most modern memory controllers access data in 256 bit chunks (derived from a 128 bit data path multiplied by two, since DDR transfers on the rising and falling clock of each cycle). A problem arises when the data for small triangles needs to be transferred. If the data amounts to only 64 bits, the traditional memory controller would still have to transfer a 256 bit chunk to relay that data, making it only 25% efficient for that transfer. The GeForce3 changes this by breaking the memory controller into four effective memory controllers, which are each capable of communicating with each other and the GPU. Each controller can access up to 64 bits of data individually, or 256 bits as a whole. We see this as a definite benefit in future content, where larger numbers of small triangles are utilized for added realism. In these situations, the data describing the positions and attributes of these triangles will be transferred in a more efficient manner.