Nvidia’s announcement of their new Ampere based RTX 3000 series of cards couldn’t have gone better for them. They ramped up the hype leading up to it for a month and even with countless leaks, they still managed to surprise everyone with both their performance estimates as well as the pricing. The combination almost instantly tanked the used card market as well, given estimates that put their RTX 3070 right with or above the RTX 2080 Ti which has been the top dog for gaming for two years now. Then you have the RTX 3080 and the RTX 3090 which show a big performance jump over the previous generation and for once pricing didn’t go way up. The RTX 3080 is coming out first and launches tomorrow, but before store availability, we get to take a look at the RTX 3080 Founders Edition and see why it is such a special card and put it through our test suite to see how it performs.
Product Name: Nvidia RTX 3080 Founders Edition
Review Sample Provided by: Nvidia
Written by: Wes Compton
Amazon Affiliate Link: HERE
Ampere
With Nvidia’s announcement, a LOT of information on Ampere has already been made available so there isn’t a huge need to deep dive things at this point. But for those of you who haven’t been paying attention I did want to do a short rundown on what Ampere is, a few of the software things Nvidia has announced, and where the RTX 3080 fits into the lineup as well as how it compares to past cards.
To start things off, Ampere is the name of Nvidia’s architecture used in the new 3000 series of cards and Ampere is the successor to Turing which was used in the 2000 series of cards. This isn’t just a die shrink, although it is built using a custom Samsung 8 nm process (for the GA102 at least) where Turing was built by TSMC at 12 nm. This is the second generation of RTX and the third generation of the Tensor core (it was used in Volta which didn’t have a gaming launch). With the new architecture, Nvidia has their new shaders getting 30 TFLOPS vs the first generation of RTX cards which were 11 shader TFLOPS. The ray-tracing core has similar increases by jumping up to 58 ray tracing TFLOPS vs 34 TFLOPS before. Then the tensor core which has even more of a performance jump, going from 89 TFLOPS to 238 TFLOPS.
With these, Nvidia was excited to push just how big of a jump that the new cards are over the previous generations in both performance and pricing which in their presentation they used a nice price to performance graph to show using launch prices. The RTX 3080 that I will be looking at is priced right with the RTX 2080 SUPER and the GTX 1080 Ti but is significantly higher even anything from past generations including the RTX 2080 Ti.
So the diagram below is a break down of a single Ampere SM which each of the new Ampere cards has multiple. You can see each SM has a new 2nd gen ray tracing core that takes up a lot of space on the die. Then everything else is split up into four partitions or processing blocks. These are broken up even farther now with one new 3rd gen Tensor core, one floating-point 32 datapath and one float point (FP32 or integer 32 datapath. This is a big change from Turing which had one data path for floating-point and then another for integers. Now floating point 32 can be handled at double the rate, which is what most 3D rendering uses. This is where they see that big jump in floating-point TFLOPS. A big improvement in ray tracing performance is seen now with each SM being able to process ray tracing and graphics/compute workloads to be able to work at the same time. The second picture is the full diagram of the RTX 3080 which has six GPCs that have 12 or 10 SMs inside.
So for specifications, I have included the RTX 3080 alongside of a few different Nvidia cards for comparison. I have the RTX 2080 Ti and the RTX 2080 SUPER from the last generation and I have also included the GTX 1080 Ti as well because I believe that a lot of die hard GTX 1080 Ti owners will be looking closely at the RTX 3080 to see how they compare. Running through them you can see that they all have the same GPC layout of 6 but the SM counts differ. The 2080 Ti is similar to the 3080 with its 68 SMs but just look at the jump in overall CUDA cores which has the RTX 3080 at 8704 which is double the RTX 2080 Ti. The number of tensor cores is down significantly from the RTX 2080 Ti and even the RTX 2080 SUPER so it will be interesting to see how the new improved tensor cores perform. Clock speeds for the RTX 3080 are higher than the 2080 TI with the boost clock running at 1710 MHz but the RTX 2080 SUPER was higher at 1815 MHz. The smaller manufacturing process has the overall transistor count WAY up as well with the RTX 3080 at 28.3 Billion with the RTX 2080 Ti at 18.6 Billion and 13.6 Billion for the RTX 2080 SUPER. But the overall die size is smaller than the RTX 2080 Ti.
On the memory side of things, the big change is the move from GDDR6 on the Turing cards to GDDR6X which uses less power and runs at a higher clock speed. Nvidia has the 3080 running at 19 Gbps compared to the 15.5 Gbps of the 2080 SUPER and 14 of the 2080 Ti. The 1080 Ti is looking especially slow with its GDDR5X running at 11 Gbps. The 320-bit memory interface is a little less than both of the Tis and more than the RTX 2080 SUPER. The memory capacity is also not a huge jump as well with the 2080Ti still having more as well as the 1080 Ti but 10GB is more than the 2080 SUPER had. Even without going up to a 352-bit memory interface the memory bandwidth with the faster memory Is up there at 760 GB/sec.
Specifications |
GeForce GTX 1080 Ti Founders Editon |
GeForce RTX 2080 Super Founders Edition |
GeForce RTX 2080 Ti Founders Edition |
GeForce RTX 3080 10 GB Founders Edition |
GPU Codename |
GP102 |
TU104 |
TU102 |
GA102 |
GPU Architecture |
Pascal |
Turing |
Turing |
Ampere |
GPCs |
6 |
6 |
6 |
6 |
TPCs |
28 |
24 |
36 |
34 |
SMs |
28 |
48 |
68 |
68 |
CUDA Cores / SM |
128 |
64 |
64 |
128 |
CUDA Cores / GPU |
3584 |
3072 |
4352 |
8704 |
Tensor Cores / SM |
N/A |
8 (2nd Gen) |
8 (2nd Gen) |
4 (3rd Gen) |
Tensor Cores / GPU |
N/A |
384 (2nd Gen) |
544 (2nd Gen) |
272 (3rd Gen) |
RT Cores |
N/A |
48 (1st Gen) |
68 (1st Gen) |
68 (2nd Gen) |
GPU Boost Clock (MHz) |
1582 |
1815 |
1545 |
1710 |
Frame Buffer Memory Size and Type |
11264 MB GDDR5X |
8192 MB GDDR6 |
11264 MB GDDR6 |
10240 MB GDDR6X |
Memory Interface |
352-bit |
256-bit |
352-bit |
320-bit |
Memory Clock (Data Rate) |
11 Gbps |
15.5 Gbps |
14 Gbps |
19 Gbps |
Memory Bandwidth |
484 GB/sec |
496 GB/sec |
616 GB/sec |
760 GB/sec |
ROPs |
88 |
64 |
88 |
96 |
Texture Units |
224 |
192 |
272 |
272 |
L2 Cache Size |
2816 KB |
4096 KB |
5632 KB |
5120 KB |
Register File Size |
7168 KB |
12288 KB |
17408 KB |
17408 KB |
TGP (Total Graphics Power) |
250W |
250 W |
260 W |
320W |
Transistor Count |
12 Billion |
13.6 Billion |
18.6 Billion |
28.3 Billion |
Die Size |
471mm2 |
545 mm2 |
754 mm² |
628.4 mm2 |
Manufacturing Process |
TSMC 16nm |
TSMC 12 nm FFN (FinFET NVIDIA) |
TSMC 12 nm FFN (FinFET NVIDIA) |
Samsung 8 nm 8N NVIDIA Custom Process |
Beyond the architecture changes and the new cards that Nvidia announced. They also paired those up with a few software and driver level changes that they were excited to talk about. The first was what they call Nvidia Reflex which Nvidia isn’t the first to focus on. Basically, in esport titles especially latency is important. Which is part of the reason for the big push for high refresh rate monitors. Nvidia has combined their G-Sync technology along with high refresh rate monitors as well as a few other software tweaks to focus on cutting down latency in these titles. They showed off the amount of difference they see by comparing with the average system latency. They have opened up an API for games to be able to measure rendering latency and to lower the render queue. The end result is a big improvement in specific games where latency is more important than in others.
The next addition was Nvidia Broadcast and this is one that plays a role in my current setup. I have been using Nvidias RTX Voice to cut out background noise when using my desktop microphone and it does an amazing job, even though it does have a few big bugs. Well, Nvidia has expanded on that idea to tie in AI-powered broadcasting help all into one program. The microphone and speaker audio noise removal is still there. But they now also have added video support. A lot of streamers use software to cut out backgrounds but the idea here is to use the built-in Tensor cores in the RTX video cards to take the load off your CPU. In addition, because it is AI they can do more detailed predictive effects. It can also be used to auto frame you as well for when you move around. Nvidia is trying to help streamers and now with COVID is also can help with people working or learning from home.
RTX IO is one of those announcements that real geeks were foaming at the mouth for, but the average gamer or user isn’t going to know what is so exciting about it. Ironically this is similar to one of the big features for the upcoming PS5. RTX IO is basically leveraging the power of your GPU to help speed up file transfers and loading. This again cuts more work away from your CPU and it also allows for a more direct path for data to go right into the GPU memory rather than through the CPU then back out to the GPU. Maps can load faster and they are using a lossless decompression that can also lower game sizes in general. Right now RTX IO is capable of decompressing even faster than the limits of Gen4 SSDs. This isn’t coming just yet, Microsoft will be bringing DirectStorage out next year and at that time RTX IO will be available.
Lastly, before diving into the RTX 3080 itself I wanted to include a copy of our GPUz which shows which firmware that we are running at and confirms that nothing weird was going on for card spec. This is especially important when it comes to pre-launch samples like this. You can also see which driver I tested on, which was the pre-launch press driver.