BF16 GPU TFLOPS Viewer
Toggle categories to hide/show entries. When calculating the 'usable' TFLOPs of a particular card, these are what we call non-sparse TFLOPs. Below contains data gathered for the most commonly used CUDA cards and their BF16 with FP32 accum (what we use in PyTorch) values
RTX 6000 Blackwell Max-Q | 503.8 |