# attention-mechanism

1 tools tagged

Showing 1 of 1 tools

FlashAttention

Fast memory-efficient GPU attention kernels

FlashAttention is a fast and memory-efficient exact attention implementation that reduces GPU memory usage from quadratic to linear in sequence length. Created by Tri Dao, it achieves 3-4x speedups over baseline implementations through IO-aware tiling that minimizes HBM reads and writes. Versions include FlashAttention-2 with improved parallelism, FlashAttention-3 optimized for Hopper H100 GPUs, and FlashAttention-4 targeting Hopper and Blackwell architectures.

open-sourceOpen Source