Paper Review 7: FlashAttention - Fast and Memory-Efficient Exact Attention with IO-Awareness
IO-aware exact attention algorithm that uses tiling to reduce the number of memory reads/writes
By Beksultan Sagyndyk
VIDEO
[Read More]