https://stevengong.co/notes/Streaming-Multiprocessor A Streaming Multiprocessor (SM) is a fundamental component of NVIDIA GPUs, consisting of multiple Stream Processors (CUDA Core) responsible for executing instructions in parallel. From Stephen Jones, I learned that each SM can managed 64 warps, so a total of 2048 threads. However, it really processes 4 warps at a time, which