6.5080 Multicore Programming ★

Recognizing that locks have fundamental limits (blocking, priority inversion, and convoying), 6.5080 introduces non-blocking synchronization. Students implement a lock-free stack using operations. They learn the ABA problem (a pointer changes from A to B and back to A, fooling the CAS) and solve it with tagged pointers or double-word CAS.

No essay on 6.5080 would be complete without confronting : ( S = 1 / ( (1-P) + P/N ) ), where ( P ) is the parallel fraction and ( N ) the core count. Students run experiments on a 64-core machine, observing that doubling cores rarely doubles speed. The culprit is serial bottlenecks, synchronization overhead, and memory bandwidth saturation. The course teaches profiling tools: perf for cache misses, Intel VTune for lock contention, and TSan (ThreadSanitizer) for data races. The ultimate lesson is humbling: the best parallel program may be one that minimizes sharing, not one that maximizes thread count. 6.5080 multicore programming

Before writing a single parallel loop, 6.5080 insists on understanding the hardware. Multicore processors do not provide a “perfectly simultaneous” view of memory. Instead, each core possesses private L1 and L2 caches, a shared L3 cache, and the main DRAM. This hierarchy introduces the problem of . The course covers the MESI (Modified, Exclusive, Shared, Invalid) protocol extensively. A student learns why two threads incrementing the same shared variable from different cores can miss each other’s updates, leading to lost counts. No essay on 6

Mastering Concurrency: The Principles and Practices of 6.5080 Multicore Programming The course teaches profiling tools: perf for cache