Threads and Concurrency
Threads and Concurrency
Lightweight Execution Units
A thread is an independent execution path within a process. Multiple threads share the same address space but have their own stack and CPU registers. Threads are how modern programs do multiple things simultaneously.
Real-world analogy: Think of a restaurant kitchen. - The restaurant is the process (owns all the equipment, ingredients, space) - Each cook is a thread (independent worker sharing the same kitchen) - They share the stove, refrigerator, and workspace (shared memory) - Each cook has their own hands and mind (own stack and registers) - They must coordinate to avoid collisions (synchronization)
One cook (single-threaded): efficient but slow for complex orders. Multiple cooks (multi-threaded): faster, but they must communicate to avoid two people using the same pot at the same time.
Part 1 — Threads vs Processes
Process Thread
────────────────────────────────────────────────────
Memory Separate Shared (within process)
Creation cost High (~1ms) Low (~10-50µs)
Switching cost High (change page Low (same page tables)
tables, TLB flush)
Communication IPC (pipes, sockets) Direct (shared memory)
Isolation Strong Weak (one thread crash
can kill all)
Resource usage High (own page Low (share everything)
table, file table)
What Threads Share vs Own
SHARED (all threads in process):
✓ Virtual address space (code, heap, globals)
✓ Open file descriptors
✓ Network sockets
✓ Signal handlers (disposition)
✓ Process ID (PID)
✓ Current working directory
✓ User and group IDs
PRIVATE (each thread has its own):
✗ Stack (local variables, function call frames)
✗ Registers (PC, SP, general purpose)
✗ Thread ID (TID)
✗ errno variable
✗ Signal mask (which signals blocked)
✗ Thread-local storage (TLS)
Part 2 — POSIX Threads (pthreads)
The standard threading API on Linux/Unix.
Creating and Joining Threads
#include <pthread.h>
void* threadFunction(void* arg) {
int id = *(int*)arg;
printf("Thread %d running\n", id);
// Do work
long sum = 0;
for (int i = 0; i < 1000000; i++) sum += i;
// Return a value (as void*)
long* result = malloc(sizeof(long));
*result = sum;
return result;
}
int main() {
pthread_t threads[4];
int ids[4] = {0, 1, 2, 3};
// Create 4 threads
for (int i = 0; i < 4; i++) {
int rc = pthread_create(&threads[i], NULL, threadFunction, &ids[i]);
if (rc != 0) { perror("create failed"); exit(1); }
}
// Wait for all to finish, collect return values
for (int i = 0; i < 4; i++) {
long* result;
pthread_join(threads[i], (void**)&result);
printf("Thread %d sum = %ld\n", i, *result);
free(result);
}
return 0;
}
Thread Stack
Each thread has its own stack:
Thread 1 stack: Thread 2 stack: Thread 3 stack:
┌────────────┐ ┌────────────┐ ┌────────────┐
│ main() │ │ worker() │ │ worker() │
│ local vars│ │ local vars │ │ local vars │
│ saved regs│ │ saved regs │ │ saved regs │
└────────────┘ └────────────┘ └────────────┘
Shared heap:
┌──────────────────────────────────────────────────────┐
│ malloc'd objects accessible by all threads │
└──────────────────────────────────────────────────────┘
Default thread stack size: 8MB on Linux
Can configure with pthread_attr_setstacksize()
Stack overflow = crash of entire process (guard pages)
Part 3 — The Race Condition Problem
When multiple threads access shared data and at least one modifies it, results depend on execution order — race condition.
// Seemingly innocent code: two threads increment a counter
int counter = 0;
void* increment(void* arg) {
for (int i = 0; i < 100000; i++) {
counter++; // NOT atomic — this is 3 operations!
}
return NULL;
}
int main() {
pthread_t t1, t2;
pthread_create(&t1, NULL, increment, NULL);
pthread_create(&t2, NULL, increment, NULL);
pthread_join(t1, NULL);
pthread_join(t2, NULL);
printf("counter = %d\n", counter); // Expected: 200000
// Actual: anywhere from 100001 to 200000!
}
Why counter++ is not atomic:
counter++ compiles to 3 instructions:
LOAD: R1 = counter (read from memory)
ADD: R1 = R1 + 1 (add in register)
STORE: counter = R1 (write back)
Race condition:
Thread 1: Thread 2: counter
LOAD R1 = 5 ... 5
ADD R1 = 6 ... 5
... LOAD R2 = 5 5
... ADD R2 = 6 5
STORE counter=6 ... 6
... STORE counter=6 6 ← LOST INCREMENT!
Both threads loaded the same value (5), both incremented to 6.
Net result: only 1 increment happened despite 2 operations.
Part 4 — Mutex (Mutual Exclusion)
A mutex ensures only one thread executes a critical section at a time.
Real-world analogy: A single-stall bathroom. There’s one lock on the door. When you enter, you lock it. Others must wait outside. When you leave, you unlock it and the next person enters. Mutual exclusion prevents two people from using it simultaneously.
pthread_mutex_t lock = PTHREAD_MUTEX_INITIALIZER;
int counter = 0;
void* increment(void* arg) {
for (int i = 0; i < 100000; i++) {
pthread_mutex_lock(&lock);
counter++; // critical section
pthread_mutex_unlock(&lock);
}
return NULL;
}
// Result: exactly 200000 every time ✓
Mutex Internals
typedef struct {
int locked; // 0 = free, 1 = locked
thread_t owner; // which thread holds it
queue_t waiters; // threads waiting
} mutex_t;
lock():
if (mutex.locked == 0):
mutex.locked = 1
mutex.owner = current_thread
return // lock acquired
else:
add current_thread to mutex.waiters
block current_thread (put to sleep)
// When woken up: retry
unlock():
assert(mutex.owner == current_thread)
if (mutex.waiters is not empty):
wake up one waiter
transfer ownership to them
else:
mutex.locked = 0
Critical: lock() and unlock() must check-and-set atomically!
Hardware provides: LOCK XCHG, CMPXCHG, LL/SC instructions
Deadlock
Real-world analogy: Two people each holding one half of a door key. Person A has the top half and waits for Person B to give the bottom half. Person B has the bottom half and waits for Person A to give the top half. Neither moves. Forever.
// Classic deadlock: two threads, two locks, wrong order
pthread_mutex_t lock1 = PTHREAD_MUTEX_INITIALIZER;
pthread_mutex_t lock2 = PTHREAD_MUTEX_INITIALIZER;
void* thread1(void*) {
pthread_mutex_lock(&lock1);
sleep(1); // pause (forces interleaving)
pthread_mutex_lock(&lock2); // WAIT: thread2 holds lock2
// ... do work ...
pthread_mutex_unlock(&lock2);
pthread_mutex_unlock(&lock1);
return NULL;
}
void* thread2(void*) {
pthread_mutex_lock(&lock2);
sleep(1);
pthread_mutex_lock(&lock1); // WAIT: thread1 holds lock1
// ... do work ...
pthread_mutex_unlock(&lock1);
pthread_mutex_unlock(&lock2);
return NULL;
}
// Result: both threads wait forever = DEADLOCK!
Deadlock conditions (all four must hold):
1. Mutual exclusion: resources can't be shared
2. Hold and wait: thread holds one resource while waiting for another
3. No preemption: resources can't be forcibly taken
4. Circular wait: A waits for B waits for C waits for A
Break ANY one condition → no deadlock.
Solutions:
1. Lock ordering: always acquire locks in same global order
Thread 1 and 2 both: lock1 first, then lock2
→ no circular wait
2. trylock with backoff:
if (!pthread_mutex_trylock(&lock2)) {
pthread_mutex_unlock(&lock1); // release what we have
backoff(); // sleep briefly
goto retry; // try again
}
3. Single lock for multiple resources (coarser granularity)
4. Banker's algorithm: check if allocation is safe before granting
(used in databases, resource managers)
Part 5 — Condition Variables
Mutex ensures mutual exclusion. Condition variables let threads wait for a condition to become true.
Real-world analogy: A package notification system. You ordered something (need a condition — package arrives). Instead of repeatedly calling the delivery company (busy-waiting), you register your number and they call you when it arrives (condition variable). You can do other things in the meantime.
pthread_mutex_t mtx = PTHREAD_MUTEX_INITIALIZER;
pthread_cond_t cond = PTHREAD_COND_INITIALIZER;
int data_ready = 0;
int data = 0;
// Producer thread: produces data
void* producer(void* arg) {
sleep(1); // simulate work
pthread_mutex_lock(&mtx);
data = 42;
data_ready = 1;
pthread_cond_signal(&cond); // wake ONE waiting thread
// pthread_cond_broadcast(&cond); // wake ALL waiting threads
pthread_mutex_unlock(&mtx);
return NULL;
}
// Consumer thread: waits for data
void* consumer(void* arg) {
pthread_mutex_lock(&mtx);
// ALWAYS use while, not if! (spurious wakeups)
while (!data_ready) {
pthread_cond_wait(&cond, &mtx);
// cond_wait ATOMICALLY:
// 1. Releases mtx
// 2. Puts thread to sleep
// 3. On wakeup: reacquires mtx before returning
}
printf("Got data: %d\n", data);
pthread_mutex_unlock(&mtx);
return NULL;
}
Why while not if:
Spurious wakeups: pthread_cond_wait can return even if no signal sent
(OS implementation artifact, POSIX allows it)
Multiple waiters: if broadcast wakes 3 threads but only 1 item available,
the other 2 must go back to sleep
Always pattern:
while (!condition) {
pthread_cond_wait(&cond, &mutex);
}
Producer-Consumer with Bounded Buffer
#define BUFFER_SIZE 10
int buffer[BUFFER_SIZE];
int in = 0, out = 0, count = 0;
pthread_mutex_t mtx = PTHREAD_MUTEX_INITIALIZER;
pthread_cond_t not_full = PTHREAD_COND_INITIALIZER;
pthread_cond_t not_empty = PTHREAD_COND_INITIALIZER;
void produce(int item) {
pthread_mutex_lock(&mtx);
while (count == BUFFER_SIZE) // buffer full?
pthread_cond_wait(¬_full, &mtx); // wait for space
buffer[in] = item;
in = (in + 1) % BUFFER_SIZE;
count++;
pthread_cond_signal(¬_empty); // wake a consumer
pthread_mutex_unlock(&mtx);
}
int consume() {
pthread_mutex_lock(&mtx);
while (count == 0) // buffer empty?
pthread_cond_wait(¬_empty, &mtx); // wait for item
int item = buffer[out];
out = (out + 1) % BUFFER_SIZE;
count--;
pthread_cond_signal(¬_full); // wake a producer
pthread_mutex_unlock(&mtx);
return item;
}
Part 6 — Semaphores
A semaphore is an integer counter with two atomic operations: - wait (P, down): decrement; if < 0, block - signal (V, up): increment; if threads waiting, wake one
Real-world analogy: Parking lot with a counter. Counter shows spaces available. When you enter, counter decreases. If 0, you wait. When someone leaves, counter increases and waiting car enters.
#include <semaphore.h>
sem_t semaphore;
sem_init(&semaphore, 0, 1); // initial value = 1 (binary semaphore = mutex)
// Thread safe increment using semaphore as mutex:
sem_wait(&semaphore); // P: decrement, block if 0
counter++; // critical section
sem_post(&semaphore); // V: increment, wake waiter
Counting semaphore — limit concurrent access:
sem_t db_connections;
sem_init(&db_connections, 0, 5); // max 5 concurrent connections
void queryDatabase() {
sem_wait(&db_connections); // acquire connection slot
// ... use database ...
sem_post(&db_connections); // release slot
}
// At most 5 threads can be in queryDatabase() simultaneously
Semaphore vs Mutex:
Mutex:
Binary (locked/unlocked)
Owner-based: only locking thread can unlock
Used for: mutual exclusion
Semaphore:
Integer counter
No owner: any thread can signal
Used for: resource counting, signaling between threads
Signaling pattern (one-shot):
sem_t ready;
sem_init(&ready, 0, 0); // initially 0 (locked)
// Thread 1: waits for Thread 2 to signal
sem_wait(&ready);
use_data();
// Thread 2: signals when data is ready
prepare_data();
sem_post(&ready); // Thread 1 can now proceed
Part 7 — Reader-Writer Problem
Multiple readers allowed simultaneously, but writers need exclusive access.
// Reader-Writer lock using pthreads
pthread_rwlock_t rwlock = PTHREAD_RWLOCK_INITIALIZER;
void* reader(void* arg) {
pthread_rwlock_rdlock(&rwlock); // multiple readers allowed
// read shared data...
printf("Reading: %d\n", shared_data);
pthread_rwlock_unlock(&rwlock);
return NULL;
}
void* writer(void* arg) {
pthread_rwlock_wrlock(&rwlock); // exclusive access
// modify shared data...
shared_data++;
pthread_rwlock_unlock(&rwlock);
return NULL;
}
Implementation (manual):
// Variables:
int readers = 0; // count of active readers
pthread_mutex_t rmutex; // protect readers count
pthread_mutex_t write_lock; // writers and readers-writing conflict
// Reader entry:
pthread_mutex_lock(&rmutex);
readers++;
if (readers == 1)
pthread_mutex_lock(&write_lock); // first reader blocks writers
pthread_mutex_unlock(&rmutex);
// ... read ...
// Reader exit:
pthread_mutex_lock(&rmutex);
readers--;
if (readers == 0)
pthread_mutex_unlock(&write_lock); // last reader unblocks writers
pthread_mutex_unlock(&rmutex);
// Writer: just acquires write_lock
pthread_mutex_lock(&write_lock);
// ... write ...
pthread_mutex_unlock(&write_lock);
Part 8 — C++ Threads (C++11 and later)
Modern C++ wraps pthread in cleaner interfaces.
#include <thread>
#include <mutex>
#include <condition_variable>
// Basic thread
std::thread t([]() {
std::cout << "Running in thread\n";
});
t.join();
// Mutex
std::mutex mtx;
int counter = 0;
auto increment = [&]() {
for (int i = 0; i < 100000; i++) {
std::lock_guard<std::mutex> lock(mtx); // RAII: auto-unlock
counter++;
}
};
std::thread t1(increment), t2(increment);
t1.join(); t2.join();
// counter == 200000 ✓
// Unique lock (can unlock manually)
std::unique_lock<std::mutex> lock(mtx);
// ... do something ...
lock.unlock();
// ... do something without lock ...
lock.lock();
// ...
// Condition variable
std::condition_variable cv;
bool ready = false;
// Waiter:
std::unique_lock<std::mutex> lock(mtx);
cv.wait(lock, [&]{ return ready; }); // wait until ready == true
// Notifier:
{
std::lock_guard<std::mutex> lock(mtx);
ready = true;
}
cv.notify_one();
// Atomic variables (no mutex needed for simple ops)
#include <atomic>
std::atomic<int> atomic_counter(0);
atomic_counter++; // atomic increment, no race condition
atomic_counter.fetch_add(1); // same
Practice Problems
Two threads increment a shared counter 100,000 times each. Without synchronization, what range of values is the final counter?
Identify the bug:
pthread_mutex_t m; void* thread1(void*) { pthread_mutex_lock(&m); /* work */ return NULL; } void* thread2(void*) { pthread_mutex_lock(&m); /* work */ pthread_mutex_unlock(&m); return NULL; }When should you use a condition variable instead of a loop that keeps checking a flag?
Show a minimal deadlock scenario and fix it.
What is a spurious wakeup? Why must condition variable waits use
whileinstead ofif?
Answers
Problem 1:
Minimum: 100,001
Thread 1 runs entirely, counter = 100,000
Thread 2 runs, but on the very last increment:
Thread 2 loads counter=100,000 into register
Thread 1 is done
Thread 2 increments to 100,001, stores
Worst case: almost all of thread 2's increments lost
Maximum: 200,000
No race conditions happen to occur
(Unlikely but possible)
Most common: somewhere between 100,001 and 200,000
The exact value depends on thread interleaving = non-deterministic
Problem 2:
Thread 1 acquires the mutex but NEVER releases it (no unlock call).
Thread 2 will wait forever trying to acquire the mutex.
This is a LIVELOCK / DEADLOCK situation.
Fix: always pair lock with unlock, use RAII in C++:
void* thread1(void*) {
pthread_mutex_lock(&m);
/* work */
pthread_mutex_unlock(&m); // ADD THIS
return NULL;
}
Problem 3:
Use condition variable when:
- Condition might not become true for a long time
- Busy-waiting (spinning) would waste CPU
Example: consumer waiting for items in a queue.
Bad (busy-wait):
while (queue.empty()) {} // spins, burns 100% CPU
Good (condition variable):
while (queue.empty()) cond_wait(&cond, &mutex); // sleeps
Use busy-wait only when:
- Expected wait is very short (< 1µs)
- Running on dedicated core (real-time systems)
- In kernel spinlocks (can't sleep in interrupt context)
Problem 4:
// Deadlock: two locks acquired in different order
mutex A, B;
Thread 1: lock(A), lock(B) // A first
Thread 2: lock(B), lock(A) // B first → circular wait!
// Fix: establish global lock ordering
Thread 1: lock(A), lock(B)
Thread 2: lock(A), lock(B) // same order → no circular wait
// Or: use trylock with backoff
Thread 2:
lock(A);
if (!trylock(B)) {
unlock(A); // release, try again later
sleep(random_backoff);
goto retry;
}
Problem 5:
Spurious wakeup: condition variable wakes up a waiting thread
even though no signal/broadcast was sent.
Allowed by POSIX specification (implementation artifact).
Can happen due to:
- Signal delivery to the process
- OS implementation choices
- Multi-processor synchronization overhead
Why while instead of if:
if (!ready) { cond_wait(...); }
// After wakeup: goes straight to use_data()
// But might have been spurious wakeup! ready is still false!
while (!ready) { cond_wait(...); }
// After spurious wakeup: checks condition again
// Not ready → goes back to sleep
// Only proceeds when ready is actually true ✓
References
- Arpaci-Dusseau — OSTEP — Chapters 26-32 (Concurrency)
- Butenhof — Programming with POSIX Threads
- Herlihy & Shavit — The Art of Multiprocessor Programming
- cppreference.com — Thread support library
- man pages:
man 3 pthread_create,man 3 pthread_mutex_lock