CPU Scheduling
CPU Scheduling
The Art of Deciding Who Runs Next
When multiple processes are ready to run and there’s only one CPU, the OS must decide: who runs next? For how long? This is CPU scheduling — one of the most studied problems in OS design.
Real-world analogy: An emergency room with multiple patients waiting and limited doctors. The triage nurse (scheduler) decides who gets treated first: - Life-threatening cases first (priority scheduling) - First come, first served for similar cases (FIFO) - Short procedures done quickly to keep the queue moving (SJF) - No patient waits forever no matter how minor (aging/fairness)
The optimal strategy depends on the goals: minimize wait time? maximize throughput? ensure fairness? These goals often conflict.
Part 1 — Scheduling Concepts
What the Scheduler Controls
Ready Queue: list of processes waiting to use CPU
FIFO order: [P1] [P2] [P3] [P4] [P5]
head tail
Scheduler picks from ready queue based on:
- Priority
- Time waited
- Estimated remaining execution time
- Or complex combinations of all of the above
Then dispatcher:
1. Context switch to selected process
2. Switch to user mode
3. Jump to correct location in user program
Preemptive vs Non-Preemptive
Non-preemptive (cooperative):
Once a process gets the CPU, it runs until:
- It voluntarily gives up CPU (yield, sleep, I/O wait)
- It terminates
Pros: simple, no race conditions from scheduler
Cons: one bad process can freeze the system
Examples: early Mac OS, Windows 3.1, some embedded RTOS
Preemptive:
OS can forcibly remove CPU from a process at any time
Typically via timer interrupt
Pros: fair, responsive, handles infinite loops
Cons: need synchronization (process can be interrupted anytime)
Examples: Linux, Windows NT+, macOS, all modern systems
Scheduling Metrics
Turnaround Time:
Time from job submission to job completion
Turnaround = Completion Time - Arrival Time
Goal: MINIMIZE
Response Time:
Time from job submission to first response
Response = First Run Time - Arrival Time
Critical for interactive systems (keyboard feels snappy)
Throughput:
Jobs completed per unit time
Goal: MAXIMIZE
CPU Utilization:
Fraction of time CPU is doing useful work (not idle)
Goal: MAXIMIZE (typically 40-90% in practice)
Fairness:
Each process gets its fair share of CPU
Goal: no starvation (process never gets CPU)
Waiting Time:
Total time spent in ready queue
Goal: MINIMIZE
Part 2 — Scheduling Algorithms
FIFO (First In, First Out) / FCFS
The first process to arrive gets the CPU first.
Example: 3 processes arrive at t=0
Process Burst Time
P1 24
P2 3
P3 3
Timeline: [P1 (24)][P2 (3)][P3 (3)]
0 24 27 30
Waiting times: P1=0, P2=24, P3=27
Average waiting: (0+24+27)/3 = 17
Convoy effect: short jobs stuck behind long job
Change order:
If P2, P3 arrive at t=0 but P1 at t=1 (SJF solves this):
[P2(3)][P3(3)][P1(24)]
0 3 6 30
Waiting times: P2=0, P3=3, P1=6
Average waiting: (0+3+6)/3 = 3 (5.7× better!)
SJF (Shortest Job First)
Run the process with shortest burst time next.
Provably optimal for minimizing average turnaround time.
(When all jobs arrive simultaneously)
Process Arrival Burst
P1 0 8
P2 0 4
P3 0 9
P4 0 5
SJF order: P2(4), P4(5), P1(8), P3(9)
Timeline: [P2][P4 ][P1 ][P3 ]
0 4 9 17 26
Waiting: P2=0, P4=4, P1=9, P3=17
Average: (0+4+9+17)/4 = 7.5
FIFO order (P1,P2,P3,P4):
Waiting: P1=0, P2=8, P3=12, P4=21
Average: (0+8+12+21)/4 = 10.25
SJF wins!
Problem: How do you know the burst time in advance?
Usually you DON'T. Prediction using history:
estimated_burst(n+1) = α × actual_burst(n) + (1-α) × estimated_burst(n)
Exponential average (α = 0.5 typical)
SRTF (Shortest Remaining Time First) — Preemptive SJF
When a new job arrives, if its burst is shorter than remaining time of current job, preempt.
Process Arrival Burst
P1 0 8
P2 1 4
P3 2 9
P4 3 5
t=0: P1 starts (only one)
t=1: P2 arrives with burst=4 < P1 remaining=7 → preempt!
t=5: P2 done. P4 arrives at t=3 with burst=5 = P1 remaining=7
P4 is shorter → P4 runs
t=10: P4 done. P1 remaining=7, P3 burst=9 → P1 runs
t=17: P1 done. P3 runs
t=26: P3 done.
Timeline:
[P1][P2 ][P4 ][P1 ][P3 ]
0 1 5 10 17 26
Average turnaround:
P1: 17-0=17, P2: 5-1=4, P3: 26-2=24, P4: 10-3=7
Average: (17+4+24+7)/4 = 13
Better than SJF for mixed arrivals!
Round Robin (RR)
Each process gets a small fixed time slice (quantum). After quantum expires, process is preempted and goes to back of queue.
Real-world analogy: A round table discussion where each person gets exactly 2 minutes to speak, then the turn moves to the next person. Everyone gets heard, no one monopolizes.
Quantum = 4
Process Burst
P1 24
P2 3
P3 3
Timeline (RR with q=4):
[P1(4)][P2(3)][P3(3)][P1(4)][P1(4)][P1(4)][P1(4)][P1(4)]
0 4 7 10 14 18 22 26 30
Waiting times:
P1: 0 + (10-4) + (14-8) + ... = complicated
P2: 4 - 0 = 4 (wait from t=0 to first run at t=4)
P3: 7 - 0 = 7
Average response time for RR much better than FCFS!
(P2 responds at t=4, not t=24)
Choosing quantum size:
Too small (q=1):
Lots of context switches → overhead dominates
If context switch costs 1ms and q=1ms → 50% CPU wasted on overhead!
Too large (q=∞):
Degenerates to FIFO
Long response time for short jobs
Sweet spot: 10-100ms in practice
10-20ms for interactive workloads
Larger for batch systems
Rule of thumb: quantum should be > 80% of actual CPU bursts
Priority Scheduling
Each process has a priority. Highest priority runs first.
Process Priority Burst
P1 3 10
P2 1 1 ← highest priority (1 = highest)
P3 4 2
P4 5 1
P5 2 5
Execution order: P2, P5, P1, P3, P4
Common priority assignments:
Static: set at creation, never changes
Dynamic: changes based on behavior, time waited
Starvation problem:
Low priority process might never run if high priority keep arriving
Solution: Aging
Gradually increase priority of waiting processes
After waiting T seconds: priority += 1
Eventually even low priority process runs
Multilevel Queue Scheduling
Different queues for different types of processes. Each queue has its own scheduling algorithm.
Priority 1 (highest): ┌───────────────────┐
System processes │ Real-time FIFO │
└───────────────────┘
Priority 2: ┌───────────────────┐
Interactive processes│ Round Robin q=8ms │
└───────────────────┘
Priority 3: ┌───────────────────┐
Batch processes │ Round Robin q=16ms│
└───────────────────┘
Priority 4 (lowest): ┌───────────────────┐
Background/idle │ FCFS │
└───────────────────┘
Higher queues have absolute priority over lower.
Process in queue 3 only runs if queues 1 and 2 empty.
MLFQ (Multilevel Feedback Queue)
The best general-purpose scheduler. Processes can move between queues based on behavior.
Real-world analogy: Airline check-in. You start in the regular queue. If you’re frequently traveling, you earn status and move to the priority queue. If a “first class” passenger is causing delays, they can be deprioritized. The system learns and adapts.
Rules:
1. If Priority(A) > Priority(B): A runs
2. If Priority(A) == Priority(B): A and B run in RR
3. New job: enters highest priority queue (Q0)
4. If job uses full quantum: demote to next queue
5. If job gives up CPU before quantum: stay in same queue
6. After time period S: boost ALL jobs to highest queue (prevent starvation)
Queue Quantum Interpretation
Q0 8ms Interactive (likely short bursts)
Q1 16ms Medium jobs
Q2 FIFO Long batch jobs
How a job's behavior determines placement:
Short interactive (< 8ms burst): stays in Q0 forever
Medium job (8-16ms bursts): demotes to Q1
Long CPU-bound job: demotes to Q2
Why this works:
Short jobs (interactive) naturally stay at high priority
Long jobs naturally drift to lower queues
No need to know burst time in advance!
Gaming: some jobs artificially yield just before quantum expires
Solution: rule 4 tracks total CPU time at each level, not just one quantum
Part 3 — Linux CFS (Completely Fair Scheduler)
The actual scheduler used in Linux since 2007.
Goal: every process gets equal CPU time
Core concept: virtual runtime (vruntime)
vruntime = actual CPU time spent × (1/weight)
Higher priority → smaller weight → vruntime increases more slowly
→ high priority process always has smallest vruntime
→ gets selected first
Data structure: Red-black tree ordered by vruntime
Leftmost node = process with smallest vruntime = runs next
O(log n) operations for insert/delete
How it works:
All processes in RB-tree sorted by vruntime
Scheduler always picks leftmost (smallest vruntime)
Runs it for a time slice
Updates vruntime, reinserts in tree
Process with least CPU time always runs next
→ perfectly fair over time
Time slice calculation:
target_latency = 20ms (try to run each process once in 20ms)
min_granularity = 4ms (minimum time slice, prevents too many switches)
if n processes: time_slice = max(target/n, min_granularity)
With 4 processes: 20/4 = 5ms each
With 100 processes: 20/100 = 0.2ms → clamped to 4ms each
Nice values (priority in Linux):
nice value: -20 (highest priority) to +19 (lowest)
Default: 0
Higher nice = nicer to others = lower priority
Lower nice = less nice = higher priority
Shell command:
nice -n 10 ./long_job # lower priority
nice -n -5 ./important # higher priority (need root for negative)
renice 5 -p 1234 # change running process's nice value
Part 4 — Real-Time Scheduling
For systems with timing deadlines: robotics, audio processing, flight control.
Real-time task properties:
Period (T): task repeats every T milliseconds
Execution time (C): CPU time needed per period
Deadline (D): must complete within D ms of release
Utilization: U = C/T
Sum of all utilizations must be ≤ 1.0 (100% CPU)
Rate Monotonic (RM):
Static priority: shorter period → higher priority
Provably optimal for static priority assignment
If sum of C/T ≤ 0.693 (ln 2): schedulable guaranteed
Example:
Task 1: C=1, T=4 (runs every 4ms, takes 1ms)
Task 2: C=2, T=6 (runs every 6ms, takes 2ms)
U = 1/4 + 2/6 = 0.25 + 0.33 = 0.58 < 0.693 → schedulable!
Earliest Deadline First (EDF):
Dynamic priority: closest deadline → highest priority
Optimal: can schedule any feasible task set
If sum of C/T ≤ 1.0: schedulable guaranteed
Practice Problems
Given processes: P1 (burst=10, arrival=0), P2 (burst=1, arrival=1), P3 (burst=2, arrival=2). Calculate average turnaround time for FCFS, SJF (non-preemptive), and RR (q=4).
Why does RR have better response time but worse turnaround time than FCFS?
A MLFQ has queues Q0 (q=8ms) and Q1 (q=16ms). A process has bursts of: 5ms, 9ms, 3ms, 20ms. Trace its path through the queues.
What is the starvation problem with priority scheduling? How does aging solve it?
Answers
Problem 1:
FCFS (arrival order: P1, P2, P3):
Timeline: [P1: 0-10][P2: 10-11][P3: 11-13]
Turnaround: P1=10-0=10, P2=11-1=10, P3=13-2=11
Average: (10+10+11)/3 = 10.33
SJF non-preemptive:
At t=0: only P1 ready → run P1 (t=0-10)
At t=10: P2(1), P3(2) both ready → run P2 (shorter)
Timeline: [P1: 0-10][P2: 10-11][P3: 11-13]
Same as FCFS in this case (P1 can't be preempted)
Average: 10.33
SRTF (preemptive SJF):
t=0: P1 runs
t=1: P2 arrives (burst=1) < P1 remaining(9) → preempt!
t=2: P2 done. P3 arrives (burst=2) < P1 remaining(9) → P3 runs
t=4: P3 done. P1 remaining=9 → P1 runs
t=13: P1 done.
Timeline: [P1:0-1][P2:1-2][P3:2-4][P1:4-13]
Turnaround: P1=13-0=13, P2=2-1=1, P3=4-2=2
Average: (13+1+2)/3 = 5.33 (much better!)
RR q=4:
Timeline: [P1:0-4][P2:4-5][P3:5-7][P1:7-11][P1:11-13*]
Wait, P2 only needs 1ms even though quantum is 4:
[P1:0-4][P2:4-5][P3:5-7][P1:7-11][P1:11-13]
Actually P2 arrives at t=1, P3 at t=2, so order at t=4:
ready queue = [P2, P3, P1_remaining]
[P1:0-4][P2:4-5][P3:5-7][P1:7-11][P1:11-13]
Turnaround: P1=13, P2=5-1=4, P3=7-2=5
Average: (13+4+5)/3 = 7.33
Problem 2:
RR response time is better because:
Every process runs within q × n_processes time
With q=4, 5 processes: max wait = 20ms before first run
FCFS: last process waits for all others to complete
RR turnaround is worse because:
Short jobs that would finish quickly in FCFS
are interrupted and must wait for round-robin turns
Example: P1=24ms, P2=3ms, P3=3ms
FCFS turnaround: 24, 27, 30 (P2 done quickly at 27)
RR q=4: P2 first runs at t=4, finishes at t=7 (much later!)
Tradeoff: RR optimizes response time, hurts turnaround
Interactive systems prefer response time (feels faster)
Batch systems prefer turnaround (total throughput)
Problem 3:
MLFQ trace:
Burst 1: 5ms
Enters Q0, runs 5ms, doesn't use full 8ms quantum
→ stays in Q0
Burst 2: 9ms
Runs 8ms in Q0, still needs 1ms
Used full quantum → demoted to Q1
Runs remaining 1ms in Q1, gives up CPU
Burst 3: 3ms
If stayed in Q1: runs 3ms (less than 16ms quantum) → stays Q1
With "priority boost" (after time period S): back to Q0
Assuming no boost: runs in Q1, finishes in 3ms
Burst 4: 20ms
In Q1, runs 16ms (full quantum) → demoted to Q2 (FIFO)
Runs remaining 4ms in Q2
Without boost: process drifts to lowest queue
With boost: periodically returns to Q0 (prevents starvation)
Problem 4:
Starvation:
Low priority process P (priority=10) is ready
High priority processes (priority=1,2,3) keep arriving
P never gets CPU — "starved"
Real scenario: background backup job never runs during busy day
Process waits hours (even days) for CPU — unacceptable
Aging solution:
Every T seconds a process waits without running: priority += 1
Initially: P has priority 10 (low)
After waiting 5 minutes: priority = 9
After 10 minutes: priority = 8
After 50 minutes: priority = 0 (highest!)
Eventually, P's priority rises above even new high-priority arrivals
Guarantees every process eventually runs
After running: priority reset to original value
References
- Arpaci-Dusseau — OSTEP — Chapters 7-10 (Scheduling)
- Silberschatz — OS Concepts — Chapter 5
- Linux CFS — kernel.org: CFS scheduler
- Real-time scheduling — Liu & Layland (1973) paper on RM scheduling