

Naïvely locking on this
and fire‐hosing work onto the thread pool can cripple an application under load. For example, using lock(this)
anywhere in your code is dangerous – another caller may also lock the same object, causing hidden contention or deadlocks. Best practice is to lock on a private object, not this.
Likewise, spinning up unbounded Task.Run
work for each request will saturate the thread pool. When all ThreadPool threads are busy, new work must wait, a situation known as ThreadPool starvation. In practice this means huge tail latencies and dropped throughput under heavy load. Launching a thread‐pool work item per request will always lead to ThreadPool starvation if the workload ramps up fast enough. Even though .NET added smarter thread‐pool heuristics to soften some worst cases, these bad patterns still manifest as unpredictable latency spikes and context‐switch storms on any modern .NET runtime when backpressure is missing.
Lets consider a shared counter incremented concurrently by many threads using lock(this)
. At any moment only one thread can enter the critical section; all others block. Under high contention, almost every thread spends most of its time waiting. The timeline below sketches four threads contending for one lock:
Time | 0 1 2 3 4 5
-----+---------------------------------------
T1 | Run -> Lock -> CSec -> Exit -> Idle
T2 | Run -> Req -> Wait -> CSec -> Exit
T3 | Run -> Req -> Wait -> Wait -> Wait
T4 | Run -> Req -> Wait -> Wait -> Wait
Here “Run” means the thread is running outside the lock, “Lock”/“Req” means requesting the lock, “CSec” is in the critical section, and “Wait” is blocked waiting. You can see only one thread (T1) enters the critical section at a time, while T2–T4 are stalled. Threads repeatedly context-switch in and out as they jockey for the lock. In real workloads this leads to thread starvation (most threads idle waiting) and expensive context switches.
Now swap in a SemaphoreSlim(1,1)
or a bounded Channel<T>
. With SemaphoreSlim
, awaiting WaitAsync()
lets the thread yield instead of spin-blocking. Threads that don’t immediately get the permit return to the thread pool, and later resume in turn. With a bounded Channel<T>
, producers calling WriteAsync
simply await when the channel is full, naturally throttling the rate of work. In both cases the scheduling changes: only the active tasks consume CPU, and blocked tasks do not tie up threads. In effect, a Channel<T>
provides built-in back-pressure when a Channel.Writer produces faster than a Channel.Reader can consume, the channel’s writer experiences back pressure. In short, replacing a raw lock with SemaphoreSlim
(async waits) or a bounded channel stops the runaway spawning of threads, avoiding the backlog and long queues seen above.
Monitor/lock
– Use for very short, CPU-bound critical sections. Ideal when you only need mutual exclusion on simple in-process data and the locked work is minimal. Always lock on a private object (never on this
or Type
) to prevent external interference.
SemaphoreSlim
– Use to throttle concurrency or guard a limited resource pool. The lightweight semaphore lets N callers through (e.g. a max count) and makes excess callers wait. It supports async waits (WaitAsync
) and cancellation, so it’s perfect for bounding parallel asynchronous work.
Channel<T>
– Use a channel for producer–consumer pipelines with back-pressure. A bounded channel (via Channel.CreateBounded
) will have writers asynchronously block on WriteAsync
when it’s full. This enforces an upper limit on queued work and automatically slows producers to match consumer speed.
Concurrent Collections / Partitioner – Use ConcurrentQueue<T>
and ConcurrentDictionary<TKey,TValue>
when you need lock-free, thread-safe data structures. For example, a ConcurrentQueue
is ideal for FIFO scenarios with multiple producers/consumers, and a ConcurrentDictionary
fits high-concurrency key-value updates. Use Partitioner<T>
(often via Partitioner.Create
in Parallel.ForEach
) when you have many small work items. Partitioners let you chunk or balance work manually, which can greatly improve throughput in data-parallel loops.
// Problematic: a counter protected by lock(this) under heavy load
public class Counter {
private int _count;
public void Increment() {
lock (this) {
// Simulate some work
System.Threading.Thread.Sleep(1);
_count++;
}
}
public int Value => _count;
}
public class Program {
public static void Main() {
var counter = new Counter();
// Spawn many threads incrementing the counter
System.Threading.Tasks.Parallel.For(0, 10000, i => {
counter.Increment();
});
Console.WriteLine(counter.Value);
}
}
// Refactored: using SemaphoreSlim with async waits and cancellation
public class AsyncCounter {
private int _count;
private readonly SemaphoreSlim _sem = new SemaphoreSlim(1, 1);
public async System.Threading.Tasks.Task IncrementAsync(System.Threading.CancellationToken token) {
await _sem.WaitAsync(token);
try {
// Simulate async work
await System.Threading.Tasks.Task.Delay(1, token);
_count++;
}
finally {
_sem.Release();
}
}
public int Value => _count;
}
public class Program {
public static async System.Threading.Tasks.Task Main() {
var counter = new AsyncCounter();
var cts = new System.Threading.CancellationTokenSource();
var tasks = new System.Threading.Tasks.Task[10000];
for (int i = 0; i < tasks.Length; i++) {
tasks[i] = counter.IncrementAsync(cts.Token);
}
await System.Threading.Tasks.Task.WhenAll(tasks);
Console.WriteLine(counter.Value);
}
}
Use SemaphoreSlim
to throttle async work. When you need to limit concurrent operations or enforce fairness (e.g. API rate limits), SemaphoreSlim
(with WaitAsync
) is ideal. It produces back-pressure instead of unbounded parallelism.
Reserve lock
/Monitor
for very short, CPU-bound sections. Only use locking when you absolutely must protect a small critical section. Always lock on a dedicated private object, not this
. If you catch yourself doing I/O or blocking inside a lock, refactor to an async-friendly primitive.
Use channels for producer-consumer queues with back-pressure. A bounded Channel<T>
automatically forces writers to wait when full, keeping tail latency predictable under load.
Prefer thread-safe collections for data sharing. For shared queues or maps, use ConcurrentQueue<T>
or ConcurrentDictionary<TKey,TValue>
which handle synchronization internally. For bulk data parallelism, use a Partitioner
(or PLINQ) to break work into chunks. These minimize manual locking and spread work evenly.
By following these principles—lock only minimally, throttle concurrency explicitly (e.g. with SemaphoreSlim
), and use async‐friendly primitives (channels, concurrent collections)—you’ll avoid subtle deadlocks and tail-latency spikes. Each primitive has a niche: choose the one that matches your workload, and always think about back-pressure when scaling to heavy load
Subscribe for weekly newsletters about detailed and abstract concepts, as well as advanced content on .NET and Node.js.