The Global Interpreter Lock (GIL) in CPython has long been a pragmatic compromise. It protects memory management internals from concurrent access at the cost of serializing all Python bytecode execution. This single lock simplifies reference counting and garbage collection, but it has made true multi-core CPU utilization within one process impossible.
Even in C extensions, while Py_BEGIN_ALLOW_THREADS
enables the GIL to be dropped, any interaction with Python objects—reference counts, attribute access, even isinstance
checks—requires reacquiring it. This fragments performance and forces programmers into a mental model where concurrent control and Python data manipulation are mutually exclusive.
The GIL also has non-obvious side effects:
The result is that GIL-based Python gives the appearance of concurrency while hiding deep systemic serialization. Most high-performance developers route around it—via multiprocessing, C++, or GPU offload—each with its own overhead and disconnect from Python ergonomics.
FTP (introduced experimentally in CPython 3.13 and maturing in 3.14) removes the GIL and introduces per-object locking semantics to enable real concurrency.
Rather than reintroduce coarse locks, the FTP model shifts toward critical sections, atomic operations, and lock elision strategies. This opens the door for performance models much closer to what C++ and Rust developers expect.
ft_utils, provides infrastructure for working with FTP in production contexts. It includes:
AtomicInt64
, AtomicReference
and AtomicFlag
This is not abstraction for abstraction’s sake but tools to allow scalable development with exact, fine grained and easy to use control over thread based parallel execution architectures.
AtomicInt64
provides lock-free atomic manipulation of integer state:
from ft_utils import AtomicInt64
counter = AtomicInt64(0)
# Multiple threads can safely increment:
counter.fetch_add(1)
# Or using arithmetic operators
counter += 1
In the absence of the GIL, this matters: naive += 1
on an int
is now unsafe without explicit synchronization.
AtomicReference
generalizes this to object references. It enables low-level constructs like lock-free queues, hazard pointers, or generational GC barriers, depending on your architecture.
AtomicFlag
is a bool abstraction over AtomicInt64.
These atomic ops are implemented in C using platform-native intrinsics, giving predictable, memory-fenced semantics consistent with modern concurrent programming.
Access to many atomic native operations (for example _Py_atomic_add_uint64_t) have been added to the CPython API. ft_utils provides ft_compat.h which backports these to previous versions of CPython to make cross version coding easier.
Critical sections in FTP are explicit per-object locks that allow serial access to shared state. In Cython:
with cython.critical_section(myobj):
# Safe access
Under the hood, each Python object now carries an optional lock. This unlocks fine-grained synchronization models—reader-writer patterns, lock striping, and even lock-free algorithms with fallback pessimism.
Unlike threading.Lock
, critical sections are:
Critical sections and mutexes both provide mutual exclusion to protect shared resources from concurrent access. However, they differ in scope, granularity, performance, and implementation strategy, particularly in the context of Free-Threaded Python (FTP).
Mutex (Mutual Exclusion Object)
Critical Section (in FTP)
Mutex
Critical Section
with cython.critical_section(obj)
locks obj
’s critical section lock.with cython.critical_section(my_object):
my_object.value += 1 # safe from race conditions
The below is from the batch executor source code in ft_utils where a critical sections protects the refilling of the buffer. Note how the critical section protects the excution on a per-object basis compare to a mutex which would just be one a code block basis.
Py_BEGIN_CRITICAL_SECTION(self);
index = _Py_atomic_load_ssize(&(self->index));
if (index < size) {
err = 0;
} else {
err = BatchExecutorObject_fill_buffer(self);
}
Py_END_CRITICAL_SECTION();
lock = threading.Lock()
with lock:
my_object.value += 1 # safe, but scope and ownership are not enforced
Feature | Mutex | Critical Section (FTP) |
---|---|---|
Scope | Arbitrary | Tied to Python objects |
Performance | OS/kernel-level (slower) | Fast user-space, object-specific |
Python Awareness | No | Yes |
Deadlock Risk | Higher | Lower (if per-object) |
Use Case | Manual general locking | Fine-grained Python object protection |
Default in FTP | No | Yes |
The Global Interpreter Lock (GIL) is a mechanism used in CPython, the standard implementation of the Python programming language, to synchronize access to Python objects, preventing multiple native threads from executing Python bytecodes at once. This lock is necessary primarily because CPython’s memory management is not thread-safe.
Thread safety refers to the ability of a program or a piece of code to behave correctly when accessed by multiple threads. Achieving thread safety is crucial in multithreaded environments where threads share the same memory space and resources. The challenges in ensuring thread safety include preventing race conditions, deadlocks, and other concurrency-related issues.
The GIL impacts the execution of threads in Python by allowing only one thread to execute Python bytecodes at a time. This means that for CPU-bound threads (those that spend most of their time performing computations), the GIL can significantly limit the benefits of multithreading because it effectively serializes the execution of these threads. However, for I/O-bound threads (those that spend most of their time waiting on I/O operations like reading from a file or network), the GIL is released during the I/O operation, allowing other threads to run.
Despite its role in simplifying certain aspects of Python’s threading implementation, the GIL does not make Python code thread-safe. The GIL is released during certain operations like I/O, and even when it is held, operations that appear atomic can still be interrupted. For example, incrementing a counter (x += 1
) is not atomic; it involves reading the current value, incrementing it, and writing it back; that might then also cause code to run due to properties and all this might change as code evolves. If multiple threads are doing this concurrently, the GIL might be released between these steps, or the thread might be interrupted, leading to a race condition.
Here’s an example that demonstrates how the GIL does not prevent race conditions:
import threading
class Counter:
"""Contains a reference to a counted value"""
def __init__(self) -> None:
self._counted = 0
@property
def counted(self) -> int:
return self._counted
@counted.setter
def counted(self, value: int) -> None:
self._counted = value
def increment_counter(counter: Counter, num_times: int, barrier: threading.Barrier) -> None:
"""Increments the counter 'num_times' times after waiting on the barrier."""
barrier.wait()
for _ in range(num_times):
counter.counted += 1
def main() -> None:
"""Runs multiple threads to increment a counter and checks for correctness."""
num_threads = 10
num_increments = 50000
iterations = 0
while True:
counter = Counter()
barrier = threading.Barrier(num_threads)
threads = []
for _ in range(num_threads):
thread = threading.Thread(target=increment_counter, args=(counter, num_increments, barrier))
threads.append(thread)
thread.start()
for thread in threads:
thread.join()
expected = num_threads * num_increments
print(f"{iterations}-> Expected: {expected} Actual: {counter.counted}", flush=True)
if counter.counted != expected:
return
iterations += 1
if __name__ == "__main__":
main()
Running this code, you’ll likely find that the actual count is less than the expected count due to the race condition in incrementing the counter. Different values for num_increments
may or may not trigger this behaviour. Similarly, running the code on different machines may impact results. So, code might work as though it is thread safe with the GIL but in relality there is no guarantee; code which works today might suddenly break tomorrow.
To achieve thread safety in Python, developers must use synchronization primitives like locks (threading.Lock
), queues (queue.Queue
), or other concurrency control mechanisms. For example, using a lock to protect the counter increment operation:
import threading
class Counter:
"""Contains a reference to a counted value"""
def __init__(self) -> None:
self._counted = 0
@property
def counted(self) -> int:
return self._counted
@counted.setter
def counted(self, value: int) -> None:
self._counted = value
def increment_counter(counter: Counter, num_times: int, barrier: threading.Barrier, lock: threading.Lock) -> None:
"""Increments the counter 'num_times' times after waiting on the barrier."""
barrier.wait()
for _ in range(num_times):
# Putting the lock around the entire loop is more efficient.
# Putting it here is a clearer demonstration of the concept.
with lock:
counter.counted += 1
def main() -> None:
"""Runs multiple threads to increment a counter and checks for correctness."""
num_threads = 10
num_increments = 50000
iterations = 0
lock = threading.Lock()
while True:
counter = Counter()
barrier = threading.Barrier(num_threads)
threads = []
for _ in range(num_threads):
thread = threading.Thread(target=increment_counter, args=(counter, num_increments, barrier, lock))
threads.append(thread)
thread.start()
for thread in threads:
thread.join()
expected = num_threads * num_increments
print(f"{iterations}-> Expected: {expected} Actual: {counter.counted}", flush=True)
if counter.counted != expected:
return
iterations += 1
if __name__ == "__main__":
main()
This version of the code ensures that the counter is incremented correctly, even with multiple threads.
SO, while the GIL simplifies certain aspects of Python’s threading by preventing multiple threads from executing Python bytecodes simultaneously, it does not make Python code inherently thread-safe. Developers must still use proper synchronisation techniques to protect shared resources and prevent concurrency-related issues.
Logically, none. Any Python code which is actually thread safe with GIL based Python will still be thread safe with FTPython. However, there my be code which appears to work (as you don’t notice the race conditions) but is not actually thread safe which will fail more often with free threading. In this case one could argue the GIL is a risk as it hides race conditions which can then bite developers when they least expect it.
In CPython prior to Python 3.13, the Global Interpreter Lock (GIL) is the central mechanism ensuring only one thread runs Python bytecode at a time. Thread scheduling is delegated to the OS, but the GIL adds an interpreter-level override: only one thread may execute Python bytecode at any time.
The GIL is released periodically based on:
gil_drop_request
) usually every 5 ms.time.sleep()
.The next thread to acquire the GIL is essentially chosen by the OS, but not based on Python-level priority because:
A high-priority thread (e.g., real-time audio or control loop) can be blocked by lower-priority threads that happen to acquire the GIL. Worse, if those lower-priority threads are preempted or starved by the OS scheduler, they may hold the GIL but not make progress, delaying everyone.
This is classical priority inversion: a low-priority thread prevents a high-priority one from proceeding due to locking mechanics.
Python does not allow user-level control over GIL scheduling, including:
threading.Thread
has no priority API).There are no hooks to influence which thread gets the GIL next, beyond blocking in native code or using time.sleep()
as a crude yield.
In GIL-locked Python:
A practical illustration: imagine a ‘golden’ thread feeding tensors to the GPU. It’s on the critical path for inference latency. Meanwhile, a logger thread is periodically flushing buffered output.
With the GIL:
In FTP:
This is not just an academic benefit. It’s how you make Python viable in systems with mixed criticality.
In Free Threaded Python (FTP):
pthread
, sched_setaffinity()
, etc.ft_utils
or similar, you can implement lock hierarchies or priority inheritance mechanisms yourself.This makes FTP viable for:
Python’s multiprocessing
module sidesteps the GIL via process isolation:
While good for CPU-bound parallelism, it is unsuitable for fine-grained control because:
If you want to build concurrent in-memory data structures, multiprocessing
doesn’t help.
multiprocessing.Manager
, Queue
, Pipe
, or SharedMemory
.SharedMemory
(3.8+) allows faster shared access for numpy arrays or raw bytes, but requires manual memory layout and synchronization.Tool Type | Granularity | Notes |
---|---|---|
Lock, Semaphore | Coarse | OS-based; useful for shared counters or critical sections |
Queue, Pipe | Coarse | Good for message passing; serialization adds latency |
Manager.Value/List | Very coarse | Slower, proxied objects using a background server thread |
shared_memory | Byte-level | Requires manual synchronization, useful for arrays |
multiprocessing
offers concurrency tools, but not fine-grained concurrency as you’d find in threading or other utilities. It excels at task-level parallelism across CPUs. It lacks low-latency, lock-free primitives and is not designed for concurrent manipulation of shared Python objects.
If you must have shared state with fine-grained control, you can either:
Native C/C++ extensions can release the GIL using Py_BEGIN_ALLOW_THREADS
/ Py_END_ALLOW_THREADS
. This allows true parallel execution but only for code that does not touch Python objects.
The moment a native thread needs to:
it must re-acquire the GIL. This serializes the execution.
Aspect | Limitation |
---|---|
Granularity | Cannot safely manipulate Python data structures without taking the GIL. So most real-world work is GIL-bound. |
Interleaving | No interleaving of native + Python logic per-thread without GIL churn. |
Scalability | Parallel work is only scalable if it’s fully outside Python (e.g., pure math, IO, or C++ workloads). |
Memory Access | Python’s memory model is not thread-safe without the GIL. You can’t update Python containers from two native threads safely. |
Design Overhead | You need to segment your application logic into “GIL-free” vs “GIL-held” regions. That’s brittle and complex. |
While native threads in legacy Python allow some parallelism, they are not a general-purpose, fine-grained model for concurrency. It’s more like a bolt-on escape hatch for specific use cases (e.g., I/O libraries, compute-heavy extensions).
Cython supports nogil
blocks, which are often suggested as a parallelism workaround:
cdef void do_work() nogil:
# C code only
However:
dict
, list
, set
, or native Python object manipulation is possible.This makes it useful for compute kernels, not general Python concurrency.
Most concurrency libraries try to protect you. FTPython with ft_utils does something different; it gives you the control you need to design for correctness, rather than depending on global serialization as a crutch. Not only that, it makes key things easy to get right and provides library support of scalability and inter-thread communication. For example, lower level languages like C will crash if something is not thread say, FTPython will not crash, it might give an unexpected result but it keeps on trucking. When you hit issues ft_utils can provide more sophisticated synchronisation like readers/write locks, atomics and ConcurrentDict to tidy up thread correctness without a big performace hit.
Unitl now, Python has never been suitable for finely tuned concurrent systems. With FTP and ft_utils, that changes. You get the primitives—now you decide how to build with them.
If you’re working on systems where performance, determinism, or mixed-criticality scheduling matter, this is finally a Python that respects your intent.