marks an important point in the evolution of the world’s most popular programming language. While Python has long been acknowledged for its readability and large ecosystem, its execution speed has often been the “elephant in the room.”
With the arrival of 3.14, the CPython core development team has delivered not one, but two of the most anticipated features in recent times.
The end of the GIL
I’ve previously written about this before. True concurrency is now available in Python if you want it. If you want more details on GIL-free Python, I’ll leave a link to my article about it at the end.
The Just-In-Time (JIT) compiler
This experimental feature is now bundled directly in official installers, and it’s what we’ll focus on here. It’s the result of years of architectural preparation done by the Python core team and others, aimed at making Python “faster by default” without breaking the C-extension ecosystem that powers everything from data science to web backends.
In this article, we’ll lift the hood of the new JIT, explore how it differentiates itself from previous optimisation efforts, and walk through some benchmarking methodology to help you decide if it’s time to try out the JIT on your workloads.
What is Python’s New Just-In-Time (JIT) compiler?
To understand the 3.14 JIT, we need to how Python traditionally runs. Standard Python (CPython) is an interpreted language. When you run a script, your code is compiled into bytecode, which is a set of instructions that the CPython virtual machine executes.
The JIT changes this flow. Instead of simply interpreting bytecode line-by-line, the JIT monitors which parts of your code are executed most frequently (the “hot” paths). When a function or loop is deemed “hot,” the JIT translates the bytecode into native machine code (instructions the CPU understands). Then, the next time the code is invoked, no interpretation is required. Instead, it just runs as it is. This can be a great time-saver, as we’ll see later on.
How the JIT fits into CPython
The Python 3.14 JIT is not a total rewrite. It is designed as an opt-in component that works alongside the existing interpreter. It uses a technique called “copy-and-patch,” which allows the JIT to be lightweight and portable across different CPU architectures without requiring a massive, complex compiler backend like LLVM.
What Changed in Python 3.14?
Python 3.13 had a basic, experimental JIT, but it was disabled by default. If you wanted to test it, you had to clone the CPython source tree and compile it with specific experimental flags such as – – enable-experimental-jit.
With Python 3.14, everything changed. It offered the JIT in the official .msi (Windows) and .pkg (macOS) installers. It also meant that you no longer needed a C compiler on your machine to experience JIT benefits. While still “experimental,” the inclusion in official binaries signals that the core team believes the JIT is stable enough for broad community testing.
Getting Python 3.14
Head over to https://www.python.org/downloads/, and you’ll see a download option for 3.14. Click that, then follow the instructions.
Alternatively, if you have the UV tool installed, you can type the following.
PS C:\ > uv python install 3.14
Enabling the JIT
By default, the JIT is disabled. This is a safety measure; because it is experimental, the Python Steering Council wants to ensure that users don’t face unexpected regressions in stability or memory usage without explicitly choosing to.
To activate the JIT, you use an environment variable. This tells the CPython runtime to initialise the JIT engine upon startup.
On Windows (PowerShell):
$env:PYTHON_JIT=1
python my_script.py
On macOS/Linux (Bash/Zsh):
PYTHON_JIT=1
python my_script.py
Once enabled, CPython doesn’t JIT-compile everything immediately. It uses a tiering system. Basically, it tries to run code as cheaply as possible first, and only spends compilation/optimisation effort on the parts that prove to be hot.
- Tier 0: Standard interpretation.
- Tier 1: Specialised bytecode (introduced in 3.11).
- Tier 2 (The JIT): Machine code generation for the most frequently used paths.
Measuring the Impact of the JIT
When testing a JIT, you can’t simply use the time.time() around a function. JITs require a warm-up period. The first few iterations of a loop might be slower than normal as the JIT profiles the code, but subsequent iterations can be significantly faster.
The Benchmark Suite
Below is a comprehensive test suite designed to exercise different aspects of the JIT, from heavy math to complex object manipulation.
File 1: workloads.py
This file contains three different CPU-bound tasks.
1/ The Mandelbrot function iterates the Mandelbrot formula over a pixel grid and returns a checksum of per-pixel iteration counts.
2/ The Djikstra function builds a deterministic random weighted graph and runs Dijkstra from node 0, returning how many nodes were finalised/visited.
3/ The Levenshtein function generates N deterministic random string pairs and returns the sum of their Levenshtein distances
from __future__ import annotations
import random
import heapq
# Workload 1: Mandelbrot (CPU + math loops)
def mandelbrot(width: int = 1000, height: int = 1000, iters: int = 500) -> int:
checksum = 0
for y in range(height):
cy = (y / height) * 2.4 – 1.2
for x in range(width):
cx = (x / width) * 3.2 – 2.2
zx, zy, count = 0.0, 0.0, 0
while zx * zx + zy * zy <= 4.0 and count < iters:
zx, zy = zx * zx – zy * zy + cx, 2.0 * zx * zy + cy
count += 1
checksum += count
return checksum
# Workload 2: Dijkstra (heap + list + logic)
def dijkstra(n: int = 10000, edges_per_node: int = 50, seed: int = 123) -> int:
rng = random.Random(seed)
graph = [[] for _ in range(n)]
for u in range(n):
for _ in range(edges_per_node):
v = rng.randrange(n)
if v != u:
graph[u].append((v, rng.randrange(1, 30)))
dist = [10**12] * n
dist[0] = 0
pq = [(0, 0)]
visited = 0
while pq:
d, u = heapq.heappop(pq)
if d != dist[u]:
continue
visited += 1
for v, w in graph[u]:
nd = d + w
if nd < dist[v]:
dist[v] = nd
heapq.heappush(pq, (nd, v))
return visited
# Workload 3: Levenshtein distance (dynamic programming)
def levenshtein(a: str, b: str) -> int:
prev = list(range(len(b) + 1))
for i, ca in enumerate(a, 1):
cur = [i]
for j, cb in enumerate(b, 1):
cur.append(min(cur[j – 1] + 1, prev[j] + 1, prev[j – 1] + (ca != cb)))
prev = cur
return prev[-1]
def levenshtein_batch(n: int = 10000, seed: int = 7, k: int = 50) -> int:
“””
Deterministic batch: fixed RNG seed, fixed alphabet, fixed string length.
Returns the sum of distances.
“””
rng = random.Random(seed)
alphabet = “abc”
total = 0
for _ in range(n):
a = “”.join(rng.choices(alphabet, k=k))
b = “”.join(rng.choices(alphabet, k=k))
total += levenshtein(a, b)
return total
File 2: benchmark.py
This script automates comparing different workloads with JIT enabled and disabled.
import os
import time
import json
import subprocess
from pathlib import Path
PYTHON_EXE = r”C:\Users\thoma\AppData\Local\Programs\Python\Python314\python.exe”
PROJECT_DIR = Path(__file__).resolve().parent
# Original workloads (statement prints a result for sanity)
WORKLOADS = [
(“mandelbrot”, ‘from workloads import mandelbrot; print(mandelbrot())’),
(“dijkstra”, ‘from workloads import dijkstra; print(dijkstra())’),
(“levenshtein_batch”, ‘from workloads import levenshtein_batch; print(levenshtein_batch())’),
]
N_RUNS = 10 # average of ALL runs (set to 6/10/20 as you like)
OUTFILE = PROJECT_DIR / “results_avg.json”
def run_once(stmt: str, jit_val: int) -> tuple[float, str]:
env = os.environ.copy()
env[“PYTHON_JIT”] = str(jit_val)
# Ensure local workloads.py is importable in subprocess
env[“PYTHONPATH”] = str(PROJECT_DIR) + (os.pathsep + env.get(“PYTHONPATH”, “”))
t0 = time.perf_counter()
p = subprocess.run(
[PYTHON_EXE, “-c”, stmt],
env=env,
cwd=str(PROJECT_DIR),
capture_output=True,
text=True,
)
t1 = time.perf_counter()
if p.returncode != 0:
raise RuntimeError(
f”Run failed (PYTHON_JIT={jit_val})\n\n”
f”Statement:\n{stmt}\n\n”
f”STDOUT:\n{p.stdout}\n\nSTDERR:\n{p.stderr}”
)
return (t1 – t0, p.stdout.strip())
def summarize(times: list[float]) -> dict:
return {
“avg”: sum(times) / len(times),
“min”: min(times),
“max”: max(times),
“runs”: times,
}
def bench_workload(name: str, stmt: str) -> dict:
results = {}
outputs = {}
for jit_val in (0, 1):
times = []
outs = []
print(f” PYTHON_JIT={jit_val}: running {N_RUNS} times…”)
for i in range(1, N_RUNS + 1):
dt, out = run_once(stmt, jit_val)
times.append(dt)
outs.append(out)
print(f” run {i}/{N_RUNS}: {dt:.6f}s”)
results[jit_val] = summarize(times)
outputs[jit_val] = outs
avg0 = results[0][“avg”]
avg1 = results[1][“avg”]
speedup = avg0 / avg1 if avg1 else float(“inf”)
delta_pct = (avg1 – avg0) / avg0 * 100.0 if avg0 else 0.0
return {
“workload”: name,
“jit0”: results[0],
“jit1”: results[1],
“speedup_jit0_over_jit1”: speedup,
“delta_pct_jit1_vs_jit0”: delta_pct,
“outputs”: outputs, # sanity: should be stable
}
def main() -> int:
all_results = []
print(f”Using Python: {PYTHON_EXE}”)
print(f”Project dir: {PROJECT_DIR}”)
print(f”Runs per setting (avg of all runs): {N_RUNS}\n”)
for name, stmt in WORKLOADS:
print(f”=== {name} ===”)
r = bench_workload(name, stmt)
all_results.append(r)
print(f”\n Averages:”)
print(f” JIT=0 avg: {r[‘jit0’][‘avg’]:.6f}s (min {r[‘jit0’][‘min’]:.6f}, max {r[‘jit0’][‘max’]:.6f})”)
print(f” JIT=1 avg: {r[‘jit1’][‘avg’]:.6f}s (min {r[‘jit1’][‘min’]:.6f}, max {r[‘jit1’][‘max’]:.6f})”)
print(f” Speedup (JIT=0 / JIT=1): {r[‘speedup_jit0_over_jit1’]:.3f}× (Δ={r[‘delta_pct_jit1_vs_jit0’]:+.2f}%)\n”)
# Optional: warn if outputs vary across runs (nondeterminism)
if len(set(r[“outputs”][0])) != 1:
print(” !! WARNING: JIT=0 output differs across runs (nondeterministic workload?)”)
if len(set(r[“outputs”][1])) != 1:
print(” !! WARNING: JIT=1 output differs across runs (nondeterministic workload?)”)
OUTFILE.write_text(json.dumps(all_results, indent=2), encoding=”utf-8″)
print(f”Wrote: {OUTFILE}”)
return 0
if __name__ == “__main__”:
raise SystemExit(main())
Here are my results.
C:\Users\thoma\projects\python_jit>C:\Users\thoma\AppData\Local\Programs\Python\Python314\python.exe benchmark.py
Using Python: C:\Users\thoma\AppData\Local\Programs\Python\Python314\python.exe
Project dir: C:\Users\thoma\projects\python_jit
Runs per setting (avg of all runs): 10
=== mandelbrot ===
PYTHON_JIT=0: running 10 times…
run 1/10: 6.890924s
run 2/10: 6.950737s
run 3/10: 7.265357s
run 4/10: 6.947150s
run 5/10: 6.932333s
run 6/10: 6.939378s
run 7/10: 7.194705s
run 8/10: 6.995550s
run 9/10: 6.902696s
run 10/10: 7.256164s
PYTHON_JIT=1: running 10 times…
run 1/10: 5.216740s
run 2/10: 5.241888s
run 3/10: 5.350822s
run 4/10: 5.246767s
run 5/10: 5.294771s
run 6/10: 5.273295s
run 7/10: 5.272135s
run 8/10: 5.617062s
run 9/10: 5.251656s
run 10/10: 5.239060s
Averages:
JIT=0 avg: 7.027499s (min 6.890924, max 7.265357)
JIT=1 avg: 5.300420s (min 5.216740, max 5.617062)
Speedup (JIT=0 / JIT=1): 1.326× (Δ=-24.58%)
=== dijkstra ===
PYTHON_JIT=0: running 10 times…
run 1/10: 0.235401s
run 2/10: 0.227603s
run 3/10: 0.244492s
run 4/10: 0.232971s
run 5/10: 0.249589s
run 6/10: 0.232229s
run 7/10: 0.229422s
run 8/10: 0.238399s
run 9/10: 0.230657s
run 10/10: 0.235772s
PYTHON_JIT=1: running 10 times…
run 1/10: 0.238862s
run 2/10: 0.239266s
run 3/10: 0.240312s
run 4/10: 0.231413s
run 5/10: 0.232692s
run 6/10: 0.233783s
run 7/10: 0.230016s
run 8/10: 0.237760s
run 9/10: 0.240895s
run 10/10: 0.246033s
Averages:
JIT=0 avg: 0.235653s (min 0.227603, max 0.249589)
JIT=1 avg: 0.237103s (min 0.230016, max 0.246033)
Speedup (JIT=0 / JIT=1): 0.994× (Δ=+0.62%)
=== levenshtein_batch ===
PYTHON_JIT=0: running 10 times…
run 1/10: 2.176256s
run 2/10: 2.171253s
run 3/10: 2.171834s
run 4/10: 2.170444s
run 5/10: 2.149874s
run 6/10: 2.162820s
run 7/10: 2.171975s
run 8/10: 2.199151s
run 9/10: 2.168398s
run 10/10: 2.167821s
PYTHON_JIT=1: running 10 times…
run 1/10: 1.575666s
run 2/10: 1.612615s
run 3/10: 1.571106s
run 4/10: 1.584650s
run 5/10: 1.579948s
run 6/10: 1.582633s
run 7/10: 1.593924s
run 8/10: 1.573608s
run 9/10: 1.581427s
run 10/10: 1.578553s
Averages:
JIT=0 avg: 2.170983s (min 2.149874, max 2.199151)
JIT=1 avg: 1.583413s (min 1.571106, max 1.612615)
Speedup (JIT=0 / JIT=1): 1.371× (Δ=-27.06%)
Interpreting the Results
As you can see, the results are a mixed bag. This is normal for an experimental JIT.
- 10–30% Speedup: Common in “pure Python” loops (like the Mandelbrot or Levenshtein tests) where the JIT can avoid the overhead of the bytecode dispatch loop.
- 0% Improvement: Common in I/O-bound tasks or code that heavily uses C extensions. The Dijkstra code didn’t speed up because its runtime is dominated by heap/tuple operations and memory-heavy, allocation-driven work that the current CPython JIT doesn’t optimise significantly, so any interpreter savings are lost in the noise.
When to Use the Python 3.14 JIT
The JIT is a powerful tool, but it is not a “magic button.” From my experience, you should try the JIT when you have…
- CPU-Bound Logic: Your application performs heavy calculations, data processing, or complex logic in pure Python.
- Long-Running Processes: Web servers (Gunicorn/Uvicorn) or background workers (Celery) that run for hours, allowing the JIT plenty of time to warm up and optimise hot paths.
- Experimental Testing: You want to prepare your codebase for future versions of Python (3.15+), where the JIT will likely be more aggressive.
And avoid it when you have…
- I/O-Bound Apps: If your app just waits for database queries or API responses, the JIT won’t help.
- Memory-Constrained Environments: Small Lambda functions or tiny containers might suffer from the increased memory footprint of the JIT cache.
- Short-Lived CLI Tools: A script that runs in under a second doesn’t need a JIT.
Future Directions: Beyond 3.14
The CPython core team views 3.14 as the “foundation year.” Future iterations (Python 3.15 and 3.16) are expected to include:
- Deeper Optimisation Passes: Using the type information gathered at runtime to perform even more aggressive machine code generation.
- Better Heuristics: Smarter decisions on when to compile, reducing the “warm-up” penalty.
- Lower Overhead: Refining the copy-and-patch mechanism to reduce memory consumption.
Summary
Python 3.14’s JIT is more than just a performance patch. It’s a statement of intent. It shows that Python is serious about closing the performance gap with languages like Java or Go while maintaining the “batteries-included” simplicity that made it famous.
For most developers, JIT is simply another tool worth keeping an eye on. If performance matters in your projects, it’s worth testing Python 3.14 against your existing workloads. A few benchmarks on your most important code paths might reveal performance gains where you weren’t expecting them.
Here is the link to my previous article on GIL Fee Python, I mentioned at the start.
