Profilers measure the performance of a whole program to identify where most of the time is spent.
But once you’ve found a target function, re-profiling the whole program to see if your changes helped can be slow and cumbersome.
The profiler introduces overhead to execution and you have to pick out the stats for the one function you care about from the report.
I have often gone through this loop while optimizing client or open source projects, such as when I optimized Django’s system checks framework (previous post).
The pain here inspired me to create tprof, a targeting profiler for Python 3.12+ that only measures the time spent in specified target functions.
Use it to measure your program before and after an optimization to see if it made any difference, with a quick report on the command line.
For example, say you’ve realized that creating pathlib.Path objects is the bottleneck for your code.
You could run tprof like so:
Benchmark with comparison mode
Sometimes when optimizing code, you want to compare several functions, such as “before” and “after” versions of a function you’re optimizing.
tprof supports this with its comparison mode, which adds a “delta” column to the report showing how much faster or slower each function is compared to a baseline.
For example, given this code:
def before():
total = 0
for i in range(100_000):
total += i
return total
def after():
return sum(range(100_000))
for _ in range(100):
before()
after()
…you can run tprof like this to compare the two functions:
$ tprof -x -t before -t after -m example
🎯 tprof results:
function calls total mean ± σ min … max delta
example:before() 100 227ms 2ms ± 34μs 2ms … 2ms -
example:after() 100 86ms 856μs ± 15μs 835μs … 910μs -62.27%
The output shows that after() is about 60% faster than before(), in this case.
Python API
tprof also provides a Python API via a context manager / decorator, tprof().
Use it to profile functions within a specific block of code.
For example, to recreate the previous benchmarking example within a self-contained Python file:
from tprof import tprof
def before():
total = 0
for i in range(100_000):
total += i
return total
def after():
return sum(range(100_000))
with tprof(before, after, compare=True):
for _ in range(100):
before()
after()
…which produces output like:
$ python example.py
🎯 tprof results:
function calls total mean ± σ min … max delta
__main__:before() 100 227ms 2ms ± 83μs 2ms … 3ms -
__main__:after() 100 85ms 853μs ± 22μs 835μs … 1ms -62.35%
How it works
tprof uses Python’s sys.monitoring, a new API introduced in Python 3.12 for triggering events when functions or lines of code execute.
sys.monitoring allows tprof to register callbacks for only specific target functions, meaning it adds no overhead to the rest of the program.
Timing is done in C to further reduce overhead.
Thanks to Mark Shannon for contributing sys.monitoring to CPython!
This is the second time I’ve used it—the first time was for tracking down an unexpected mutation (see previous post).
Fin
If tprof sounds useful to you, please give it a try and let me know what you think!
Install tprof from PyPI with your favourite package manager.
May you hit your Q1 targets,
—Adam