gh-142659: Optimize set_swap_bodies for intersection_update#148155
gh-142659: Optimize set_swap_bodies for intersection_update#148155Siyet wants to merge 3 commits intopython:mainfrom
Conversation
|
Most changes to Python require a NEWS entry. Add one using the blurb_it web app or the blurb command-line tool. If this change has little impact on Python users, wait for a maintainer to apply the |
|
@Siyet Could you add a few benchmarks for the affected cases? |
|
Benchmarks on a dedicated server (AMD EPYC 1 vCPU, 2 GB RAM, Ubuntu 24.04, no background load, Default build (with GIL):
Free-threading build (
The end-to-end improvement is modest (~1%) because Benchmark script"""Benchmark set.intersection_update for gh-142659.
Measures the performance of intersection_update across different set sizes
and usage patterns. Each benchmark recreates the set on every iteration
to measure the full intersection_update path including set_replace_body.
"""
import statistics
import sys
import timeit
ROUNDS = 50
BENCHMARKS = {
"small (5 & 5)": {
"setup": "",
"stmt": "set(range(5)).intersection_update({3, 4, 5, 6, 7})",
"number": 2_000_000,
},
"medium (100 & 100)": {
"setup": "b = set(range(50, 150))",
"stmt": "set(range(100)).intersection_update(b)",
"number": 500_000,
},
"large (10k & 10k)": {
"setup": "b = set(range(5000, 15000))",
"stmt": "set(range(10000)).intersection_update(b)",
"number": 2_000,
},
"multi-arg (100 & 100 & 50)": {
"setup": "b = set(range(50, 150)); c = set(range(75, 125))",
"stmt": "set(range(100)).intersection_update(b, c)",
"number": 500_000,
},
"empty result (100 & 0 overlap)": {
"setup": "b = set(range(200, 300))",
"stmt": "set(range(100)).intersection_update(b)",
"number": 500_000,
},
}
def main():
print(f"Python {sys.version}")
print(f"Rounds: {ROUNDS}")
print()
print(f"{'Benchmark':<32} {'Mean (ns)':>10} {'Stdev':>10} {'Min':>10} {'Max':>10}")
print("-" * 78)
for name, bench in BENCHMARKS.items():
times = timeit.repeat(
bench["stmt"],
setup=bench["setup"],
number=bench["number"],
repeat=ROUNDS,
)
per_iter = [t / bench["number"] * 1e9 for t in times]
mean = statistics.mean(per_iter)
stdev = statistics.stdev(per_iter)
print(
f"{name:<32} {mean:>10.1f} {stdev:>9.1f} {min(per_iter):>10.1f} {max(per_iter):>10.1f}"
)
if __name__ == "__main__":
main() |
Replace the general-purpose set_swap_bodies() with a specialized set_replace_body() that exploits the invariant that src is always a uniquely-referenced temporary about to be discarded.
d3f1a20 to
37a95e6
Compare
Objects/setobject.c
Outdated
|
|
||
| assert(!PyType_IsSubtype(Py_TYPE(dst), &PyFrozenSet_Type)); | ||
| assert(!PyType_IsSubtype(Py_TYPE(src), &PyFrozenSet_Type)); | ||
| assert(Py_REFCNT(src) == 1); |
There was a problem hiding this comment.
| assert(Py_REFCNT(src) == 1); | |
| assert(_PyObject_IsUniquelyReferenced(src)); |
There was a problem hiding this comment.
This assertion is going to be prone to gc.get_objects()/gc.get_referrers() issues like in #148180.
I'm not sure if we should do anything about that.
There was a problem hiding this comment.
Will gc.get_objects and similar ever see this set? They use a stop-the-world heap traversal, which can't happen until set_intersection_update / set_intersection_update_multi_impl returns, and by the time that happens, this src set is already gone.
There was a problem hiding this comment.
I agree with @zhuyifei1999 - src is a local variable that gets freed before the function returns, so gc can't observe it.
Switched to _PyObject_IsUniquelyReferenced per @eendebakpt's suggestion regardless, since I believe it's the correct API for free-threaded builds.
| moving dst's old contents into src for proper cleanup on Py_DECREF. | ||
|
|
||
| t=set(a); a.clear(); a.update(b); b.clear(); b.update(t); del t | ||
| The caller guarantees that src is a uniquely-referenced temporary set |
There was a problem hiding this comment.
If the called is required to cleanup src immediately afterwards, maybe it is cleaner to let the set_replace_body steal the reference and do the decref.
There was a problem hiding this comment.
In set_intersection_update_multi_impl, I placed Py_DECREF(tmp) after Py_END_CRITICAL_SECTION().
My reasoning was that moving the decref inside set_replace_body would run the deallocation (and decref of all keys) while still holding the critical section lock, which I thought might be undesirable.
By keeping it outside, the caller has control over when deallocation happens.
This is my first PR though and I'm still getting familiar with the project, so if I'm wrong here - happy to rework it.
Summary
Replace the general-purpose
set_swap_bodies()with a specializedset_replace_body()that exploits the invariant that the source argument is always a uniquely-referenced temporary set about to be discarded.Follow-up to the observation in #132290 (comment) and #142659.
Changes
set_swap_bodies()was designed for arbitrary two-way swaps between any two sets, but it is only called fromset_intersection_update()andset_intersection_update_multi_impl(), where the second argument (tmp) is always a freshly created temporary withPy_REFCNT == 1.The new
set_replace_body()exploits this invariant:src: the temporary is not visible to other threads, so plain assignments suffice (saves atomic fence overhead in the free-threaded build).copy_small_tableforsrc: use plainmemcpyinstead of per-entry atomic stores when writing back to src's smalltable (Py_GIL_DISABLEDpath).assert).srcis never shared (enforced viaassert), so only one direction of the shared-marking check is needed — propagate shared status fromdsttosrcfor proper deallocation of old entries.Py_hash_t hvariable.All assumptions are guarded by
assert()to document and enforce the contract.