Feature or enhancement
Proposal:
Support PGO (profile guided optimization) for clang-cl on Windows using a similar approach as done in the Linux makefiles for clang.
Has this already been discussed elsewhere?
No response given
Links to previous discussion of this feature:
Discussion has started in the PR #129907 while being draft.
Linked PRs
64bit pyperformance results on my Windows 10 PC (dusty i5-4570 CPU) run with --fast --affinity 0 for commit 9db1a29 with
| Benchmark |
msvc.release.9db1a297d9 |
clang.release.9db1a297d9 |
msvc.pgo.9db1a297d9 |
clang.pgo.9db1a297d9 |
| Geometric mean |
(ref) |
1.27x faster |
1.28x faster |
1.47x faster |
| Benchmark |
msvc.release.9db1a297d9 |
clang.release.9db1a297d9 |
| Geometric mean |
(ref) |
1.27x faster |
clang 18.1.8 is faster than 19.1.1, and 20.1.0.rc2 with tailcalling is the fastest:
| Benchmark |
msvc.pgo.9db1a297d9 |
clang.pgo.18.1.8.9db1a297d9 |
clang.pgo.9db1a297d9 |
clang.pgo.tc.20.1.0.rc2.9db1a297d9 |
| Geometric mean |
(ref) |
1.19x faster |
1.15x faster |
1.25x faster |
Details
Benchmarks with tag 'apps':
| Benchmark |
msvc.release.9db1a297d9 |
clang.release.9db1a297d9 |
msvc.pgo.9db1a297d9 |
clang.pgo.9db1a297d9 |
| 2to3 |
586 ms |
491 ms: 1.19x faster |
462 ms: 1.27x faster |
426 ms: 1.38x faster |
| docutils |
4.27 sec |
3.75 sec: 1.14x faster |
3.50 sec: 1.22x faster |
3.31 sec: 1.29x faster |
| html5lib |
104 ms |
81.6 ms: 1.28x faster |
77.9 ms: 1.34x faster |
74.5 ms: 1.40x faster |
| Geometric mean |
(ref) |
1.20x faster |
1.28x faster |
1.35x faster |
Benchmarks with tag 'asyncio':
| Benchmark |
msvc.release.9db1a297d9 |
clang.release.9db1a297d9 |
msvc.pgo.9db1a297d9 |
clang.pgo.9db1a297d9 |
| async_tree_none |
511 ms |
383 ms: 1.33x faster |
394 ms: 1.30x faster |
357 ms: 1.43x faster |
| async_tree_cpu_io_mixed |
933 ms |
805 ms: 1.16x faster |
749 ms: 1.25x faster |
697 ms: 1.34x faster |
| async_tree_cpu_io_mixed_tg |
891 ms |
776 ms: 1.15x faster |
716 ms: 1.24x faster |
665 ms: 1.34x faster |
| async_tree_eager |
209 ms |
153 ms: 1.37x faster |
160 ms: 1.31x faster |
133 ms: 1.57x faster |
| async_tree_eager_cpu_io_mixed |
656 ms |
630 ms: 1.04x faster |
567 ms: 1.16x faster |
535 ms: 1.23x faster |
| async_tree_eager_cpu_io_mixed_tg |
830 ms |
741 ms: 1.12x faster |
681 ms: 1.22x faster |
646 ms: 1.28x faster |
| async_tree_eager_io |
1.12 sec |
870 ms: 1.29x faster |
874 ms: 1.28x faster |
817 ms: 1.37x faster |
| async_tree_eager_io_tg |
1.12 sec |
890 ms: 1.26x faster |
898 ms: 1.25x faster |
840 ms: 1.33x faster |
| async_tree_eager_memoization |
393 ms |
304 ms: 1.29x faster |
304 ms: 1.29x faster |
281 ms: 1.40x faster |
| async_tree_eager_memoization_tg |
546 ms |
420 ms: 1.30x faster |
427 ms: 1.28x faster |
397 ms: 1.37x faster |
| async_tree_eager_tg |
408 ms |
312 ms: 1.31x faster |
321 ms: 1.27x faster |
297 ms: 1.38x faster |
| async_tree_io |
1.14 sec |
868 ms: 1.31x faster |
889 ms: 1.28x faster |
824 ms: 1.38x faster |
| async_tree_io_tg |
1.14 sec |
871 ms: 1.31x faster |
877 ms: 1.30x faster |
807 ms: 1.41x faster |
| async_tree_memoization |
649 ms |
493 ms: 1.32x faster |
509 ms: 1.28x faster |
458 ms: 1.42x faster |
| async_tree_memoization_tg |
605 ms |
453 ms: 1.34x faster |
462 ms: 1.31x faster |
425 ms: 1.42x faster |
| async_tree_none_tg |
497 ms |
371 ms: 1.34x faster |
382 ms: 1.30x faster |
352 ms: 1.41x faster |
| Geometric mean |
(ref) |
1.26x faster |
1.27x faster |
1.38x faster |
Benchmarks with tag 'math':
| Benchmark |
msvc.release.9db1a297d9 |
clang.release.9db1a297d9 |
msvc.pgo.9db1a297d9 |
clang.pgo.9db1a297d9 |
| float |
145 ms |
108 ms: 1.35x faster |
116 ms: 1.25x faster |
96.8 ms: 1.50x faster |
| nbody |
203 ms |
155 ms: 1.31x faster |
171 ms: 1.19x faster |
128 ms: 1.58x faster |
| pidigits |
245 ms |
250 ms: 1.02x slower |
250 ms: 1.02x slower |
240 ms: 1.02x faster |
| Geometric mean |
(ref) |
1.20x faster |
1.13x faster |
1.34x faster |
Benchmarks with tag 'regex':
| Benchmark |
msvc.release.9db1a297d9 |
clang.release.9db1a297d9 |
msvc.pgo.9db1a297d9 |
clang.pgo.9db1a297d9 |
| regex_compile |
237 ms |
180 ms: 1.31x faster |
180 ms: 1.31x faster |
157 ms: 1.51x faster |
| regex_dna |
226 ms |
256 ms: 1.14x slower |
210 ms: 1.07x faster |
211 ms: 1.07x faster |
| regex_effbot |
4.05 ms |
not significant |
3.66 ms: 1.11x faster |
3.39 ms: 1.20x faster |
| regex_v8 |
38.7 ms |
35.7 ms: 1.08x faster |
33.7 ms: 1.15x faster |
29.8 ms: 1.30x faster |
| Geometric mean |
(ref) |
1.06x faster |
1.16x faster |
1.26x faster |
Benchmarks with tag 'serialize':
| Benchmark |
msvc.release.9db1a297d9 |
clang.release.9db1a297d9 |
msvc.pgo.9db1a297d9 |
clang.pgo.9db1a297d9 |
| json_dumps |
19.6 ms |
16.9 ms: 1.16x faster |
15.0 ms: 1.31x faster |
12.9 ms: 1.52x faster |
| json_loads |
48.1 us |
46.7 us: 1.03x faster |
36.8 us: 1.31x faster |
32.7 us: 1.47x faster |
| pickle |
21.5 us |
17.9 us: 1.20x faster |
19.1 us: 1.13x faster |
15.0 us: 1.44x faster |
| pickle_dict |
46.0 us |
34.3 us: 1.34x faster |
43.2 us: 1.07x faster |
27.6 us: 1.67x faster |
| pickle_list |
8.16 us |
6.19 us: 1.32x faster |
6.89 us: 1.18x faster |
5.05 us: 1.62x faster |
| pickle_pure_python |
672 us |
455 us: 1.48x faster |
463 us: 1.45x faster |
378 us: 1.78x faster |
| tomli_loads |
3.84 sec |
2.79 sec: 1.38x faster |
2.88 sec: 1.33x faster |
2.38 sec: 1.61x faster |
| unpickle |
26.2 us |
24.0 us: 1.09x faster |
19.8 us: 1.32x faster |
17.9 us: 1.46x faster |
| unpickle_list |
7.29 us |
6.03 us: 1.21x faster |
6.87 us: 1.06x faster |
5.38 us: 1.36x faster |
| unpickle_pure_python |
505 us |
321 us: 1.57x faster |
336 us: 1.50x faster |
257 us: 1.96x faster |
| xml_etree_parse |
232 ms |
228 ms: 1.02x faster |
200 ms: 1.16x faster |
210 ms: 1.10x faster |
| xml_etree_iterparse |
185 ms |
160 ms: 1.16x faster |
154 ms: 1.21x faster |
145 ms: 1.27x faster |
| xml_etree_generate |
181 ms |
148 ms: 1.22x faster |
135 ms: 1.35x faster |
119 ms: 1.53x faster |
| xml_etree_process |
128 ms |
100 ms: 1.28x faster |
94.4 ms: 1.36x faster |
82.0 ms: 1.56x faster |
| Geometric mean |
(ref) |
1.24x faster |
1.26x faster |
1.51x faster |
Benchmarks with tag 'startup':
| Benchmark |
msvc.release.9db1a297d9 |
clang.release.9db1a297d9 |
msvc.pgo.9db1a297d9 |
clang.pgo.9db1a297d9 |
| python_startup |
45.4 ms |
not significant |
43.1 ms: 1.05x faster |
43.7 ms: 1.04x faster |
| python_startup_no_site |
37.1 ms |
not significant |
35.4 ms: 1.05x faster |
35.9 ms: 1.03x faster |
| Geometric mean |
(ref) |
1.00x faster |
1.05x faster |
1.04x faster |
Benchmarks with tag 'template':
| Benchmark |
msvc.release.9db1a297d9 |
clang.release.9db1a297d9 |
msvc.pgo.9db1a297d9 |
clang.pgo.9db1a297d9 |
| django_template |
75.6 ms |
55.6 ms: 1.36x faster |
52.1 ms: 1.45x faster |
42.1 ms: 1.79x faster |
| genshi_text |
44.5 ms |
31.4 ms: 1.42x faster |
32.5 ms: 1.37x faster |
26.3 ms: 1.69x faster |
| genshi_xml |
102 ms |
74.0 ms: 1.37x faster |
74.6 ms: 1.36x faster |
63.1 ms: 1.61x faster |
| mako |
23.3 ms |
17.7 ms: 1.31x faster |
16.7 ms: 1.39x faster |
14.4 ms: 1.61x faster |
| Geometric mean |
(ref) |
1.36x faster |
1.39x faster |
1.67x faster |
All benchmarks:
| Benchmark |
msvc.release.9db1a297d9 |
clang.release.9db1a297d9 |
msvc.pgo.9db1a297d9 |
clang.pgo.9db1a297d9 |
| 2to3 |
586 ms |
491 ms: 1.19x faster |
462 ms: 1.27x faster |
426 ms: 1.38x faster |
| async_generators |
696 ms |
565 ms: 1.23x faster |
577 ms: 1.21x faster |
514 ms: 1.35x faster |
| async_tree_none |
511 ms |
383 ms: 1.33x faster |
394 ms: 1.30x faster |
357 ms: 1.43x faster |
| async_tree_cpu_io_mixed |
933 ms |
805 ms: 1.16x faster |
749 ms: 1.25x faster |
697 ms: 1.34x faster |
| async_tree_cpu_io_mixed_tg |
891 ms |
776 ms: 1.15x faster |
716 ms: 1.24x faster |
665 ms: 1.34x faster |
| async_tree_eager |
209 ms |
153 ms: 1.37x faster |
160 ms: 1.31x faster |
133 ms: 1.57x faster |
| async_tree_eager_cpu_io_mixed |
656 ms |
630 ms: 1.04x faster |
567 ms: 1.16x faster |
535 ms: 1.23x faster |
| async_tree_eager_cpu_io_mixed_tg |
830 ms |
741 ms: 1.12x faster |
681 ms: 1.22x faster |
646 ms: 1.28x faster |
| async_tree_eager_io |
1.12 sec |
870 ms: 1.29x faster |
874 ms: 1.28x faster |
817 ms: 1.37x faster |
| async_tree_eager_io_tg |
1.12 sec |
890 ms: 1.26x faster |
898 ms: 1.25x faster |
840 ms: 1.33x faster |
| async_tree_eager_memoization |
393 ms |
304 ms: 1.29x faster |
304 ms: 1.29x faster |
281 ms: 1.40x faster |
| async_tree_eager_memoization_tg |
546 ms |
420 ms: 1.30x faster |
427 ms: 1.28x faster |
397 ms: 1.37x faster |
| async_tree_eager_tg |
408 ms |
312 ms: 1.31x faster |
321 ms: 1.27x faster |
297 ms: 1.38x faster |
| async_tree_io |
1.14 sec |
868 ms: 1.31x faster |
889 ms: 1.28x faster |
824 ms: 1.38x faster |
| async_tree_io_tg |
1.14 sec |
871 ms: 1.31x faster |
877 ms: 1.30x faster |
807 ms: 1.41x faster |
| async_tree_memoization |
649 ms |
493 ms: 1.32x faster |
509 ms: 1.28x faster |
458 ms: 1.42x faster |
| async_tree_memoization_tg |
605 ms |
453 ms: 1.34x faster |
462 ms: 1.31x faster |
425 ms: 1.42x faster |
| async_tree_none_tg |
497 ms |
371 ms: 1.34x faster |
382 ms: 1.30x faster |
352 ms: 1.41x faster |
| asyncio_tcp |
1.64 sec |
1.55 sec: 1.06x faster |
1.48 sec: 1.11x faster |
not significant |
| asyncio_websockets |
732 ms |
578 ms: 1.27x faster |
758 ms: 1.04x slower |
not significant |
| chaos |
132 ms |
88.6 ms: 1.48x faster |
90.8 ms: 1.45x faster |
74.3 ms: 1.77x faster |
| comprehensions |
34.7 us |
24.5 us: 1.42x faster |
25.2 us: 1.38x faster |
19.2 us: 1.80x faster |
| bench_mp_pool |
213 ms |
196 ms: 1.09x faster |
177 ms: 1.20x faster |
190 ms: 1.12x faster |
| bench_thread_pool |
1.95 ms |
1.74 ms: 1.12x faster |
1.68 ms: 1.16x faster |
1.63 ms: 1.19x faster |
| coroutines |
45.3 ms |
33.9 ms: 1.34x faster |
36.1 ms: 1.25x faster |
26.9 ms: 1.68x faster |
| coverage |
130 ms |
119 ms: 1.09x faster |
120 ms: 1.09x faster |
103 ms: 1.26x faster |
| crypto_pyaes |
147 ms |
109 ms: 1.35x faster |
109 ms: 1.35x faster |
86.3 ms: 1.70x faster |
| deepcopy |
516 us |
391 us: 1.32x faster |
388 us: 1.33x faster |
309 us: 1.67x faster |
| deepcopy_reduce |
5.30 us |
4.19 us: 1.26x faster |
3.95 us: 1.34x faster |
3.23 us: 1.64x faster |
| deepcopy_memo |
67.1 us |
41.6 us: 1.61x faster |
46.8 us: 1.44x faster |
34.8 us: 1.93x faster |
| deltablue |
7.72 ms |
4.52 ms: 1.71x faster |
4.92 ms: 1.57x faster |
3.80 ms: 2.03x faster |
| django_template |
75.6 ms |
55.6 ms: 1.36x faster |
52.1 ms: 1.45x faster |
42.1 ms: 1.79x faster |
| docutils |
4.27 sec |
3.75 sec: 1.14x faster |
3.50 sec: 1.22x faster |
3.31 sec: 1.29x faster |
| dulwich_log |
156 ms |
141 ms: 1.11x faster |
129 ms: 1.20x faster |
131 ms: 1.19x faster |
| fannkuch |
770 ms |
592 ms: 1.30x faster |
637 ms: 1.21x faster |
516 ms: 1.49x faster |
| float |
145 ms |
108 ms: 1.35x faster |
116 ms: 1.25x faster |
96.8 ms: 1.50x faster |
| create_gc_cycles |
1.62 ms |
1.71 ms: 1.05x slower |
not significant |
1.71 ms: 1.05x slower |
| gc_traversal |
5.03 ms |
not significant |
4.02 ms: 1.25x faster |
5.71 ms: 1.13x slower |
| generators |
65.1 ms |
40.4 ms: 1.61x faster |
44.4 ms: 1.47x faster |
36.0 ms: 1.81x faster |
| genshi_text |
44.5 ms |
31.4 ms: 1.42x faster |
32.5 ms: 1.37x faster |
26.3 ms: 1.69x faster |
| genshi_xml |
102 ms |
74.0 ms: 1.37x faster |
74.6 ms: 1.36x faster |
63.1 ms: 1.61x faster |
| go |
255 ms |
147 ms: 1.73x faster |
170 ms: 1.50x faster |
132 ms: 1.94x faster |
| hexiom |
13.4 ms |
8.49 ms: 1.58x faster |
9.22 ms: 1.46x faster |
7.11 ms: 1.89x faster |
| html5lib |
104 ms |
81.6 ms: 1.28x faster |
77.9 ms: 1.34x faster |
74.5 ms: 1.40x faster |
| json_dumps |
19.6 ms |
16.9 ms: 1.16x faster |
15.0 ms: 1.31x faster |
12.9 ms: 1.52x faster |
| json_loads |
48.1 us |
46.7 us: 1.03x faster |
36.8 us: 1.31x faster |
32.7 us: 1.47x faster |
| logging_format |
21.2 us |
16.4 us: 1.29x faster |
14.7 us: 1.44x faster |
13.6 us: 1.56x faster |
| logging_silent |
213 ns |
143 ns: 1.49x faster |
152 ns: 1.40x faster |
109 ns: 1.95x faster |
| logging_simple |
19.4 us |
14.6 us: 1.33x faster |
13.5 us: 1.44x faster |
12.2 us: 1.60x faster |
| mako |
23.3 ms |
17.7 ms: 1.31x faster |
16.7 ms: 1.39x faster |
14.4 ms: 1.61x faster |
| mdp |
3.99 sec |
4.12 sec: 1.03x slower |
3.76 sec: 1.06x faster |
3.37 sec: 1.18x faster |
| meteor_contest |
175 ms |
133 ms: 1.32x faster |
139 ms: 1.26x faster |
124 ms: 1.41x faster |
| nbody |
203 ms |
155 ms: 1.31x faster |
171 ms: 1.19x faster |
128 ms: 1.58x faster |
| nqueens |
179 ms |
129 ms: 1.38x faster |
131 ms: 1.37x faster |
103 ms: 1.73x faster |
| pathlib |
278 ms |
266 ms: 1.04x faster |
256 ms: 1.09x faster |
262 ms: 1.06x faster |
| pickle |
21.5 us |
17.9 us: 1.20x faster |
19.1 us: 1.13x faster |
15.0 us: 1.44x faster |
| pickle_dict |
46.0 us |
34.3 us: 1.34x faster |
43.2 us: 1.07x faster |
27.6 us: 1.67x faster |
| pickle_list |
8.16 us |
6.19 us: 1.32x faster |
6.89 us: 1.18x faster |
5.05 us: 1.62x faster |
| pickle_pure_python |
672 us |
455 us: 1.48x faster |
463 us: 1.45x faster |
378 us: 1.78x faster |
| pidigits |
245 ms |
250 ms: 1.02x slower |
250 ms: 1.02x slower |
240 ms: 1.02x faster |
| pprint_safe_repr |
1.46 sec |
1.09 sec: 1.34x faster |
1.09 sec: 1.34x faster |
934 ms: 1.57x faster |
| pprint_pformat |
3.00 sec |
2.22 sec: 1.35x faster |
2.23 sec: 1.35x faster |
1.91 sec: 1.57x faster |
| pyflate |
875 ms |
626 ms: 1.40x faster |
668 ms: 1.31x faster |
537 ms: 1.63x faster |
| python_startup |
45.4 ms |
not significant |
43.1 ms: 1.05x faster |
43.7 ms: 1.04x faster |
| python_startup_no_site |
37.1 ms |
not significant |
35.4 ms: 1.05x faster |
35.9 ms: 1.03x faster |
| raytrace |
587 ms |
385 ms: 1.52x faster |
414 ms: 1.42x faster |
321 ms: 1.83x faster |
| regex_compile |
237 ms |
180 ms: 1.31x faster |
180 ms: 1.31x faster |
157 ms: 1.51x faster |
| regex_dna |
226 ms |
256 ms: 1.14x slower |
210 ms: 1.07x faster |
211 ms: 1.07x faster |
| regex_effbot |
4.05 ms |
not significant |
3.66 ms: 1.11x faster |
3.39 ms: 1.20x faster |
| regex_v8 |
38.7 ms |
35.7 ms: 1.08x faster |
33.7 ms: 1.15x faster |
29.8 ms: 1.30x faster |
| richards |
102 ms |
65.3 ms: 1.56x faster |
64.7 ms: 1.58x faster |
49.7 ms: 2.05x faster |
| richards_super |
116 ms |
74.3 ms: 1.57x faster |
74.7 ms: 1.56x faster |
56.2 ms: 2.07x faster |
| scimark_fft |
664 ms |
485 ms: 1.37x faster |
493 ms: 1.35x faster |
358 ms: 1.85x faster |
| scimark_lu |
227 ms |
159 ms: 1.43x faster |
164 ms: 1.39x faster |
132 ms: 1.72x faster |
| scimark_monte_carlo |
138 ms |
91.6 ms: 1.51x faster |
101 ms: 1.37x faster |
74.6 ms: 1.85x faster |
| scimark_sor |
256 ms |
176 ms: 1.46x faster |
195 ms: 1.31x faster |
151 ms: 1.69x faster |
| scimark_sparse_mat_mult |
8.76 ms |
6.31 ms: 1.39x faster |
6.06 ms: 1.45x faster |
5.01 ms: 1.75x faster |
| spectral_norm |
179 ms |
136 ms: 1.32x faster |
151 ms: 1.19x faster |
110 ms: 1.63x faster |
| sqlglot_normalize |
204 ms |
156 ms: 1.31x faster |
151 ms: 1.35x faster |
131 ms: 1.55x faster |
| sqlglot_optimize |
97.0 ms |
77.7 ms: 1.25x faster |
74.5 ms: 1.30x faster |
66.2 ms: 1.47x faster |
| sqlglot_parse |
2.52 ms |
1.72 ms: 1.46x faster |
1.81 ms: 1.39x faster |
1.51 ms: 1.66x faster |
| sqlglot_transpile |
3.02 ms |
2.15 ms: 1.41x faster |
2.21 ms: 1.37x faster |
1.85 ms: 1.63x faster |
| sqlite_synth |
4.08 us |
3.81 us: 1.07x faster |
3.75 us: 1.09x faster |
3.44 us: 1.18x faster |
| sympy_expand |
818 ms |
681 ms: 1.20x faster |
640 ms: 1.28x faster |
578 ms: 1.42x faster |
| sympy_integrate |
33.5 ms |
28.2 ms: 1.19x faster |
27.1 ms: 1.24x faster |
24.2 ms: 1.38x faster |
| sympy_sum |
258 ms |
222 ms: 1.16x faster |
213 ms: 1.21x faster |
199 ms: 1.29x faster |
| sympy_str |
484 ms |
405 ms: 1.20x faster |
383 ms: 1.26x faster |
344 ms: 1.41x faster |
| telco |
13.1 ms |
11.1 ms: 1.17x faster |
10.7 ms: 1.22x faster |
9.37 ms: 1.40x faster |
| tomli_loads |
3.84 sec |
2.79 sec: 1.38x faster |
2.88 sec: 1.33x faster |
2.38 sec: 1.61x faster |
| typing_runtime_protocols |
296 us |
239 us: 1.24x faster |
223 us: 1.32x faster |
193 us: 1.53x faster |
| unpack_sequence |
152 ns |
58.2 ns: 2.61x faster |
84.8 ns: 1.79x faster |
59.3 ns: 2.56x faster |
| unpickle |
26.2 us |
24.0 us: 1.09x faster |
19.8 us: 1.32x faster |
17.9 us: 1.46x faster |
| unpickle_list |
7.29 us |
6.03 us: 1.21x faster |
6.87 us: 1.06x faster |
5.38 us: 1.36x faster |
| unpickle_pure_python |
505 us |
321 us: 1.57x faster |
336 us: 1.50x faster |
257 us: 1.96x faster |
| xml_etree_parse |
232 ms |
228 ms: 1.02x faster |
200 ms: 1.16x faster |
210 ms: 1.10x faster |
| xml_etree_iterparse |
185 ms |
160 ms: 1.16x faster |
154 ms: 1.21x faster |
145 ms: 1.27x faster |
| xml_etree_generate |
181 ms |
148 ms: 1.22x faster |
135 ms: 1.35x faster |
119 ms: 1.53x faster |
| xml_etree_process |
128 ms |
100 ms: 1.28x faster |
94.4 ms: 1.36x faster |
82.0 ms: 1.56x faster |
| Geometric mean |
(ref) |
1.27x faster |
1.28x faster |
1.47x faster |
Benchmark hidden because not significant (1): asyncio_tcp_ssl
More benchmarks (including clang-cl 18.1.8, 20.1.0.rc2, computed gotos and tailcall) can be found in https://gist.github.com/chris-eibl/114a42f22563956fdb5cd0335b28c7ae.
Raw data is here https://gist.github.com/chris-eibl/c73b02762a7c467e9a410a0aa19c7701.
Feature or enhancement
Proposal:
Support PGO (profile guided optimization) for clang-cl on Windows using a similar approach as done in the Linux makefiles for clang.
Has this already been discussed elsewhere?
No response given
Links to previous discussion of this feature:
Discussion has started in the PR #129907 while being draft.
Linked PRs
64bit pyperformance results on my Windows 10 PC (dusty i5-4570 CPU) run with
--fast --affinity 0for commit 9db1a29 withclang 18.1.8 is faster than 19.1.1, and 20.1.0.rc2 with tailcalling is the fastest:
Details
Benchmarks with tag 'apps':
Benchmarks with tag 'asyncio':
Benchmarks with tag 'math':
Benchmarks with tag 'regex':
Benchmarks with tag 'serialize':
Benchmarks with tag 'startup':
Benchmarks with tag 'template':
All benchmarks:
Benchmark hidden because not significant (1): asyncio_tcp_ssl
More benchmarks (including clang-cl 18.1.8, 20.1.0.rc2,
computed gotosandtailcall) can be found in https://gist.github.com/chris-eibl/114a42f22563956fdb5cd0335b28c7ae.Raw data is here https://gist.github.com/chris-eibl/c73b02762a7c467e9a410a0aa19c7701.