Non-blocking `_PyMutex_LockTimed` spins and may fail unnecessarily in no-GIL builds

# Bug report

### Bug description:

### Non-blocking `_PyMutex_LockTimed` spins and may fail unnecessarily (no-GIL build)

**Branch:** `main` (commit `2793b68f758c10fb63b264787f10d46a71fc8086`)  
**Build:** configured with `--disable-gil` (`MAX_SPIN_COUNT > 0`)  
**OS:** all

---

## Summary

`_PyMutex_LockTimed()` is supposed to return immediately when called with
`timeout == 0` (non-blocking).  In a no-GIL build it:

1. **Spins for up to `MAX_SPIN_COUNT` yields** before returning;  
2. Can still return **`PY_LOCK_FAILURE` even though the lock was released** during that spin.

Timed/blocking calls (`timeout > 0`) also waste CPU because the spin loop
never reloads the lock word, so they can’t notice an unlock until after the
maximum spin count.

The bug is invisible in GIL builds because `MAX_SPIN_COUNT` is 0.

---

## Root cause

```c
/* Python/lock.c — excerpts */

if ((v & _Py_LOCKED) == 0) {
    if (_Py_atomic_compare_exchange_uint8(&m->_bits, &v, v | _Py_LOCKED))
        return PY_LOCK_ACQUIRED;
}
else if (timeout == 0) {            // executes only if FIRST load saw LOCKED
    return PY_LOCK_FAILURE;         // non-blocking, OK
}

/* … later … */
if (!(v & _Py_HAS_PARKED) && spin_count < MAX_SPIN_COUNT) {
    _Py_yield();
    spin_count++;
    continue;                       // BUG: never refreshes v
}
```

* If our fast CAS loses a race, the `else if` guard is skipped, so a
  *non-blocking* call drops into the spin loop.  
* Inside that loop `v` is never reloaded, so the thread can’t see that the
  lock became free.  After `MAX_SPIN_COUNT` iterations it falls through to
  the same `timeout == 0` guard and fails spuriously.

---

## Proposed fix

```c
/* 1 — reload v each spin */
if (!(v & _Py_HAS_PARKED) && spin_count < MAX_SPIN_COUNT) {
    _Py_yield();
    spin_count++;
    v = _Py_atomic_load_uint8_relaxed(&m->_bits);   // ← added
    continue;
}

/* 2 — early-out for timeout == 0 */
if ((v & _Py_LOCKED) == 0) {
    if (_Py_atomic_compare_exchange_uint8(&m->_bits, &v, v | _Py_LOCKED))
        return PY_LOCK_ACQUIRED;
}
if (timeout == 0) {                               // ← moved outside else
    return PY_LOCK_FAILURE;
}
```

### Result

* **Non-blocking calls** now return immediately: success if the CAS wins,
  failure if it loses – no spinning, no parking.
* **Timed/blocking calls** still spin for fairness, but they now reload the
  lock word each iteration, so they acquire promptly once the lock is free.



### CPython versions tested on:

CPython main branch

### Operating systems tested on:

macOS


### Linked PRs
* gh-135872
* gh-135946
* gh-135947
* gh-146064

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Non-blocking `_PyMutex_LockTimed` spins and may fail unnecessarily in no-GIL builds #135871

Bug report

Bug description:

Non-blocking `_PyMutex_LockTimed` spins and may fail unnecessarily (no-GIL build)

Summary

Root cause

Proposed fix

Result

CPython versions tested on:

Operating systems tested on:

Linked PRs

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Uh oh!

Non-blocking _PyMutex_LockTimed spins and may fail unnecessarily in no-GIL builds #135871

Description

Bug report

Bug description:

Non-blocking _PyMutex_LockTimed spins and may fail unnecessarily (no-GIL build)

Summary

Root cause

Proposed fix

Result

CPython versions tested on:

Operating systems tested on:

Linked PRs

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions

Non-blocking `_PyMutex_LockTimed` spins and may fail unnecessarily in no-GIL builds #135871

Non-blocking `_PyMutex_LockTimed` spins and may fail unnecessarily (no-GIL build)