ProcessPoolExecutor shutdown hangs after future cancel was requested

**Bug report**

With a ProcessPoolExecutor, after submitting and quickly canceling a future, a call to `shutdown(wait=True)` would hang indefinitely.
This happens pretty much on all platforms and all recent Python versions.

Here is a minimal reproduction:

```py
import concurrent.futures

ppe = concurrent.futures.ProcessPoolExecutor(1)
ppe.submit(int).result()
ppe.submit(int).cancel()
ppe.shutdown(wait=True)
```

The first submission gets the executor going and creates its internal `queue_management_thread`.
The second submission appears to get that thread to loop, enter a wait state, and never receive a wakeup event.

Introducing a tiny sleep between the second submit and its cancel request makes the issue disappear. From my initial observation it looks like something in the way the `queue_management_worker` internal loop is structured doesn't handle this edge case well.

Shutting down with `wait=False` would return immediately as expected, but the `queue_management_thread` would then die with an unhandled `OSError: handle is closed` exception.

**Environment**

* Discovered on macOS-12.2.1 with cpython 3.8.5.
* Reproduced in Ubuntu and Windows (x64) as well, and in cpython versions 3.7 to 3.11.0-beta.3.
* Reproduced in pypy3.8 as well, but not consistently. Seen for example in Ubuntu with Python 3.8.13 (PyPy 7.3.9).

**Additional info**

When tested with `pytest-timeout` under Ubuntu and cpython 3.8.13, these are the tracebacks at the moment of timing out:

<details>

```pytb
_____________________________________ test _____________________________________
    @pytest.mark.timeout(10)
    def test():
        ppe = concurrent.futures.ProcessPoolExecutor(1)
        ppe.submit(int).result()
        ppe.submit(int).cancel()
>       ppe.shutdown(wait=True)
test_reproduce_python_bug.py:14:
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
/opt/hostedtoolcache/Python/3.8.13/x64/lib/python3.8/concurrent/futures/process.py:686: in shutdown
    self._queue_management_thread.join()
/opt/hostedtoolcache/Python/3.8.13/x64/lib/python3.8/threading.py:1011: in join
    self._wait_for_tstate_lock()
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
self = <Thread(QueueManagerThread, started daemon 140003176535808)>
block = True, timeout = -1
    def _wait_for_tstate_lock(self, block=True, timeout=-1):
        # Issue #18808: wait for the thread state to be gone.
        # At the end of the thread's life, after all knowledge of the thread
        # is removed from C data structures, C code releases our _tstate_lock.
        # This method passes its arguments to _tstate_lock.acquire().
        # If the lock is acquired, the C code is done, and self._stop() is
        # called.  That sets ._is_stopped to True, and ._tstate_lock to None.
        lock = self._tstate_lock
        if lock is None:  # already determined that the C code is done
            assert self._is_stopped
>       elif lock.acquire(block, timeout):
E       Failed: Timeout >10.0s
/opt/hostedtoolcache/Python/3.8.13/x64/lib/python3.8/threading.py:1027: Failed
----------------------------- Captured stderr call -----------------------------
+++++++++++++++++++++++++++++++++++ Timeout ++++++++++++++++++++++++++++++++++++
~~~~~~~~~~~~~~~~~ Stack of QueueFeederThread (140003159754496) ~~~~~~~~~~~~~~~~~
  File "/opt/hostedtoolcache/Python/3.8.13/x64/lib/python3.8/threading.py", line 890, in _bootstrap
    self._bootstrap_inner()
  File "/opt/hostedtoolcache/Python/3.8.13/x64/lib/python3.8/threading.py", line 932, in _bootstrap_inner
    self.run()
  File "/opt/hostedtoolcache/Python/3.8.13/x64/lib/python3.8/threading.py", line 870, in run
    self._target(*self._args, **self._kwargs)
  File "/opt/hostedtoolcache/Python/3.8.13/x64/lib/python3.8/multiprocessing/queues.py", line 227, in _feed
    nwait()
  File "/opt/hostedtoolcache/Python/3.8.13/x64/lib/python3.8/threading.py", line 302, in wait
    waiter.acquire()
~~~~~~~~~~~~~~~~ Stack of QueueManagerThread (140003176535808) ~~~~~~~~~~~~~~~~~
  File "/opt/hostedtoolcache/Python/3.8.13/x64/lib/python3.8/threading.py", line 890, in _bootstrap
    self._bootstrap_inner()
  File "/opt/hostedtoolcache/Python/3.8.13/x64/lib/python3.8/threading.py", line 932, in _bootstrap_inner
    self.run()
  File "/opt/hostedtoolcache/Python/3.8.13/x64/lib/python3.8/threading.py", line 870, in run
    self._target(*self._args, **self._kwargs)
  File "/opt/hostedtoolcache/Python/3.8.13/x64/lib/python3.8/concurrent/futures/process.py", line 362, in _queue_management_worker
    ready = mp.connection.wait(readers + worker_sentinels)
  File "/opt/hostedtoolcache/Python/3.8.13/x64/lib/python3.8/multiprocessing/connection.py", line 931, in wait
    ready = selector.select(timeout)
  File "/opt/hostedtoolcache/Python/3.8.13/x64/lib/python3.8/selectors.py", line 415, in select
    fd_event_list = self._selector.poll(timeout)
+++++++++++++++++++++++++++++++++++ Timeout ++++++++++++++++++++++++++++++++++++
```

</details>

Tracebacks in PyPy are similar on the `concurrent.futures.process` level. Tracebacks in Windows are different in the lower-level areas, but again similar on the `concurrent.futures.process` level.

Linked PRs:
- #94468


### Linked PRs
* gh-94468
* gh-102746
* gh-102747

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

ProcessPoolExecutor shutdown hangs after future cancel was requested #94440

Linked PRs

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Uh oh!

ProcessPoolExecutor shutdown hangs after future cancel was requested #94440

Description

Linked PRs

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions