-
Notifications
You must be signed in to change notification settings - Fork 115
fix(screentracker): prevent Send() from blocking when ReadyForInitialPrompt stays false #215
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: fix/write-stabilize-non-fatal-phase1
Are you sure you want to change the base?
Changes from all commits
99b9c11
761b7d5
e80b116
934e832
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,215 @@ | ||
| # Research: Send() blocks indefinitely when ReadyForInitialPrompt returns false (#209) | ||
|
|
||
| ## Problem Context | ||
|
|
||
| When `POST /message` with `type: "user"` is called, `Send()` hangs indefinitely if `ReadyForInitialPrompt` never returns `true` for the current screen content, even though `GET /status` returns `"stable"`. | ||
|
|
||
| **Root cause chain:** | ||
|
|
||
| 1. `statusLocked()` does NOT check `initialPromptReady`. It returns `"stable"` when the screen is stable and the queue is empty. | ||
| 2. `Send()` checks `statusLocked() != ConversationStatusStable` — this passes, so the message is enqueued. | ||
| 3. The send loop blocks on `<-c.stableSignal`. | ||
| 4. The snapshot loop only fires `stableSignal` when `c.initialPromptReady && len(c.outboundQueue) > 0 && c.isScreenStableLocked()`. | ||
| 5. Since `initialPromptReady` is `false` and never becomes `true`, the signal never fires. | ||
| 6. `Send()` blocks forever on `<-errCh`. | ||
|
|
||
| **Real-world trigger:** Claude Code v2.1.87 shows a theme selection onboarding screen using `╌╌╌` (U+254C, BOX DRAWINGS LIGHT DOUBLE DASH HORIZONTAL) instead of `───` (U+2500, BOX DRAWINGS LIGHT HORIZONTAL). The message box detection fails because `findGreaterThanMessageBox` / `findGenericSlimMessageBox` look for `───────────────` which isn't present. `ReadyForInitialPrompt` stays `false`. | ||
|
|
||
| ## Code Analysis | ||
|
|
||
| ### `statusLocked()` — `lib/screentracker/pty_conversation.go:505-537` | ||
|
|
||
| ```go | ||
| func (c *PTYConversation) statusLocked() ConversationStatus { | ||
| // ...sanity checks... | ||
| snapshots := c.snapshotBuffer.GetAll() | ||
| if len(c.messages) > 0 && c.messages[len(c.messages)-1].Role == ConversationRoleUser { | ||
| return ConversationStatusChanging | ||
| } | ||
| if len(snapshots) != c.stableSnapshotsThreshold { | ||
| return ConversationStatusInitializing | ||
| } | ||
| if !c.isScreenStableLocked() { | ||
| return ConversationStatusChanging | ||
| } | ||
| // Handle initial prompt readiness: report "changing" until the queue is drained | ||
| if len(c.outboundQueue) > 0 || c.sendingMessage { | ||
| return ConversationStatusChanging | ||
| } | ||
| return ConversationStatusStable | ||
| } | ||
| ``` | ||
|
|
||
| **Key observation:** `initialPromptReady` is never consulted. The status can be `"stable"` even when `initialPromptReady` is `false`. | ||
|
|
||
| ### `Send()` — `lib/screentracker/pty_conversation.go:358-378` | ||
|
|
||
| ```go | ||
| func (c *PTYConversation) Send(messageParts ...MessagePart) error { | ||
| // ...validation... | ||
| c.lock.Lock() | ||
| if c.statusLocked() != ConversationStatusStable { | ||
| c.lock.Unlock() | ||
| return ErrMessageValidationChanging | ||
| } | ||
| c.lock.Unlock() | ||
| errCh := make(chan error, 1) | ||
| c.outboundQueue <- outboundMessage{parts: messageParts, errCh: errCh} | ||
| return <-errCh // blocks forever if stableSignal never fires | ||
| } | ||
| ``` | ||
|
|
||
| ### Snapshot loop signal logic — `lib/screentracker/pty_conversation.go:229-236` | ||
|
|
||
| ```go | ||
| if c.initialPromptReady && len(c.outboundQueue) > 0 && c.isScreenStableLocked() { | ||
| select { | ||
| case c.stableSignal <- struct{}{}: | ||
| c.sendingMessage = true | ||
| default: | ||
| } | ||
| } | ||
| ``` | ||
|
|
||
| ### `findGreaterThanMessageBox` — `lib/msgfmt/message_box.go:11-22` | ||
|
|
||
| ```go | ||
| func findGreaterThanMessageBox(lines []string) int { | ||
| for i := len(lines) - 1; i >= max(len(lines)-6, 0); i-- { | ||
| if strings.Contains(lines[i], ">") { | ||
| if i > 0 && strings.Contains(lines[i-1], "───────────────") { | ||
| return i - 1 | ||
| } | ||
| return i | ||
| } | ||
| } | ||
| return -1 | ||
| } | ||
| ``` | ||
|
|
||
| Only checks for `───────────────` (U+2500). Does not handle `╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌` (U+254C). | ||
|
|
||
| ### `findGenericSlimMessageBox` — `lib/msgfmt/message_box.go:28-38` | ||
|
|
||
| ```go | ||
| func findGenericSlimMessageBox(lines []string) int { | ||
| for i := len(lines) - 3; i >= max(len(lines)-9, 0); i-- { | ||
| if strings.Contains(lines[i], "───────────────") && | ||
| (strings.Contains(lines[i+1], "|") || strings.Contains(lines[i+1], "│") || strings.Contains(lines[i+1], "❯")) && | ||
| strings.Contains(lines[i+2], "───────────────") { | ||
| return i | ||
| } | ||
| } | ||
| return -1 | ||
| } | ||
| ``` | ||
|
|
||
| Same limitation — only checks for U+2500. | ||
|
|
||
| ### `isGenericAgentReadyForInitialPrompt` — `lib/msgfmt/agent_readiness.go:34-38` | ||
|
|
||
| ```go | ||
| func isGenericAgentReadyForInitialPrompt(message string) bool { | ||
| message = trimEmptyLines(message) | ||
| messageWithoutInputBox := removeMessageBox(message) | ||
| return len(messageWithoutInputBox) != len(message) | ||
| } | ||
| ``` | ||
|
|
||
| Returns `true` only if `removeMessageBox` actually removes something. If neither message box detector matches, the message is unchanged and readiness returns `false`. | ||
|
|
||
| ### Existing test: "agent not ready - status is stable" — `pty_conversation_test.go:~1082` | ||
|
|
||
| The existing test `TestInitialPromptReadiness/"agent not ready - status is stable until agent becomes ready"` already asserts that status is `stable` when the agent is not ready. This is the **current expected behavior** when there IS an initial prompt configured but the agent hasn't become ready yet. | ||
|
|
||
| However, this test ALSO has `InitialPrompt` configured. The issue scenario is different: `Send()` is called by the user (not as initial prompt) while `initialPromptReady` is `false`. | ||
|
|
||
| ### Existing test: "no initial prompt - normal status logic applies" — `pty_conversation_test.go:~1160` | ||
|
|
||
| When `ReadyForInitialPrompt` always returns `false` AND there is no `InitialPrompt` configured, status is currently `stable`. From a pure screen-stability perspective this is correct — the screen IS stable. | ||
|
|
||
| However, `Send()` will still block in this state because `stableSignal` requires `initialPromptReady` to fire. This means status says `stable` but the system cannot actually process user messages — an inconsistency that is the root cause of the bug. | ||
|
|
||
| ### Existing test: "no initial prompt configured - normal status logic applies" — `pty_conversation_test.go:~1207` | ||
|
|
||
| When `ReadyForInitialPrompt` is NOT set (nil → defaults to `return true`) and no `InitialPrompt` is configured, status correctly reaches `stable`. This test is UNAFFECTED by the fix because `initialPromptReady` becomes `true` on the first snapshot tick via the default function. | ||
|
|
||
| ## Approaches | ||
|
|
||
| ### Approach A: Guard `statusLocked()` with `initialPromptReady` check | ||
|
|
||
| **Description:** When `initialPromptReady` is `false`, return `ConversationStatusChanging` (or a new status) from `statusLocked()`. This prevents `Send()` from enqueueing and returns `ErrMessageValidationChanging` immediately. | ||
|
|
||
| **Precedent:** `statusLocked()` already returns `ConversationStatusChanging` when there are items in the outbound queue or a message is being sent. This follows the same pattern. | ||
|
|
||
| **Strongest argument for:** Fail-fast. Any future detection failures fail immediately with a clear error instead of hanging. This is a general safety net. | ||
|
|
||
| **Strongest argument against:** Changes the public status semantics. Currently, `statusLocked()` reports on screen state. Adding `initialPromptReady` couples it to agent detection. Also, callers currently expect `"stable"` to mean "screen is stable" — now it would also mean "agent detection succeeded". This could break the existing test `TestInitialPromptReadiness/"agent not ready - status is stable until agent becomes ready"` which explicitly asserts status is `stable` when readiness is `false`. | ||
|
|
||
| **Consideration:** The `stableSignal` only gates the signal when `initialPromptReady` is false. But this is orthogonal to the **user-initiated** `Send()` path. The initial prompt path and the user-message path both go through the same queue and same signal. The real issue is that `initialPromptReady` gates the signal for ALL queued messages, not just the initial prompt. | ||
|
|
||
| **Nuance:** We need to be careful about when `ReadyForInitialPrompt` is `nil` (defaults to `func(string) bool { return true }`). When there's no readiness function, `initialPromptReady` becomes `true` on the first snapshot tick. This won't cause regressions. | ||
|
|
||
| ### Approach B: Decouple `stableSignal` from `initialPromptReady` for user-sent messages | ||
|
|
||
| **Description:** Only gate the `stableSignal` on `initialPromptReady` for the initial prompt. For user-sent messages, fire the signal based purely on screen stability. This could be done by tracking whether the queued message is the initial prompt or a user message. | ||
|
|
||
| **Strongest argument for:** Precisely targets the bug without changing status semantics. The initial prompt legitimately needs readiness gating; user messages do not. | ||
|
|
||
| **Strongest argument against:** Adds complexity to the queue/signal mechanism. The `outboundQueue` currently treats all messages uniformly. Adding message-type awareness complicates the design. Also, if the agent truly isn't ready, sending a user message to it may not work correctly anyway. | ||
|
|
||
| **What this makes easy:** Preserves existing status semantics and test assertions. | ||
| **What this makes hard:** Complicating the send path and potentially allowing messages to be sent to an unready agent. | ||
|
|
||
| ### Approach C: Improve message box detection to handle `╌` (U+254C) | ||
|
|
||
| **Description:** Add `╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌` as an alternative pattern in `findGreaterThanMessageBox` and `findGenericSlimMessageBox`. | ||
|
|
||
| **Strongest argument for:** Fixes the specific real-world trigger. The onboarding screen for Claude Code v2.1.87 uses this character. | ||
|
|
||
| **Strongest argument against:** Only fixes this one variant. Future Claude Code versions might use yet another character. Does not prevent the indefinite hang for other detection failures. | ||
|
|
||
| **What this makes easy:** Simple, targeted fix. | ||
| **What this makes hard:** Doesn't address the systemic issue. | ||
|
|
||
| ### Approach D: Both A and C (recommended by the issue) | ||
|
|
||
| **Description:** Fix the detection for this specific Claude Code version (C) AND add the `statusLocked()` guard (A) so future detection failures fail fast. | ||
|
|
||
| **Strongest argument for:** Belt and suspenders. Fixes the immediate problem and prevents the class of bugs. | ||
|
|
||
| **Strongest argument against:** Status semantics change (same as A). However, the issue author explicitly recommends this. | ||
|
|
||
| ## Decisions | ||
|
|
||
| ### Decision 1: Guard `statusLocked()` with `initialPromptReady` AND fix detection | ||
| - **Question:** How to prevent `Send()` from hanging when readiness detection fails? | ||
| - **Options:** (A) guard statusLocked only, (B) decouple stableSignal, (C) fix detection only, (D) both A+C | ||
| - **Chosen:** D — both guard and detection | ||
| - **Classification:** Agent-recommended (issue author also recommends option D) | ||
| - **Reasoning:** The `stableSignal` gates ALL outbound messages on `initialPromptReady`. `statusLocked()` must reflect this. The detection fix handles the immediate trigger; the guard prevents the class of bugs. | ||
|
|
||
| ### Decision 2: Return `ConversationStatusChanging` when `initialPromptReady` is false | ||
| - **Question:** What status to return when readiness is false? | ||
| - **Options:** `changing` vs `initializing` | ||
| - **Chosen:** `changing` | ||
| - **Classification:** Agent-recommended | ||
| - **Reasoning:** The snapshot buffer IS full (past the `initializing` phase). `changing` matches the error `ErrMessageValidationChanging`. | ||
|
|
||
| ### Decision 3: Apply the guard unconditionally (not only when `InitialPrompt` is configured) | ||
| - **Question:** Should the `initialPromptReady` guard only apply when `InitialPrompt` is configured? (Open Question 2) | ||
| - **Options:** Conditional (only when InitialPrompt set) vs unconditional | ||
| - **Chosen:** Unconditional | ||
| - **Classification:** Agent-recommended | ||
| - **Reasoning:** The `stableSignal` gates on `initialPromptReady` for ALL queued messages, not just the initial prompt. If `initialPromptReady` is false and a user calls `Send()`, the message hangs regardless of whether `InitialPrompt` is configured. Status must reflect actual send capability. When `ReadyForInitialPrompt` is nil (default), it auto-returns `true` and `initialPromptReady` becomes `true` on the first snapshot tick — before status could transition to `stable`. So the unconditional guard causes no regressions for the default case. | ||
|
|
||
| ### Decision 4: Add U+254C only to detection | ||
| - **Question:** Which additional Unicode box-drawing characters to support? | ||
| - **Options:** Just U+254C, also U+254D, broad set | ||
| - **Chosen:** U+254C only | ||
| - **Classification:** Agent-recommended | ||
| - **Reasoning:** This is the specific character seen in the wild. The `statusLocked()` guard provides the safety net for future unknown characters. | ||
|
|
||
| ## Open Questions | ||
|
|
||
| All open questions have been resolved — see Decisions section above. | ||
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,54 @@ | ||
| # Plan: Fix Send() blocking indefinitely when ReadyForInitialPrompt returns false (#209) | ||
|
|
||
| Research: [issue-209-research.md](./issue-209-research.md) | ||
|
|
||
| Stacked on: [PR #208](https://github.com/coder/agentapi/pull/208) (`fix/write-stabilize-non-fatal-phase1`) | ||
|
|
||
| ## Problem | ||
|
|
||
| `Send()` hangs indefinitely when `ReadyForInitialPrompt` never returns `true`, even though `Status()` reports `"stable"`. This is because `statusLocked()` doesn't check `initialPromptReady`, allowing `Send()` to enqueue messages, but the `stableSignal` (which the send loop waits on) requires `initialPromptReady == true`. If readiness never arrives, the signal never fires and `Send()` blocks forever. (See research: Problem Context) | ||
|
|
||
| The real-world trigger is Claude Code v2.1.87's onboarding screen using `╌` (U+254C) instead of `─` (U+2500) in its box-drawing characters, causing message box detection to fail. (See research: Code Analysis — findGreaterThanMessageBox) | ||
|
|
||
| ## Decisions | ||
|
|
||
| 1. **Guard `statusLocked()` with `initialPromptReady` check AND fix detection.** | ||
| - Options: (A) guard statusLocked only, (B) decouple stableSignal, (C) fix detection only, (D) both A+C. | ||
| - Chosen: D — both guard and detection. | ||
| - Classification: Agent-recommended (issue author also recommends option D). | ||
| - Reasoning: The `stableSignal` gates ALL outbound messages on `initialPromptReady`. `statusLocked()` must reflect this — otherwise status says "stable" but the system cannot process messages. The detection fix handles the immediate trigger; the guard prevents the class of bugs. | ||
|
|
||
| 2. **Return `ConversationStatusChanging` (not `Initializing`) when `initialPromptReady` is false.** | ||
| - Options: `changing` vs `initializing`. | ||
| - Chosen: `changing`. | ||
| - Classification: Agent-recommended. | ||
| - Reasoning: The snapshot buffer IS full (past the `initializing` phase). The error `ErrMessageValidationChanging` says "message can only be sent when the agent is waiting for user input" — which is semantically correct when readiness hasn't been detected. | ||
|
|
||
| 3. **Apply the guard unconditionally (not only when `InitialPrompt` is configured).** | ||
| - Options: Conditional (only when InitialPrompt set) vs unconditional. | ||
| - Chosen: Unconditional. | ||
| - Classification: Agent-recommended. | ||
| - Reasoning: The `stableSignal` gates on `initialPromptReady` for ALL queued messages, not just the initial prompt. If `initialPromptReady` is false and a user calls `Send()`, the message hangs regardless of whether `InitialPrompt` is configured. Status must reflect actual send capability. When `ReadyForInitialPrompt` is nil (the default), it auto-returns `true` and `initialPromptReady` becomes `true` on the first snapshot tick — before status could ever transition to `stable`. So the unconditional guard causes no regressions for the default case. (See research: Approach A — Nuance) | ||
|
|
||
| 4. **Add `╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌` (U+254C repeated) as an alternative pattern in message box detection.** | ||
| - Options: just U+254C, also U+254D, broad set of horizontal box-drawing characters. | ||
| - Chosen: U+254C only. | ||
| - Classification: Agent-recommended. | ||
| - Reasoning: This is the specific character seen in the wild. The `statusLocked()` guard provides the safety net for future unknown characters. Adding a broad set risks false positives. | ||
|
|
||
| 5. **Update existing tests that assert `stable` when `initialPromptReady` is false.** | ||
| - The tests `"agent not ready - status is stable until agent becomes ready"` and `"no initial prompt - normal status logic applies"` currently assert `stable` when readiness is false. The research notes the second test's assertion was correct from a pure screen-stability perspective, but is inconsistent with `Send()` behavior: `stableSignal` gates on `initialPromptReady` for ALL messages, so `Send()` would hang despite `stable` status. The status must reflect actual send capability. | ||
| - The test `"no initial prompt configured - normal status logic applies"` is **unaffected** because it doesn't set `ReadyForInitialPrompt`, so the default (`return true`) applies and `initialPromptReady` becomes `true` before stability is reached. | ||
| - Classification: Agent-recommended. | ||
|
|
||
| ## Implementation Flow | ||
|
|
||
| 1. **Message box detection** — Extend both message box detection functions to also match `╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌` patterns. Add a testdata fixture for the Claude onboarding screen with `╌` characters. Run readiness tests to verify. | ||
|
|
||
| 2. **Status guard** — Add a check in the status logic: when `initialPromptReady` is false and the screen is otherwise stable, return `changing` instead of `stable`. This prevents `Send()` from enqueueing messages that can never be processed. | ||
|
|
||
| 3. **Update existing tests** — Fix the two tests that assert `stable` when readiness is false. Update them to expect `changing`. Confirm the third "no initial prompt configured" test is unaffected. | ||
|
|
||
| 4. **Add reproducing test** — Add a test that demonstrates `Send()` returns an error instead of hanging when `initialPromptReady` is false. | ||
|
|
||
| 5. **Run full test suite** — Verify no regressions. |
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -5,14 +5,14 @@ import ( | |
| ) | ||
|
|
||
| // Usually something like | ||
| // ─────────────── | ||
| // ─────────────── (or ╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌) | ||
| // > | ||
| // ─────────────── | ||
| // ─────────────── (or ╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌) | ||
| // Used by Claude Code, Goose, and Aider. | ||
| func findGreaterThanMessageBox(lines []string) int { | ||
| for i := len(lines) - 1; i >= max(len(lines)-6, 0); i-- { | ||
| if strings.Contains(lines[i], ">") { | ||
| if i > 0 && strings.Contains(lines[i-1], "───────────────") { | ||
|
Member
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Obs
Agreed this is fine — more permissive detection reduces false negatives, and the narrow scan window (last 9 lines) plus the prompt-character requirement on the middle line make false positives unlikely. Worth knowing if a false-positive concern arises later. |
||
| if i > 0 && (strings.Contains(lines[i-1], "───────────────") || strings.Contains(lines[i-1], "╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌")) { | ||
| return i - 1 | ||
| } | ||
| return i | ||
|
|
@@ -22,14 +22,14 @@ func findGreaterThanMessageBox(lines []string) int { | |
| } | ||
|
|
||
| // Usually something like | ||
| // ─────────────── | ||
| // ─────────────── (or ╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌) | ||
| // | | ||
| // ─────────────── | ||
| // ─────────────── (or ╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌) | ||
| func findGenericSlimMessageBox(lines []string) int { | ||
| for i := len(lines) - 3; i >= max(len(lines)-9, 0); i-- { | ||
| if strings.Contains(lines[i], "───────────────") && | ||
| if (strings.Contains(lines[i], "───────────────") || strings.Contains(lines[i], "╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌")) && | ||
|
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. P3 The Separately:
|
||
| (strings.Contains(lines[i+1], "|") || strings.Contains(lines[i+1], "│") || strings.Contains(lines[i+1], "❯")) && | ||
| strings.Contains(lines[i+2], "───────────────") { | ||
| (strings.Contains(lines[i+2], "───────────────") || strings.Contains(lines[i+2], "╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌")) { | ||
| return i | ||
| } | ||
| } | ||
|
|
||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
P3 Line numbers for
pty_conversation.gofunctions are wrong at the base SHA.statusLocked()is cited at 505 (actual: 549),Send()at 358 (actual: 379), snapshot loop signal at 229 (actual: 250). The offsets are non-uniform, ruling out a version mismatch. The code content in each snippet is accurate, so readers will find the right code by function name, but the specific line citations are fabricated precision. The smaller files (message_box.go,agent_readiness.go) have correct numbers, suggesting confabulation for the large file. (Mafu-san, downgraded from P2: consequence is wasted reader time, not wrong conclusions.)