Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
215 changes: 215 additions & 0 deletions docs/plans/issue-209-research.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,215 @@
# Research: Send() blocks indefinitely when ReadyForInitialPrompt returns false (#209)

## Problem Context

When `POST /message` with `type: "user"` is called, `Send()` hangs indefinitely if `ReadyForInitialPrompt` never returns `true` for the current screen content, even though `GET /status` returns `"stable"`.

**Root cause chain:**

1. `statusLocked()` does NOT check `initialPromptReady`. It returns `"stable"` when the screen is stable and the queue is empty.
2. `Send()` checks `statusLocked() != ConversationStatusStable` — this passes, so the message is enqueued.
3. The send loop blocks on `<-c.stableSignal`.
4. The snapshot loop only fires `stableSignal` when `c.initialPromptReady && len(c.outboundQueue) > 0 && c.isScreenStableLocked()`.
5. Since `initialPromptReady` is `false` and never becomes `true`, the signal never fires.
6. `Send()` blocks forever on `<-errCh`.

**Real-world trigger:** Claude Code v2.1.87 shows a theme selection onboarding screen using `╌╌╌` (U+254C, BOX DRAWINGS LIGHT DOUBLE DASH HORIZONTAL) instead of `───` (U+2500, BOX DRAWINGS LIGHT HORIZONTAL). The message box detection fails because `findGreaterThanMessageBox` / `findGenericSlimMessageBox` look for `───────────────` which isn't present. `ReadyForInitialPrompt` stays `false`.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P3 Line numbers for pty_conversation.go functions are wrong at the base SHA. statusLocked() is cited at 505 (actual: 549), Send() at 358 (actual: 379), snapshot loop signal at 229 (actual: 250). The offsets are non-uniform, ruling out a version mismatch. The code content in each snippet is accurate, so readers will find the right code by function name, but the specific line citations are fabricated precision. The smaller files (message_box.go, agent_readiness.go) have correct numbers, suggesting confabulation for the large file. (Mafu-san, downgraded from P2: consequence is wasted reader time, not wrong conclusions.)

🤖

## Code Analysis

### `statusLocked()` — `lib/screentracker/pty_conversation.go:505-537`

```go
func (c *PTYConversation) statusLocked() ConversationStatus {
// ...sanity checks...
snapshots := c.snapshotBuffer.GetAll()
if len(c.messages) > 0 && c.messages[len(c.messages)-1].Role == ConversationRoleUser {
return ConversationStatusChanging
}
if len(snapshots) != c.stableSnapshotsThreshold {
return ConversationStatusInitializing
}
if !c.isScreenStableLocked() {
return ConversationStatusChanging
}
// Handle initial prompt readiness: report "changing" until the queue is drained
if len(c.outboundQueue) > 0 || c.sendingMessage {
return ConversationStatusChanging
}
return ConversationStatusStable
}
```

**Key observation:** `initialPromptReady` is never consulted. The status can be `"stable"` even when `initialPromptReady` is `false`.

### `Send()` — `lib/screentracker/pty_conversation.go:358-378`

```go
func (c *PTYConversation) Send(messageParts ...MessagePart) error {
// ...validation...
c.lock.Lock()
if c.statusLocked() != ConversationStatusStable {
c.lock.Unlock()
return ErrMessageValidationChanging
}
c.lock.Unlock()
errCh := make(chan error, 1)
c.outboundQueue <- outboundMessage{parts: messageParts, errCh: errCh}
return <-errCh // blocks forever if stableSignal never fires
}
```

### Snapshot loop signal logic — `lib/screentracker/pty_conversation.go:229-236`

```go
if c.initialPromptReady && len(c.outboundQueue) > 0 && c.isScreenStableLocked() {
select {
case c.stableSignal <- struct{}{}:
c.sendingMessage = true
default:
}
}
```

### `findGreaterThanMessageBox` — `lib/msgfmt/message_box.go:11-22`

```go
func findGreaterThanMessageBox(lines []string) int {
for i := len(lines) - 1; i >= max(len(lines)-6, 0); i-- {
if strings.Contains(lines[i], ">") {
if i > 0 && strings.Contains(lines[i-1], "───────────────") {
return i - 1
}
return i
}
}
return -1
}
```

Only checks for `───────────────` (U+2500). Does not handle `╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌` (U+254C).

### `findGenericSlimMessageBox` — `lib/msgfmt/message_box.go:28-38`

```go
func findGenericSlimMessageBox(lines []string) int {
for i := len(lines) - 3; i >= max(len(lines)-9, 0); i-- {
if strings.Contains(lines[i], "───────────────") &&
(strings.Contains(lines[i+1], "|") || strings.Contains(lines[i+1], "│") || strings.Contains(lines[i+1], "❯")) &&
strings.Contains(lines[i+2], "───────────────") {
return i
}
}
return -1
}
```

Same limitation — only checks for U+2500.

### `isGenericAgentReadyForInitialPrompt` — `lib/msgfmt/agent_readiness.go:34-38`

```go
func isGenericAgentReadyForInitialPrompt(message string) bool {
message = trimEmptyLines(message)
messageWithoutInputBox := removeMessageBox(message)
return len(messageWithoutInputBox) != len(message)
}
```

Returns `true` only if `removeMessageBox` actually removes something. If neither message box detector matches, the message is unchanged and readiness returns `false`.

### Existing test: "agent not ready - status is stable" — `pty_conversation_test.go:~1082`

The existing test `TestInitialPromptReadiness/"agent not ready - status is stable until agent becomes ready"` already asserts that status is `stable` when the agent is not ready. This is the **current expected behavior** when there IS an initial prompt configured but the agent hasn't become ready yet.

However, this test ALSO has `InitialPrompt` configured. The issue scenario is different: `Send()` is called by the user (not as initial prompt) while `initialPromptReady` is `false`.

### Existing test: "no initial prompt - normal status logic applies" — `pty_conversation_test.go:~1160`

When `ReadyForInitialPrompt` always returns `false` AND there is no `InitialPrompt` configured, status is currently `stable`. From a pure screen-stability perspective this is correct — the screen IS stable.

However, `Send()` will still block in this state because `stableSignal` requires `initialPromptReady` to fire. This means status says `stable` but the system cannot actually process user messages — an inconsistency that is the root cause of the bug.

### Existing test: "no initial prompt configured - normal status logic applies" — `pty_conversation_test.go:~1207`

When `ReadyForInitialPrompt` is NOT set (nil → defaults to `return true`) and no `InitialPrompt` is configured, status correctly reaches `stable`. This test is UNAFFECTED by the fix because `initialPromptReady` becomes `true` on the first snapshot tick via the default function.

## Approaches

### Approach A: Guard `statusLocked()` with `initialPromptReady` check

**Description:** When `initialPromptReady` is `false`, return `ConversationStatusChanging` (or a new status) from `statusLocked()`. This prevents `Send()` from enqueueing and returns `ErrMessageValidationChanging` immediately.

**Precedent:** `statusLocked()` already returns `ConversationStatusChanging` when there are items in the outbound queue or a message is being sent. This follows the same pattern.

**Strongest argument for:** Fail-fast. Any future detection failures fail immediately with a clear error instead of hanging. This is a general safety net.

**Strongest argument against:** Changes the public status semantics. Currently, `statusLocked()` reports on screen state. Adding `initialPromptReady` couples it to agent detection. Also, callers currently expect `"stable"` to mean "screen is stable" — now it would also mean "agent detection succeeded". This could break the existing test `TestInitialPromptReadiness/"agent not ready - status is stable until agent becomes ready"` which explicitly asserts status is `stable` when readiness is `false`.

**Consideration:** The `stableSignal` only gates the signal when `initialPromptReady` is false. But this is orthogonal to the **user-initiated** `Send()` path. The initial prompt path and the user-message path both go through the same queue and same signal. The real issue is that `initialPromptReady` gates the signal for ALL queued messages, not just the initial prompt.

**Nuance:** We need to be careful about when `ReadyForInitialPrompt` is `nil` (defaults to `func(string) bool { return true }`). When there's no readiness function, `initialPromptReady` becomes `true` on the first snapshot tick. This won't cause regressions.

### Approach B: Decouple `stableSignal` from `initialPromptReady` for user-sent messages

**Description:** Only gate the `stableSignal` on `initialPromptReady` for the initial prompt. For user-sent messages, fire the signal based purely on screen stability. This could be done by tracking whether the queued message is the initial prompt or a user message.

**Strongest argument for:** Precisely targets the bug without changing status semantics. The initial prompt legitimately needs readiness gating; user messages do not.

**Strongest argument against:** Adds complexity to the queue/signal mechanism. The `outboundQueue` currently treats all messages uniformly. Adding message-type awareness complicates the design. Also, if the agent truly isn't ready, sending a user message to it may not work correctly anyway.

**What this makes easy:** Preserves existing status semantics and test assertions.
**What this makes hard:** Complicating the send path and potentially allowing messages to be sent to an unready agent.

### Approach C: Improve message box detection to handle `╌` (U+254C)

**Description:** Add `╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌` as an alternative pattern in `findGreaterThanMessageBox` and `findGenericSlimMessageBox`.

**Strongest argument for:** Fixes the specific real-world trigger. The onboarding screen for Claude Code v2.1.87 uses this character.

**Strongest argument against:** Only fixes this one variant. Future Claude Code versions might use yet another character. Does not prevent the indefinite hang for other detection failures.

**What this makes easy:** Simple, targeted fix.
**What this makes hard:** Doesn't address the systemic issue.

### Approach D: Both A and C (recommended by the issue)

**Description:** Fix the detection for this specific Claude Code version (C) AND add the `statusLocked()` guard (A) so future detection failures fail fast.

**Strongest argument for:** Belt and suspenders. Fixes the immediate problem and prevents the class of bugs.

**Strongest argument against:** Status semantics change (same as A). However, the issue author explicitly recommends this.

## Decisions

### Decision 1: Guard `statusLocked()` with `initialPromptReady` AND fix detection
- **Question:** How to prevent `Send()` from hanging when readiness detection fails?
- **Options:** (A) guard statusLocked only, (B) decouple stableSignal, (C) fix detection only, (D) both A+C
- **Chosen:** D — both guard and detection
- **Classification:** Agent-recommended (issue author also recommends option D)
- **Reasoning:** The `stableSignal` gates ALL outbound messages on `initialPromptReady`. `statusLocked()` must reflect this. The detection fix handles the immediate trigger; the guard prevents the class of bugs.

### Decision 2: Return `ConversationStatusChanging` when `initialPromptReady` is false
- **Question:** What status to return when readiness is false?
- **Options:** `changing` vs `initializing`
- **Chosen:** `changing`
- **Classification:** Agent-recommended
- **Reasoning:** The snapshot buffer IS full (past the `initializing` phase). `changing` matches the error `ErrMessageValidationChanging`.

### Decision 3: Apply the guard unconditionally (not only when `InitialPrompt` is configured)
- **Question:** Should the `initialPromptReady` guard only apply when `InitialPrompt` is configured? (Open Question 2)
- **Options:** Conditional (only when InitialPrompt set) vs unconditional
- **Chosen:** Unconditional
- **Classification:** Agent-recommended
- **Reasoning:** The `stableSignal` gates on `initialPromptReady` for ALL queued messages, not just the initial prompt. If `initialPromptReady` is false and a user calls `Send()`, the message hangs regardless of whether `InitialPrompt` is configured. Status must reflect actual send capability. When `ReadyForInitialPrompt` is nil (default), it auto-returns `true` and `initialPromptReady` becomes `true` on the first snapshot tick — before status could transition to `stable`. So the unconditional guard causes no regressions for the default case.

### Decision 4: Add U+254C only to detection
- **Question:** Which additional Unicode box-drawing characters to support?
- **Options:** Just U+254C, also U+254D, broad set
- **Chosen:** U+254C only
- **Classification:** Agent-recommended
- **Reasoning:** This is the specific character seen in the wild. The `statusLocked()` guard provides the safety net for future unknown characters.

## Open Questions

All open questions have been resolved — see Decisions section above.
54 changes: 54 additions & 0 deletions docs/plans/issue-209.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,54 @@
# Plan: Fix Send() blocking indefinitely when ReadyForInitialPrompt returns false (#209)

Research: [issue-209-research.md](./issue-209-research.md)

Stacked on: [PR #208](https://github.com/coder/agentapi/pull/208) (`fix/write-stabilize-non-fatal-phase1`)

## Problem

`Send()` hangs indefinitely when `ReadyForInitialPrompt` never returns `true`, even though `Status()` reports `"stable"`. This is because `statusLocked()` doesn't check `initialPromptReady`, allowing `Send()` to enqueue messages, but the `stableSignal` (which the send loop waits on) requires `initialPromptReady == true`. If readiness never arrives, the signal never fires and `Send()` blocks forever. (See research: Problem Context)

The real-world trigger is Claude Code v2.1.87's onboarding screen using `╌` (U+254C) instead of `─` (U+2500) in its box-drawing characters, causing message box detection to fail. (See research: Code Analysis — findGreaterThanMessageBox)

## Decisions

1. **Guard `statusLocked()` with `initialPromptReady` check AND fix detection.**
- Options: (A) guard statusLocked only, (B) decouple stableSignal, (C) fix detection only, (D) both A+C.
- Chosen: D — both guard and detection.
- Classification: Agent-recommended (issue author also recommends option D).
- Reasoning: The `stableSignal` gates ALL outbound messages on `initialPromptReady`. `statusLocked()` must reflect this — otherwise status says "stable" but the system cannot process messages. The detection fix handles the immediate trigger; the guard prevents the class of bugs.

2. **Return `ConversationStatusChanging` (not `Initializing`) when `initialPromptReady` is false.**
- Options: `changing` vs `initializing`.
- Chosen: `changing`.
- Classification: Agent-recommended.
- Reasoning: The snapshot buffer IS full (past the `initializing` phase). The error `ErrMessageValidationChanging` says "message can only be sent when the agent is waiting for user input" — which is semantically correct when readiness hasn't been detected.

3. **Apply the guard unconditionally (not only when `InitialPrompt` is configured).**
- Options: Conditional (only when InitialPrompt set) vs unconditional.
- Chosen: Unconditional.
- Classification: Agent-recommended.
- Reasoning: The `stableSignal` gates on `initialPromptReady` for ALL queued messages, not just the initial prompt. If `initialPromptReady` is false and a user calls `Send()`, the message hangs regardless of whether `InitialPrompt` is configured. Status must reflect actual send capability. When `ReadyForInitialPrompt` is nil (the default), it auto-returns `true` and `initialPromptReady` becomes `true` on the first snapshot tick — before status could ever transition to `stable`. So the unconditional guard causes no regressions for the default case. (See research: Approach A — Nuance)

4. **Add `╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌` (U+254C repeated) as an alternative pattern in message box detection.**
- Options: just U+254C, also U+254D, broad set of horizontal box-drawing characters.
- Chosen: U+254C only.
- Classification: Agent-recommended.
- Reasoning: This is the specific character seen in the wild. The `statusLocked()` guard provides the safety net for future unknown characters. Adding a broad set risks false positives.

5. **Update existing tests that assert `stable` when `initialPromptReady` is false.**
- The tests `"agent not ready - status is stable until agent becomes ready"` and `"no initial prompt - normal status logic applies"` currently assert `stable` when readiness is false. The research notes the second test's assertion was correct from a pure screen-stability perspective, but is inconsistent with `Send()` behavior: `stableSignal` gates on `initialPromptReady` for ALL messages, so `Send()` would hang despite `stable` status. The status must reflect actual send capability.
- The test `"no initial prompt configured - normal status logic applies"` is **unaffected** because it doesn't set `ReadyForInitialPrompt`, so the default (`return true`) applies and `initialPromptReady` becomes `true` before stability is reached.
- Classification: Agent-recommended.

## Implementation Flow

1. **Message box detection** — Extend both message box detection functions to also match `╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌` patterns. Add a testdata fixture for the Claude onboarding screen with `╌` characters. Run readiness tests to verify.

2. **Status guard** — Add a check in the status logic: when `initialPromptReady` is false and the screen is otherwise stable, return `changing` instead of `stable`. This prevents `Send()` from enqueueing messages that can never be processed.

3. **Update existing tests** — Fix the two tests that assert `stable` when readiness is false. Update them to expect `changing`. Confirm the third "no initial prompt configured" test is unaffected.

4. **Add reproducing test** — Add a test that demonstrates `Send()` returns an error instead of hanging when `initialPromptReady` is false.

5. **Run full test suite** — Verify no regressions.
14 changes: 7 additions & 7 deletions lib/msgfmt/message_box.go
Original file line number Diff line number Diff line change
Expand Up @@ -5,14 +5,14 @@ import (
)

// Usually something like
// ───────────────
// ─────────────── (or ╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌)
// >
// ───────────────
// ─────────────── (or ╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌)
// Used by Claude Code, Goose, and Aider.
func findGreaterThanMessageBox(lines []string) int {
for i := len(lines) - 1; i >= max(len(lines)-6, 0); i-- {
if strings.Contains(lines[i], ">") {
if i > 0 && strings.Contains(lines[i-1], "───────────────") {
Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Obs findGenericSlimMessageBox accepts mixed border styles — e.g. ─── on top and ╌╌╌ on bottom would match. (Edge Case Analyst, Contract Auditor)

The top and bottom borders are checked independently with ||. In practice, Claude Code uses the same character for both borders, so this is defensive rather than problematic.

Agreed this is fine — more permissive detection reduces false negatives, and the narrow scan window (last 9 lines) plus the prompt-character requirement on the middle line make false positives unlikely. Worth knowing if a false-positive concern arises later.

if i > 0 && (strings.Contains(lines[i-1], "───────────────") || strings.Contains(lines[i-1], "╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌")) {
return i - 1
}
return i
Expand All @@ -22,14 +22,14 @@ func findGreaterThanMessageBox(lines []string) int {
}

// Usually something like
// ───────────────
// ─────────────── (or ╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌)
// |
// ───────────────
// ─────────────── (or ╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌)
func findGenericSlimMessageBox(lines []string) int {
for i := len(lines) - 3; i >= max(len(lines)-9, 0); i-- {
if strings.Contains(lines[i], "───────────────") &&
if (strings.Contains(lines[i], "───────────────") || strings.Contains(lines[i], "╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌")) &&
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P3 The addition in findGenericSlimMessageBox has no test coverage. The fixture (msg_dashed_box.txt) uses > as the prompt character, so findGreaterThanMessageBox matches first and this function is never reached. A fixture with borders and or as the prompt indicator would close the gap. Without it, a typo in the dashed string on any of these three lines would pass the entire test suite, and the statusLocked() guard would mask the regression as permanent "running" status rather than a clear detection failure. (Bisky P3, Mafuuu P3)

Separately: "╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌" is now inlined 3 times (lines 15, 30, 32), mirroring the pre-existing 3x "───────────────". A containsHorizontalBorder(line string) bool helper would collapse all six sites and prevent copy-paste miscounts on these visually-identical 15-char Unicode strings. (Gon P3)

🤖

(strings.Contains(lines[i+1], "|") || strings.Contains(lines[i+1], "│") || strings.Contains(lines[i+1], "❯")) &&
strings.Contains(lines[i+2], "───────────────") {
(strings.Contains(lines[i+2], "───────────────") || strings.Contains(lines[i+2], "╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌")) {
return i
}
}
Expand Down
Loading
Loading