Skip to content

fix: improve disconnect recovery UX — don't show onboarding to existing users #8311

@jeffa-block

Description

@jeffa-block

Summary

When a session fails with "Lost connection to server", the only recovery option ("Go home") can dump existing users into the onboarding Welcome screen. The error screen itself offers no retry, no context, and no way to recover.

Problem — Two Bugs Compounding

Bug 1: "Go home" sends configured users to onboarding

Repro: Have two Goose windows open. Kill/restart goosed (or lose network). One window shows "Failed to Load Session". Click "Go home".

What happens: setView('chat') navigates to /, which is wrapped in OnboardingGuard. The guard calls readConfig('GOOSE_PROVIDER') — a POST to goosed. If goosed is still recovering, the catch block sets hasProvider = false and renders the full "Welcome to goose — Connect an AI model provider" screen.

File: ui/desktop/src/components/onboarding/OnboardingGuard.tsx:42-44

} catch (error) {
  console.error('Error checking provider:', error);
  setHasProvider(false);  // ← Wrong: assumes no provider when server is just unreachable
}

Impact: Existing users think they need to reconfigure. Some do, breaking their setup.

Bug 2: No retry or context recovery on the error screen

File: ui/desktop/src/components/BaseChat.tsx:342-370

The error screen shows a red banner and a single "Go home" button. No retry. No session ID. No way to recover the last message. The SSE layer (useSessionEvents.ts) already has reconnect logic with exponential backoff — but the UI error screen was never updated to leverage it.

Proposed Fix — Two Small PRs

PR1: OnboardingGuard fail-open (1 file, +9/-2 lines)

Persist a localStorage flag (goose_has_provider) on successful provider detection. On config read failure, check the flag before falling through to onboarding. Existing users see the error screen instead of Welcome.

+        const configured = provider.trim() !== '';
+        setHasProvider(configured);
+        if (configured) {
+          localStorage.setItem('goose_has_provider', 'true');
+        }
       } catch (error) {
         console.error('Error checking provider:', error);
-        setHasProvider(false);
+        const previouslyConfigured = localStorage.getItem('goose_has_provider') === 'true';
+        setHasProvider(previouslyConfigured);

PR2: Error screen retry + new session buttons (2 files, +33/-12 lines)

Replace the single "Go home" button with:

  1. "Retry connection" — clears error state, re-triggers session load via a retryCount state in the useEffect dependency array
  2. "New session" — navigates to hub (same as old "Go home" but clearer label)
  3. Session ID — displayed and selectable for context recovery / bug reports

Slack Evidence

Date Channel Quote
03/04 #goose-help "I'm running into Failed to Load Session / Stream error: Lost connection to server"
03/04 #ai-help-community "I am facing Failed to load session issue... I restarted goose and restarted my machine"
03/04 #goose-help "The Goose error reads: Failed to Load Session - Stream Error - Lost connection to Server"
24/03 #goose-slackbot "exposing some kind of 'clear/reset session' functionality would help this which has been a common request"
16/03 #nexus "every now and then im getting this error, and have to abandon the session and start a new one. Is there a way to recover from this?"

Prior Art

Reference Status Notes
#1520 — Retry button Closed Never shipped for disconnect errors
#7972 — Safe agent recovery Closed @kyledef attempted this. Closed for process reasons, not technical.
#7834 — SSE separation Merged SSE reconnects automatically now, but UI error screen wasn't updated
#8298 — Preserve context on retry Open Different scope — automated retry context, not disconnect screen

Edge Cases

  1. First-time user with no localStorage flaggetItem returns null, === 'true' is false → falls through to onboarding as expected. No regression.
  2. User clears browser storage — same as first-time, falls through to onboarding. Acceptable — they'll set up once and the flag is re-persisted.
  3. Retry when server is permanently down — retry clears error, shows loading state, then error reappears after resumeAgent fails again. User can retry again or click "New session". No infinite loop.
  4. Retry when server recovers — retry clears error, resumeAgent succeeds, session loads normally. Best case.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions