Listens to you always, watches your screen and watches you via camera to learn from you.
Totally local (really - all local models, all the time, no external LLMs)
Experimental - This project is actively evolving. The core capture and insight pipeline works, but the automated actions system (DSL-based macOS automation) is still being implemented and tested.
A macOS menu bar app that acts as a personal ambient intelligence assistant. Captures screen, voice, and face data, analyzes it with on-device LLMs, and surfaces insights to help you stay aware of your work patterns, collaborators, and wellbeing.
WIP Much of this project is WIP, but especially the actions/agentic component: this is using work documented here: https://arxiv.org/abs/2409.00608 with a simplified "Tool Rag" approach for taking local actions with things like apple script, but barely works (am working on it!)
100% local. No cloud. No tracking.
# Build the app
just build
# Run (builds if needed)
just run
# Clean build artifacts
just cleanRequires macOS 14+ and Apple Silicon (M1/M2/M3).
- Screen Capture - OCR text from your focused windows every 20 seconds
- Voice Capture - Speech-to-text using WhisperKit (local Whisper model)
- Face Detection - Presence and emotion tracking via Vision framework
- Knowledge Extraction - LLM identifies projects, collaborators, interests, and TODOs
- Wellness Monitoring - Detects overwork, stress, and late-night patterns
- Smart Actions - Popup notifications when you need a break or have pending tasks
┌─────────────────────────────────────────────────────────────────────────┐
│ DATA CAPTURE (continuous) │
│ │
│ Screen (20s) ──► OCR ───┐ │
│ Voice ──► WhisperKit ───┼──► SQLite Database │
│ Face ──► Vision ────────┘ │
└─────────────────────────────────────────────────────────────────────────┘
│
▼
┌─────────────────────────────────────────────────────────────────────────┐
│ REFINERS (every 20 min) │
│ │
│ Raw data ──► LLM Refiners ──► Knowledge (projects, people, topics) │
│ │
│ ProjectsRefiner │ CollaboratorsRefiner │ InterestsRefiner │ TodosRefiner
└─────────────────────────────────────────────────────────────────────────┘
│
▼
┌─────────────────────────────────────────────────────────────────────────┐
│ INSIGHT GENERATORS │
│ │
│ Knowledge + Mood ──► Generators ──► Insights (observations) │
│ │
│ WorkSummary │ PatternDetector │ Progress │ Collaboration │ Wellness │
└─────────────────────────────────────────────────────────────────────────┘
│
▼
┌─────────────────────────────────────────────────────────────────────────┐
│ ACTION GENERATORS │
│ │
│ Insights + Mood ──► Generators ──► Actions (popups/notifications) │
│ │
│ WellnessAction │ ReminderAction │ FocusAction │ LateNightAction │
└─────────────────────────────────────────────────────────────────────────┘
| Layer | Purpose | Output |
|---|---|---|
| Capture | Raw data collection | screen_captures, voice_segments, face_events |
| Refiners | Extract structured knowledge | projects, collaborators, interests, todos |
| Insight Generators | Observe patterns, create observations | insights |
| Action Generators | Decide when to interrupt user | actions (popups) |
| Component | Technology |
|---|---|
| Screen Capture | ScreenCaptureKit |
| OCR | Vision framework |
| Voice | WhisperKit (whisper-tiny.en, ~40MB) |
| Face/Emotion | Vision framework |
| LLM | MLX-Swift-LM (Qwen2.5-3B-Instruct-4bit, ~4GB) |
| Database | GRDB.swift (SQLite) |
| UI | SwiftUI + AppKit |
GoosePerception/
├── App/
│ └── AppDelegate.swift # Service init, menu bar, callbacks
├── Analysis/
│ └── AnalysisScheduler.swift # Orchestrates analysis pipeline
├── Database/
│ ├── Database.swift # GRDB wrapper, migrations
│ └── Models/ # Data models (Action, Insight, etc.)
├── Services/
│ ├── ScreenCapture/ # Screenshot + OCR
│ ├── Voice/ # WhisperKit transcription
│ ├── Face/ # Camera + emotion detection
│ ├── FileActivity/ # Directory activity tracking
│ └── LLM/
│ ├── LLMService.swift # MLX model runner
│ ├── Refiners/ # Knowledge extractors
│ └── Generators/ # Insight & Action generators
└── Views/
├── DashboardView.swift # Main UI (Services, Knowledge, etc.)
└── InsightPopupManager.swift # Floating popups
| Table | Purpose |
|---|---|
| screen_captures | OCR text from focused windows |
| voice_segments | Transcribed speech |
| face_events | Presence and emotion |
| projects | Extracted project names |
| collaborators | Extracted people |
| interests | Extracted topics |
| todos | Tasks found in screen text |
| insights | Generated observations |
| actions | Triggered popups/notifications |
| app_usage | Aggregated app usage stats |
| directory_activity | Recent file activity |
| Tab | Shows |
|---|---|
| Services | Toggle Screen/Voice/Face capture, audio levels, current emotion |
| Knowledge | Mood summary, Projects, Collaborators, Interests, Apps, Directories, TODOs |
| Insights | Generated observations from analysis |
| Actions | Pending/completed/dismissed action items |
| Activity | Real-time event log |
| Captures | Historical screen captures with OCR preview |
| LLM | Full LLM session history |
| Generator | Cooldown | Triggers When |
|---|---|---|
| WorkSummary | 2h | 10+ captures, has projects |
| PatternDetector | 4h | 2+ projects, 20+ captures |
| ProgressTracker | 1h | Has pending TODOs |
| Collaboration | 3h | Has collaborators + voice activity |
| Wellness | 30m | 2h+ work, late night, or stress signals |
| Generator | Cooldown | Triggers When |
|---|---|---|
| Wellness | 45m | 2+ wellness-related insights |
| Reminder | 60m | TODOs pending > 2 hours |
| Focus | 45m | Insights about context-switching |
| LateNight | 60m | After 10pm + late-night insights |
- 100% local - All processing on-device
- No images stored - Screenshots processed and discarded
- No network - Except for one-time model downloads
- User control - Toggle any capture service independently
# Watch for changes and rebuild
just watch
# Run tests
just test
# Check for issues
just lintSee: TinyAgent: Function Calling at the Edge (Erdogan et al., 2024)