Skip to content

tui: only check for emojis in visibleWidth when necessary#369

Closed
nathyong wants to merge 1 commit intobadlogic:mainfrom
nathyong:feature/less-emoji
Closed

tui: only check for emojis in visibleWidth when necessary#369
nathyong wants to merge 1 commit intobadlogic:mainfrom
nathyong:feature/less-emoji

Conversation

@nathyong
Copy link
Copy Markdown
Contributor

@nathyong nathyong commented Dec 30, 2025

Speeds up loading/resizing of long sessions by 10x or more, when running on bun.

The initial render of a session, and any re-draws caused by terminal resizing are noticeably slow, especially on conversations with 20+ turns and many tool calls (>1 second).

From profiling with bun --cpu-prof (available since bun 1.3.2), the majority of the rendering (90%) is spent on detection of emojis in the string-width library, running the expensive /\p{RGI_Emoji}$/v regular expression on every individual grapheme cluster in the entire scrollback. I believe it essentially expands to a fixed search against every possible emoji sequence, hence the amount of CPU time spent in it.

This change replaces the stringWidth from string-width with a graphemeWidth function that performs a similar check, but avoids running the /\p{RGI_Emoji}$/v regex for emoji detection unless it contains codepoints that could be emojis.

The visibleWidth function also has two more optimisations:

  • Short-circuits string length detection for strings that are entirely printable ASCII characters
  • Adds a cache for non-ASCII segments to avoid recomputing string length when resizing

@nathyong nathyong force-pushed the feature/less-emoji branch 2 times, most recently from 9b6e959 to e4a226f Compare December 30, 2025 09:50
@nathyong nathyong marked this pull request as draft December 30, 2025 09:51
@nathyong
Copy link
Copy Markdown
Contributor Author

nathyong commented Dec 30, 2025

I tested this with a separate feature flag in the pi coding-agent to run ui.requestRender so that I could get CPU stats on loading the same session file without any additional overhead.

0001-Create-rendering-testing-mode.patch

The resulting flame graphs were pretty insightful. From the beginning (manually testing an interactive session, without the profiling mode):
2025-12-30_21-01

To the final numbers, down to 0.5s, although now colorize is the main culprit in slow rendering:
2025-12-30_21-03

It's definitely possible to improve (e.g. this isn't as fast as gemini-cli), but I stopped here because I didn't want to overcomplicate the code. Resizing and loading sessions is now an "acceptable" speed, there are other bugs I want to look at next.

I'm not familiar with the full aesthetics of this codebase, so let me know if e.g. the cache is considered bad taste.

widthCache.delete(firstKey);
}
}
widthCache.set(str, width);
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cache is FIFO, not LRU (Map.keys() returns keys in insertion order, .set() does not re-insert).

I haven't tested if FIFO is faster, but my assumption from testing is that this only gets stressed during resize events.

Copy link
Copy Markdown
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the cache size can be quite a bit bigger actually.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll test with a 200-length multilingual conversation with varying cache sizes. I don't even know if it's getting populated.

Comment on lines +28 to +33
(cp >= 0x1f300 && cp <= 0x1faff) || // Main emoji blocks
(cp >= 0x2600 && cp <= 0x27bf) || // Misc symbols, dingbats
(cp >= 0x1f1e0 && cp <= 0x1f1ff) || // Regional indicators (flags)
(cp >= 0x231a && cp <= 0x23ff) || // Misc technical
segment.includes("\uFE0F") || // Contains VS16 (emoji presentation selector)
segment.length > 2 // Multi-codepoint sequences (ZWJ, skin tones, etc.)
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Need to double-check this one against latest ICU, and note that it could diverge over time as well. However it is much faster than a direct check against /^\p{RGI_Emoji}$/.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I broadened the blocks slightly, particularly the main emoji blocks, absorbing the regional indicators.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If y'all think it's necessary, I can add some unit tests to make sure we haven't made any regressions.

Copy link
Copy Markdown
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nah, let's have things explode in prod :D People and models.shouldn't use the emoji plane anyways.

The initial render of a session, and any re-draws caused by terminal
resizing are noticeably slow, especially on conversations with 20+
turns and many tool calls.

From profiling with `bun --cpu-prof` (available since bun 1.3.2), the
majority of the rendering (90%) is spent on detection of emojis in the
string-width library, running the expensive `/\p{RGI_Emoji}$/v`
regular expression on every individual grapheme cluster in the entire
scrollback. I believe it essentially expands to a fixed search against
every possible emoji sequence, hence the amount of CPU time spent in it.

This change replaces the `stringWidth` from string-width with a
`graphemeWidth` function that performs a similar check, but avoids
running the `/\p{RGI_Emoji}$/v` regex for emoji detection unless it
contains codepoints that could be emojis.

The `visibleWidth` function also has two more optimisations:
- Short-circuits string length detection for strings that are entirely
  printable ASCII characters
- Adds a cache for non-ASCII segments to avoid recomputing string length
  when resizing
@nathyong nathyong marked this pull request as ready for review December 30, 2025 10:52
@badlogic
Copy link
Copy Markdown
Owner

What terminal, OS, machine, and Node runtime are you using? I don't experience > 1s rerenders even on my humongous sessions.

Copy link
Copy Markdown
Owner

@badlogic badlogic left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm need to tedt it a bit before merge.

widthCache.delete(firstKey);
}
}
widthCache.set(str, width);
Copy link
Copy Markdown
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the cache size can be quite a bit bigger actually.

@badlogic
Copy link
Copy Markdown
Owner

I have to dig into cli-highlight, that colorize call timing is crazy.

@nathyong
Copy link
Copy Markdown
Contributor Author

nathyong commented Dec 30, 2025

I'm seeing this on both my systems. It's painfully slow - good to hear that it seems to be just me I guess? I was wondering how everyone else was getting by. The cli-highlight slowness might also be a symptom of my machine - but it shouldn't be slower than a frame or two, even on a 10-year-old computer.

Here are the configurations I've tested:

CPU OS Node / Runtime Terminal
x64 xeon skylake (4cpu) NixOS 25.11 (Linux 6.12.59) bun compiled into pi v0.30.2 GH release ghostty 1.2.3
x64 xeon skylake (4cpu) NixOS 25.11 (Linux 6.12.59) bun compiled into pi v0.30.2 GH release kitty 0.44.0
x64 xeon skylake (4cpu) NixOS 25.11 (Linux 6.12.59) bun 1.3.2 ghostty 1.2.3
aarch64 cortex a72 (4cpu) (raspberry pi 4b) NixOS 25.11 (Linux 6.12.47) bun compiled into pi v0.30.2 GH release zellij 0.43.1 over termux 0.118.3

I've patched the interpreter (patchelf) so that the binary release runs on NixOS, but it's still equally slow when launched with bun from source.

I'm still using pi as my main coding agent (I really appreciate the minimalism), but have switched to using Discord + --mode json --print as the frontend for the time being 😂

@badlogic
Copy link
Copy Markdown
Owner

Could you try running pi via Node instead of bun? I can't repro this on my machines that are similar to what you have.

@nathyong
Copy link
Copy Markdown
Contributor Author

Looks like this is related to a known issue with bun where JSC's regex is much slower than V8 (used in node). Bun issue is tracked at oven-sh/bun#3464 and the upstream Webkit/JSC issue is at https://bugs.webkit.org/show_bug.cgi?id=258706 , with no activity on this since 2024.

This would also explain the poor performance of cli-highlight as well in my experience.

I used hyperfine with my --profile-render patch to get these numbers.

hyperfine --warmup 2 --runs 10 \
   -n 'bun (no fix)' "bun packages/coding-agent/src/cli.ts --session $SESSION --profile-render 1" \
   -n 'tsx (no fix)' "npx tsx packages/coding-agent/src/cli.ts --session $SESSION --profile-render 1"
Configuration Mean Time Relative Speedup
bun (no fix) 11.335s ± 0.325s 1x
tsx (no fix) 2.619s ± 0.086s 4.3x
bun (with fix) 1.130s ± 0.184s 10x
tsx (with fix) 2.273s ± 0.196s 5x

Are most users using the npm install method?

@badlogic
Copy link
Copy Markdown
Owner

badlogic commented Jan 1, 2026

Yeah, I'd assume most users use npm install, as custom hooks and tools are currently not working in the Bun self-contained executables for "reasons". I personally never use the Bun method, mostly because I develop with tsx + node.

@badlogic
Copy link
Copy Markdown
Owner

badlogic commented Jan 2, 2026

Merged manually after rebasing. Thanks for the contribution! This also fixed a bug we had with ZWJ emoji sequences (like 🏳️‍🌈) where our \p{Cf} stripping was breaking them.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants