tui: only check for emojis in visibleWidth when necessary#369
tui: only check for emojis in visibleWidth when necessary#369nathyong wants to merge 1 commit intobadlogic:mainfrom
Conversation
9b6e959 to
e4a226f
Compare
|
I tested this with a separate feature flag in the pi coding-agent to run 0001-Create-rendering-testing-mode.patch The resulting flame graphs were pretty insightful. From the beginning (manually testing an interactive session, without the profiling mode): To the final numbers, down to 0.5s, although now colorize is the main culprit in slow rendering: It's definitely possible to improve (e.g. this isn't as fast as gemini-cli), but I stopped here because I didn't want to overcomplicate the code. Resizing and loading sessions is now an "acceptable" speed, there are other bugs I want to look at next. I'm not familiar with the full aesthetics of this codebase, so let me know if e.g. the cache is considered bad taste. |
| widthCache.delete(firstKey); | ||
| } | ||
| } | ||
| widthCache.set(str, width); |
There was a problem hiding this comment.
Cache is FIFO, not LRU (Map.keys() returns keys in insertion order, .set() does not re-insert).
I haven't tested if FIFO is faster, but my assumption from testing is that this only gets stressed during resize events.
There was a problem hiding this comment.
I think the cache size can be quite a bit bigger actually.
There was a problem hiding this comment.
I'll test with a 200-length multilingual conversation with varying cache sizes. I don't even know if it's getting populated.
packages/tui/src/utils.ts
Outdated
| (cp >= 0x1f300 && cp <= 0x1faff) || // Main emoji blocks | ||
| (cp >= 0x2600 && cp <= 0x27bf) || // Misc symbols, dingbats | ||
| (cp >= 0x1f1e0 && cp <= 0x1f1ff) || // Regional indicators (flags) | ||
| (cp >= 0x231a && cp <= 0x23ff) || // Misc technical | ||
| segment.includes("\uFE0F") || // Contains VS16 (emoji presentation selector) | ||
| segment.length > 2 // Multi-codepoint sequences (ZWJ, skin tones, etc.) |
There was a problem hiding this comment.
Need to double-check this one against latest ICU, and note that it could diverge over time as well. However it is much faster than a direct check against /^\p{RGI_Emoji}$/.
There was a problem hiding this comment.
I broadened the blocks slightly, particularly the main emoji blocks, absorbing the regional indicators.
There was a problem hiding this comment.
If y'all think it's necessary, I can add some unit tests to make sure we haven't made any regressions.
There was a problem hiding this comment.
Nah, let's have things explode in prod :D People and models.shouldn't use the emoji plane anyways.
The initial render of a session, and any re-draws caused by terminal
resizing are noticeably slow, especially on conversations with 20+
turns and many tool calls.
From profiling with `bun --cpu-prof` (available since bun 1.3.2), the
majority of the rendering (90%) is spent on detection of emojis in the
string-width library, running the expensive `/\p{RGI_Emoji}$/v`
regular expression on every individual grapheme cluster in the entire
scrollback. I believe it essentially expands to a fixed search against
every possible emoji sequence, hence the amount of CPU time spent in it.
This change replaces the `stringWidth` from string-width with a
`graphemeWidth` function that performs a similar check, but avoids
running the `/\p{RGI_Emoji}$/v` regex for emoji detection unless it
contains codepoints that could be emojis.
The `visibleWidth` function also has two more optimisations:
- Short-circuits string length detection for strings that are entirely
printable ASCII characters
- Adds a cache for non-ASCII segments to avoid recomputing string length
when resizing
e4a226f to
acbc866
Compare
|
What terminal, OS, machine, and Node runtime are you using? I don't experience > 1s rerenders even on my humongous sessions. |
badlogic
left a comment
There was a problem hiding this comment.
lgtm need to tedt it a bit before merge.
| widthCache.delete(firstKey); | ||
| } | ||
| } | ||
| widthCache.set(str, width); |
There was a problem hiding this comment.
I think the cache size can be quite a bit bigger actually.
|
I have to dig into cli-highlight, that colorize call timing is crazy. |
|
I'm seeing this on both my systems. It's painfully slow - good to hear that it seems to be just me I guess? I was wondering how everyone else was getting by. The cli-highlight slowness might also be a symptom of my machine - but it shouldn't be slower than a frame or two, even on a 10-year-old computer. Here are the configurations I've tested:
I've patched the interpreter ( I'm still using pi as my main coding agent (I really appreciate the minimalism), but have switched to using Discord + |
|
Could you try running pi via Node instead of bun? I can't repro this on my machines that are similar to what you have. |
|
Looks like this is related to a known issue with bun where JSC's regex is much slower than V8 (used in node). Bun issue is tracked at oven-sh/bun#3464 and the upstream Webkit/JSC issue is at https://bugs.webkit.org/show_bug.cgi?id=258706 , with no activity on this since 2024. This would also explain the poor performance of cli-highlight as well in my experience. I used hyperfine with my hyperfine --warmup 2 --runs 10 \
-n 'bun (no fix)' "bun packages/coding-agent/src/cli.ts --session $SESSION --profile-render 1" \
-n 'tsx (no fix)' "npx tsx packages/coding-agent/src/cli.ts --session $SESSION --profile-render 1"
Are most users using the |
|
Yeah, I'd assume most users use |
|
Merged manually after rebasing. Thanks for the contribution! This also fixed a bug we had with ZWJ emoji sequences (like 🏳️🌈) where our |


Speeds up loading/resizing of long sessions by 10x or more, when running on bun.
The initial render of a session, and any re-draws caused by terminal resizing are noticeably slow, especially on conversations with 20+ turns and many tool calls (>1 second).
From profiling with
bun --cpu-prof(available since bun 1.3.2), the majority of the rendering (90%) is spent on detection of emojis in the string-width library, running the expensive/\p{RGI_Emoji}$/vregular expression on every individual grapheme cluster in the entire scrollback. I believe it essentially expands to a fixed search against every possible emoji sequence, hence the amount of CPU time spent in it.This change replaces the
stringWidthfrom string-width with agraphemeWidthfunction that performs a similar check, but avoids running the/\p{RGI_Emoji}$/vregex for emoji detection unless it contains codepoints that could be emojis.The
visibleWidthfunction also has two more optimisations: