We rewrote our Rust WASM Parser in TypeScript – and it got 3x Faster

(openui.com)

64 points | by zahlekhan 2 hours ago

16 comments

rented_mule 9 minutes ago
Something not unlike this happened to me when moving some batch processing code from C++ to Python 1.4 (this was 1997). The batch started finishing about 10x faster. We refused to believe it at first and started looking to make sure the work was actually being done. It was.
The port had been done in a weekend just to see if we could use Python in production. The C++ code had taken a few months to write. The port was pretty direct, function for function. It was even line for line where language and library differences didn't offer an easier way.
A couple of us worked together for a day to find the reason for the speedup. Just looking at the code didn't give us any clues, so we started profiling both versions. We found out that the port had accidentally fixed a previously unknown bug in some code that built and compared cache keys. After identifying the small misbehaving function, we had to study the C++ code pretty hard to even understand what the problem was. I don't remember the exact nature of the bug, but I do remember thinking that particular type of bug would be hard to express in Python, and that's exactly why it was accidentally fixed.
We immediately started moving the rest of our back end to Python. Most things were slower, but not by much because most of our back end was i/o bound. We soon found out that we could make algorithmic improvements so much more quickly, so a lot of the slowest things got a lot faster than they had ever been. And, most importantly, we (the software developers) got quite a bit faster.
blundergoat 2 hours ago
The real win here isn't TS over Rust, it's the O(N²) -> O(N) streaming fix via statement-level caching. That's a 3.3x improvement on its own, independent of language choice. The WASM boundary elimination is 2-4x, but the algorithmic fix is what actually matters for user-perceived latency during streaming. Title undersells the more interesting engineering imo.
[-]
- azakai 27 minutes ago
  O(N²) -> O(N) was 3.3x faster, but before that, eliminating the boundary (replacing wasm with JS) led to speedups of 2.2x, 4.6x, 3.0x (see one table back).
  It looks like neither is the "real win". both the language and the algorithm made a big difference, as you can see in the first column in the last table - going to wasm was a big speedup, and improving the algorithm on top of that was another big speedup.
- nulltrace 11 minutes ago
  Yeah the algorithmic fix is doing most of the work here. But call that parser hundreds of times on tiny streaming chunks and the WASM boundary cost per call adds up fast. Same thing would happen with C++ compiled to WASM.
- socalgal2 1 hour ago
  same for uv but no one takes that message. They just think "rust rulez!" and ignore that all of uv's benefits are algo, not lang.
  [-]
  - estebank 1 hour ago
    Some architectures are made easier by the choice of implementation language.
  - rowanG077 24 minutes ago
    That's a pretty big claim. I don't doubt that a lot of uv's benefits are algo. But everything? Considering that running non IO-bound native code should be an order of magnitude faster than python.
    [-]
    - thfuran 3 minutes ago
      More than one, I'd think.
- Aurornis 1 hour ago
  > Title undersells the more interesting engineering imo.
  Thanks for cutting through the clickbait. The post is interesting, but I'm so tired of being unnecessarily clickbaited into reading articles.
- sroussey 1 hour ago
  Yeah, though the n^2 is overstating things.
  One thing I noticed was that they time each call and then use a median. Sigh. In a browser. :/ With timing attack defenses build into the JS engine.
  [-]
  - fn-mote 21 minutes ago
    For those of us not in the know, what are we expecting the results of the defenses to be here?
- shmerl 1 hour ago
  More like a misleading clickbait.
nine_k 1 hour ago
"We rewrote this code from language L to language M, and the result is better!" No wonder: it was a chance to rectify everything that was tangled or crooked, avoid every known bad decision, and apply newly-invented better approaches.
So this holds even for L = M. The speedup is not in the language, but in the rewriting and rethinking.
[-]
- MiddleEndian 1 hour ago
  Now they just need a third party who's never seen the original to rewrite their TypeScript solution in Rust for even more gains.
  [-]
  - nine_k 1 hour ago
    Indeed! But only after a year or so of using it in production, so that the drawbacks would be discovered.
- azakai 24 minutes ago
  You're generally right - rewrites let you improve the code - but they do have an actual reason the new language was better: avoiding copies on the boundary.
  They say they measured that cost, and it was most of the runtime in the old version (though they don't give exact numbers). That cost does not exist at all in the new version, simply because of the language.
- baranul 1 hour ago
  Truth. You can see improvement, even rewriting code in the same language.
evmar 1 hour ago
By the way, I did a deeper dive on the problem of serializing objects across the Rust/JS boundary, noticed the approach used by serde wasn’t great for performance, and explored improving it here: https://neugierig.org/software/blog/2024/04/rust-wasm-to-js....
spankalee 1 hour ago
I was wondering why I hadn't heard of Open UI doing anything with WASM.
This new company chose a very confusing name that has been used by the Open UI W3C Community Group for over 5 years.
https://open-ui.org/
Open UI is the standards group responsible for HTML having popovers, customizable select, invoker commands, and accordions. They're doing great work.
nssnsjsjsjs 18 minutes ago
Rewrite bias. Yoy want to also rewrite the Rust one in Rust for comparison.
[-]
- jeremyjh 4 minutes ago
  It would be surprising if rewriting in Rust could change the WASM boundary tax that the article identified as the actual problem.
nallana 25 minutes ago
Why not a shared buffer? Serializing into JSON on this hot path should be entirely avoidable
[-]
- devnotes77 8 minutes ago
  [dead]
joaohaas 21 minutes ago
God I hate AI writing.
That final summary benchmark means nothing. It mentions 'baseline' value for the 'Full-stream total' for the rust implementation, and then says the `serde-wasm-bindgen` is '+9-29% slower', but it never gives us the baseline value, because clearly the only benchmark it did against the Rust codebase was the per-call one.
Then it mentions: "End result: 2.2-4.6x faster per call and 2.6-3.3x lower total streaming cost."
But the "2.6-3.3x" is by their own definition a comparison against the naive TS implementation.
I really think the guy just prompted claude to "get this shit fast and then publish a blog post".
ivanjermakov 24 minutes ago
Good software is usually written on 2nd+ try.
dmix 1 hour ago
That blog post design is very nice. I like the 'scrollspy' sidebar which highlights all visible headings.
Claude tells me this is https://www.fumadocs.dev/
[-]
- sroussey 1 hour ago
  Interesting, thanks. I need make some good docs soon.
  [-]
  - dmix 1 hour ago
    Good documentation is always worth the effort. Markdown explaining your products is gold these days with LLMs.
neuropacabra 34 minutes ago
This is very unusual statement :-D
caderosche 1 hour ago
What is the purpose of the Rust WASM parser? Didn't understand that easily from the article. Would love a better explanation.
[-]
- joshuanapoli 57 minutes ago
  They use a bespoke language to define LLM-generated UI components. I think that this is supposed to prevent exfiltration if the LLM is prompt-injected. In any case, the parser compiles chunks streaming from the LLM to build a live UI. The WASM parser restarted from the beginning upon each chunk received. Fixing this algorithm to work more incrementally (while porting from Rust to TypeScript) improved performance a lot.
szmarczak 44 minutes ago
> Attempted Fix: Skip the JSON Round-Trip > We integrated serde-wasm-bindgen
So you're reinventing JSON but binary? V8 JSON nowadays is highly optimized [1] and can process gigabytes per second [2], I doubt it is a bottleneck here.
[1] https://v8.dev/blog/json-stringify [2] https://github.com/simdjson/simdjson
[-]
- kam 2 minutes ago
  No, serde-wasm-bindgen implements the serde Serializer interface by calling into JS to directly construct the JS objects on the JS heap without an intermediate serialization/deserialization. You pay the cost of one or more FFI calls for every object though.
  https://docs.rs/serde-wasm-bindgen/
slowhadoken 58 minutes ago
Am I mistaken or isn’t TypeScript just Golang under the hood these days?
[-]
- jeremyjh 2 minutes ago
  There is too much wrong here to call it a mistake.
- iainmerrick 41 minutes ago
  Hmm, there's an in-progress rewrite of the TypeScript compiler in Go; is that what you mean?
  I don't think that's actually out yet, and more importantly, it doesn't change anything at runtime -- your code still runs in a JS engine (V8, JSC etc).
patapim 19 minutes ago
[dead]
SCLeo 1 hour ago
They should rewrite it in rust again to get another 3x performance increase /s