AI has been slowing down relatively, considering its trajectory over the past 20-30 years. For one, even if LLM may have plateaud in terms of intelligence-parameters ratio, research is on-going on new frontiers for ML, including (but not limited to) world models. Other research directions are studying backpropagation and its physical analogies, such as equilibrium of chaotic states.
In addition, there's a lot of research on the hardware angle and actual prototypes are already being built such as AI-on-chip Cerebra and Taalas for one.
What approach are you using? Been working on a similar in-browser node runtime based on Rust/WASM kernel + Service-Worker HTTP intercept + CJS→ESM transform.
Feature wise, does this compare to StackBlitz webcontainers?
I had previous experience with QuickJS - respectively using the rquickjs crate (awesome project) - so my approach was first asserting whether it was possible to run a Wasmtime binary that both executes the JS code and handles HTTP requests and responses.
Then, the second part which was really important to me, was figuring out if I could find a way to embed the developer's JS code within the worker without requiring them to install Cargo. (thanks to Wizer it's possible, love it).
Once I had those two, the rest was basically execution (not saying it was straightforward though ;)
I was also a bit lucky: at the same time as I was developing it, Rolldown announced the version 1 of their standalone crate. So it was the perfect timing to use it as well.
As for StackBlitz WebContainers, I actually don't know much about it. They run in the browser as I understand, so fundamentally different but, feature wise I'm sure this project is way more mature and therefore offers way more features.
Awesome, thanks for detailing the thought flow and choices that led you here. I chose not to go the QuickJS route for performance reasons but I think it's a solid choice depending on the use case.
> They run in the browser as I understand, so fundamentally different
Yes, runs entirely in the browser, while this is a hosted product. StackBlitz technology is really good but it is closed source.
Yeah I was surprised by this when I opened their website.
Your setup - Rust/WASM kernel + Service worker - sounds really sweet. If already public, please do share the link, else looking forward to your launch!
Our technology is much more general that WebContainers, and it's based on a Linux-compatible WebAssembly kernel. It also supports real command line tools, including git, bash and the complete set of busybox utilities.
The version of Claude Code you see running is completely unmodified.
The architecture is a fairly straightforward WebAssembly-native monolithic kernel. Most of the complexities come from making things work well within the browser constraints for real world, large apps.
We have quite a bit of experience on the topic however, these are previous projects of ours:
WebVM (https://webvm.io): x86 Debian shell running client-side in the browser via x86 -> WebAssembly JIT compilation
As a matter of fact WebVM and BrowserPod share the same kernel, the difference is all on the performance side.
WebVM uses x86 virtualization and hence has a significant performance penalty, with the upside of running any existing software without needing the source code.
BrowserPod on the other hand runs WebAssembly binaries at almost native speed. Source code is required, but that is a fair compromise in the world of sandboxing. Most language runtimes and CLI tools are FOSS anyway, and many closed-source tools (such as Claude Code) are written in scripting languages and run on top of FOSS engines.
> WebVM uses x86 virtualization and hence has a significant performance penalty
That is precisely the reason why we chose not avoid using any solution which uses virtualization, even though you get a full OS. QuickJS also pays a performance tax (no JIT) and still doesn't give you the OS.
On our part we're mostly focused on JS for the time being and we think that the best bet is to reuse the browser V8 engine
Claude workflows in ultra code mode works in a very similar fashion and it consumes a moderate amount of the session usage limit, depending on the complexity of the task. With the API it would probably get expensive quickly though
> Wait, wait, wait: browsers allow websites to store junk on my drive?
Technically even a cookie is junk on your drive
> Without even asking whether the site can use local storage?
Would it be practical to ask permission for every site you visit? It would be better to periodically check the size of your home folder (where the browser profiles normally reside)
The V100 and the 4090 are based on vastly different architectures, the former uses the older Volta while the latter uses Ada. Last I checked you cannot meaningfully combine them. The 3090 is better than the V100, just get two 3090 and a NVLink.
There are a variety of inference engines that support this, regardless of whether or not there is native FP8 in Ampere - llama.cpp will do it quite happily. VLLM you can do W8A16 quant too.
There are a whole lot of ways to quantize models in general.
reply