More

binyu · 2026-06-08T23:14:26 1780960466

A tool to manage Claude Code conversations based on my typical workflow which integrates with my desktop OS and terminal app.

binyu · 2026-06-08T18:23:38 1780943018

> Right now Claude is faster than me on some tasks but we’re at least close.

I dont doubt it, but I don't think you can spawn 10 copies of yourself working simultaneously.

AlecSchueler · 2026-06-08T18:28:46 1780943326

No, but nor can you keep track of what 10 agents are doing simultaneously. Hence the multitasking regret.

pixel_popping · 2026-06-08T18:32:23 1780943543

An agent can, you don't need to watch tasks, you can have a live digest with another tool.

AlecSchueler · 2026-06-09T08:38:25 1780994305

Who watches the watchers?

logankeenan · 2026-06-08T19:57:25 1780948645

Do you have any recommendations for a live digest tool?

binyu · 2026-06-08T18:18:39 1780942719

AI has been slowing down relatively, considering its trajectory over the past 20-30 years. For one, even if LLM may have plateaud in terms of intelligence-parameters ratio, research is on-going on new frontiers for ML, including (but not limited to) world models. Other research directions are studying backpropagation and its physical analogies, such as equilibrium of chaotic states.

In addition, there's a lot of research on the hardware angle and actual prototypes are already being built such as AI-on-chip Cerebra and Taalas for one.

binyu · 2026-06-07T15:29:20 1780846160

Very cool work.

What approach are you using? Been working on a similar in-browser node runtime based on Rust/WASM kernel + Service-Worker HTTP intercept + CJS→ESM transform.

Feature wise, does this compare to StackBlitz webcontainers?

le_chuck · 2026-06-07T15:46:04 1780847164

I had previous experience with QuickJS - respectively using the rquickjs crate (awesome project) - so my approach was first asserting whether it was possible to run a Wasmtime binary that both executes the JS code and handles HTTP requests and responses.

Then, the second part which was really important to me, was figuring out if I could find a way to embed the developer's JS code within the worker without requiring them to install Cargo. (thanks to Wizer it's possible, love it).

Once I had those two, the rest was basically execution (not saying it was straightforward though ;)

I was also a bit lucky: at the same time as I was developing it, Rolldown announced the version 1 of their standalone crate. So it was the perfect timing to use it as well.

As for StackBlitz WebContainers, I actually don't know much about it. They run in the browser as I understand, so fundamentally different but, feature wise I'm sure this project is way more mature and therefore offers way more features.

binyu · 2026-06-07T15:58:34 1780847914

Awesome, thanks for detailing the thought flow and choices that led you here. I chose not to go the QuickJS route for performance reasons but I think it's a solid choice depending on the use case.

> They run in the browser as I understand, so fundamentally different

Yes, runs entirely in the browser, while this is a hosted product. StackBlitz technology is really good but it is closed source.

le_chuck · 2026-06-07T16:11:27 1780848687

Yeah I was surprised by this when I opened their website.

Your setup - Rust/WASM kernel + Service worker - sounds really sweet. If already public, please do share the link, else looking forward to your launch!

binyu · 2026-06-07T20:48:56 1780865336

Will do, thanks!

binyu · 2026-06-07T03:29:52 1780802992

Been working on something similar based on Webcontainers. How does your Node.js support compare with StackBlitz technology?

Are you running the version of Claude code that Anthropic distributes in the browser or did you have to adapt it to run on your stack?

Cheers

apignotti · 2026-06-07T07:26:19 1780817179

Our technology is much more general that WebContainers, and it's based on a Linux-compatible WebAssembly kernel. It also supports real command line tools, including git, bash and the complete set of busybox utilities.

The version of Claude Code you see running is completely unmodified.

binyu · 2026-06-07T15:42:20 1780846940

Awesome, what approach are you using? Is this a real micro kernel architecture or just containerized VM?

apignotti · 2026-06-07T16:26:57 1780849617

The architecture is a fairly straightforward WebAssembly-native monolithic kernel. Most of the complexities come from making things work well within the browser constraints for real world, large apps.

We have quite a bit of experience on the topic however, these are previous projects of ours:

WebVM (https://webvm.io): x86 Debian shell running client-side in the browser via x86 -> WebAssembly JIT compilation

Browsercraft (https://browsercraft.cheerpj.com): Minecraft running unmodified in the browser via our WebAssembly JVM (CheerpJ)

binyu · 2026-06-07T17:15:03 1780852503

Oh, you are the author of WebVM, pretty cool! I looked at it while choosing the stack for our project and it seems very solid.

Keep up the great work

apignotti · 2026-06-07T17:43:17 1780854197

As a matter of fact WebVM and BrowserPod share the same kernel, the difference is all on the performance side.

WebVM uses x86 virtualization and hence has a significant performance penalty, with the upside of running any existing software without needing the source code.

BrowserPod on the other hand runs WebAssembly binaries at almost native speed. Source code is required, but that is a fair compromise in the world of sandboxing. Most language runtimes and CLI tools are FOSS anyway, and many closed-source tools (such as Claude Code) are written in scripting languages and run on top of FOSS engines.

binyu · 2026-06-08T14:39:24 1780929564

> WebVM uses x86 virtualization and hence has a significant performance penalty

That is precisely the reason why we chose not avoid using any solution which uses virtualization, even though you get a full OS. QuickJS also pays a performance tax (no JIT) and still doesn't give you the OS.

On our part we're mostly focused on JS for the time being and we think that the best bet is to reuse the browser V8 engine

binyu · 2026-06-04T21:34:21 1780608861

Claude workflows in ultra code mode works in a very similar fashion and it consumes a moderate amount of the session usage limit, depending on the complexity of the task. With the API it would probably get expensive quickly though

binyu · 2026-06-02T15:43:02 1780414982

The Internet will be the Internet. Expect it to get worse if anything.

binyu · 2026-06-02T01:18:48 1780363128

> "exploit"

More like social engineering meets AI and stupidity

binyu · 2026-06-01T14:32:21 1780324341

> Wait, wait, wait: browsers allow websites to store junk on my drive?

Technically even a cookie is junk on your drive

> Without even asking whether the site can use local storage?

Would it be practical to ask permission for every site you visit? It would be better to periodically check the size of your home folder (where the browser profiles normally reside)

binyu · 2026-05-31T20:46:25 1780260385

The V100 and the 4090 are based on vastly different architectures, the former uses the older Volta while the latter uses Ada. Last I checked you cannot meaningfully combine them. The 3090 is better than the V100, just get two 3090 and a NVLink.

tymscar · 2026-06-01T00:12:27 1780272747

Well I did in fact meaningfully combined them without an issue, that was the whole point of the blogpost.

binyu · 2026-06-01T15:18:48 1780327128

Yes but it creates a bottleneck that negates the benefit of using multiple cards that way. Look into it. Cheers

tymscar · 2026-06-01T15:20:23 1780327223

Well it doesn’t matter because the bottleneck here is actually quite small for me. The issue is vram. If anything the bottleneck is my 4080.

binyu · 2026-06-01T15:22:06 1780327326

Gotcha, I am not saying your setup is inherently wrong or useless. I am glad it works for your use cases. Godspeed

tymscar · 2026-06-01T15:25:46 1780327546

I think its a very fair thing you have flagged!

cthalupa · 2026-06-01T00:19:05 1780273145

You can split tensors across an AMD GPU and Nvidia GPU - different architectures are not an issue. People run LLMs across some pretty crazy setups.

binyu · 2026-06-01T15:21:27 1780327287

It depends but you cannot directly mix for example Ampere with Ada coz the lack of support for native FP8 in Ampere.

cthalupa · 2026-06-01T22:34:47 1780353287

There are a variety of inference engines that support this, regardless of whether or not there is native FP8 in Ampere - llama.cpp will do it quite happily. VLLM you can do W8A16 quant too.

There are a whole lot of ways to quantize models in general.

binyu · 2026-06-02T04:16:27 1780373787

Yeah, you'd need to use asymmetric quantization and other software techniques.