Tool call as code call

This is the story of how the tooliscode repo came to be.

The backstory

In my last couple of weeks at work, it felt we might be holding tool calls wrong sometimes. Especially when it involved giant responses.

First, while trying an internal MCP that did SQL queries, the context window kept blowing up. But after replacing the MCP with an SDK that the LLM could write code to call (and write result to a file and process), things were much better!

Next, Claude launched context management with a way to trim tool call responses.

Finally, that same week Cloudflare wrote about it best in Code Mode.

Since then, I wished for a generic python library that would transparently turn tool calls into code calls with a lightweight runtime under the hood.

Coding it up

When I found myself with time at hand, I tried to vibe code my way into building one.

Since this is code generated by LLMs, I wanted the runtime to be isolated (no network access, r/w access to just the shared per-session dir, etc), but wanted to avoid dealing with docker-style containers if possible.

GPT-5 nudged me to use the Web Assembly (wasm) based wasmtime / cpython-wasm stack. It was interesting to learn that wasm stuff was now available outside the browser and on python. Apparently, it is a different set of standard libraries with restricted support for C-based libraries, but fine for pure python. It was wasm time!

I used gpt-5-codex-medium to vibe code it. Much of the struggle had to do with codex not following the rules of operating within the wasm runtime. Pointed questions to GPT-5-thinking (ChatGPT) was helpful!

How it works

Given a new LLM session with a set of function tools:

Generate an SDK that represents the function tools. The SDK will run in the guest, but its implementation will call back to the host for executing the user function.
Spin up a stateful (for the duration of the session) guest instance with a shared directory, the generated SDK and helpers.
When the LLM generates code, host ships it to the guest to execute it, and stream stdout/stderr back

How hard can it be

I was surprised at the subtle issues in implementing a “simple library to execute some python code” :).

First, it seems the only way to communicate with the “guest” was via stdin/stdout or traditional files (not fifo, unix domain socket, etc). Next, you need to come up with on-wire protocol (jupyter gives this for free). Then, in the guest you execute the generated code and capture its stdout in a string (to ship back to the LLM), but that code also calls back to the host using stdout (but that has to use the guest-level stdout)!

I was also surprised by how easy it is to handle statefulness: exec(compile(code, "", “exec”), G, G). (With session scoped globals in ‘G’, variables persist across code cells.)

Where its at

The basic version works, but I’m looking to test it with heavyweight, real MCP Servers, perhaps as part of my next project.

There are still things I’d like to implement like: concurrent calls with asyncio, agent-sdk integration, custom packages, shell runtime variant, etc.

p.s: a human wrote this :)