Porting KittenTTS to the Browser with OpenAI Codex

KittenTTS Web Demo

KittenTTS is a neat open-source text-to-speech model that packs a surprising amount of punch in sizes as low as 25 MB. It has over 11,000 stars on GitHub and for good reason — the output quality is impressive for something you can run locally. The catch? It's Python-only, needs pip install gymnastics, and you certainly can't just hand someone a link and have them try it out.

The KittenTTS folks say they have browser version planned, but I was a bit impatient and also was eager to find out if a modern AI agent like Codex could pull off a Javascript port. The plan was simple: you open a page, pick a voice, type some text, and hear it speak. No server, no Python, no installs, available to client side web projects easily.

Spoiler alert: It actually worked almost single shot (I had to prompt my way through the plan and run a bunch of pnpm install commands to work around the sandbox basically). The results are up in https://github.com/jokkebk/KittenTTS-JS, but read on to find out how I got there with ChatGPT Codex 5.3 doing the heavy lifting.

The Plan

I started by asking Codex how hard it would be to create a browser JS port. It studied the Python codebase, identified the pipeline stages, and came back with a surprisingly thorough analysis. The core pipeline is:

  1. Text preprocessing (cleanup, sentence chunking)
  2. Phonemization via eSpeak
  3. Token mapping (phonemes to integer IDs)
  4. ONNX model inference per chunk
  5. Waveform concatenation

The model itself is a single ONNX graph that takes tokenized phonemes, a voice style vector, and a speed parameter, and outputs raw audio samples directly. No separate vocoder needed. This is key — it meant the browser port just needed ONNX Runtime Web and the right input pipeline.

Codex also noted the tricky bits: voice embeddings stored in NumPy's .npz format (basically a ZIP of .npy files), and the eSpeak phonemizer which is a C library. Both have WebAssembly ports available, making the whole thing feasible. It laid out a 7-phase plan with API design, repo layout, risk mitigations, and estimated about 7–8 days for a robust V1. I had it write the plan to PLAN.md before proceeding — having that concrete reference turned out to be valuable.

Letting Codex Work

With the plan saved, I told it to go ahead. Here's roughly what happened over the next few hours, based on the commit history:

Read post

Novation Launchpad Mk3 Test Suite and Games

Launchpad Mini

tl;dr Wrote a HTML based debug tool and two mini games for Novation LaunchPad, try it out on Chrome if you have one:

https://tools.codeandlife.com/launchpad_test.html

Longer version (templated with Claude Sonnet using the github commit): I've been thinking about getting a MIDI controller to use as a Stream Deck type of device — not to do any music stuff, but use the midi messages to automate workflows. Not having found anything compact, robust and with cool twiddly knobs, I thought to beta test my actual interest by loaning a Novation Launchpad Mk3 from a friend.

I started by asking ChatGPT if we can control this thing with MIDI and the answer was yes. It even suggested a Python test script to try it out. I obliged and ran it, and it didn't work, but a few rounds of discussion revealed the culprit to be two input and output devices, and the script had defaulted on the wrong one.

Now Chrome nowadays has all these awesome local USB capabilities, so what about MIDI? Turns out, you can skip all Python package dependencies and just write a single HTML page that can connect to the launchpad and program it using the Web MIDI API!

The Launchpad Mk3 uses MIDI messages also to control the LEDs, so you can not just receive input, but actually control the 8x8 LED matrix with 127 colors (ChatGPT said first half is static and the other half pulsing colors, but I was not quite sure how this worked).

Snake and Blob Games

Once I had the basic LED control working and a HTML debug console that showed the MIDI messages flowing, I couldn't resist adding some games. Snake is the classic — use the arrow pads to guide the snake around the 8x8 grid. The Blob game is a variant I asked Claude Code to whip up for me, or actually my daughter, who still lacks the dexterity to play Snake, but might be happy with some animation if she succeeds in hitting the blob on the grid.

Read post

New Additions to tools.codeandlife.com

I've been doing some housekeeping on my website, moving various small tools and experiments that were scattered in subfolders to a proper home at tools.codeandlife.com. While at it, I used Claude Code to give most of them a UI refresh while keeping the core functionality intact.

Here's what was added:

Tool Description
Blackjack Trainer Practice blackjack basic strategy decisions
Piano Trainer Learn to read notes for left and right hand
Wordle Solver Find possible Wordle letter orderings from your guesses
Find Fraction Convert decimal numbers to fractions
Rush Hour Solver Solve Rush Hour sliding block puzzles
Firefighter Simple (PWA) arcade game for the Christmas holidays

Most of these are simple single-file HTML+JS apps that I've accumulated over the years. Nothing fancy, but occasionally useful. The full list is at tools.codeandlife.com.

Read post

New Look for 2025

You might have noticed things look a bit different around here. I finally got around to refreshing the blog design with the aid of Claude Code. Thanks for drafting this post also go the same way.

What's New

  • Dark and light mode — The site now respects your system preference. No more blinding white pages at 2 AM.

  • Retro-inspired styling — I went for a look that nods to the old-school web while staying readable. Monospace headers, clean lines, nothing fancy.

  • Search — There's now a search box in the header! Finding that one post about Arduino timers from 2014 should be much easier now. Try it out and see if Claude just hallucinated that joke!

  • Related posts — Single post pages now show related articles in the sidebar (desktop) and at the bottom (mobile). Hopefully this surfaces some older posts that might still be useful.

The underlying tech is still Eleventy with Liquid templates, but I did upgrade to newer version of 11ty with some AI help — Opus 4.5 did most of the work and even advised against upgrading to a fancier framework, which I appreciated greatly.

Let me know if anything looks broken on your end.

Read post

JoonasPihlajamaa.com updated

Very small cross-advertisement: I finally got around to revamp my personal website, eliminating a lot of autobiographical jabber. Such level of detail felt very needed in 1998ish when I first did my homepage and I had followed the tradition, but now we're in a more streamlined mode. Having said that, it's still pretty verbose.

Looks are now very clean, though smell AI generated (and rightly so), but it's not a heavy traffic site and most readers will miss the point, so I'll let it stand. It's not like the previous version was better, so I will take the improvement and spend twiddly-time elsewhere. :)

Check it out: https://joonaspihlajamaa.com

Read post

First Tools on tools.codeandlife.com: AI-Generated Zip Image Viewers

After setting up tools.codeandlife.com last week, I needed to actually create some tools for it. Simon Willison's detailed example of using LLMs for code inspired me — his https://tools.simonwillison.net/ has over 80 single-file HTML/JS apps, all built by prompting LLMs.

My first tool idea: a browser-based zip image viewer. I just updated JZipView (a native C app), and thought a pure HTML/JS version would be a nice complement.

Two AI-Generated Viewers

I had both Claude Opus 4.5 and ChatGPT 5.2 Pro create their own versions from the same basic prompt. The results:

Both needed some polishing with Claude Code (using Sonnet) to get them working similarly — handling drag-and-drop, keyboard navigation, and the usual edge cases. But the core functionality came out surprisingly well from both models.

  • You can drag a .zip file over, and it will use import { unzip } from 'https://esm.sh/fflate@0.8.2'; to decompress it and show a gallery view
  • Left click on an image opens it in "fit to view" mode
  • Another left click will open 1:1 mode and mouse gets captures so you can just mouse move around the image
  • Right click goes back to "fit to view" and again to grid mode
  • Arrow keys and scroll wheel allow to browse through the images

Read post