The Evolution of Shiki v1.0

Shiki is a syntax highlighter that uses TextMate grammars and themes, the same engine that powers VS Code. It provides one of the most accurate and beautiful syntax highlighting for your code snippets. It was created by Pine Wu back in 2018, when he was part of the VS Code team. It started as an experiment to use Oniguruma to do syntax highlighting.

Different from existing syntax highlighters like Prism and Highlight.js that designed to run in the browser, Shiki took a different approach by highlighting ahead of time. It ships the highlighted HTML to the client, producing accurate and beautiful syntax highlighting with zero JavaScript. It soon took off and became a very popular choice, especially for static site generators and documentation sites.

While Shiki is awesome, it's still a library that is designed to run on Node.js. This means it a bit limited to only highlight static code and would find trouble with dynamic code, because it can't work in the browser. This is also because Shiki relies on the WASM binary of Oniguruma, as well as a bunch of heavy grammar and theme files in JSON. It uses Node.js filesystem and path resolution to load these files, which is not accessible in the browser.

To improve that situation, I started this RFC that later landed with this PR and shipped in Shiki v0.9. While it abstracted the file loading layer to use fetch or filesystem based on the environment, it's still quite complicated to use as you need to serve the grammars and theme files somewhere in your bundle or CDN manually and call the setCDN method to tell Shiki where to load these files.

The solution is not perfect but at least it made it possible to run Shiki on browser to highlight dynamic content. We have been using that approach since then - until the story of this article began.

The Start

Nuxt is putting a lot effort on pushing the web to the edge, making the web more accessible with lower latency and better performance. Edge hosting services like CloudFlare Workers are deployed all over the world, like CDN servers, users can get the content from the nearest edge server without the round trips to the origin server which could be thousands of miles away. With the awesome benefits it provides, it also comes with some trade-offs. Edge servers use a restricted runtime environment, for example, CloudFlare Workers does not support file system access and usually don't preserve the state between requests. While Shiki's main overhead is loading the grammars and themes upfront, that wouldn't work well in the edge environment.

It's started with a chat between Sébastien and me. That we are trying to make Nuxt Content which uses Shiki to highlight the code blocks, to work on the edge.

Chat History Between Sébastien and Anthony

I started the experiments by patching Shiki locally, to convert the grammars and themes into ECMAScript Module (ESM) so that it can be understood and bundled by the build tools, to create the code bundle for CloudFlare Workers to consume without reading the filesystem nor making network requests.

Before - Read JSON assets from filesystem

import fs from 'fs/promises'

const cssGrammar = JSON.parse(await fs.readFile('../langs/css.json', 'utf-8'))

After - Using ESM import

const cssGrammar = await import('../langs/css.mjs').then(m => m.default)

We need to wrap the JSON files into ESM as inline literal so that we can use import() to dynamically import them. The different is that import() is a standard JavaScript feature that works everywhere JavaScript runs, while fs.readFile is a Node.js specific API that only works in Node.js. Having import() statically would also make bundlers like Rollup and webpack able to construct the module relationship graph and emit the bundled code in chunks.

Then I realized that it's actually more than tha to get the worker support properly. Because bundlers expect imports to be resolvable at build time, meaning that in order to support all the languages and themes, we need to listing all the import statements to every single grammar and theme files in the codebase. This would end up with a huge bundle size with a bunch of grammars and themes that you might not actually use. This problem is particular important in edge environment, where the bundle size is critical to the performance.

So, we need to figure out a better middle ground to make it works better.

The Fork - Shikiji

Knowing this might fundamentally change the way Shiki works, and we don't want to risk breaking the existing Shiki users with our experiments, I started a fork of Shiki called Shikiji. Trying to get revise the previous design decisions and start from scratch. The goal is to make Shiki runtime-agnostic, performant and efficient, like the philosophy we had in UnJS.

To make that happen, we need to make Shikiji completely ESM-friendly, pure and tree-shakable. This goes all the way up to the dependencies of Shiki, both vscode-oniguruma and vscode-textmate are provide in Common JS (CJS) format. vscode-oniguruma also contains a WASM binding generated by emscripten that contains dangling promises that will make CloudFlare Workers failed to finish the request. We ended up by embedding the WASM binary into a base64 string and ships as ES module, manually rewrite the WASM binding to avoid dangling promises, and vendored vscode-textmate to compile from it's source code and produce the efficient ESM output.

The end result is pretty promising, we managed to get Shikiji works on any runtime environment, even possible to import it from CDN and run in the browser with a single line of code.

We also took the chance to improve the API and the internal architecture of Shiki. We switched from simple string concatenation to use hast, and Abstract Syntax Tree (AST) format for HTML, for generating the HTML output. This opens up the possibility of having the Transformers API that allows the capability to modify the intermediate HAST and do many cool integrations that would be very hard to achieve previously.

Dark/Light mode support was a frequently requested feature. Because of the static approach Shiki takes, it won't be possible to change the theme on the fly at rendering. The solution in the past was to generate the hightlighted HTML twice, and toggle the visibility of them based on the user's preference - it wasn't efficient as it duplicate the payload, or use CSS variables theme which lose the granular highlighting Shiki is great for. With the new architecture Shikiji has, I took a step back and rethought the problem, and came up with the idea of breaking down the common tokens and merge multiple themes as inlined css variable, provides the efficient output while aligning with the Shiki's philosophy. You can learn more about it in Shiki's documentation.

To make the migration easier, we also created the shikiji-compact compatibility layer, that uses Shikiji's new fundation and provides backward compatibility API for Shiki users to migrate easier.

To get Shikiji work for Cloudflare Workers, we get one extra challenge is that Cloudflare Workers does not support initiating WASM instance from inlined binary data, but requires to import the static .wasm asset, for the security reason. This means that our "All-in-ESM" approach does not work well on CloudFlare. This would requires extra work on users to provide different WASM source, which makes the experience not as smooth as we expected. At this moment, Pooya Parsa step in and made the universal layer unjs/unwasm, which supports the upcoming WebAssembly/ES Module Integration proposal. It has been integrated into Nitro to have automated WASM targets. We hope that unwasm could help the ecosystem to have a better experience with WASM that also align with the standard to be future-proof.

Overall, the Shikiji rewrite seems to worked quite well, we have many tools like Nuxt Content, VitePress, Astro even migrated to it, and the feedbacks we got are very positive. That verifies our rewrite is the direction to go, and it should solve many problems the other users might had in the past.

Merging Back

While I am already a team member of Shiki and was helping to do the releases from time to time, Pine was still the lead of the directions of Shiki. While he was busy on other stuff, the iterations of Shiki has been slow down. During the experiments in Shikiji, I proposed a few improvements that could help Shiki to get a more modern structure. While generally everyone agree with that direction, there would be quite a lot work to do and no one really started to work on that.

While we are happy to use Shikiji to solve the problems we had, we certainly don't want to see the community been separated by two different versions of Shiki. So I made a call with Pine, explained what we have changed in Shikiji and what problem the solve. We made the consensus to merge the two projects into one, and soon we have:

feat!: merge Shikiji back into Shiki for v1.0 #557

We are really happy to see that our work in Shikiji has been merged back to Shiki, that not only works for ourselves, but also make benefit the entire community. With the merge, it solves around 95% of the open issue we had in Shiki for years:

Shikiji Merged Back to Shiki

Shiki now also got a brand new documentation site where you can also play it right in your browser (thanks to the agnostic approach!). Many frameworks now has built-in integration with Shiki, maybe you are already using it somewhere!

Twoslash

Twoslash is an integration tool to reterice type information from TypeScript Language Services and generated to your code snippet. It essentially make your static code snippet to have hover type information similiar to your VS Code editor. It's made by Orta Therox for the TypeScript documentation site, there you can find the original source code here. Orta also created the Twoslash integration for Shiki v0.x versions. Back then, Shiki did not have proper plugin system, that makes the shiki-twoslash had to be built as a wrapper over Shiki, make it a bit hard to set up as the existing Shiki integrations won't directly work with Twoslash.

// TODO:

Conclusions

Our mission at Nuxt is not only make a better framework for developers, but also to make the entire frontend and web ecosystem a better place. We are keeping pushing the boundaries and endorse the modern web standards and best practices. We hope you enjoy the new Shiki, unwasm, Twoslash and many other tools we made in the process of making Nuxt and the web better.

← Back to blog

Nuxt 3.10

Nuxt 3.10 is out - packed with features and fixes. Here are a few highlights.