Every example I thought "yeah, this is cool, but I can see there's space for improvement" — and lo! did the author satisfy my curiosity and improve his technique further.
Bravo, beautiful article! The rest of this blog is at this same level of depth, worth a sub: https://alexharri.com/blog
> It may seem odd or arbitrary to use circles instead of just splitting the cell into two rectangles, but using circles will give us more flexibility later on.
I still don’t really understand why the inner part of the rectangle can’t just be split in a 2x3 grid. Did I miss the explanation?
It reminds me quite a bit of collision engines for 2D physics/games. Could probably find some additional clever optimisations for the lookup/overlap (better than kd-trees) if you dive into those. Not that it matters too much. Very cool.
This is amazing all round - in concept, writing, and coding (both the idea and the blog post about it).
I feel confident stating that - unless fed something comprehensive like this post as input, and perhaps not even then - an LLM could not do something novel and complex like this, and will not be able to for some time, if ever. I’d love to read about someone proving me wrong on that.
There's a lot of nitty gritty concerns I haven't dug into: how to make it fast, how to handle colorspaces, or like the author mentions, how to exaggerate contrast for certain scenes. But I think 99% of the time, it will be hard to beat chafa. Such a good library.
I did something very similar to this (searching for similar characters across the grid, including some fuzzy matching for nearby pixels) around 1996. I wonder if I still have the code? It was exceedingly slow, think minutes for a frame at the Pentiums of the time.
What a great post. There is an element of ascii rendering in a pet project of mine and I’m definitely going to try and integrate this work. From great constraints comes great creativity.
What about the explanation presented in the next paragraph?
> Consider how an exponent affects values between 0 and 1. Numbers close to experience a strong pull towards while larger numbers experience less pull. For example 0.1^2=0.01, a 90% reduction, while 0.9^2=0.81, only a reduction of 10%.
That's exactly the reason why it works, it's even nicely visualized below. If you've dealt with similar problems before you might know this in the back of your head. Eg you may have had a problem where you wanted to measure distance from 0 but wanted to remove the sign. You may have tried absolute value and squaring, and noticed that the latter has the additional effect described above.
It's a bit like a math undergrad wondering about a proof 'I understand the argument, but how on earth do you come up with this?'. The answer is to keep doing similar problems and at some point you've developed an arsenal of tricks.
In general for analytic functions like e^x or x^n the behaviour of the function on any open interval is enough to determine its behaviour elsewhere. By extension in mathematics examining values around the fundamental additive and multiplicative units \{ 0, 1 \} is fruitful in illustrating of the quintessential behaviour of the function.
Author here. There isn't a library around this yet, but the source code for the blog is open source (MIT licensed): https://github.com/alexharri/website
The code for this post is all in PR #15 if you want to take a look.
It's important to note that the approach described focuses on giving fast results, not the best results.
Simply trying every character and considering their entire bitmap, and keeping the character that reduces the distance to the target gives better results, at the cost of more CPU.
This is a well known problem because early computers with monitors used to only be able to display characters.
At some point we were able to define custom character bitmap, but not enough custom characters to cover the entire screen, so the problem became more complex.
Which new character do you create to reproduce an image optimally?
And separately we could choose the foreground/background color of individual characters, which opened up more possibilities.
Yeah, this is good to point out. The primary constraint I was working around was "this needs to run at a smooth 60FPS on mobile devices" which limits the type and amount of work one can do on each frame.
I'd probably arrive at a very different solution if coming at this from a "you've got infinite compute resources, maximize quality" angle.
Thinking more about the "best results". Could this not be done by transforming the ascii glyphs into bitmaps, and then using some kind of matrix multiplication or dot production calculation to calculate the ascii character with the highest similarity to the underlying pixel grid? This would presumably lend itself to SIMD or GPU acceleration. I'm not that familiar with this type of image processing so I'm sure someone with more experience can clarify.
You said “best results”, but I imagine that the theoretical “best” may not necessarily be the most aesthetically pleasing in practice.
For example, limiting output to a small set of characters gives it a more uniform look which may be nicer. Then also there’s the “retro” effect of using certain characters over others.
And a (the?) solution is using an algorithm like k-means clustering to find the tileset of size k that can represent a given image the most faithfully. Of course that’s only for a single frame at a time.
In the appendix, he talks about reducing the lookup space by quantising the sampled points to just 8 possible values. That allowed him to make a look up table about 2MB in size which were apparently incredibly fast.
I've been working on something similar (didn't get to this stage yet) and was planning to do something very similar to the circle-sampling method but the staggering of circles is a really clever idea I had never considered. I was planning on sampling character pixels' alignment along orthogonal and diagonal axes. You could probably combine these approaches. But yeah, such an approach seemed particularly powerful for the reason you could encode it all in a table.
Nice! Now add colors and we can finally play Doom on the command line.
More seriously, using colors (not trivial probably, as it adds another dimension), and some select Unicode characters, this could produce really fancy renderings in consoles!
At least six dimensions, right? For each character, color of background, color of foreground, and each color has at least three components. And choosing how the components are represented isn’t trivial either - RGB probably isn’t a good choice. YCoCg?
I had been thinking of messing around with a DOM-based ‘console’ in Tauri that could handle a lot more font manipulation for a pseudo-TUI application similar to this. It's definitely possible! It would be even simpler to do in TS.
Bravo, beautiful article! The rest of this blog is at this same level of depth, worth a sub: https://alexharri.com/blog
> It may seem odd or arbitrary to use circles instead of just splitting the cell into two rectangles, but using circles will give us more flexibility later on.
I still don’t really understand why the inner part of the rectangle can’t just be split in a 2x3 grid. Did I miss the explanation?
I feel confident stating that - unless fed something comprehensive like this post as input, and perhaps not even then - an LLM could not do something novel and complex like this, and will not be able to for some time, if ever. I’d love to read about someone proving me wrong on that.
It reminds me of how chafa uses an 8x8 bitmap for each glyph: https://github.com/hpjansson/chafa/blob/master/chafa/interna...
There's a lot of nitty gritty concerns I haven't dug into: how to make it fast, how to handle colorspaces, or like the author mentions, how to exaggerate contrast for certain scenes. But I think 99% of the time, it will be hard to beat chafa. Such a good library.
EDIT - a gallery of (Unicode-heavy) examples, in case you haven't seen chafa yet: https://hpjansson.org/chafa/gallery/
and damn that article is so cool, what a rabbithole.
I think there's a small problem with intermediate values in this code snippet:
Replace x by value.GitHub: https://github.com/symisc/ascii_art/blob/master/README.md Docs: https://pixlab.io/art
It probably has a different looking result, though.
How do you arrive at that? It's presented like it's a natural conclusion, but if I was trying to adjust contrast... I don't see the connection.
> Consider how an exponent affects values between 0 and 1. Numbers close to experience a strong pull towards while larger numbers experience less pull. For example 0.1^2=0.01, a 90% reduction, while 0.9^2=0.81, only a reduction of 10%.
That's exactly the reason why it works, it's even nicely visualized below. If you've dealt with similar problems before you might know this in the back of your head. Eg you may have had a problem where you wanted to measure distance from 0 but wanted to remove the sign. You may have tried absolute value and squaring, and noticed that the latter has the additional effect described above.
It's a bit like a math undergrad wondering about a proof 'I understand the argument, but how on earth do you come up with this?'. The answer is to keep doing similar problems and at some point you've developed an arsenal of tricks.
The code for this post is all in PR #15 if you want to take a look.
https://github.com/cacalabs/libcaca
Simply trying every character and considering their entire bitmap, and keeping the character that reduces the distance to the target gives better results, at the cost of more CPU.
This is a well known problem because early computers with monitors used to only be able to display characters.
At some point we were able to define custom character bitmap, but not enough custom characters to cover the entire screen, so the problem became more complex. Which new character do you create to reproduce an image optimally?
And separately we could choose the foreground/background color of individual characters, which opened up more possibilities.
I'd probably arrive at a very different solution if coming at this from a "you've got infinite compute resources, maximize quality" angle.
For example, limiting output to a small set of characters gives it a more uniform look which may be nicer. Then also there’s the “retro” effect of using certain characters over others.
More seriously, using colors (not trivial probably, as it adds another dimension), and some select Unicode characters, this could produce really fancy renderings in consoles!