The article’s title is misleading: “The Man Who Created a Written Language for the Cherokee Did It So Efficiently and Elegantly, His Peers Thought It Was Magic.”
His peers thought it was magic because they were unfamiliar with the concept of writing, not because his writing system was so efficient. He was put on trial for witchcraft because people thought he was communicating via magic. https://education.nationalgeographic.org/resource/sequoyah-a....
For those just encountering this like me, the man in question was Sequoyah, a monolingual Cherokee. His own tribe put him on trial, being overseen by his Chief.
Slightly different from what I’d normally assume had happened from just reading the above comment.
Really impressive on his part, basically saw it was possible and looked as some examples of what others had done, then got to work.
The notion that Sequoyah was a monolingual Cherokee is dubious. He had a European father (though he was raised with his mother) and worked as a trader and served in the U.S. Army. His cousin, to whom he presented his syllabary, was also half European, “George Lowery.” He had extensive contact with Europeans. Moreover, his syllabary includes adaptations of Latin, Greek, and Cyrillic letters. Part of the story is that he copied some character shapes from his wife’s family’s Bible. (Presumably they could read English if they had a Bible.)
He was obviously exposed to a variety of European writing. He completed his syllabary in 1821, many years after his military service. It seems highly unlikely that someone who was so linguistically gifted to be able to invent a syllabary would not have picked up some familiarity with spoken and written English through that exposure.
This article does a good job of reviewing the conflicting narratives of his history: https://www.jstor.org/stable/26467045. It’s all very uncertain, and there’s a lot of mythology.
> Eventually, he hit on 86 syllables that expressed specific sounds, each syllable represented by symbols borrowed from Greek, Hebrew and English. Later reduced to 85 symbols...
>The syllabary was widely lauded, as its phonetic accuracy and simplicity made it far easier to grasp than English.
I mean, that feels like it's bound to happen when an alphabet is built to represent current language or pronunciation. English is notoriously awful for not doing that.
Fun fact: all (non-Cherokee?) alphabets in use today stem from an ancient Canaanite alphabet called the proto-Sinaitic script [1]. This is why Hebrew's alphabet near-perfectly phonetically represents the spoken language: Hebrew is just a dialect of Canaanite, and all Canaanite dialects are mutually intelligible, and alphabets were invented to represent spoken Canaanite. As the alphabet was cribbed by the Greeks (who were taught a simplified version by seafaring Canaanites — the Phoenicians — and termed it the "Phoenician alphabet" [2] despite the Phoenicians not specifically inventing it), significant alterations had to be made and it's been an imperfect match for most Western languages ever since.
"This is why Hebrew's alphabet near-perfectly phonetically represents the spoken language" - nonsense. That's just because modern Hebrew is based on the written language and thus reflects spelling pronunciation rather than historical pronunciation.
Also, proto-Sinaitic is not an alphabet. That's why Persian writing became harder to read when they switched from the nearly alphabetic Old Persian cuneiform to Aramaic abjad descended from proto-Sinaitic.
It wasn't directly cribbed (unlike Western alphabets), but given that Hangul was invented in the 1400s after exposure to Western alphabets, most scholars still consider alphabets to have only been invented once [1] and then copied, much like the wheel. Although I suppose that's true of Cherokee too!
My understanding is it’s the earliest known alphabet but not the ancestor to all alphabetic languages as there are Asian and other alphabetic languages that are not derived from western or Arabic alphabets. Specifically Greek and Latin alphabets and their descendants are based on it. Specifically Japanese Hiragana and Katakana are syllabic alphabets derived from kanji (and Chinese pictograms) as a simplification of the pictographic language and not derived from proto sinaitic. Others are possibly linked, like Thai, Khmer, etc through an Aramaic -> Brami-> Pallava->Khmer linkage but the Brami link is not fully established to be true.
No: most scholars believe alphabets were only invented once, much like the wheel. All Western alphabets are direct descendants, and the non-Western alphabets were directly inspired by it. [1]
Phonetic alphabets were introduced to most of Asia by various Brahmic scripts; the most widely-used (albeit briefly-used) one being the Mongolian Phags-pa script [2], derived from Tibetan, derived from various Brahmic scripts, derived from Aramaic, derived from Phoenician, derived from — sure enough — proto-Sinaitic. Thai and Khmer are derived from Pallava [3], which is derived from Tamil-Brahmi, derived from other Brahmic scripts, again derived from Aramaic and thus eventually from proto-Sinaitic; etc etc.
Technically, the proto-Sinaitic script is an abjad, with the Greek alphabet being the first true alphabet (symbols for both consonants and vowels).
Proto-Sinaitic/Phoenician can be described as the “first alphabetic system,” Greek the “first true alphabet.”
Fun fact: Greek is the world’s oldest recorded living language.
The Greek alphabet has been in use for approximately 2,800 years; previously, Greek was recorded in writing systems such as Linear B and the Cypriot syllabary.
"and all Canaanite dialects are mutually intelligible": That is the definition of a dialect.
Also, I don't know how you can claim Hebrew is phonetically represented by its alphabet rather than the other way around, as a revived language the pronunciations are largely a matter of convention based on Yiddish. It would be more accurate to say that modern Hebrew uses an ancient writing system, which happens to be closely related to the ancestor of modern European alphabets.
This is like speciation but for languages: there's no "ah-ha!" moment, but we know a lemur can't produce viable offsprings with a zebra. Likewise we know Italian isn't French even though some words are kinda similar. If you want to be technical about it, it's a spectrum: I understand British people and people from the American deep South, but it's far from certain they will understand each other. Hard to be precise with social sciences.
That said, two people who understand each other are, by any reasonable definition, speaking dialects of the same tongue (if not, obviously, the very same dialect).
Egyptian hieroglyphics already had alphabetic elements, and the canaanites borrowed those: https://en.wikipedia.org/wiki/Egyptian_hieroglyphs (“Egyptian hieroglyphs are the ultimate ancestor of the Phoenician alphabet, the first widely adopted phonetic writing system”).
Egyptian heiroglyphs were not an alphabet, even if they had alphabetic elements (in addition to pictographic ones). Scholars generally agree that proto-Sinaitic was the first alphabet, and all subsequent alphabets used today are either direct descendants or directly inspired by it. https://en.wikipedia.org/wiki/History_of_the_alphabet
The article said nothing like that. It was an original script invented by a Tibetan monk at the paid directions of Kublai Khan.
> Descending from Tibetan script, it is part of the Brahmic family of scripts, which includes Devanagari and scripts used throughout Southeast Asia and Central Asia.[5] It is unique among Brahmic scripts in that it is written from top to bottom,[5] as how classical Chinese used to be written, and as the Mongolian alphabet or later Manchu alphabet is still written.
> The origin of the script is still much debated, with most scholars stating that Brahmi was derived from or at least influenced by one or more contemporary Semitic scripts. Some scholars favour the idea of an indigenous origin,[19] or connection to the much older and as yet undeciphered Indus script[20][21] but the evidence is insufficient at best.
There's an International Phonetic Alphabet for transcribing speech literally.[1]
Automation is now available. Languages to IPA, IPA to various languages, text to speech, speech to text, evaluation of pronunciation.
The IPA still relies on convention to transcribe sounds. There's plenty of academic papers out there describing lesser studied languages and, if those conventions don't yet exist, the papers often contradict each other.
A writing system that used strict phonetic transcription for everything would be unusably bad. Everyone pronounces words differently than the writing system prescribes, in every language. Words are shortened and blended together constantly in connected speech.
That's not quite what I meant by unusably bad, though that does have its own set of challenges for sure. I was just in Toronto for the first time and appreciated the designers of the Ojibwe Latin alphabet for pulling it off without diacritics.
What's happening with your example is just that the symbols chosen for the phonemic transcription are non-Latin so they're unfamiliar to read aloud and harder to type for non-speakers. What I meant was if we all wrote with all of our individual idiosyncrasies of speech without converging on a prescribed standard (a writing system separate from speech transcription).
"Amnu ge sum'm frum upsterz, gimmi u sek" but even more so, with IPA characters for all the 40-odd individual sounds of my dialect of English - then you write your response in the same level of phonetic detail. Exactly what a writing system shouldn't do.
English is three* languages in a trenchcoat, all languages borrow but English in particular is a cobbled together mess. Like a salors' pidgin language except instead of sailors, driven by the ruling class of Britain at the boundary of several language families who kept conquering each other.
English is a West Germanic language with vocabulary from other languages, primarily French and Latin. But most of the core words are Germanic. It is not a pidgin whose defining feature is simplified grammar.
"Ye Olde Mill" or whatever archaic silliness you'll find at fairs and whatnot was the result of the printing press dropping þ (as in þe, þ is just th-) and was never supposed to be pronounced with a "y" sound.
"Ye Olde" ye was not the same word as "Hear ye, hear ye!", that ye is a plural 'you' basically the same word as "y'all" and never had a thorn.
It’s great compression: Y sometimes a vowel, sometimes a consonant.
And while not encoded on a keyboard, it still blows my mind that English has a crazy number of past tenses - and a such a bad hack of a future tense that it’s hard to classify as such.
You just press backspace and hit the accent mark key or for a printing press stack the accent mark on top of the letter. People ditched accents because they were rarely used in English writing (only really being used for some loanwords), not because simplifications were forced by typewriters or the printing press (which handle non-English languages just fine).
Amazing "By 1825, the majority of Cherokees could read and write in their newly developed orthography.[5]". It even has a reference so it must be true.
Claude Shannon talks about this in A Mathematical Theory of Communication. He defines redundancy as one minus relative entropy, where relative entropy is the ratio of the language's actual average uncertainty per symbol to the maximum possible uncertainty if all alphabet symbols were completely random and equally likely.
He gives some rather cute examples, like the language of Finnegans Wake by Joyce being very low redundancy (high efficiency in your words). He also states that crossword puzzles don't work in a perfectly efficient language, that 50% redundancy is pretty good for 2-d puzzles, and 33% redundancy good for 3-d puzzles. This has always been one of my favorite and in my mind most random corollaries in a paper.
I feel like you're going to run up against the definitions of "efficient" and "compression".
For example, a language with a larger alphabet will be able to express more in fewer characters. Is that more efficient?
Similarly, you could think of each word as a sort of lookup table for information in the mind of the reader. We don't define words as we're writing, we expect the speaker to know them already. If a language has more words, each word is more precise, and fewer words can be used to express an idea—but is that efficiency? You're just relying on the reader having more preexisting knowledge.
You are citing a single use of em-dashes in a single 30 year old article as proof of something.
If anything, the length of that article shows how rarely em-dashes were used by most writers. They're like exclamatory versions of semicolons, a contrived sudden interruption, a sort of inversion of the three dot "…" elipsis. Maybe the em-dash cracked and fell on the floor.
The reason LLMs use a lot of em-dashes is because that's a format they've chosen for output. Thinking that LLMs have a lot of em-dashes because works in the wild have a lot of em-dashes is like thinking that LLM output has a lot of emoticons because a lot of essayists use emoticons to mark subject divisions in the text.
If you read a lot of books, particularly older ones, you'll find em dashes in all kinds of writing and used often. It's functional punctuation that once you understand you may even find yourself using it (and then being accused of being an AI, lol)
His peers thought it was magic because they were unfamiliar with the concept of writing, not because his writing system was so efficient. He was put on trial for witchcraft because people thought he was communicating via magic. https://education.nationalgeographic.org/resource/sequoyah-a....
Slightly different from what I’d normally assume had happened from just reading the above comment.
Really impressive on his part, basically saw it was possible and looked as some examples of what others had done, then got to work.
This article does a good job of reviewing the conflicting narratives of his history: https://www.jstor.org/stable/26467045. It’s all very uncertain, and there’s a lot of mythology.
Maybe the symbols themselves aren't new.
https://en.wikipedia.org/wiki/Cherokee_syllabary#Unicode
You (or anyone else here) might enjoy Omniglot, an old web 1.0 site that was amazing for its comprehensive treatment of all writing scripts:
https://www.omniglot.com/writing/cherokee.htm
I mean, that feels like it's bound to happen when an alphabet is built to represent current language or pronunciation. English is notoriously awful for not doing that.
1: https://en.wikipedia.org/wiki/Proto-Sinaitic_script
2: https://en.wikipedia.org/wiki/Phoenician_alphabet
Also, proto-Sinaitic is not an alphabet. That's why Persian writing became harder to read when they switched from the nearly alphabetic Old Persian cuneiform to Aramaic abjad descended from proto-Sinaitic.
1: https://en.wikipedia.org/wiki/History_of_the_alphabet
https://en.wikipedia.org/wiki/Bopomofo
Phonetic alphabets were introduced to most of Asia by various Brahmic scripts; the most widely-used (albeit briefly-used) one being the Mongolian Phags-pa script [2], derived from Tibetan, derived from various Brahmic scripts, derived from Aramaic, derived from Phoenician, derived from — sure enough — proto-Sinaitic. Thai and Khmer are derived from Pallava [3], which is derived from Tamil-Brahmi, derived from other Brahmic scripts, again derived from Aramaic and thus eventually from proto-Sinaitic; etc etc.
1: https://en.wikipedia.org/wiki/History_of_the_alphabet
2: https://en.wikipedia.org/wiki/%CA%BCPhags-pa_script
3: https://en.wikipedia.org/wiki/Pallava_script
Proto-Sinaitic/Phoenician can be described as the “first alphabetic system,” Greek the “first true alphabet.”
Fun fact: Greek is the world’s oldest recorded living language.
The Greek alphabet has been in use for approximately 2,800 years; previously, Greek was recorded in writing systems such as Linear B and the Cypriot syllabary.
Also, I don't know how you can claim Hebrew is phonetically represented by its alphabet rather than the other way around, as a revived language the pronunciations are largely a matter of convention based on Yiddish. It would be more accurate to say that modern Hebrew uses an ancient writing system, which happens to be closely related to the ancestor of modern European alphabets.
See https://en.wikipedia.org/wiki/Revival_of_the_Hebrew_language
I dunno, some English dialects don't seem particularly intelligible to me, and I'm a natively fluent speaker of it.
That said, two people who understand each other are, by any reasonable definition, speaking dialects of the same tongue (if not, obviously, the very same dialect).
Counterexample: Korean Hangul [0]
[0] https://en.wikipedia.org/wiki/Hangul
https://www.amazon.com/A-to-Z-Season-1/dp/B0CWCHTM3B
Episode 2 then covers the printing press.
https://en.wikipedia.org/wiki/%CA%BCPhags-pa_script
> Descending from Tibetan script, it is part of the Brahmic family of scripts, which includes Devanagari and scripts used throughout Southeast Asia and Central Asia.[5] It is unique among Brahmic scripts in that it is written from top to bottom,[5] as how classical Chinese used to be written, and as the Mongolian alphabet or later Manchu alphabet is still written.
https://en.wikipedia.org/wiki/%CA%BCPhags-pa_script
> The origin of the script is still much debated, with most scholars stating that Brahmi was derived from or at least influenced by one or more contemporary Semitic scripts. Some scholars favour the idea of an indigenous origin,[19] or connection to the much older and as yet undeciphered Indus script[20][21] but the evidence is insufficient at best.
https://en.wikipedia.org/wiki/Brahmi_script
So maybe, but probably not and this particular language though it has roots elsewhere of debated origin was an original spontaneous creation.
[1] https://easypronunciation.com/en/english-phonetic-transcript...
A writing system that used strict phonetic transcription for everything would be unusably bad. Everyone pronounces words differently than the writing system prescribes, in every language. Words are shortened and blended together constantly in connected speech.
This is, for better or worse, what is being done to incorporate aboriginal names into things like streets and bridges in places like Vancouver.
- [stal̕əw̓asəm Bridge](https://en.wikipedia.org/wiki/Stal%CC%95%C9%99w%CC%93as%C9%9...) - [šxʷməθkʷəy̓əmasəm Street](https://vancouver.ca/news-calendar/musqueamview-street-signs...)
I see the practicalities of adopting this IPA-lite form, but it's a struggle to use, even though I've previously been trained in IPA.
What's happening with your example is just that the symbols chosen for the phonemic transcription are non-Latin so they're unfamiliar to read aloud and harder to type for non-speakers. What I meant was if we all wrote with all of our individual idiosyncrasies of speech without converging on a prescribed standard (a writing system separate from speech transcription).
"Amnu ge sum'm frum upsterz, gimmi u sek" but even more so, with IPA characters for all the 40-odd individual sounds of my dialect of English - then you write your response in the same level of phonetic detail. Exactly what a writing system shouldn't do.
*(or 7 or whatever number makes you feel best)
In Unicode, that's ſ and þ. Both historical English letters that are no longer used.
"Ye Olde" ye was not the same word as "Hear ye, hear ye!", that ye is a plural 'you' basically the same word as "y'all" and never had a thorn.
And while not encoded on a keyboard, it still blows my mind that English has a crazy number of past tenses - and a such a bad hack of a future tense that it’s hard to classify as such.
Linguistics is fun. The accents are alright.
This was caused by the printing press and the typewriter (keyboard) both of which forced simplifications in the written English language.
He gives some rather cute examples, like the language of Finnegans Wake by Joyce being very low redundancy (high efficiency in your words). He also states that crossword puzzles don't work in a perfectly efficient language, that 50% redundancy is pretty good for 2-d puzzles, and 33% redundancy good for 3-d puzzles. This has always been one of my favorite and in my mind most random corollaries in a paper.
https://people.math.harvard.edu/~ctm/home/text/others/shanno...
For example, a language with a larger alphabet will be able to express more in fewer characters. Is that more efficient?
Similarly, you could think of each word as a sort of lookup table for information in the mind of the reader. We don't define words as we're writing, we expect the speaker to know them already. If a language has more words, each word is more precise, and fewer words can be used to express an idea—but is that efficiency? You're just relying on the reader having more preexisting knowledge.
[0]https://en.wikipedia.org/wiki/Ithkuil
[1]https://news.ycombinator.com/item?id=29036441
Now, if you want to say that they wrote in the same annoyingly pretentious way that AIs often do, I could agree with that...
https://www.smithsonianmag.com/arts-culture/review-of-the-pr...
Where do you think LLM's learned these things from? They are widely used in literary writing. Like magazines and books.
If anything, the length of that article shows how rarely em-dashes were used by most writers. They're like exclamatory versions of semicolons, a contrived sudden interruption, a sort of inversion of the three dot "…" elipsis. Maybe the em-dash cracked and fell on the floor.
The reason LLMs use a lot of em-dashes is because that's a format they've chosen for output. Thinking that LLMs have a lot of em-dashes because works in the wild have a lot of em-dashes is like thinking that LLM output has a lot of emoticons because a lot of essayists use emoticons to mark subject divisions in the text.