Roman Letters

(romanletters.org)

107 points | by diodorus 3 days ago

5 comments

vessenes 1 day ago
A few things about AI-led projects like this come to my mind — first, it’s cool to see all this pulled together. I’m sure the design will read “Claude 2026” soon, but that’s fine - it’s clean and generally has reasonable UX.
There are some real rough spots - for instance, the Latin texts are generated via OCR from scanned documents directly; they’re not from some other scholarly corpus that’s been checked. I only looked at a few, but they all have significant transcription difficulties. Sources are linked, and those sources seem to be archive.org scans. Of course, getting a fluid-sounding translation out of a somewhat shitty transcription is something AI will do for you happily, but it’s harder to get it to tell you where it’s gone off the rails.
That’s not the thing that comes to mind, though. What comes to mind is that projects like this are super useful scaffolding, and I hope it’s built as such. Transcription will get better. Actually I’m pretty sure it could be better now, given the output quality. Translations of better transcriptions will be better. Plus we will likely have higher quality translation tech available.
So, I’d like to see a project like this lean in to that iterative side of this kind of scholarship/hobby/historical work and make versioning and logging of updates part of the interface. Starting in the late 1990s many academic projects did this with large corpuses of documents, (I’m familiar at the least with the Yale Jonathan Edwards project), and used crowd sourced support — there’s no reason not to include facilities that interleave the AI and interested Latin/Roman scholars here.
In my mind with that done, this could turn into a genuinely useful tool. Which would be cool!
[-]
- craig_vg 4 hours ago
  Thank you so much for the feedback on this. I just implemented some updates that should better track our iterations and changes to the project. Obviously, GitHub does some tracking, but I've added some more formal updates.
  Additionally, I completely agree on the OCR issues. Long-term, the goal is to use higher quality OCR and a broader set of data to make this even more accessible to people.
  I'll note that the primary goal is as much scholarship as it is giving access to hobbyists or people interested in the data itself. If the data could get to the point where it is scholarly useful, of course that would be something I'd like to achieve.
- wongarsu 1 day ago
  I haven't checked any texts from the 500s. But I did some work with texts from the 1700s. Most of them had terrible transcriptions on archive.org, made using old tesseract versions. You could probably improve a lot with newer tesseract versions. I went for the nuclear option and just passed the image of each page (along with some context on how the previous page ended) to Qwen2.5vl:32b and got near-perfect transcriptions. And as you can tell by the old model that was months ago, vision models only got better.
  Of course in some cases vision models are a liability for OCR because the errors they do make are replaced by plausible sounding replacements instead of alphabet soup. But if you only use the transcription as input for an LLM that doesn't matter. It only becomes an issue of how much compute you are willing to throw at it
  [-]
  - vessenes 1 day ago
    Yes, exactly. What could be durable is not the specific transcription as of today - until it’s perfect or at least ‘good enough’ - but the web site, comments, and process that can be run and turn into improved results - that part seems likely to be valuable to me.
Rendello 3 days ago
What a cool project, I like this one where Pliny the Younger complains about a no-show at his dinner party:
https://romanletters.org/letters/pliny_younger/1015/
[-]
- zeusdclxvi 1 day ago
  This appears to be written to this guy: https://en.wikipedia.org/wiki/Gaius_Septicius_Clarus
- CGMthrowaway 3 days ago
  Had to look up "sow's matrices."
  > A "sow's matrix" (or vulva in Latin) is a dish from ancient Rome consisting of the uterus of a sow (a female pig), often specifically from one that has never farrowed or that was slaughtered shortly after farrowing. It was considered a delicacy among the wealthy elite and was a common dish served at lavish Roman banquets and dinner parties, often used as a sign of luxury, wealth, and status.
  [-]
serious_angel 1 day ago
The website looked as any LLM ("AI") generated one, usually via Claude, considering the design that model frequently uses.
And it is (300,755++ lines from Claude): https://github.com/CraigVG/roman-letters-network
Here, I am sorry, but I just cannot consider it serious nor accountable, since I just cannot trust its data.
If all the information there is valid and verified, every single letter and the authors' word after the LLM's processing, then the "AI" may be dimmed.
Yet, I don't believe so, knowing how unlimitedly every subjective word may change contexts, and using objectified and limited LLM for it?
There's `?scholarly=true` GET parameter mentioned in the `:/CLAUDE.md`, but a quick check of its behavior didn't result in any change.
Regardless, the idea and overall intention that highlights the impact and importance of history, and presents connections between infinitely unique and miraculous people around the infinite world... where every single word carries a life moment... is ineffably magnificent...
Thank you, Craig Vander Galien, for the idea and love in history!
---
```
    > Modern English translations were produced using Claude (Anthropic), working from either the Latin/Greek original or an existing 19th-century English version. Translation work was guided by two internal documents: a translation guide covering late antique epistolary conventions, rhetorical register, and how to handle common formulaic phrases; and a modern voice guide specifying tone, vocabulary level, and how to avoid archaism while remaining faithful to the original.
    > 
    > AI-generated translations are clearly marked in the interface. They are provided for accessibility and research convenience, not as authoritative scholarly translations. The original Latin or Greek is preserved alongside every translation, and 19th-century English versions are shown where available. Corrections from domain experts are welcome.
    > 
    > Source: https://romanletters.org/about/
```
[-]
- craig_vg 4 hours ago
  Serious Angel, thank you for the feedback on the project. The scholarly parameter should be working again now.
  Overall, I completely agree with your criticisms about the LLM nature of this. Yes, the project is completely coded by Claude. It's a side project that I threw together based on my love of history. I'm not an academic nor a researcher, but I do want to provide value for those who have a hobbyist-level interest in Roman history. If the project can reach the quality level required for real scholarship, I'd like to achieve that. If not, I want to be clear that it's not at that level.
  On that note, I have tried to include original sources wherever possible. Wherever an LLM does translation, it is noted in the user interface, as you also quoted from my methodology.
  Thanks again, and if you have any direct feedback or changes you'd like to see, I would love to hear it.
- yreg 1 day ago
  The design is good. It is unoriginal but not every project needs to use an original design.
  [-]
  - Igrom 1 day ago
    serious_angel is not contending with you that the design is bad, or that it is bad because it is unoriginal. In fact, they are not even specifically calling out the design.
    They have noticed the design, recognized it as the output of an LLM, then proceeded to discover that an LLM was involved in much of the creation of the project. This is an academic project. Whatever the pedigree of the researcher is, this implies to the grandparent that the final result of the work may be amateurish or worse, to an extent generated. Therefore, he's concerned that it puts the legitimacy of the research outcomes (e.g. completeness, contents of letters, classification, maybe even hallucinations in the thesis proper).
    Preemptive arguments:
    1. "The author's a researcher, not a programmer; therefore it's fine to use an LLM. It is preposterous to ask each researcher to learn web development to publish their research." You are right, but given the amount of vibe-coded websites we see, and them all having the default (Astro?) style, the grandparent all the same has the right to associate that style with untrustworthy crap. I'm not saying that this academic website is necessarily crap. However, I think it's useful for the grandparent to share their sentiment, because the researcher might not know.
    2. "A lot of pages have links to sources; you could verify the legitimacy yourself". perhaps, but doubting the veracity of research is a bad first impression, isn't it?
    It's a bit sad, because the website is non-trivial, and would have taken quite a bit of effort without an LLM. But it is difficult to separate webdev enablement with the rest of the LLM baggage.
    [-]
    - craig_vg 4 hours ago
      Thanks for the feedback on this. I'll note that I am not an academic nor a researcher. I'm a hobbyist who wanted to bring together data from different letters and read them for myself. I don't aspire to be a researcher nor publish academic work.
      The primary goal is to provide access to these letters to non-academics who would like to read what Romans were writing in language they can understand.
      I take all the points about research outcomes and the quality of the data itself. That is going to be an ongoing process to continue to improve it alongside LLMs. I have a day job, and this is just a side project, but where it can provide value, I want to lean into that side of things.
cjs_ac 1 day ago
A pity the Latin text isn't made available as well.
[-]
- amszmidt 1 day ago
  But it is. There is a link to the original text at the bottom of the translation.
- yorwba 1 day ago
  It is? Below the English translation.
  [-]
  - cjs_ac 1 day ago
    Not on all of them.
ixv0 20 hours ago
Would be helpful to mention Latin in the title