I just got back from a trip to the University of Chicago, where I was participating in the second installment of the Humanistic AI project, led by Hoyt Long and Chris Kennedy. Several things at this workshop led me to reflect on OCR, and this has been happening more and more recently.
OCR (optical character recognition) is the process of turning images into machine readable text. It's long been a foundational tool for digital humanities and cultural studies, as so much of the data in these fields is locked away in books, handwritten letters, ancient scrolls, and other physical media, and transliterating and digitizing that data requires expertise, patience, and access. People have been trying to use computers to automate OCR for a long time, and new AI models have been very successful at this task, even for handwriting.
I can't do the history and significance of OCR justice in a single post, but for a bit more context, see this 2019 post from Ryan C. Cordell on "Why You (A Humanist) Should Care About Optical Character Recognition" and this 2025 post from Dan Cohen, "The Writing is on the Wall for Handwriting Recognition," which generated some discussion on Bluesky.
All that said, until now, I've successfully avoided dealing with OCR, as it involves images (and I like text), was far away from the analysis work I've always been more interested in, and seemed like one of those fiddly research tasks whose intricacy you can get lost in forever. But things have changed recently (see Dan Cohen's post above), and moving with the overall shift of models collapsing across text/vision, I've started dipping my toes into the world of OCR.
In Chicago, we heard a wonderful presentation by Christopher Wolfram about Babylonian tablets and extracting astronomical observations from these ancient diaries. The project relies on human transliterations, and Christopher explained how difficult OCR is for these tablets, where 3D scans are essential to capture the depth and shadows of the markings. Fascinating!
We also spent a lot of time discussing humanities benchmarks, and a comment from Tess McNulty left me thinking hard about who benchmarks are for. They're often designed by computer scientists, for computer science problems. But computational humanities projects and datasets are often too niche for broad leaderboards to be informative, while what is helpful is infrastructure that helps us run our own evaluations on our own niche data.
And that led to me to think, as usual, about the work Hugging Face and Daniel van Strien have been doing to open up technical infrastructure (not just code but models, data, and compute power) specifically for humanities scholars. I feel like they're not being celebrated enough for this. Dan is a GLAM scholar turned "Machine Learning Librarian" who has been building and sharing OCR information and infrastructure for years.
Most recently, Daniel released this benchmarking system, where you can bring your own data and it will run a suite of OCR models and produce comparison results. You don't need any GPUs or infrastructure of your own! We used this for a recent project and found it easy to deploy.
I've been following Daniel's work from the beginning (as soon as I saw that HF hired a GLAM scholar, I was interested) but I've been especially interested in his OCR tooling because of (1) how many humanities scholars have approached me for OCR advice in the past year (despite again, me having very little experience with OCR) and (2) a special dataset I've been working with that has bumped up directly into the bounds of what current OCR systems can and cannot do.
All this is to say:
OCR for handwriting is still not perfect, but as Dan Cohen writes, off-the-shelf models are doing very well; I think this can be a solidly good use of AI for the humanities (as long as it's paired with scholarly expertise).
You should try out Daniel van Strien's OCR benchmarking system.
Hugging Face is great, and you can follow Daniel van Strien and Adina Yakup on Bluesky for regular updates about OCR and other technology hosted on HF, such as these new, tiny, and intriguing OCR models from Baidu.
Check out Christopher's very cool work on Babylonian tablets and astronomical patterns.