A New Life for Ancient Texts
July 14, 2015—Assistant Professor of Computational Linguistics Amir Zeldes has always had an interest in the data of language—in particular, the ability to test and verify how language works. Computational linguists specialize in the intersection of language and technology; specifically, how computers process “natural” languages, meaning those spoken by human beings.
“Think about how a computer sees language,” Zeldes said. “When you’re saying things in English, or whatever language you’re speaking, how would you expect a computer to understand that?”
Expertise in this field is what led Zeldes to serve as the co-director of Coptic SCRIPTORIUM, an interdisciplinary, collaborative project that focuses on digitizing and sharing texts written in the ancient Egyptian language of Coptic. A direct descendent of hieroglyphics, Coptic enjoyed a heyday between the second and tenth centuries before being replaced by Arabic. Many texts from the earliest periods of Christianity (including the Bible) were written in or translated to Coptic.
Today, the study of these texts is extremely valuable to religious scholars, historians, and linguists, among others. Access to such texts, however, has been relatively limited—something that Zeldes and his co-director, Caroline Schroeder (associate professor of religious and classical studies at the University of the Pacific), hope to change.
The two first met in 2012, when Schroeder attended Zeldes’ course during a summer school session at Tufts University. At the time, Schroeder was already working with Coptic. She and Zeldes soon realized that, as partners, they could vastly improve access to and study of Coptic texts.
“She had the expertise in Coptic studies and I had expertise in computational linguistics techniques that hadn’t yet been applied to that language,” Zeldes explained. “It just seemed like something that could really be done now.”
After receiving an initial grant from the National Endowment for the Humanities (NEH), Zeldes and Schroeder focused on accumulating and coding materials, as well as the process of segmenting words into their constituent parts and determining their parts of speech.
“Coptic has a rather challenging system of multiple segments. Like a lot of languages from the Middle East, the things you end up writing together with spaces between them are not exactly what you would call an individual word—you need to split that up even smaller, and you can have a computer program help you with that,” Zeldes explained.
With its initial funding, SCRIPTORIUM also built a Coptic part of speech tagger that automatically determined which category a word belongs to, such as noun, verb, or preposition, among many others.
In May 2015, the project received a second NEH grant that will provide funding for two years, titled “KELLIA” (Koptische/Coptic Electronic Language and Literature International Alliance). The $192,500 grant is one of six nationally awarded by the NEH/DFG Bilateral Digital Humanities Program.
KELLIA will support improved international coordination of Coptic projects through a collaboration between Coptic SCRIPTORIUM and other partners, including Germany’s University of Göttingen and the University of Münster. Funds from the grant will support efforts in gathering, annotating, sharing, and editing Coptic texts.
On the technical side, one of the challenges is taking existing tools that were designed for English or other mainstream languages and making them work for Coptic.
“What I’d really like to be able to do is understand what makes Coptic difficult in specific ways, and then, by reusing tools that are already available, not make Coptic an exception, but make the rules work for it. As soon as you do that, there’s a flood of other tools that become useable for you,” said Zeldes.
Once published, the project’s website will offer various ways of viewing and interacting with its Coptic texts, including a normalized view with an option to view a translation. Users will also be able to see what a piece of text looked like in the original written manuscript—how it was laid out (columns and lines) as well various colors of ink. For those interested in linguistic analysis, there will also be a view that offers part of speech analysis.
For Zeldes, the importance of making these texts available goes beyond the technical and linguistic opportunities. The KELLIA project takes its name from an area in the Egyptian desert where monks lived alongside one another in cells—known as “kellia”—as opposed to living in isolation.
“You were supposed to stay in your cell as if you were on a mountain alone, but they did it in a community,” Zeldes explained. “When Christianity started out, there wasn’t the idea that people should live together for the purpose of worship—the idea of a monastery that we now take for granted developed in Egypt in this period. And if you’re interested in learning how that came about, then these texts are what you need.”
For news and updates about Coptic SCRIPTORIUM, visit the project’s blog.