As a followup to my recent Varia post, I’d like to explain two programs that I used recently in my Textual Criticism class: Juxta and CTE.  To do so, I’ll run through how the final product came together from start to finish.  Our goals were traditional: we wanted to use Lachmanian methods to create a stemma and establish the archetypal text, to the degree possible.  

The first part of preparing an edition, of course, is to choose a text, and then to acquire images of as many of the manuscripts as possible.  This requires reading through any prior literature about the text, but also includes combing through manuscript catalogs to determine which, if any, mss contain your text.  Digital catalogs are thankfully making this process much easier (V. e. g. the marvelously helpful website Pinakes: http://pinakes.irht.cnrs.fr/).  This task is still a chore, though.  Thankfully my professor, Dr. Mantello, had already done this work for us.  He had both selected the text (a sermon of Bishop Robert Grosseteste on clerical orders), and obtained PDF copies of all of the relevant mss.  The mss came to 13 in total.  One ms’s text was partial, and another two were partially illegible, either due to poor imaging, or fire damage.  There were six students in the class, so Prof. Mantello split the sermon into 3 sections.  Each pair were responsible for a third of the text (my section came out to about 1400 words).  

The next order of business was to prepare collations: that is, to determine where the mss varied from one another.  This is where I found Juxta helpful.  Juxta allows one to compare 2+ transcriptions of a given text very easily.  Unfortunately, perhaps, this requires full-text transcriptions of each ms.  This can take a lot of time, especially with 13 mss.  Some texts, of course, have dozens, or even hundreds of manuscripts, and most texts will be much longer than the small 1400 word section of our sermon.  That said, preparing accurate transcriptions of 13 mss took me only a 2-3 months, and I was also working on plenty of other stuff in the meantime.  For those with longer texts, doing a smaller chunk (say about 1,500  words) from one part of the text will generally allow one to highlight the most important mss without having to transcribe every single mss in toto.  

Now, regarding transcriptions: In an ideal world, one would have at least two people making transcriptions of the same ms.  This allows one to compare the two transcriptions at the end to highlight trouble spots and to eliminate typos and other errors.  As my teammate chose to to a manual collation, this option wasn’t available, so I made do in other ways (her manual collations were invaluable later in the process, however).  Once I had transcriptions of two different mss, I normalized the orthography [1] then compared these two transcriptions two one another.  At each difference, I checked the mss to ensure that my transcriptions were correct.  At the end of this process, I had two fairly accurate transcriptions which I then used to correct the rest of my transcriptions as I finished them.  This is by far the most tedious part.  Even after I had ferreted out most of the problems in my initial pass, I still found myself consistently returning later to the mss to check particular readings (and often found that transcriptions still contained errors).  Unfortunately, I also took the longer approach of typing each new transcription from scratch.  It occurred to me later, through reading a paper of Tara Andrews, that it’s much faster to modify an existing transcription to fit a new ms instead of starting from scratch.  In any case, accurate transcriptions are a necessity for any further work.  This stage, though often tedious and monotonous, is extremely important.  Juxta (or another comparison tool) is quite useful even at this stage, since highlighting the differences between transcriptions will often highlight errors in your transcriptions.

After transcribing, one can then proceed to examining the differences between mss.  Juxta is helpful here.  Here’s a screenshot:

Screen Shot 2014 06 07 at 11 57 45 AM

Right now, I’m using the ms K as my “base text.” Areas with darker highlighting indicate that a larger number of mss have a variant reading at a certain point.  In this case, there’s an important omission shared by 8 mss at the beginning of our section (running from collocantur existentes … ecclesiastice hierarchie).  Clicking on the dark text will show what the variant mss read:

Screen Shot 2014 06 07 at 12 01 46 PM 

Unfortunately, Juxta is not smart enough to determine group similar readings together.  In this case, N O R Rf all have the exact same omissions.  R6 has the omission too, but inserts an et to try and make the resulting text make more sense.  Ideally, Juxta would group all of the readings together (perhaps it will in the future, or perhaps I’ll create my own version that does that: it’s free and opensource after all!).  It still, however, provides a useful overview of the tradition at any given point.  Here’s a less complicated example:

Screen Shot 2014 06 07 at 1 26 06 PM

This shows that 4 mss have the text in ecclesia or in ecclesiam.  As these four mss have a number of other shared readings that are unique to them, it’s clear that they belong to a family.  After further analysis, it becomes clear that this in an addition that doesn’t belong in the archetypal text.  If you’d like a file to test with, I’ve uploaded a test file with a selection of manuscripts.

Using Juxta, I was able to determine work out a provisional stemma of the 13 mss.  Traditional Lachmannian methods worked pretty well.  There were a number of omissions and other agreements in error that allowed us to group the mss into families and then into a stemma.  Furthermore, our examination of the internal evidence (the text) corresponded fairly well with the relationships that Thomson[2] had posited based on external criteria (like dates, and the number and order of the sermons contained in the mss).  My initial stemma required some reworking, both because of errors in my transcriptions (that my partner thankfully discovered) and because the place of one ms wasn’t clear when looking only at our sections.  Incorporating data from the other sections allowed us to place that ms with more confidence.  

The final step was to incorporate all of this information into a critical edition, replete with critical apparatus and source apparatus.  The information for the apparatus of sources was more straightforward.  Prof. Mantello had helped us track down the important sources.  Creating the critical apparatus naturally required us to decide what the original text was.  The stemma made this straightforward in most cases.  In a few cases, the better attested reading was less satisfactory on internal grounds.  In a few places, I chose a poorly attested reading, or even ventured a few emendations (though for most of them, I failed to convince Prof. Mantello).  When examining trouble spots, the electronic Grosseteste was immensely helpful.  It allowed me to check a particular construction across a wide swathe of Grosseteste’s corpus.  

I used the Classical Text Editor (CTE) to assemble my final product.  The CTE is quite a powerful tool.  It has the ability create a wide variety of critical editions.  Ours was a fairly simple text+notes+apparatus, but one can also add further apparatuses, or even add parallel texts/translations.  There are a few downsides.  First, the program is quite expensive (to the tune of several hundred USD, though there is a free trial that is fully functional except for the ability to generate non-watermarked output).  Second, the program is difficult to use if you don’t have someone to show you the basics.  I have a computer science degree, and found myself frequently frustrated at first.  That said, the basics aren’t difficult once you’ve been shown how the program works.  I gave a presentation for my classmates, and everyone decided to use it for their text.  Only one other student in the class had a technical background, but everyone was able to use the program to assemble their text.  

And I must say, the output is pretty sharp.  The only other means I know to create something comparable is LaTeX, and that requires quite a bit more technical knowledge than needed for the CTE.  (It was LaTeX, for instance, that I used to create my text and translation of Origen’s 3rd homily on Ps. 76)  As an example of CTE output, here’s the first page of our final text: InLibroNumerorum_mapoulos_excerpt.pdf.  If anyone knows of CTE tutorials (besides the help files), I’d love to know about them.  Sometime soon I’ll post some basic walkthroughs that I created for my classmates.  

I should say that there are a number of useful tools that I’ve not mentioned here.  Our final goal for this project remained a printed text.  Things look differently if web-publication is in view (the CTE does support TEI output, but I’ve not tested it to see how it works).  Also, there’s much work being done in the field of digital stemmatology.  Tools like stemmaweb allow one to use a number of different algorithms to create a stemma digitally.  Variant graphs, for instance, look like a useful way to look at the tradition. I don’t read Armenian, but I’m very impressed by the technical aspects of Tara Andrew’s digital edition of Matthew of Edessa.  Her academia.edu page is well worth a look if you’re interested in digital editions.  

Do apprise me of anything important I’ve omitted in the comments, particularly if you’ve advice on better ways to approach the task.

ἐν αὐτῷ,
ΜΑΘΠ 

[1] Normalizing the orthography is an important step as orthographic variants usually aren’t important for distinguishing the relationships between mss.  I kept my original transcriptions, which followed the orthography of the mss, but did most of my analysis on the basis of the normalized files.  
[2] Thomson, H., The Writings of Robert Grosseteste (Cambridge 1940)

Advertisements

Yesterday, I received news that my abstract had been accepted for the “Preaching After Easter” conference which will take place in March, 2013 in Leuven.  The title of the abstract is “For those who love learning,” Gregory of Nazianzus on the Miracle of Pentecost.  It will essentially be a more detailed write-up of the passage I’ve examined here and here from Gregory’s Or. 41 on Pentecost.  I’d like to publicly thank Charles Sullivan, through whom I became interested in the passage, and whose dialogue has been extremely helpful in sorting out the intricacies of Gregory’s argument and its later reception.  I’m particularly curious about the philosophical background he may be pulling in, and also the way he weaves different scriptural passages together.  I think it’ll be fun to do a paper that’s not, strictly speaking, “digital humanities.”  

But back in the “digital” domain, I’ve submitted an abstract for the meeting of the North American Patristics Society next May.  The paper will essentially be an digital authorship analysis of as much as I can transcribe from the recently discovered Origen codex. I hope to show that stylometric analyses support an attribution of the homilies codex to Origen, and I’d also like to examine the stylometric differences within the codex.  Hopefully it’ll be accepted!  I’ve yet to attend a NAPS conference, but I’ve heard good things.

ἐν αὐτῷ,

ΜΑΘΠ

I haven’t often mentioned my interest in things digital on this blog, but earlier this year, I was fortunate to attend a workshop in Belgium entitled, “Means and Methods for the Digital Analysis of Ancient and Medieval Texts and Manuscripts.”  I got to hear a variety of interesting papers and debates, all while enjoying terrific hospitality.  One of the happy consequences of this visit was that I met several people working on “digital humanities”[1] type projects.  One of my great interests as a budding text-critic is in digital stemmatology.  The question essentially boils down to: how can we use digital/statistical methods to help us map the history of a text’s transmission.  Ideally, the end result is a stemma, or family tree, detailing the copying history of the extant manuscripts.  This is helpful either for traditional philology (establishing the archetypal text), or for those interested in reception history.  Tara Andrews, whom I was fortunate to meet in Leuven, recently wrote a blog post which captures the history and status quaestionis quite well, here. All of this makes me wish I was in Hamburg this week at the Digital Humanities 2012 conference.  There are a number of interesting abstracts listed here.

As a Computer Science undergraduate turned (soon-to-be) Greek and Latin graduate student, I’m naturally very interested in how computers can help us study ancient texts.  Two areas, in particular, hold my interest right now: digital stemmatology and digital stylometry.

Stemmatology I mentioned earlier: I’m attempting to apply these sorts of methods to the Palaea Historica, a 10th century Byzantine Greek retelling of the Old Testament.  One of my professors at NC State is working on a critical edition, and so I hope to put these stemmatological methods to good use.  Time will tell if I’m successful, but I’ll be presenting a paper in Nov. so I’ll definitely have something to say then!

Digital Stylometry is a more recent interest of mine.  The most common application is authorship attribution: can we somehow quantify style and then use that measure to compare different texts?  Perhaps the most common application is authorship attribution.  If the methods develop well enough, this might, for instance, help us sort out anonymous catenae fragments, or anonymous homilies like the ones in the recently discovered CMB 314 (which we’re pretty sure, at least currently, belong to Origen). 

[1]  I still find this phrase frustratingly vague (I’m interested in a narrower type of research), but I employ it nevertheless.

ἐν αὐτῷ,

ΜΑΘΠ