It turns out it wasn't as hard as I anticipated to start getting useful information extracted from my born-digital-for-printing-on-dead-trees dissertation. Here's a not-yet-perfect xml serialization (borrowing tags from the TEI) of "instance" information found in the diss narrative:
Each instance is a historical event (or in some cases event series) relating to boundary demarcation or dispute within the empire. Here's a comparison between the original formatting for paper and the xml.
<?xml version="1.0" encoding="UTF-8"?> <div type="instance" xml:id="INST9"> <idno type="original">INST9</idno> <head>A Negotiated Boundary between the <placeName type="ancient">Zamucci</placeName> and the <placeName type="ancient">Muduciuvi</placeName></head> <p rend="indent">Burton 2000, no. 78</p> <p>Date(s): <date>AD 86</date></p> <p type="treDisputeStatement">This boundary marker was placed in accordance with the agreement of both parties (<foreign xml:lang="la">ex conven/tione utrarumque nationum</foreign>), and therefore may be taken as evidence of a <hi rend="bold">boundary dispute</hi>.</p> <p rend="indent">This single boundary marker from coastal <placeName type="modern">Libya</placeName> provides the only evidence for the resolution of a boundary dispute between these two indigenous peoples. The date of the demarcation, as calculated from the imperial titulature, places the event in the same year as the reported ‘destruction’ of the <placeName type="ancient">Nasamones</placeName> by <placeName type="ancient">Legio III Augusta</placeName> as a consequence of a tax revolt in which tax collectors were killed.<note n="286"> Zonaras 11.19. </note> It is not clear whether the boundary action was related to the conflict, or merely took advantage of the temporary presence of the legionary legate in what ought to have been part of the proconsular province. Surviving documentation for proconsuls during the 80s AD is incomplete, and therefore we cannot say who was governing <placeName type="ancient">Africa Proconsularis </placeName>at the time of this demarcation.<note n="287"> Thomasson 1996, 45-48. </note> Neither party seems to have been related to the <placeName type="ancient">Nasamones</placeName>; rather, they are thought to be sub- tribes of the <placeName type="ancient">Macae.</placeName><note n="288">Mattingly 1994, 27-28, 32, 74, 76.. </note></p> </div>
One thing that made this a lot easier than it might of been was the way I used styles in Microsoft Word back when I created the original version of the document. Rather than just painting formatting onto my text for headings, paragraphs, strings of characters, and so forth, I created a custom "style" for each type of thing I wanted to paint (e.g., an "instance heading" or a "personal name"). I associated the desired visual formatting with each of these, but the names themselves (since the captured semantic distinctions that I was interested in) provided hooks today for writing this stuff out as sort-of TEI XML.
There's more to do, obviously, but this was a satisfying first step.