
Digital Editions, Start to Finish: Introduction
Table of Contents
- About Digital Editing
- Why Learn TEI?
- How to Use this Textbook and System
- Alternative methods
- Working in Code
- Further Reading
- List of Digital Resources (discussed here)
- Notes
About Digital Editing
Why Learn to encode in TEI / XML?
In publishing texts digitally, why not use WordPress, or the wiki-like language of Markdown, especially given that it is so much simpler and faster to do so?
- WordPress documents are created in the WordPress software. They can be exported as
- Markdown language is a presentation language, not description. It uses arbitrarily assigned symbols to mark content, whereas TEI is called "semantic markup" because the TEI tags describe the content of what's inside: in Markdown, you can say, "italicize this"; in TEI, you can say "this is the title of a book," "a journal, or "an article."
- Because TEI documents are semantically enriched with information about both the content and the form of the content, I believe that they will be most valuable to researchers of the future, even if everything else about the edition's display is lost to the obsolescence of software and code.
- We won't ALWAYS be reading digital editions in web browsers. New software will be developed for, say, reading digital editions by inserting a chip in our brains (à la William Gibson's Neuromancer)
- I have had experience transforming documents from information-poor environments to those that are information rich, using semantic code -- and it is not only extremely difficult, but also the results are unpredictable. For example, italics can be used for many different things, and a simple code for italics won't tell me why in any given instance an item is being italicized. Or, to give another example, it could be that authors always appear in <h2> tags in a corpus of web pages, or set off by underlines or two hastags in Markdown, but those codes are used for many other things besides authors, and NOTHING about those symbols by themselves tells me which items encoded in this way are authors' names.
- I believe that our best sustainability plan is coding scholarship in TEI: so many people are using it -- there is a whole community and a listserv* for consulting experts. It isn't simply that TEI-encoded documents can be easily transformed into HTML files, currently the best preserved digital files (via Internet Archive's Wayback Machine*, and some even stored by the Library of Congress). But if programmers of the future, creating those book-brain-chips find it much easier to do so with TEI/XML, that will happen first. In a sense, editors using TEI take on some difficulty that would otherwise fall to programmers; our work will most like be transformed into whatever code is used to create future software if our code provides programmers with the path of least resistance, just as water flows downhill.
But there are hurdles for encoding digital editions in TEI:
- Learning TEI code* can be difficult at first; it is definitely harder than learning Markdown.
- The programming used for transforming these coded documents into HTML (web pages) that are styled with CSS (cascading stylesheets) is XSLT (eXtensible Stylesheet Language Transformations), and that code is quite difficult to learn.15
The goal of this textbook is to reduce the complexity of creating archival-quality digital editions by:
- Providing detailed instructions about how to buy a URL and publicly viewable web space and how to buy, download, and use the oXygen XML editor.
- Providing instructions for TEI coding that limits the number of choices one would have to make in selecting tags, attributes, and values;16
- Providing all the programming for creating and displaying the digital edition out of TEI-encoded documents. You can use the system without learning how it works.17
The most important part of the Digital Edition is the TEI: because it is in human-readable code, it preserves all the information comprising the edition. Wherever the TEI encoded document washes up on the shore, future programmers will know what to do with it, even if they have never seen such a thing before.18
How this textbook works:
As described in the main page for the Introduction section, this textbook is usable by focussing only on the Setup and TEI sections, and, again, the TEI document is the most important piece of your digital edition.
Later, if you want to alter how your digital edition looks online, you can learn the presentation codes by focussing on the HTML and CSS sections. All the code provided can be altered or replaced.
And finally, if you want to learn how to alter and/or write your own XSLT code that transforms TEI into HTML, the XPath and XSLT sections are most important.
Alternatives
As has been made clear by the founders of the "minimal DH" movement, complexity is involved in every system for digital publishing, whether it be WordPress, Markdown, or TEI-encoded digital editions. But in each case these complexities differ in kind:
- For WordPress, the complexity lies with the System Administrator of the server hosting your WordPress instance.
- For Markdown, it is in the software that transforms Markdown code into web pages.
- For encoding documents in TEI, and creating your own "software" to transform
them into web (HTML) pages, Endings Project*, the complexity is in the hands of
the producer of digital editions, us, in two ways:
- learning the TEI tagset and rules;
- creating the programs that translate TEI documents into web pages.
Use whatever you need to use, for whatever reason, at the moment that you begin digitizing documents. Regardless of my preference for taking on complexity as a scholar, I recognize the wisdom of Minimal Computing movement in helping people who lack technological support to publish digital editions. Roopika Risam and Alex Gil suggest think about what you actually have at hand in designing your digital project:
Dr. Gil teaches minimal computing techniques and has also written two important programs for creating digital editions in Markdown. Using Jekyll*, a complete content management system that works with Markdown code, Dr. Gil's "Ed*" software provides an interface specifically for digital editions. Dr. Gil has also created "no-connect*," software that allows creating a digital edition without an Internet connection.20
The difficulty of learning TEI is mitigated by using oXygen*: think of the oXygen XML editor as Microsoft Word for XML coding. oXygen Tutorials are located in both the Setup and TEI/XML sections of this textbook.
But there are other, great ways to learn TEI, especially
- LEAF-Writer* (which helps you both learn TEI and add encode entities such as authors, events, and places)
- Tutorials at the Women Writer's Project*
- The Programming Historian*
- The Digital Humanities Summer Institute* at the University of Victoria
The difficulty of learning programming code such as XSLT is mitigated by using the DigEd System offered here. This textbook resembles TEIBoilerplate*, with the addition here of explanations about and instructions in how the code and programming works. There are other ways to publish TEI-encoded documents:
- TEI Publisher* is an excellent publication system; it may require having or setting up an eXist Database in your web space (using Docker), and so, if you have the System Administrator help to do so, it is an excellent choice. Explore more about TEI Publisher at e-editiones*.
- Your TEI documents can be published using TAPAS*, the TEI Archiving, Publishing, and Access Service.
Working in Code
Software created by practitioners in the field of Digital Humanities can be quite buggy. Please understand: every ounce of patience you donate proves your resistance to big tech companies; using what your colleagues create gives us, and students, a future of alternative, not-for-profit digital platforms. The kindest gift you can give to a developer or someone like me is an email or slack message describing a bug!!21
Creating cultural textual data is a way of courting complexity. An XSLT Tutorial offered by King's College London gave students this caveat: we will show you how to do something in XSLT, and, the third time we run it, it will work. Yes, it means that the XSLT did NOT do what it was supposed to do during the first two runs, the first two attempts to test the code by running it on a document. Every document is so distinctive, and sometimes it seems that every single new one demands customization.
All of which is to say that you have to be patient with your own learning process as well as with the people and tools trying to help you.
It might be easier to be patient by accepting in advance: learning to code and run programs is pretty much "troubleshooting." It's rare for anyone to create any digital resource and get everything right the first time. There is a constant push and pull between the technological realities and the desire to represent a cultural artefact in a certain way, giving those of us who are creating digital editions the opportunity to question our own representational biases: do imagined user needs arise from all of us having a "print hangover"? If digital environments can't replicate certain features of a printed book for example, are those features meaningful in specific ways? What features are offered digitally that are not available in print?
I write in praise of troubleshooting: think of it as doing Wordles or Sudoku. Any sociologist or humanist who has had to learn citation systems for publishing articles and books can learn to create a state-of-the-art digital edition. When you write in Word, you click on an "i" icon to make a title into italics; when coding in TEI, instead of clicking, you type <title>, and oXygen helps you by both opening and closing the tag and offering drop-down lists of possible attributes and values. It takes longer, but it is not more difficult.
There is a recipe for troubleshooting as you create your digital edition that I follow religiously:
- When something isn't working, go back through the set-up steps to make sure everything is where and as it should be
- remembering that almost all problems are XPath problems; and/or problems related to file structure;
- Whether creating a new TEI file, editing an old one, adjusting the CSS styling, or transforming your TEI into HTML, make sure that whatever you have changed or made using oXygen has been saved to the file being modified. That is, in your finder or explorer windows, click on the file to see when it was "last modified" and make sure that the date and time are correct.
- Because you are dealing with so many variables (Is the TEI properly encoded? Is the XSLT actually running? Is something in the CSS file inhibiting a desired change?), troubleshooting sometimes requires other eyes. Walk away until you can get help. Slack* or email me at mandell@tamu.edu with the subject line "Digital Editions": I am happy to help troubleshoot!
Further Reading
Resources Mentioned in the Introduction and Notes
- Eve and Gray, Reassembling Scholarly
Communications, open access
- Manifold Press
- Whitney Trettien, Cut/Copy/Paste, open access
- Whitney Trettien, The Little Gidding Harmonies
- Whitney Trettien, Susanna [Ferrar] Collet's Commonplace
Book
- Whitney Trettien, Collet Commonplace Book database
- Whitney Trettien, Manicule software, open-source
- Stéfan Sinclair
- Voyant-Tools.org
- Rockwell and Sinclair, Hermeneutica "Interludes," open access
- Hermeneutica, printed book
- Spyral
- Python class
- JSTOR's Constellate
- Humanities Data Analysis: Case Studies with Python
- Ted Underwood
- The Stone and the Shell
- Katherine Bode
- WordPress XML, "WXR"
- TEI listserv
- Wayback Machine
- TEI coding guidelines -- elements
- Markdown software
- The Endings Project
- Introduction to the Minimal Computing Issue
- Digital Humanities Computing (DHQ
- Jekyll
- Alex Gil, open-source software "Ed"
- Alex Gil, open-source software "no-connect"
- Alex Gil (as "Professor Hacker"), "How (and Why) to
Generate a StaticWebsite Using Jekyll, Part 1, Chronicle of Higher Education
Blog (free with signup)
- oXygen
- LEAF-Writer Commons
- The Women Writer's Project
- Programming Historian
- Digital Humanities Summer Institute
- TEIBoilerplate
- TEI Publisher
- e-editiones
- TAPAS
- Slack Digital Editions Channel
Notes
1. Cambridge, MA: MIT Press, 2013. Back
2. Paratext and Genre System: A Response to Franco Moretti," Critical Inquiry 6 (Autumn 2009): p. 168. Back
, "3. Breaking the Book: Print Humanities in the Digital Age, Malden, MA: Wiley Blackwell, 2015, p. 157. Back
,4. What's the Matter with Computational Literary Studies," Critical Inquiry" 49.4 (Summer 2023): 507-529, p. 515. Back | Back 2
, "5. For example, see Toward Linked Open Data for Latin America," in Reassembling Scholarly Communications: Histories, Infrastructures, and Global Politics of Open Access, ed. Martin Paul Eve and Jonathan Gray, Cambridge, MA: MIT Press, 2020; also available open access*. Back
, , "6. Becoming Beside Ourselves: The Alphabet, Ghosts, and Distributed Human Being, Durham, NC: Duke Univ. Press, 2008. Back
starts with the alphabet:7. Cut/Copy/Paste: Fragments from the History of Bookwork (Minneapolis: University of Minnesota Press, 2021). The printed book itself is lovely to read, but, Dr. Trettien also undertakes her own "bookwork"--which is to say, modding of the publication environment, by making the best use I have seen so far of the University of Minnesota's experimental, digital publishing platform, Manifold Press*. Reading Cut/Copy/Paste online* gives access to datasets visible in spreadsheets, explanatory videos, web publications, and more that offer readers the opportunity to follow the author's research and thinking tragjectory, making it very much an exciting path. Back | Back 2
,8. Judith Butler, Erving Goffman Back
10. Hermeneutica*Cambridge, MA: MIT Press, 2016 Back
11. The Python class* offered by Bryan Tarpley and the excellent classes offered through JSTOR's Constellate* teach beginning to intermediate Python, which can then be used for a particular project with the help of the excellent book by , , and , Humanities Data Analysis: Case Studies with Python* (Princeton, NJ: Princeton University Press, 2021). Back
12. See note above. Back
13. I haven't yet investigated the WXR* XML, but, as far as I can tell, there are no actively maintained TEI Plugins for transforming TEI into WordPress's own WXR and vice versa. Back
14. While it could be that the pdf format will always be used, the format is proprietary, owned by Adobe, and, it is fundamental in threatening the preservation of cultural heritage materials through creating a "dark archive." Back
15. The Markdown software* that generates web pages from Markdown code is written in PERL and PHP programming languages, but you typically would not need to change the software itself except for updating the code as systems evolve. Back
16. These choices can be easily changed should you modify the programming offered here or develop your own transforms. See how to modify a large corpus of TEI documents all at once. Back
17. The only problem with the simplest approach is that things do go wrong. I provide the Slack Channel* so that I can help at those moments. Back
18. I am alluding here to Against Theory: Literary Studies and the New Pragmatism, W. J. T. Mitchell, Chicago: Univ. of Chicago Press, 1985. Back
, ,19. Introduction: The Questions of Minimal Computing," DHQ* 16.2 2022, available online* Back
, "20. Two articles by How (and Why) to Generate a Static Website Using Jekyll, Part 1" and "Part II," in The Chronicle of Higher Education blog* (which can be accessed for free), August and September 2015. Back
offer lessons in minimal computing: "21. Slack* or email me at mandell@tamu.edu with the subject line "Digital Editions," and add to your CV "Code Review for Digital Editions, Start to Finish," https://www.diged.org/DigitalEditions/reviewers.html. Back