How we store, provide, interchange and consume textual content in the digital age is fundamentally shifting. For presenting digital texts possibilities are endless, which means that in fact we are about to read documents that appear in a variety like before: with different features in layout, rendering and all the features we may add to digital texts like hyper-references, tooltips and other event-based additions. All features of the presentation layer are implementation-specific, even if a superior design guide is applied.
This document aims to describe how we may interchange textual resources and their additional features no matter if born-digital or transcribed.
Of course this specification is not a self-pleasing at all. Its archetype is IIIF, it will implement JSON-LD and reaches out to Open Annotation. While IIIF allows links from the image object to a corresponding textual source, this mechanism fails when dealing with texts not derived from a source, representing a reassembled version or a born-digital text.
What we define as a text
Well, we will keep this short and assume that any sequence of characters may form a text. This sequence is subject for further annotations represented as layers. Neither this specification nor any implementation SHALL try to answer this question and SHOULD be agnostic in this way.
Why we do not rely on TEI
That is quite easy to answer: because of its overhead. And for the same reasons why TEI will never become an ISO standard. TEI provides «guidelines» to describe text. One MAY follow the guide and others MAY apply a different application profile for the same features. It offers diverse implementations of how to encode a reference to an external entity. Approaches trying to constraining the schema, like TEI Simple did, ended up in a schema more or less exclusively for printed texts and is not applicable to handwritten material.
TEI documents are now part of larger software and
not always usable standalone. We want to present a variety of documents in a
single viewer: a application that needs both: the html (or any other appropriate)
representation of the
text and parts of the
teiHeader. TextAPI combines this in a standardized way
by defining mandatory metadata fields and putting them in a webapplication-friendly
format. Also we add the benefit of a standardized REST-API in a declarative way:
we prepare a recommendation of REST-paths to put an interoperable topping to
your chocolatey TEI-cake.
The TextAPI is completely agnostic to the TEI but most projects are able to transform
their TEI data to comply to. Many information from the
teiHeader can be used and
the main artifact – the representation of the text in a version provided by the
editor – will be present in html most often. To use TextAPI the technicians
simply have to serialize the metadata into the objects and prepare endpoints to load
the specific resources. Easy.
We reference the W3C data model for providing web annotations. This way, we can omit complex and maybe interactive content in the XHTML fragments and move parts of the markup – especially that is not intended to be used for formatting the output, e.g. semantic information. It is a flexible framework for providing further information to the text and also separates text and annotation to enable the client to find a unique way to deal with additional informations.