More about @rend

Lou Burnard has provided a technical summary of some of the recently issues discussed concerning @rend, but I thought I might provide some more explanation for those not as familiar with the technical background to the discussion. I would have done so sooner but was driving around too narrow farm roads in Cornwall on holiday without much reception on my phone. What follows are my own opinions and interpretations of the TEI Guidelines which are continually evolving based on community consensus.

The @rend attribute

The TEI provides a @rend attribute which indicates an interpretation of how the element in question was rendered or presented in the source text. It has nothing to say about what should be done with the element in any particular output from processing or displaying the TEI text. The assumption that many people make is that processing TEI means outputting HTML designed to help you read the text, but this is certainly not necessarily the case. The TEI text might have any number of outputs, just for reading it might be HTML, ePub, PDF, DOCX, and many more, moreover those encoding the texts might not be intending to read it but process it for other forms of text analysis in any number of formats. While individual projects can provide project documentation on how they intend certain elements to be presented in particular forms of output, other people processing those texts could choose to do something completely different.

@type and @rend values and their whitespace

During a TEI-L discussion concerning why the @type attribute did not allow spaces it was explained that this is because the @type attribute does not contain free text, but a special token that categorises the element in some way. Moreover, the recommended practice is for projects to customise the TEI to constrain the choices available for the value of the @type attribute on some elements and document in their customisation exactly what those special tokens mean. @type attribute values are a datatype of data.enumerated which means that they are “expressed as a single XML name taken from a list of documented possibilities”. That means that this value has to obey the rules of what it means to be an XML name, and it should be from a set list that the project has documented (preferably in its TEI customisation, but possibly just in prose documentation preserved with the TEI file). Most elements that have a @type attribute get it from claiming membership in the att.typed attribute class, and if a secondary type classification is allowed they also get @subtype.

The discussion moved on (possibly because I referenced my earlier post on @rend) to the difference with the @rend attribute and using CSS inside it. However, with the @rend attribute though the situation is slightly more confusing. It allows 1 to infinity occurrences of the datatype data.word in it. A data.word datatype “defines the range of attribute values expressed as a single word or token.” As I’ve discussed elsewhere, this means if someone marks up a text using:

<hi rend="It looks a bit like that other one">text</hi>

This actually has 8 tokens “It”, “looks”, “a”, “bit”, “like”, “that”, “other”, “one”. The point is that the whitespace between these words in the attribute make these each separate values or tokens, not a phrase. The encoder might just have written:

<hi rend="big bold beautiful">text</hi>

or indeed

<hi rend="largeStyle42">text</hi>

The data.word datatype says that “Attributes using this datatype must contain a single ‘word’ which contains only letters, digits, punctuation characters, or symbols: thus it cannot include whitespace.”

Some encoders believe that the TEI should reverse its decision on free text in attributes and allow @rend to contain “It looks like that other one” and this not to be a set of discrete tokens. Personally, I disagree and feel that would be a retrograde step.

@rend values and their order

Other than defining it as a set of data.word occurrences the TEI does not dictate what the @rend values should look like. In my opinion it would be wrong if the TEI try to codify all the possible rendition values that appear in every sort of text. Moreover, describing the way something appears in a text is always an interpretative process and two separate encoders looking at the same text, or looking at it for different reasons, might perceive it in very different ways. In fact the Guidelines explicitly say:

“These Guidelines make no binding recommendations for the values of the rend attribute; the characteristics of visual presentation vary too much from text to text and the decision to record or ignore individual characteristics varies too much from project to project.” (

Some encoders believe that it is a shame that the TEI has not defined a syntax by which they should specify the @rend attribute values. I disagree because I feel that the greatest flexibility should be given to projects and sub-communities to customise and constrain such values for themselves. It could be argued that the TEI has indeed provided a syntax, but in a very general way, that these are whitespace separated tokens containing only letters, digits, punctuation characters or symbols. The point is that these are entirely meant to be intended as magic tokens that individual projects can decide for the meaning for their own use (and document). If I put in the magic token ‘bold’ it might mean in my project something different than it means in yours.

It came out in the TEI-L discussion that some encoders believe that the order of @rend values provided should be important, as if they are making a phrase. Others tend to put the most important rendition classification first, and still others always provide different types of classification in the same order. I find these all prone to human inconsistency and so I choose to believe that they are an unordered set of values that could be entered in any order. i.e. that:

<hi rend="big bold beautiful">text</hi>

should be understood to be semantically equivalent to:

<hi rend="beautiful big bold">text</hi>

My beliefs here are, perhaps unduly, influenced by long and painful experience in processing hand-encoded texts (which also influences my beliefs on the value of automatic and semi-automatic up-converting markup). In my encoding projects I recommend that no special significance be granted based on the order of the tokens present in the @rend value. The TEI, I think sensibly, allows individual projects to do what they want but does specify that these are individual tokens.

Some projects decide to put various standard presentation-description formats, e.g. Cascading StyleSheets, into the @rend attribute. I personally feel that this is misguided and sloppy. Partly this is because I suspect that some of them are actually encoding for a particular output format (rather than documenting what the original source looked like) and this is the wrong place to store this information. Partly this is because such presentation-description formats often use significant whitespace (which then means an abuse of the data.word datatype). And partly this is because I feel there is a better and easier way to do this more consistently using the @rendition attribute.

@rendition and <rendition> really aren’t extreme

As with many other things in the TEI, the Guidelines provide a simple use-case (@rend’s magic tokens) and a more complex system (@rendition). The @rendition attribute allows you to point to a <rendition> element up in the header where you can use any form of free text to describe how this was rendered in the original source. This means that instead of putting a set of magic tokens or classifications like “largeStyle42” an encoder can completely transparently point to a fuller description using the standard URI fragment pointing mechanism that is common throughout the TEI recommendations. Thus instead of writing:

<hi rend="largeStyle42">text</hi>

And having it documented somewhere what this meant. The encoder can point to a <rendition> element by its @xml:id attribute and have a fuller description there. For example this could be:

<hi rendition="#largeStyle42">text</hi>

and while that doesn’t look much different the URL fragment ‘#largeStyle42’ points to a place inside the TEI file’s <teiHeader> (specifically inside the <tagsDecl> element) where there is a better description:

<rendition scheme="free" xml:id="largeStyle42">

This text is really big, bold, and beautiful


Okay, admittedly that might not be a very useful description. But the point with the ‘free’ scheme is that it is free text. It can be any prose, in any language, and way of describing it. The @scheme attribute also allows for ‘css’ for those people wishing to use cascading stylesheet language, ‘xslfo’ for those wanting to use extensible stylesheet language formatting objects, and ‘other’ for those using another set rendition description language. So ‘#largeStyle42’ could point to something using CSS that looked like:

<rendition scheme="css" xml:id="largeStyle42">
font-size: 75pt;
font-family:"brushstroke", fantasy;

If a more precise description (in whatever language) is able to be provided for ‘largeStyle42’, then this can be changed at a later date. Equally this could be broken up into multiple <rendition> elements and you can have:

<rendition scheme="css" xml:id="bold">font-weight:bold;</rendition>
<rendition scheme="css" xml:id="big">font-size:75pt;</rendition>
<rendition scheme="css" xml:id="beautiful">font-family:"brushstroke", fantasy;</rendition>
<rendition scheme="css" xml:id="oxBlue">color:#002147;</rendition>

and in the text:

<hi rendition="#big #oxBlue #bold #beautiful">text</hi>

Moreover, because @rendition is one of the TEI’s many pointing elements it does not need to point to a <rendition> element in the very same file! Instead a project could centralise all their rendition information to a single place. So that might look like:

<hi rendition="renditionFile.xml#largeStyle42">text</hi>

or indeed


Some encoders feel that pointing to a <rendition> element is a lot harder than just sticking some tokens into the @rend attribute. Others argue that as part of the process of hand encoding users should be able to add whatever they want to @rend, and for this to be valid because rationalising these in advance is more difficult than doing so afterwards. Or indeed that it is more convenient to encode unusual variants ‘in-line’ rather than pointing back to the header. Both of these are good points, and have some truth to them. In the first case, it depends on the level of specification needed. Most encoders in my experience use very general and imprecise @rend categorisations. That is, they could have a rend value of ‘big72pt’ but they tend to just use ‘big’ (or small/medium/large/x-large).

How much time and energy one wants to spend worrying about specifying @rend and/or @rendition values depends on how important to your project that that this information is documented and done so in a consistent manner. If it is just that you want record whether something is in one of a handful of different colours, sizes, or styles, then you probably just want to agree a project specification of @rend values (and what they mean) for your TEI customisation.

Other @rend issues

Some encoders believe that there is no formal way of indicating what syntax you have used for your @rend values. I disagree because I believe these are magic tokens which are most properly documented in the TEI customisation. This enables an encoder to give a free text description for every magic token used in @rend attribute values, and moreover if they wish it enables a project to constrain it to be just this set of values. If a project is using a specified syntax inside their @rend attribute values (so-called ‘rendition ladders’ are one such format) then this should be documented inside the <encodingDesc>, perhaps in prose or perhaps the TEI will add a mechanism in response to the TEI-L discussion which enables categorisation and description of the taxonomy of @rend attribute values.

Changing @rend

My arguments here are based on my own views and understanding of the current (P5 2.0.2) version of the TEI Guidelines. However, these are subject to change (both my views and the Guidelines). I’ve often been told that the TEI recommendations seem like dictates coming down from on high saying “do it this way”, but that is really not how I view the TEI Guidelines or the community that creates them. The TEI is an open source project which takes solicitations for bug and feature requests from anyone and everyone. This can be from someone encoding their very first TEI document, reading the Guidelines for the first time, or it can be from those with a long history of experience with the TEI. Each and every bug and feature request should be considered on its own merits by the TEI Technical Council elected by the TEI community. [Note: there is scope for electoral reform, but this is a very different topic.] The recommendations of the TEI are not a fixed quantity but an evolving record of the concerns and experience of the community that produces it. In many ways hearing what users new to the TEI have difficulty with, or where they find the Guidelines confusing is more valuable in the long run than some of the more arcane technical discussions.

Self Study (part 1): Introducing XML and Markup

I’m occasionally asked what people should read and do if they want to teach themselves TEI P5 XML. Where should they start? This depends, obviously, on what time they have and what resources. I tend to recommend directed intensive training such as the Digital.Humanties@Oxford Summer School as good ways to get an introduction to such topics.

However, some people are unable to participate in such training and prefer self-directed learning. What should they do? There are lots of resources online such as TEI By Example and the TEI Guidelines. Where to start?

When people are taking an Introduction to TEI workshop I usually introduce markup but move onto TEI and XML very quickly because in such intensive workshops time is limited. Instead, when people are undertaking self-directed learning I think they should use the time they have to learn more about HTML and then XML before starting to learn about the TEI vocabulary of XML itself.

There is so much reading that is possible to suggest for an initial exploration of XML and Markup.  I would suggest at least looking at:

as a good start.

If I were to suggest a series of assignments someone might undertake based on this reading it would be to do the following, writing up answers to the questions.

  1. Read the W3Schools HTML basic section and XHTML section, do the HTML and XHTML quizzes
  2. Read the W3Schools XML basic section and XML Namespaces page, do the XML quiz
  3. Read the TEI Guidelines Gentle Introduction to XML; and the wikipedia article on XML.
  4. How does XML differ from HTML? Why might it be more powerful to describe what some piece of data is, rather than say how it should be presented?
  5. Download and install the oXygen XML editor (you can get a 1 month free trial license, otherwise costs $64 USD)
  6. Choose a very short (1 page) sample of a document you are interested in.
  7. Create a list of the overall structural aspects you feel define this sort of document. Create a list of any of data-like entries (like names or dates) in the document. Create a list of presentational aspects of the document that you think important to record.
  8. Funding challenge part 1: Hypothetically, imagine you had funding to mark up several thousand pages of this material. Look at the list of aspects you would like to record. Why is each one important? What benefit does recording each of these things give those wanting to use or understand the text (or culture from which it originates)? Which would you choose to markup? How consistently can you mark up this feature? Such document analysis should be done long before any project starts (or asks for funding).
  9. Funding challenge part 2: An uncaring government has slashed its funding for higher education research projects and has reduced your project’s funding by 50%! What would you do? Will you mark up only 50% of the material? If so, how do you decide which parts? Will you only mark up certain aspects? If so, which ones and why?
  10. Using the ‘Text’ (code view) mode of the  oXygen XML editor create a well-formed XML file of your sample document with elements and attributes that you have invented yourself. What difficulties do you encounter doing this?
  11. Why might it be better for communities of users to agree on elements, what they mean, and how they should be used?
  12. What are the central ideas of Michael Wesch’s youtube video? How do they relate to the nature of XML and how it is used?
  13. Read the wikipedia article on RSS, and find an RSS feed to subscribe to in google reader to see its application.
  14. Does order really matter in an XML document?  What is the difference between:

    <list><item n=”1″>item 1</item><item n=”2″>item number 2</item></list>  and
    <list><item n=”2″>item number 2</item><item n=”1″>item 1</item></list>

    And how much difference does this make when viewing XML as a data storage format rather than a presentational one?

  15. Join the TEI-L mailing list and start lurking.

This certainly isn’t exhaustive, but with a bit of support, I suggest someone undertaking this would be much better placed to start learning about TEI P5 XML from the online sources available.

The next post in this series is an Introduction to the Text Encoding Initiative Guidelines.