The Battle for the Body Field

by Jeff Eaton

30 Reader Comments

Back to the Article
  1. Apologies, but this is a very Drupal-oriented article. You seem to have taken a long time to say ‘let’s add custom tags and attributes into our WYSIWYG editors, except that part won’t be WYSIWYG and guess what it will be in Drupal 8’. As a web developer I’ve spent years using a wide variety of CMS, and Drupal’s approach is highly limited by the fact that it’s based in PHP, which is just a scripting language (and Perl-based, so everything is seen as a text-processing problem, which is what Perl is good at). If all you have is a hammer, everything looks like a nail. Talking about XML transformations and CKEditor is giving me flashbacks to the late 90s and early 2000s when those were the primary CMS options. You also seem to be contradicting yourself a bit by saying that we don’t want to make users dive into the HTML, but then we do want them to have to learn custom tags and attributes (which don’t get visually represented during editing). Modern systems based on object-oriented languages (like C# or Javascript) give developers the chance to build reusable modules that can separate data from design, not require authors to learn any markup, and be visually represented during content construction. I’ve built widgets in Sitefinity, for example, that don’t allow any design decisions by the author, but give them complete flexibility to place specialized content wherever they need in their article, and see how it will look at design time. Custom tags and attributes are indeed the future, but modern technologies will allow the author to still be abstracted away from the semantic markup and simply focus on content. Thank you for the article, but this is for a specific technology community, and leaves out the many other approaches available even now.
    Copy & paste the code below to embed this comment.
  2. I think ultimately what we’re looking at is the need for more metadata tagging the content, allowing a means of managing that metadata, and then defining a means of leveraging that metadata toward a variety of purposes (seo, presentation, interaction, filtering). That really is the thing lacking in HTML. Tags are fine to define basic and common structures (i.e. this is a list of stuff, this is an article, this is a quote) where it fails is in describing the specific content more. So yeah, it’s an article, what kind of article, about what, from whom, when, what’s the context, who’s the audience, etc.,. At the same time we have to worry about “clean code” and not becoming the bloat that XML can become if not properly curated. It’s funny in that what we basically need is XML and XSL, but not XML or XSL. We need the ability to create chunks of content and specify all those extra data points and then transform it in some separate step into something the browser will render and the user can interact with, but without the bloat of XML and without the speed issues of XSL. So we’re left to writing our own parsers and coming up with our own meta -languages and domain specific languages to fix this problem across a variety of platforms. I think another challenge is that the publishers don’t want restrictions. I’ve run into this with some marketing teams and designers. If our developers say “we need to live within this box due to budget, infrastructure, time, etc.,.” the designers and marketers and business partners balk… “why are you putting restrictions on us! We’re designers! You’re not designers!” and then we go over time or budget trying to figure out how to make their designs implement like they expect (which rarely works out 100%). I think the same holds true here. Instead of designing a system that works consistently and fluidly for 80% of consumers, we’re worrying about fitting every single possible use case that could ever occur and in the process making it difficult for everyone. So how do we put a stake in the ground and say we’re going to do something here, it won’t make everyone happy, but it will work for most and not have everyone reinventing their own version of the same thing.
    Copy & paste the code below to embed this comment.
  3. Tony, thanks for your comments! I’ve definitely spent the past several years in the Drupal world, but I don’t think the narrative-structure challenge (or the approaches I’m discussing) are in any way specific to one CMS or a particular programming language. You’re correct in that many different CMS platforms have arrived at similar conclusions… and yet, on project after project we find clients who’ve been left with a body field, a WYSIWYG editor, and a pile of HTML-insertion buttons. The challenge isn’t simply technical, since the tools to build intelligent, modular content have been around for decades. The biggest hurdle is getting CMS integrators, web developers, and designers on the same page when it comes to the meaning of the underlying content they’re storing, manipulating, and presenting. XML and DITA are important not because they’re perfect technical solutions, but because their communities have learned to articulate many challenging questions about content structure. I agree wholeheartedly that presenting raw markup to users isn’t the answer, and I hope that the space constraints of the article didn’t leave anyone with the wrong impression. Precise, meaningful markup (the kind that vanilla HTML can’t currently provide) is a critical foundation for the easy-to-use widgets and editing interfaces you describe. While it can simplify the markup jumble, it’s not the end of the process.
    Copy & paste the code below to embed this comment.
  4. The quandary is that, although mistrusted by many, XML and XSLT actually enhance publishing opportunities as well as enable stepping away from solutions that only exist in silos. Shared content architectures lower the cost of development and maintenance of tools, enhance content portability across the Web, enable common design patterns to attract wider communities of users (training content, how-to articles, and encyclopedic content as a few examples), and give product vendors wider markets for widely-shared solutions, potentially lowering costs thanks to competition that could not exist before. HTML5’s structural and semantic elements certainly help out with creating ever more adaptive content delivery solutions, but nothing in the Web architecture helps with the problem of umpteen different authors creating umpteen different interpretations of what the repeatable structure of particular content type ought to be. Herein is where some form of schema-coached content authoring tool really does make sense. I’ve been beaten up enough about technology biases to avoid letting you see whether or not I use XML under the covers. I prefer to suggest that HTML5 can be augmented by XML for far greater roles, just as Ripley was augmented by her Power Loader suit for doing outmatched battle (Alien meme there).  It is a simple matter of finding scaffolding for your content goals in the standards that can do the most to lower your costs and enhance your reach.
    Copy & paste the code below to embed this comment.
  5. @Tony, just to point out, the latest iterations of PHP, and specific PHP frameworks, including Symfony, which is the basis for much of Drupal 8, are object-oriented (http://stackoverflow.com/questions/4699519/is-php-object-oriented), so I don’t think that’s at issue. (Also, Perl-based… really?!) I think the issue of how to break down content for its reuse (here, we’re concerned about the content’s code, not that of the CMS/html framework) while allowing contextual presentation of aggregated content in a specific display format at a specific url is an important one that all web developers have to work through in order to accommodate the proliferating number of uses of our content (e.g., in apps, output in APIs, displayed on different types of devices, consumed by screen-readers, and the like). It’s an important question for long-term preservation of content, as well, something that’s coming to a head given the age of the web and the pervasiveness of web-published content.
    Copy & paste the code below to embed this comment.
  6. Pixel and Tonic have added a great feature into Craft CMS called Matrix. It lets you create ‘blocks’ of content types that make it easy for a person to enter in the control panel and also give developers the ability to use proper markup.
    Copy & paste the code below to embed this comment.
  7. @Jeff Eaton Thanks for the reply - perhaps I misunderstood the intent of your article, but the struggles you suggested seemed specific to the ‘WYSIWYG editor and buttons’ problem that you mention, which I haven’t found to be a problem in all CMS. That seems to me more a technical problem of implementation. Most of your article seemed to me focused on the experience of the author, not the challenges of the developer, thus my feeling that it seemed contradictory. Thanks for clarifying.
    Copy & paste the code below to embed this comment.
  8. @sclapp Yes Perl-based, in that PHP started as a C-translation of Perl scripts, and inherited Perl-like syntax. I find that languages, no matter how they progress, keep a root way of thinking. C++ and C# still ‘think’ like C in many ways though they’ve moved far from their origins. Also PHP 5 has objects and interfaces, but that doesn’t necessarily make it ‘object-oriented’ yet. That’s an open debate. You can get into questions of polymorphism, and implementation (heaps, stacks, etc.) As far as data, I think it’s important as developers not to get caught up in abstractions. Custom tags, attributes, etc. are also just data -  metadata that someone has to write code to translate. Moving from <div> to <address> is to accommodate humans (whether screen readers or reading code), whatever the device is, the computer system doesn’t care. You’re abstracting away from what’s happening within the browser anyway. We’re just building deeper layers of abstraction. So the question is how far do you abstract the user, and how far do you abstract the developer, and overall how clean and understandable can you make your abstraction.
    Copy & paste the code below to embed this comment.
  9. The need for more specific elements is there for sure. Microformats never really got full support, RDFa strikes fear that another marquee or blink tag will be introduced under a money-backed namespace, and data attributes, while useful, unfortunately have little to do with semantics at this time. I agree that rel attributes need to play a more prominent role in defining and describing content relationships.
    “There is a very real problem that needs to be solved here. We need mechanisms in HTML that clearly and unambiguously enable developers to add richer, more meaningful semantics—not pseudo semantics—to their markup. This is perhaps the single most pressing goal for the HTML 5 project.” - John Allsop
    And yet, here we are. I think we need to introduce a small new set of elements that represent content categories—general enough to be reusable, but specific enough to be semantic. Just like Microformats, I think the 80/20 rule is a good model to follow. Why shouldn’t the following tags exist? calendar event location (lat, long) vCard (person or profile) product review resume/cv feed (RSS, timelines) img type="logo" (solving the h1 vs img debate) disclosure type="summary | spoiler | sensitive etc" (replacing the detail tag) You could go more even granular for tags: post type="status | image | audio | video | comment" audio type="music | voiceover | language | show (podcast/interview)" Your warning example above could be abstracted to notification with a type attribute that accepts values of error, warning, or message.  Think about the countless parsers, aggregators, and API calls just this set would save, and how this would affect object-oriented CSS. I think we’d all benefit from paving the cowpaths and re-approaching HTML as if we were being created for tomorrow’s web. The other, larger elephant in the room is that content strategy, content management, and semantic, reusable markup exist in a delicate balance. There are IAs who are concerned with content management, content strategists concerned with copywriting and context planning, and front-end devs who want to keep markup lean and reusable, not to mention page weight low. It’s becoming increasingly more difficult to draw clear distinctions in ownership, because content management and authoring are dictating the HTML markup, which more often than not, heavily influences CSS/JavaScript hooks, even source control. The interdependence can quickly become political and the user ends up suffering. I’m also heavily invested in helping find or create an amicable solution.  
    Copy & paste the code below to embed this comment.
  10. Fully agree with the author here. For those familiar with Entity Relationship (ER) modeling, we could say there is a need for two things in a CMS: 1. Custom entity types (meaning entity types consisting of several attribute types). In typical CMSs such as Drupal, these would be the “custom content types”.
    2. Custom attribute content. We should be able to define any attribute type as a simple type, such as text, string, options, ..., or as a complex types (such as the body field described here). This body field can then (technically) consist of e.g. a combination of standard HTML tags and custom elements (such as an image gallery), which can be themed independently of their structure. We are currently implementing such a system in a new Angular based CMS. Angular is great in this regard, because it has something called directives, which is a great way of adding these custom elements. For more information check this: http://docs.angularjs.org/guide/directive, or wait for our upcoming CMS :)
    Copy & paste the code below to embed this comment.
  11. That’s a really interesting article Jeff, I totally agree with your central point that complex content in the body of page content is really underrepresented in CMSs at present. Page content types work fine (see content channels, content types, etc), and complex areas in fixed locations seem well catered for too (for example Matrix in EE and Craft). However, I agree that users often want the flexibility to add complex content anywhere in the page. But that content is still modular and needs to be separated from what it looks like. From what I’ve seen most CMSs don’t really cater for that. You seem to be indicating users will have to write markup to express complex content in the body, in my experience this is usually a hard sell. I think a successful solution would have to marry a decent visual editor along with an expressive markup language. My instinct is HTML5 could be OK for this, with things like data attributes as you suggest. There are also some interesting projects like Made by Many’s Sir Trevor http://madebymany.github.io/sir-trevor-js/ which allows the user to add blocks of modular content in any order. It feels like content strategy is smashing up against CMS tools and we’re starting to see some real progress in how people think about content and how we publish it. It will be interesting to see how this all evolves…
    Copy & paste the code below to embed this comment.
  12. Jeff,
    Great points about the limitations of HTML5, which fails to provide true semantic markup, despite claims otherwise.  I think the bigger issue to solve is determining the purpose of markup within the body.  You mention the role of markup to help to structure a document.  There is far more of that can be done.  That is really what DITA has been about, and it has been a walled garden, serving the needs of individual content creating organizations. The other kind of meaning-based markup is about relating internal content other other content outside of the document.  This is where the various flavors of data markup are about RDFa, JSON-LD, etc.  The web standards community has been more active in this area, largely because committee members are interested in linking data, but it hasn’t been as much of a priority for ordinary authors, who largely don’t understand how it works.  As a result, this kind of markup is still rather cryptic to many.  And it only covers a limited range of named entities that while important to businesses, don’t reflect full range of interests of the general public. For either kind of markup to be more widely adopted, it needs to be concise, precise, and easy to understand.  I would love to see standards bodies care more about these issues.  HTML is now being used for everything, including books, so it needs more robust markup capabilities.
    Copy & paste the code below to embed this comment.
  13. I feel that the curve of our industry is—and should be—bending towards less coding complexity in the body field, rather than more. Most of the problems that Mr. Eaton describe have a straightforward and non-coding based solution already: oEmbed.  I can’t think of a more semantic, easy to use, and responsive way of handling needing to put some piece content into a specific part of your content than a simple HTTP link at that point in the text, then allowing your CMS to auto-discover from the content source the code required to display that rich content, and then work out it’s own rules on integrating that content into your design.  You drop a link to a Slideshare presentation into your body field. When that body field is pulled into some context where a rich media object is appropriate (like a full-size article view), an oEmbed-enabled CMS will render the presentation; in other contexts (an RSS feed, an older feature phone, a ‘help’ tooltip) the CMS will just include the link.  WordPress, Drupal, Plone and I’m sure many other CMSs have supported oEmbed for years (in WordPress, I much prefer to use oEmbed to using their shortcode system that Mr. Eaton described). Most of us are familiar with oEmbed as a way of rendering Twitter cards and Youtube videos, or one of the many other social media services, but in fact anyone can setup an oEmbed provider for their own content, both for internal use and to share with the world. Finally, rich media aside,  Mr. Eaton brings up his core point— adding semantic-ish custom markup to text (‘warning’ or ‘task’ etc tags within an XMLish framework).  I guess there are probably some specific uses for this, but at a core level I’m uncomfortable with trying to use a technical structure to create meaning.  The content is the meaning.
    Copy & paste the code below to embed this comment.
  14. Interesting article and interesting discussions in the comments section. One can take ideas from here and there to get closer towards a solution for this daunting task. In my search for CMSs that more or less implement what’s discussed here, I found the below two, not very known, CMSs. I’m sharing them in hope they’ll help someone looking for a similar solution. http://www.impresspages.org/
    http://www.pimcore.org/ Cheers.
    Copy & paste the code below to embed this comment.
  15. Thanks for the interesting article. It’s indeed about an old problem that yet is at the very heart of the difficulty of Web content management. I appreciate the effort to compare the approach of the “Web folks” (rich text editor in an object or page oriented CMS) with the DITA way. These two worlds are too disconnected and it is good sometime to try to connect the dots! I’m working on eZ Publish (side note: spells “eZ Publish”, I know this doesn’t always make it easy ..) and we’ve been dealing with this problem for many years (from day 1 actually) with the approach to forbid HTML markup in the Richtext edit and rely on an internal XML pivot markup (ezxml). We definitely think this is the right approach. I just wanted to add that we are in the middle of reworking our solution here (moving to a new version of our internal format based on Docbook, and using transformations to different HTML5 views by default). Ping me if ever you are interested. One of the main point in the process is the editorial experience within the Richtext editor often miss-named “Wysiwyg” (we use a customization of TinyMCE). How to “see” the chunks when the content is indeed edited embracing the “Create Once, Publish Everywhere” approach of NPR (and many others in fact I’m not sure they invented it…). By design, Wysiwyg is not making any sens in that scenario when you have multiple channels, multiple designs, multiple screen size consuming the content… and we are back to the big dilemma of separating content from presentation BUT giving meaning and context to editors that needs it to deliver good content! A passionating challenge that is more than ever giving many of us a lot of fun.
    Copy & paste the code below to embed this comment.
  16. It says, “Wikipedia recently rolled out an assistive editing tool to help new users navigate the complexity of the site’s content.” However, the screenshot provided is not the new tool (it’s an old one called WikiEditor).  The new tool is called VisualEditor, and can be accessed directly.  It’s interesting, since it’s a visual view, but still lets you use templates (which can be added and edited, with parameters), to avoid the issue of e.g. every article coming up with their own way of displaying info on the right. You can also see a screenshot of VisualEditor in action. You can try VisualEditor more broadly by signing up (if you don’t yet have an account).  Then, go to Beta Features , select VisualEditor, and save.  There will then be an “Edit” tab at the top, with Beta next to it, which leads to VisualEditor. I work for the Wikimedia Foundation as a software engineer, though not on the VisualEditor team.
    Copy & paste the code below to embed this comment.
  17. In my experience, many content authors have little idea about writing with meaning, or writing for the web specifically at all. They are very visually oriented, as if they’re writing in Word. In fact, only people comfortable in abstract thinking grasp the concept of semantics vs visuals at all. It is very well possible to build draggable, reusable components that authors can embed in a content body, with direct visual feedback, and dialogs that allows them to tune that instance of the component. They can even be perfectly responsive. And the markup is clean. I’ve seen it work. The problem though is that such an approach is hugely expensive, and requires a lot of custom development, as well as ongoing maintenance. Therefore I think, we need to improve the authoring tools, but authors also need to become more skilled in writing content for the web. Yes it’s a tough sell, but if it’s your job to write content for the web, you might as well learn it. We need to close the gaps from both sides.
    Copy & paste the code below to embed this comment.
  18. Jeff: Excellent article. You are spot on. And, this is why you need to be at the next Intelligent Content Conference. We just ended this year’s show a few days ago, but I have a free ticket with your name on it for next year’s event. You’re exactly the kind of guy who should be there amongst the hundreds of us who get it. I used to spend my time trying to recruit new believers, but, given the awesome amount of work and great paying gigs out there, now I spend most of my time finding qualified people to work on amazing projects with real impact. Congrats on summarizing so well what we’ve been preaching for a decade or more. It’ll take some time, but the rest will come kicking and screaming to our party, like it or not. Scott Abel
    The Content Wrangler
    Copy & paste the code below to embed this comment.
  19. @Matthew Flaschen—Thanks for the update on Wikipedia’s visual editor. It’s a great example of how a project can iterate on its tools when the underlying “vocabulary” is understood, and I’m going to be giving it a closer look…
    Copy & paste the code below to embed this comment.
  20. @fchristant—You’re absolutely correct that visual tools aren’t incompatible with the approach being discussed, and they can have a significant impact on the user experience and content quality. The problem of training writers to capture meaning rather than appearance is definitely a key challenge. That’s actually one of the reasons I’ve been focusing on lessons from the XML community, and the problematic vocabulary mismatch between HTML and the work that most content creators do every day. Although we can never build foolproof systems, the “language” we offer them in the form of markup, assistive editor buttons, and widgets can shape the work in a positive direction. When the training and the functional vocabulary of the tools *compliment* each other, it’s helped reduce many of the reuse challenges we see on large content projects.
    Copy & paste the code below to embed this comment.
  21. Congrats on summarizing so well what we’ve been preaching for a decade or more. It’ll take some time, but the rest will come kicking and screaming to our party, like it or not.
    www.fmyykj.com
    Copy & paste the code below to embed this comment.
  22. Thanks for this, I’m inspired. The use of Web Components (currently through Polymer.js) along with a CSS architecture such as SUIT.css make this approach a plausible and attractive one.
    Copy & paste the code below to embed this comment.
  23. Hey Jeff,
    Thanks for sharing this. I thought it was a really interesting post that brought up a number of good questions. I wanted to let you know that I thought our readers would enjoy this, and I included your post in my roundup of February’s best web design/development, CMS, and security content. http://www.wiredtree.com/blog/februarys-best-web-designdevelopment-cms-security/ Thanks again for the nice work. Rachel
    Copy & paste the code below to embed this comment.
  24. Well, i never had this kind of problems because i use the power of MODX Revolution with a customized TinyMCE plugin and another PHP plugins. In my opinion, we should create rules for the writers, and try to cover all the situations that they´ll need to create their articles, by developing specific templates/chunks. And of course, if we can give them some training on HTML better. As Tony said “... for example, that don’t allow any design decisions by the author, but give them complete flexibility to place specialized content wherever they need in their article, and see how it will look at design time.” This is the point. we just create the flexibility to an author publish their content… and that´s it. And i think at this time we´ve got all the tools to solve this problematic… we just need to be creative!!! :) Sorry about my english.
    Copy & paste the code below to embed this comment.
  25. Very interesting read, though I’m not a fan of drupal, I think there are greater technologies out there.
    Copy & paste the code below to embed this comment.
  26. Hi Brother How r u? I hope you will b fine. i read your Colom, Tutorial, But i cant understand how to use this on with normal this. Brother applogeys to take that i m disagree with you and i want more qustion for you.
    best regart.
    www.truewisdomcltohing.com
    Copy & paste the code below to embed this comment.
  27. This is precisely the problem I have been struggling with in recent years. I’m comfortable coding a website for a client. The difficult part is handing it over to them with a CMS that strikes the right balance of usability and flexibility. Whilst generating clean code and responsive layouts. I now think some form of custom tags are the answer to this problem. Even if a WYSIWYG editor is built on top of them. The question is now one of finding the technology to implement this… I’d like to keep most pre-processing server-side until Web Components are more widely, natively supported. And I favour Ruby/Rails for development. Something like the Radius templating engine is a contender. (I think it’s a shame it hasn’t been more widely adopted.) But it would be even better to use pure XML or HTML for forward-compatibility, perhaps in combination with Markdown for simplicity. Does anyone have any suggestions?
    Copy & paste the code below to embed this comment.
  28. Wow Jeff, thank you very much for this article. I got the impression that as soon as you work outside the popular web publishing world with PHP-based CMS, in an environment more dominated by engineering, you find very good solutions in deed, that make use of XML (maybe even XSLT) for quite a while. Please let me point out a small comment by sc5 that CMS will die in the transition to responsive HTML5 services ,which is interesting but wont be the case as I understand now. During my research on this topic I found Symphony CMS, which is PHP-based but makes use of XML and XSLT exclusively. Would be interesting to see how and if they make use of it in the native editor. Any experiences?
    Copy & paste the code below to embed this comment.
  29. This feels like a complex way to say “use symantic bbcodes where possible, limited styling bbcodes, then convert to markup on output”. Haven’t forums and CMSes been doing this for ages with comment editors inserting such square bracketed tags in lieu of direct html?
    Copy & paste the code below to embed this comment.
  30. In our attempt to solve the body field problem we identified a list of content patterns (28 so far) that would help us streamline the authoring process and improve the reading experience. We developed shortcodes for many of these patterns that authors or editors could use. Check out the work at
    Content patterns to improve the reading experience.
    Copy & paste the code below to embed this comment.