Unwebbable

by Joe Clark

42 Reader Comments

Back to the Article
  1. Joe — minor correction for accuracy, uppercase uses text-transform, not text-decoration.

    Copy & paste the code below to embed this comment.
  2. Another good example of ‘unwebbable’ text is poetry, especially the modern kind that plays with page margins, text orientation (another inherent HTML weakness), font faces and font sizes, and even page size sometimes. In fact, this is text that deliberately defies conventions and structure.

    Copy & paste the code below to embed this comment.
  3. Great article, raises some good questions.

    Personally I would use a definition list if I were to mark up a masthead. Would that not be most appropriate? Also, I believe the HTML5 <aside> element is intended to cover callouts, sidebars and footnotes (to some degree).

    Copy & paste the code below to embed this comment.
  4. I love to see people coming up with new ways of solving problems. I’d be curious to see how the HTML 5 Aside would work, but it’s the xml idea that got my attention. Seems like it would be the most inherently flexible method to me.

    Copy & paste the code below to embed this comment.
  5. So, I suppose I am in the boat with the people that think anything can be “webbable”. I think that what it comes down to is, how accustomed and comfortable people have become with a particular format/routine.

    I do agree that XML would be a viable solution to consider here. Although, I had wished you discussed more of a solution with XML rather than just the term itself.

    Bottom line. A message needs to be conveyed from the writers, to the actors, and whom ever else. How you get there is the question. Another question would be why go to the web with the scripts?

    Copy & paste the code below to embed this comment.
  6. I didn’t notice if anyone mentioned it, but HTML5 actually has a new <dialog> element

    Copy & paste the code below to embed this comment.
  7. In reading this article, I was reminded of Plotbot, which makes (at least) a small attempt to break the paradigm of poor formatting by providing an XML download of one’s screenplay. The problem, of course, is that it’s not in a format that’s usable to any other application.

    However, I see this as a good first step.

    Copy & paste the code below to embed this comment.
  8. For digitizing manuscripts, etc., I’ve found the DTDs from the “Text Encoding Initiative”:http://www.tei-c.org/ to be really useful. Joe, et al., the TEI might be worth checking out, if you haven’t already.

    Copy & paste the code below to embed this comment.
  9. Before you exclaim “MathML!” the way a pensioner might yell out “Bingo!,”
    I don’t think I’ve ever laughed out loud at an ALA article. Say what they will about Mr. Clark, nobody can deny he’s got a way with words.

    “@AfroNinja”:http://www.alistapart.com/comments/unwebbable//#7 if the document is XML then if has to be usable by other applications, doesn’t it?

    Copy & paste the code below to embed this comment.
  10. Before you exclaim “MathML!” the way a pensioner might yell out “Bingo!,”

    I don’t think I’ve ever laughed out loud at an ALA article. Say what they will about Mr. Clark, nobody can deny he’s got a way with words.

    “@AfroNinja:”:http://www.alistapart.com/comments/unwebbable//#7 if the document is XML, it has to be usable by other applications, doesn’t it?

    Copy & paste the code below to embed this comment.
  11. I thought the article was interesting in that it shows how custom XML comes in handy over simple XHTML as a means of storage… but then again, was (X)HTML even intended for general information storage? XML marks up information, and (X)HTML just happens to mark up information specifically about web page content.

    The semantic web is, for a great part, about automation, and therefore storing a screenplay as XML, reading that in using a script (say, PHP) and then passing it to the browser as XHTML seems to me to not only be very possible, but also very sensible… so I don’t understand what about a screenplay is ‘unwebbable’, any more than, say, a blog with comments. It would be just as bad to store a, but blogs are quite obviously alive and well on the web. You store it in one format (database) and display it in another (X/HTML).

    Copy & paste the code below to embed this comment.
  12. Oops! I didn’t finish my sentence! Meant to say…

    “It would be just as bad to store a weblog as a static HTML page.”

    Sorry. :)

    Copy & paste the code below to embed this comment.
  13. I think the main problem is that you need a bit of software to actually understand a given XML document, rather than just validate it.

    DTDs (or XSDs) make it easy for a program, say a web browser, to check that a document is valid but they don’t describe what the content actually means or how it should be displayed. This makes accessibility difficult; you can’t apply any semantic meaning to the content and it will just be presented as a mess of text.

    The only reason that browsers can understand (X)HTML, for example, is that their developers have read the extensive (human readable) documentation made available by the W3C. Therefore, they know that the H1 element is a heading and can offer tools (e.g. a document outline) and default styling that makes use of that knowledge.

    Getting a single piece of software that can do that with any type of XML document is never going to be feasible.

    Copy & paste the code below to embed this comment.
  14. “Bringing scripts to the web is noticeably worse than filming a stageplay.” – oh so damn spot on!

    “You could, without too much of a stretch, mark up a script as a table.” – Yep, that’s exactly what I thought.

    “But a masthead marked up with H1 through H6 essentially pollutes the tag stream of the surrounding web page.” – Even with HTML5 «footer» «aside» etc tags? Perhaps not.

    “Armed with this knowledge, what are we going to do? Prediction: nothing.” – Agreed, and nothing is exactly what should be done too.

    “After all, isn’t our new future wrapped up in HTML5? Just as our old future was wrapped up in XHTML2? ” – NO!!!! HTML5 is wrapped up in fun, XML is wrapped up in the invisible extendability of usefulness. Fun is easier to deliver on than usefulness. When you add “fun” to “ding” you have another reason HTML5 will grow quicker XML… faster ROI.

    “The web is, of course, a wondrous thing, but its underlying language lacks the vocabulary to express even the things that humans have already expressed elsewhere.” – Good! The idea that HTML could express everything humans have expressed elsewhere is insane. The law of “least resistance” dictates that a super-language with a rigid syntax would never accommodate a race who’s very expression is ‘to break the mold’. How many commands can a mind hold? So much easier to navigate a small set of fluid meta controls than a galaxy full of rocks.

    Copy & paste the code below to embed this comment.
  15. To give web documents the rich semantics of print documents, XML is finally a viable option.

    I use xhtml because the experts who have guided my growth as a web designer seem to believe in it. I believe it think it has potential to do all sorts of amazing things”¦ but I don’t know how to use it for anything beyond typical semantic web page markup that validates. I have no concept of its potential. This concluding sentence seems to indicate I’m not far off the mark. So my understanding is that if I become proficient in xml, I could in theory create elegant ways to extend my xhtml markup to convey things like screenplays.

    The truth is I find this all very confusing, and I’m not even sure what question to ask. Is my understanding correct? I think I need more “how” to better understand the “why.” Where to next? Where do I go for an xml 101? (Please don’t say W3C!).

    Thank you.

    Copy & paste the code below to embed this comment.
  16. “The creation myth of the web tells us that Tim Berners-Lee invented HTML as a means of publishing physics research papers. True? It doesn’t matter; it’s a founding legend of the web whose legacy continues to this day.”

    Okay I’ll bite. Someone care to set the record straight?

    “We are well past the stage where browsers could not be expected to display valid, well-formed XML.”

    XML (and related technologies) is hard for beginners. I think every HTML author who first hears about XML is like “Cool! HTML where you can invent your own tags!” So they dive in and about the time they encounter namespaces something dies inside them.

    It’s not so much that HTML is the lingua franca of the web, but that tag-soup is the lingua franca of the web.

    Copy & paste the code below to embed this comment.
  17. interesting and thorough article Joe. People, not unreasonably,
    want to use common and widely understood conventions from print. As you rightly point out, this can rarely be done without adaptation to the media.

    Are you suggesting one solution screenplays is to use something like the Final Draft XML format and get browser manufacturers to create default styles to parse it? Or are we talking XLST here?..

    Of your other suggestions, the humble PRE tag seems the best/most pragmatic (though has no semantic meaning behind it).

    Copy & paste the code below to embed this comment.
  18. You say that XML is a solution—but the web is not only about having the data there, i would want to display it, that’s why i put it on my public server and not on my bookshelf. I don’t see what’s the problem with the MathML-kind of stuff. Those are sulutions for this kind of problems. Somebody took the trouble to make a DTD for a pretty troublesome field. I wonder if there’s any XML markup for say music sheets…
    But the main question: how am i supposed to format my own xml? No way. Am i wrong?

    Copy & paste the code below to embed this comment.
  19. Marking up a film script is complicated, and when it gets complicated nobody does it.

    @valerauko wonders ‘if there’s any XML markup for say music sheets’. Check out “XML and Music”:http://xml.coverpages.org/xmlMusic.html for one of many abandoned resources dedicated to the online storage, transfer and display of musical notation. Clever and passionate people work on wonderful DTDs for their thesis. Then they leave uni and never update (or use) the markup they created.

    It’s too complicated. Like Joe said about screenplays, it’s easier to use PDFs or “proprietary formats”:http://www.sibelius.com/products/scorch/index.html for display and “super-simple markup”:http://chordie.com for everyday use.

    Copy & paste the code below to embed this comment.
  20. Not all meaning can be captured in in hierarchical trees; not all meaning can be rendered machine-readable. Sometimes the essential part of a designed object is space and the way that small pieces form a whole, and I don’t think computers can understand that yet.

    Copy & paste the code below to embed this comment.
  21.  Back when I was a tot, before anyone even dreamed of things like icons, drop-down menus, and windows, “computer literacy” was a big concern. Meaning there was a fear that the current crop of young people would fall behind in their communication skills because having them meant knowing how to program a computer. In hindsight, expecting people to climb a learning curve like that to publish a simple document is a laugh.
     Right now, I’m typing into a little box and after I click “Submit” the whole world will be able to read it. Big learning curve involved with that one, eh?
     Are we all going to learn XML now? Or is the idea to build it into products that ordinary people can use?
     My wife writes a lot of papers in the APA style. So do others – millions of them are generated every year. Ever see what MS Word spits out when you save one of them as HTML? Smart stuff in (hopefully), garbage out.
     What we need right now are well thought out, ad hoc, interim solutions. Using class names may be hacky but at a later date, at least the document can be machine-parsed and transformed into something better.
     Since I do a lot of thinking about stuff like this, that’s just my off-the-top take. Terrific article. Did a post on “Readable Web”:http://readableweb.com/moving-from-print-to-screen-a-case-study-from-joe-clark/ recommending it. Ciao.

    Copy & paste the code below to embed this comment.
  22. … for courses. The web is not a paginated format. Text-indentation is not the way to indicate separations in content. A script’s semantic structure is something you infer from its appearance, a document’s is implicitly defined.

    Your use case is something I’m dubious of – though if the intent is to completely reproduce the appearance of a script in the classic style, whilst creating / editing in a digital medium a document that retains semantic structure, you’d be better off using XML and looking at something like XSL-FO for presentation.

    Copy & paste the code below to embed this comment.
  23. @vai who wrote:
    “Text-indentation is not the way to indicate separations in content.”
    If you don’t count the typographic convention of the last 500 years or so that uses indentation to separate textual content, you’re absolutely right.
    Or am I misunderstanding you?

    Copy & paste the code below to embed this comment.
  24. ‘in a document’ should be implicit in that sentence, and by which I mean ‘from the document’s point of view’.

    Case in point – being typewritten, the script is bound to the medium of paper. A document can end up on multiple media, in various presentations. How do you represent text-indent in a vocal presentation? You don’t – it has no ‘meaning’ beyond what you infer about the content from it. Model your domain (db / xml schema / what-have-you), populate, take care of presentation issues afterward. This applies to software as a whole, and even the humble document.

    Yes, you’re misunderstanding me.

    Copy & paste the code below to embed this comment.
  25. Vai, text formatting like screenplay indention would not translate to “vocal” format because one experiences spoken texts in sequence (one word at a time, let’s say), whereas the sighted reader can scan an entire page in seconds. You see alignments and indents out of the corner of your eye and they guide your reading of the page. The same isn’t true when, to use an example, the page is enunciated by a screen reader. Then it’s just one bit of content after another.

    Copy & paste the code below to embed this comment.
  26. You’re right that some texts don’t particularly lend themselves well to the flow of web formats. For now, it appears that we must use various hacks (or old-school tables), neither of which are a particularly good way to go (but could at least be visually readable).

    I’ve used Final Draft, which is not the most user-friendly program (it feels stiff and clunky). That said, I’d be hard put to believe that studios or other owners of movie scripts actually want them out there on the Web.

    I’d be at least as interested in the ability to use more than the few fonts that Macs, PCs and Linux machines share in common.

    Copy & paste the code below to embed this comment.
  27. With reference to Joe’s latest comment, it seems to me that the idea of all the indentation seems to make the screenplays easier to read. If that is the case, I don’t see why XML or any other language is going to make any difference.

    On the other hand, MathML or SVG makes a lot of sense in why XML is a good language to use to markup. But this usage for screenplay seems to be thin ice.

    Copy & paste the code below to embed this comment.
  28. Divya, we could fake the appearance of screenplay pages, except of course for the fact that we don’t have pages and the entire purpose of a printed screenplay is at odds with whatever purpose, if any, an online screenplay has.

    But at root HTML doesn’t give us rich enough semantics to mark up the actual content. XML might.

    Copy & paste the code below to embed this comment.
  29. I would argue that footnotes are not webable. Aren’t hyperlinks the World Wide Web’s alternative to footnotes? Instead of providing bibliography after the document and marking the references to the titles in it with [1], [2], [3] we use hyperlinks to the original articles. Instead of explaining a word in a footnote, we just create a hyperlink to Wikipedia or other source of knowledge somewhere in the web.

    I can imagine situations where the traditional footnotes can’t be replaced with traditional hyperlinks, but a little creativity would always come handy. After all web is not print, is it? So web documents are not meant to be the same as print pages, are they?

    Copy & paste the code below to embed this comment.
  30. I have to admit that until now I used paragraphs or divs styled with CSS scripts to format difficult parts of web sites. Having read this article I realize that it should be much easier to create structures like footnotes or scientific formula. Nowadays it appears contradictory that we can buy things and watch TV with a browser, but we cannot display simple equations like x=1/2 in a well-looking way (and in my eyes png graphics cannot be a solution for this issue). But finally I am optimistic—even though HTML 5 solves not all problems, it shows that the web languages are undergoing progress.

    Copy & paste the code below to embed this comment.
  31. I would argue that footnotes are not webable.

    I wouldn’t, although I find it easier to use endnotes than footnotes (ie, when you print them, they all appear at the end of the document rather than on the relevant page).

    In the text, use an <abbr title=“Content of footnote” class=“note”>1</abbr> element for the footnote reference. You can number it manually, or if you want to be clever (and not worry about legacy browsers), you could probably get it to display the number with a counter. This will give the contents of the footnote as a tooltip on :hover. Then add a <dl> list at the end of the document defining each number with the appropriate footnote as the definition.

    Other options would be to display the content of the footnotes in a frame/iframe at the bottom of the page or as a lightbox if you want it to look fancy.

    Copy & paste the code below to embed this comment.
  32. What’s so hard about semantically marking up a film script? No, you won’t get the page break every 60 seconds, but I’d be surprised if that works out at a hard-and-fast rule anyway, and maybe that’s the one thing that has to give.

    <h3>Reverse angle – over their shoulders</h3>
    Slowly, without any fuss, and without a pattern of sorts, that would be pretty if the impact wasn’t so frightening… slowly, all the runway lights are going out.
    McClane
    Jesus…
    <h3>Int. Virginia Church – same time</h3>
    As Stuart’s tech throws some more switches –

    and so on.

    Use a counter on the <h3> element (I’ve assumed that a couple of higher levels will be needed but use whatever level is appropriate). Or use an <ol> if you’re worried about legacy browsers.

    Use margin-left to indent the different classes of paragraph to the appropriate point.

    Use text-transform:uppercase; on <h3> and <strong>.

    There’s absolutely no reason why it needs to have a fixed-pitch font, if you’re lining the edges up with proper margins, but you can choose to use one if you prefer the feel of it.

    Obviously with HTML5, you can use the <dialog> element to improve the markup even more.

    What, in all that, (a) is not perfectly semantic, or (b) able to replicate the rendering used in the original script?

    Copy & paste the code below to embed this comment.
  33. OK, so the Textile preview bears no relation to what is actually output…

    I used normal angle brackets around h3, which it is left alone, strong, which it has converted to square brackets, and p, which it has stripped altogether.

    The screenplay should have read:
    [p class=“direction”]Slowly, without any fuss, and without a pattern of sorts, that would be pretty if the impact wasn’t so frightening”¦ slowly, all the runway lights are going out.[/p]
    [p class=“speaker”]McClane[/p]
    [p class=“speech”]Jesus”¦[/p]

    Copy & paste the code below to embed this comment.
  34. In the author bio it has the phrase, “His ongoing missions”. This smacks of bringing religion to the ignorant natives to me. Here’s a piece of dogma about using “presentational HTML and inline styles” delivered without benefit of any technical justification, “These are, of course, outmoded development methods.” Just what is your objection to John August’s desire to have his RSS feeds look the way his readers, screenwriters, expect it to look? And what is it that makes them outmoded if the politically correct methods don’t work in RSS?

    So I guess I missed the sermon on why everything on the Internet should be marked up systematicly. What I’ve never seen explained, in detail, is what the benefit is to this and why it is promoted with such zeal. Give me some examples of where this has been done and what benefit was derived.

    The first step in justifying any technology is to clearly explain the goal of that technology. So let’s talk about the screenplay on the Internet. First, who is it there for and what are they doing with it? In the case of John August’s blog, they are there to be read by the people that read his blog and, I think, no other reason. They are there for education … to illustrate the point he’s making in his post.

    Now I know that a script that is going to be produced does serve some additional purposes in the pre-production phase of film making. It is called the script breakdown. This is usually a manual step where the various film making departments comb through the script to find all the characters, props, locations, sounds and visuals that will be needed during production. There are script software systems which allow the author (or someone else) to “markup” the script to identify these elements and automate the breakdown. These are WYSIWYG tools and not HTML like at all.

    So let’s talk about marking up a script in HTML:
    “Would class names really suffice here—that is, H2 class=“slugline” versus H2 class=“charactername”? Really, the answer is no. Script headings and HTML headings are two different things.” They are? Who says? Why? Again, no technical justification … just dogma. You might as well say, “Well, everyone knows that’s just wrong.”

    Each of these IS a heading that introduces and pertains to what follows until another heading is reached. That, to me, is the essence of what a heading is. Do class names not add to the systematic metadata about that entry? Here’s my shot at marking it up:

    <DIV class=“manuscript”>
    … stuff left out here …
    <DIV class=“slugline”>
    <H1>THE CAB</H1>
    <DIV class=“pagenumber”>32</div>
    <DIV class=“continuetopofpage”>CONTINUE -</div>
    <DIV class=“description”>But Barns doesn’t reply … just tries – and fails – to point out the window. Everybody turns.</DIV>
    </DIV>
    <DIV class=“slugline”>
    <H1>REVERSE ANGLEOVER THEIR SHOULDERS<H1>
    <DIV class=“description”>Slowly without any fuss, and with a pattern of sorts that would pretty if the impact weren’t so frightening … slowly <SPAN class=“breakdownVisual”>ALL THE RUNWAY LIGHTS GO OUT.</SPAN>
    <DIV class=“charactername”>MCCLANE
    <DIV class=“dialog”>Jesus …</DIV>
    </DIV>
    … stuff left out
    </DVI>

    (I hope this comment preview is accurate.)
    I see no reason this markup, which is systematic to me, can’t be styled using CSS and I know that I could write a parser to automate the breakdown of a script formatted this way.

    So I guess I’ve missed your whole point, being one of the ignorant natives.

    “but what you should feel is cheated” and don’t tell me how I should feel either. I don’t feel cheated when I buy those screenplay books. If you feel cheated then own that.

    After writing this I guess my objection is to the tone of the article. I guess you may not like the tone of my reply either. Well the Internet is a big place.

    Peace,

    Rob:-]

    Copy & paste the code below to embed this comment.
  35. I agree with most of what RobShaver said.  While I think I understand Joe Clark’s point, I agree that functions like movie script writing and formatting need a “Movie Script Editor” program.  If there is adequate demand, it would probably provide a web formatted output, maybe even like Joe suggests.

    As far as “outmoded development methods”…  there are so many responses to that statement that I can’t even begin to enumerate them here.  Try using modern methods with HTML email.  There is also the implication that people who still use “outmoded development methods” are Wrong.  I know of people who built their sites with Front Page 4 and are still maintaining them with it.  Would I recommend Front Page for anything?  Hell, no… but I refuse to say these people are Wrong for doing something that works for them.  Fortunately, the web browser manufacturers believe they need to support the existing web including those “outdated methods”.

    Speaking of browsers, I think the browser writers are going to trump all of the supposed standards and keep trying to make software that works.  HTML5?  Yeh, we’ll support that… and HTML 1, 2, 3.2, 4, XHTML, javascript, ECMAscript, DOM, whatever it takes to make it work.  ‘deprecated’?  Not in this browser!

    If W3C can’t make things that Improve the situation, maybe Adobe will.  The web page of the future:

    <html>
    <head>
    <title>Untitled</title>
    </head>
    <body>
    <runFlash src=“MyFlashPage”>
    </body>
    </html>

    Copy & paste the code below to embed this comment.
  36. Thanks to Rob Shaver for going to the trouble of creating a demo for what I had in mind when I said:
    “Using class names may be hacky but at a later date, at least the document can be machine-parsed and transformed into something better.”
    There might be a smart way to get rel, rev, and title attributes in on the act, too.
    BTW – is this not the basic technique used in Microformats?
    If Rob’s example were to be codified – spec’d to exactly when and where these combinations of tags and classnames are to be applied, it certainly would provide a “schema” of sorts.
    It’s one way to approach the problem, surely.

    Copy & paste the code below to embed this comment.
  37. Surely we could accomplish any of this with a simple <pre> tag? Thus giving the author pretty much the same freedom of the typewriter. A little clever javascript to deal with interface differences like tabbing (or any other typerwriter oddities) shouldn’t be too hard either. The argument for semantics is voided by the counter-argument that the printed version is automatically semantic by reader interpretation – the same would apply to pre-formatted text… My pencil doesn’t need to know whether it’s writing a paragraph or title, and nor would an online screenplay. It’s fair to say that typical web semantics don’t apply, and that the screenplay format could very easily be replicated.

    Also the notion that an HTML document is automatically “Pageless” is daft, if you want to represent “1 minute” intervals what’s wrong with denoting every 50th line or so?

    PS – apologies for my rudeness, I think that your technical argument is overshadowed by your obvious sentimentality for the typewritten form. If you’d presented your article with more romanticism and less “fact” I would have enjoyed and appreciated the article as I imagine you meant!

    Copy & paste the code below to embed this comment.
  38. That should have read < PRE >, I would’ve given an example if it wasn’t for the terror that is Textile formatting!!

    Copy & paste the code below to embed this comment.
  39. Rob Shaver, for “missions” read “goals.” At least I have some.

    Copy & paste the code below to embed this comment.
  40. “Michael Newton”:http://www.alistapart.com/comments/unwebbable//#9 if the document is XML, it has to be usable by other applications, doesn’t it?

    No it will not. Any XML application is opaque to any other XML application unless you have prior knowledge. The claim that an XML processor somehow magically knows what an XML application is about is a myth based on “mismarketing”:http://my.opera.com/jax/blog/html5-xml-stealth XML, and probably part of the reason for the perceived backlash against XMLXML solves problems, just not the problems it often is claimed to solve.

    Back to the article, there is no reason why the internal format shouldn’t be a task-specific XML format (or any other format for that matter), and XML has the advantage that it can be transformed into HTML fairly easily. However I don’t think the particular example of film scripts was that well-chosen, as they can be encoded in HTML with no loss of information. The example in the comments with musical annotations might be a better one, as there is no adequate support for that in HTML.

    Copy & paste the code below to embed this comment.
  41. “Another question would be why go to the web with the scripts?”

    @mattrossidesigns (Post #5)

    To share them with community theatre and school drama groups, is my first thought.

    Copy & paste the code below to embed this comment.
  42. I’m surprised to see no reference to the Text Encoding Initiative (http://www.tei-c.org), which has been saying more or less the same thing as this article since the mid 90s. And which is now more or less the de facto xml vocabulary of choice for marking up the meaning structure of texts rather than their accidental appearance.

    Copy & paste the code below to embed this comment.