Community Creators, Secure Your Code!

by Niklas Bivald

33 Reader Comments

Back to the Article
  1. This can be partially mitigated by using a proper DTD for your documents, but you’re right. I suppose the idea about giving users a limited toolset would help prevent malformed code. However… if the parser makes the code valid as well, it shouldn’t be a problem.

    Copy & paste the code below to embed this comment.
  2. Using real HTML, CSS and URI parsers seems like the most secure solution, and it only has to be done when processing the input, not every time it’s displayed.

    In Java, there’s “TagSoup”:http://mercury.ccil.org/~cowan/XML/tagsoup/ for parsing just about any input HTML, which is a good idea anyway, a few “CSS Parsers”:http://www.w3.org/Style/CSS/SAC/ and the provided URI parser, and presumably other languages have the same.

    If you include only the elements, attributes, CSS rules and URI methods you don’t understand, and correctly escape the output with the right “character encoding”:http://www.w3.org/International/O-charset.html I don’t see how anything can slip through.

    Copy & paste the code below to embed this comment.
  3. Seems to me that if I was to summarise this article in a few words, capturing all the important information that is not already second-nature to most web developers, it would be: “The word ‘javascript’ can have line breaks (and spaces, and other separating chars?) in it”.

    All the stuff about escaping special characters, white-listing HTML elements, and being careful about CSS input, have been well-known for years. It’s just that this new ‘ja-vas-cript’ IE trick has come into the limelight recently, because of the MySpace exploit that the author mentions.

    Copy & paste the code below to embed this comment.
  4. For the web app I develop in Perl, I’ve found HTML::Scrubber to be a good way to help in cleaning up input from untrusted sources – nicely customizable in a very perlish way. Check it out –

    http://search.cpan.org/~podmaster/HTML-Scrubber-0.08/Scrubber.pm

    does well with javascript, although I do not know about css (you can strip out any tag you’d like), or js in css.

    Copy & paste the code below to embed this comment.
  5. It is facicnating to see this issue discussed, we work on a number communitiy based sites and will now work on a solution for this. Upto now we have been using the HTML tidy variations for .net at…

    http://tidy.sourceforge.net/

    I’m sure this will resolve most issues. We have found it fantiastic! Also for editing HTML try using

    http://tinymce.moxiecode.com/

    We find this brilliant and has a range of options for narrowing the HTML tags allowed.

    Hope this helps

     

    Copy & paste the code below to embed this comment.
  6. Partially in response to Brian Lepore (comment #8), I’d like to help underscore the threat of XSS. Any time user input is accepted (even if that input comes directly from a GET or POST variable), it needs to be properly escaped on output or you are at risk of an attack.

    Here’s an example that brings it home for me. Imagine a site where you store a cookie for authorization. As you may know, cookie contents are accessible through Javascript. Imagine if this site’s login page would display error messages in the URL, like login.php?error=Incorrect password.

    Seems innocent and common enough, but if I am an attacker, and I IM one of the site’s users with a URL like login.php?error=Incorrect password [removed]alert([removed])[removed]
    (contrived example), you can see how I’d be able to manipulate the cookies and send their login cookie to my site (through a Javascript redirect). Similarly this can be used for phishing by sending the user to a fake login page via a link on the real site.

    With XmlHttpRequest, I’d even be able to force the user to perform actions (via HTTP POST/GET) on the site – such as the voting example given in this article.

    To make matters worse, many of the new community sites that are springing up encourage the user to enter HTML, and correctly differentiating valid HTML from invalid HTML is a difficult process. This means that you can’t really use stuff like PHP’s strip_tags() function.

    Hope this helps!

    Copy & paste the code below to embed this comment.
  7. At the moment we’ve got a half-empty glass here. I can’t judge the contents just yet, because if I really want to get to the taste I need the whole thing.
    I’m sorry to say, but I find the half of this article useless. It might start making more sense when part two is out and about, but until then this seems like a very lengthy introduction.

    Copy & paste the code below to embed this comment.
  8. thomas:

    I’m sorry that my previous post did not state this, but I am aware of the idea of validating input to protect users.

    That said, I have never really understood why many sites have a tendency to use GET data like in your example, rather than keeping then the use of error code numbers. I know, it is quite annoying to have to look up the different numbers when you want to use something, but it saves the worry of someone injecting HTML into your site. In my opinion, the security benefit outways the simplicity in development.

    I like the check_tags function in the first link that ban jax posted. It looks like a beefed up version of strip_tags that fits the needs of most developers that need to allow HTML.

    Copy & paste the code below to embed this comment.
  9. I like the check_tags function in the first link that ban jax posted. It looks like a beefed up version of strip_tags that fits the needs of most developers that need to allow HTML.

    The Iamcal code is quite interesting, but it doesn’t guarantee XHTML 1.0 valid code, since it doesn’t check the children of the elements (a tag within a tag). Also, it’s not easily extensible to environments that need a broader tag base: there’s a lot more to XSS in attributes than a few protocols. Especially true if you decide to allow the style attribute (which, as the article points out, can execute JavaScript too! Fun.)

    Shoot me, but I’m not sure why anyone would need image tags for most applications either.

    Copy & paste the code below to embed this comment.
  10. Am I being totally short-sighted here, or can all these security holes be resolved in one simple stroke: don’t let your users personalise their space through real code!

    My other half asked me a few days ago to help her style her MySpace stuff (and as it’s the first time I’ve really bothered going there I almost threw up when I saw how s**t it is code-wise) and then gave up after tearing my hair out for ages.

    For a start it clearly says “please don’t use CSS to remove any MySpace ads” so immediately I set about doing just that – and succeeded. I then decided to play a little prank on her by adding “table {display:none}” to her style sheet and promptly destroyed the entire site in preview mode.

    IMO these sorts of places – MySpace in particular – are so shoddily written it’ll take a CSS expert to write code to successfully style the soup of nested tables, divs and junk to get anything worthwhile looking at – and I doubt the majority of the user base will be these CSS experts (I know I’m not) – and certainly from my own experience anybody half web-dev savvy have their own blogs with crisp, clean blogging systems or just written their bloody own!

    I’m all for personalisation and marking your cyber-territory, but surely it’s quicker and easier for the users and safer for the admins to allow personalisation through forms and options. Let the user click the settings they want and the system generates the styles.

    Obviously you’ll still need to filter text input areas to avoid IE’s incompetence, but you’re already running a lot tighter ship.

    Or, I say again, am I being short-sighted?

    Copy & paste the code below to embed this comment.
  11. MySpace didn’t get popular because it was well-written ;-) It got popular because it gaves users control.

    However, I think that you do have a point in not letting users customize space through “real code.” Isn’t that what most forums and blogs do right now by not allowing HTML when they can avoid it? BBCode and Textile are much easier to secure than raw web input.

    Copy & paste the code below to embed this comment.
  12. Well of course you don’t have to let them use real code. You could let them pick between the options you give them. But you could also just not accept their content, or their photos, or their comments..

    Like the previous poster said, MySpace sold for many millions of dollars, and it was only because “making profiles pretty” really appeals to teenaged girls and the boys who lust for them. People like to do their own thing.

    Copy & paste the code below to embed this comment.
  13. “But you could also just not accept their content, or their photos, or their comments..”

    I think that’s a totally different ball game – not accepting customisation through real code has nothign to do with censorship.

    I entirely agree that making “pretty profiles” is the attraction of MySpace and its ilk, but as it’s been mentioned, there are much more secure ways of going about it – allowing real-code submission is just asking for too much trouble.

    Copy & paste the code below to embed this comment.
  14. this is important stuff, blogging communities allow other users to view their code, but for web 2.0 and secure programs its important to know that the code can be hacker-proof.

    Copy & paste the code below to embed this comment.
  15. As well as allowing javascript URLs in CSS, IE also has a “feature” that lets properties be set using expressions, written in (of course) JavaScript. So you can use:

    <body [removed]alert(‘hi’));”>…</body>

    Just something else to watch out for…

    Copy & paste the code below to embed this comment.
  16. I think MySpace giving their users freedom to edit their templates CSS was a huge mistake. Sure, the default theme is ugly but the things that the majority of people do to their MySpaces is much worse. They just destroy them beyond any level of readability or sanity.

    MySpace would be nicer place without theme editing. Pure Volume is proof of this.

    Copy & paste the code below to embed this comment.
  17. Your article makes a good case for the security codes necessary to hold back abuse of the system. The system still needs to be refined so that legitimate users are not kept out in the same stroke we use to stop abusers. Thanks for raising the topic so well…

    Copy & paste the code below to embed this comment.
  18. I know the article is about XSS, but the example used points to another problem with a lot of these types of sites, not using a validation scheme for the ‘voting’ script, or scripts that control other types of changes.

    A simple check for a valid random unique id in voteOnAuser.php would kill any chance of a XSS vulnerability such as this from having any effect because the ‘vote’ would automaticaly be rejected.

    And a big applause to #28, if you’re going to allow customization then by all means have complete control over the code yourself.

    Copy & paste the code below to embed this comment.
  19. I sure there are two clear strategies to prevent XSS attacks:
    1. Format using tidy and then remove anything unexpected – leave only basic set of tags.

    2. Separate administrating and displaying markup content on different sites (like blogger.com does).

    Correct me if I wrong.

    Copy & paste the code below to embed this comment.
  20. Instead of
    one should use
    ;) Makes things valid as well :D

    Copy & paste the code below to embed this comment.
  21. All that customization of web application reminds me of an article on network security. 

    Pick your poison, do you want to restrict the user to a know and limited set of features or chase all the hacks that people will find? 

    With the first approach you define and design it once; the other approach is a never ending race to ensure your application is safe from the known tricks and hacks.

    Ok, it takes more time to develop the white list; most certainly feels more restrictive from a user standpoint but I guess it depends if you prefer to appear on the front page because you’re application is great or because somebody took control of other people’s account. 

    Maybe I’m paranoid, but I’d rather spend time expending my applications then fixing my damaged reputation.

    Cheers!

    Copy & paste the code below to embed this comment.
  22. http://kevin.mesiab.com/wordpress/index.php/cragslist-vulnerability/

    Apparently they need to heed the message. ;)  At the time of writing, this hole has not been patched.

    Copy & paste the code below to embed this comment.
  23. The ad hoc “tricks”? the article prescribes can fall victim to clever attackers. For instance, if you were to use str_replace(”˜javascript’, ‘’, $html) your script would still be
    www.replicahours.com vulnerable to javasjavascriptcript (this is documented in the XSS cheatsheet posted above, excellent reading for anybody interested in HTML validation).

    Copy & paste the code below to embed this comment.