A List Apart

Menu
A random assortment of socks hangs from a line with a sneaky, slithery companion hiding among them. Will our hero discover the snake before it's too late?

Illustration by Dougal MacPherson

What I Talk About When I Talk About Sorting: Untangling Array#sort

Sorting things is a fundamental part of our daily lives—it’s something we do everyday to make our lives easier, following all kinds of criteria. Whether you’re looking for a person’s phone number, the location of your favorite book, or even matching up your socks, sorting allows us to find what we are looking for in a faster and more effective way.

Article Continues Below

This is also the case in the world of web development. But if you thought you knew exactly how JavaScript’s Array#sort works under the hood, think twice.

Scratchin’ the surface

No matter your skill level, if you’re a JavaScript developer you’ve probably come across the Array#sort method at some point. Do you remember the first time you tried sorting numbers in JavaScript? You were probably astonished to discover (just like the rest of us) that the sort method does NOT sort things quite as we might expect.  You don’t know what I’m talking about? Let’s dive into some code:

const myArray = [33, 2, 98, 25, 4]
myArray.sort() // [ 2, 25, 33, 4, 98 ]

Wait, what? Is JavaScript nuts? In which world are 25 and 33 smaller than 4? Before you start rethinking your whole life, let’s figure this out.

Lexicographical sorting

What is actually happening here is that JavaScript is sorting our numerical array in a lexicographical manner—think alphabetical order, where every value is treated as a string.

The catch here is that Array#sort can take a compare function as a parameter, but if you don’t supply it, “elements are sorted by converting them to strings and comparing strings in Unicode code point order” (according to the MDN docs). This means that JavaScript will treat the following arrays in a similar fashion when calling the sort method:

const numbers = [80, 9]
numbers.sort() // [80, 9]

const strings = ['80', '9']
strings.sort() // ['80', '9']

In this case, “80” comes before “9” because it has a smaller Unicode code point. If you don’t believe me, let’s take a look at the code point value of the first character of each:

"8".codePointAt(0)  // 56
"9".codePointAt(0)  // 57

Basically the function codePointAt() is simply a method of the String object that is used to get the Unicode code point value of any character at a given index.

At this point, the following code shouldn’t be shocking to anybody because now we know that JavaScript is just converting all the elements in those arrays to strings and comparing their Unicode values. (Yes, Emojis also have Unicode code point values.)

const emojis = ["😍","😂","😰"]
emojis.sort() // ["😂", "😍", "😰"]

const wtfJavaScript = [390, "😂", 1, "2325"]  
wtfJavaScript.sort() // [1, "2325", 390, "😂"]

Numerical sorting

After all that mumbo jumbo, what if we actually JUST wanted to sort our array numerically? As stated before, we need to provide a compare function that will sort the array elements according to the return value of that compare function.

  • If the return value of compareFunction(a, b) is less than 0, a will come before b.
  • If the return value is greater than 0, b will come before a.
  • If the return value is 0, a and b will remain unchanged.

To compare numbers instead of strings, provide a function that subtracts b from a. Here is an example:

const myArray = [33, 2, 98, 25, 4]
myArray.sort((a, b) => a - b) // [ 2, 4, 25, 33, 98 ]

Rollin’ in the deep

During all this JavaScript sorting fun, I bet you wondered at some point about the algorithm used by the native JavaScript sort method. No? That was just me? Either way, let’s check it out.

Now, here’s the thing: the ECMAScript standard doesn’t specify which algorithm should be used, so each JavaScript engine is allowed to use whatever algorithm it wants. Why should you care about this? Keep reading, but first let’s find out which engines use which algorithms. (The good thing is that most of these engines are open source so we can look at the code and check what they are using.)

How different JavaScript engines sort
JavaScript Engine Sort Algorithm(s)
SpiderMonkey (Mozilla) Insertion Sort (for short arrays)
Merge Sort
V8 (Google) Insertion Sort (for short arrays)
Quick Sort
Nitro (Apple) Merge Sort
Chakra (Microsoft) Quick Sort

Figure 1: As you can see, each JavaScript engine has its own algorithm for sorting.

This isn’t a computer science article, but let’s get some things straight. Because JavaScript engines don’t use the same algorithm behind the native method, it is very likely that you’ll encounter different results when running the sort method in different browsers. You might find that elements are being sorted in a different way, or some sorts run faster than others (depending on the JavaScript engine). If your application relies crucially on sorting data, you have to pay attention to these kinds of details.

For example, Google’s V8 that powers Chrome and NodeJS uses the quick sort algorithm, which is not a stable algorithm. Stability in the context of sorting means that it preserves the original order of the input set when having elements with equal values. If you have to sort a list of people in your database who were previously sorted alphabetically by last name, you might want to preserve that original order when sorting again, this time according to age and looking for people of the same age. This means you will need a stable sort.

Since each JavaScript engine implements the Array#sort method with different algorithms (that may or may not be stable), stability is not guaranteed. Let’s check an example:

const people = [
	{ name: 'Kei Akamatsu', age: 32 },
	{ name: 'Fumiaki Haida', age: 42 },
	{ name: 'Tengo Kawana', age: 26 },
	{ name: 'Sara Kimoto', age: 11 },
	{ name: 'Midori Kobayashi', age: 11 },
	{ name: 'Eri Kurono', age: 54 },
	{ name: 'Haruki Murakami', age: 6 },
	{ name: 'Satoru Nakata', age: 26 },
	{ name: 'Yoshio Oumi', age: 26 },
	{ name: 'Miss Saeki', age: 17 },
	{ name: 'Yuzuki Shirane', age: 26 },
	{ name: 'Kafka Tamura', age: 26 },
	{ name: 'Tsukuru Tazaki', age: 32 },
	{ name: 'Toru Watanabe', age: 12 }
]

people.sort((a, b) => a.age - b.age)

In V8, this is the result:

{ name: 'Haruki Murakami', age: 6 },
{ name: 'Midori Kobayashi', age: 11 },
{ name: 'Sara Kimoto', age: 11 },
{ name: 'Toru Watanabe', age: 12 },
{ name: 'Miss Saeki', age: 17 },
{ name: 'Kafka Tamura', age: 26 },
{ name: 'Satoru Nakata', age: 26 },
{ name: 'Yuzuki Shirane', age: 26 },
{ name: 'Yoshio Oumi', age: 26 },
{ name: 'Tengo Kawana', age: 26 },
{ name: 'Tsukuru Tazaki', age: 32 },
{ name: 'Kei Akamatsu', age: 32 },
{ name: 'Fumiaki Haida', age: 42 },
{ name: 'Eri Kurono', age: 54 }

Notice how the previous alphabetical order for the people who are 26 years old is not preserved? In JavaScript engines implemented with stable algorithms, this is not a problem.  Just keep in mind that the sort method will sort differently depending on where you’re running it.

What if it’s crucial for your application to maintain consistent behavior across engines? Is implementing your own version of the sort method even an option? Maybe yes, maybe no. If stability and performance are high on your priority list, it’s probably yes.

Actually, implementing your own homemade JavaScript sorting function is not that difficult and takes only a few lines of code. There are plenty of books that explain how to implement the most popular sorting algorithms. Two good resources are The Algorithm Design Manual by Steven S. Skiena and Foundations of Algorithms by Richard Neapolitan et al.

You can even extend the Array prototype to define your shiny new implemented sort functions:

Array.prototype.InsertionSort = function() {
	/* your implementation here */
}

Array.prototype.MergeSort = function() {
	/* your implementation here */
}
   
Array.prototype.QuickSort = function() {
	/* your implementation here */
}
  
myArray.InsertionSort()
myArray.MergeSort()
myArray.QuickSort()

Believe it or not, self-implemented JavaScript sorting functions can be faster than the native method, though it depends on various things, such as the amount of space you have available, the kind and quantity of data you are sorting, and the JavaScript engine you are using.

Testing and benchmarking

Too hard to believe? Let’s do some benchmarking! The following table shows the results of testing the native JavaScript sort method against my own implementations of insertion sort, merge sort, and quick sort for dynamically created arrays with x elements. The values represent the operations per second done by each method:

Operations per second
10
Elements in Array
100 1000 100,000 1,000,000 10,000,000
Array#sort 1,967,829 127,999 5,601 8.61 1 0.08
Insertion Sort 28,520,017 (fastest) 4,014,363 (fastest) 314,407 (fastest) 0.19 (slowest) - -
Merge Sort 109,299 (slowest) 10,559 (slowest) 711 (slowest) 6.16 0.45 (slowest) 0.05 (slowest)
Quick Sort 880,522 96,941 6,316 35.75 (fastest) 1.98 (fastest) 0.18 (fastest)

Figure 2: Operations per second by different sort implementations. Fastest results are shown in bold and slowest results are shown in italics.

As mentioned before, the quantity of data to be sorted directly impacts the performance of every algorithm. Notice how insertion sort performs better than the other methods (including the native sort) for the first thousand elements. As the data input increases, insertion sort becomes slower, and for a hundred thousand elements it becomes the least performant. At this point, quick sort takes the lead. For the remaining test cases, it continues to be more performant than the native version. (It is very important to clarify that the previous benchmark was tested in Chrome 56 on macOS 10.12.3.)

Your homework now is to perform the same benchmark on different machines, with different input sizes and different JavaScript engines to see what results you get!

JavaScript inception

I might not be a fortune teller, but I bet you’re probably thinking: “How come self-implemented JavaScript sorting functions can beat the native sort C/C++ implementations?”

First of all, let’s backtrack a little bit. C/C++ implementations? Are we even sure about that?

If you peeked at the source code of the JavaScript engines, perhaps you noticed that in the case of V8 and Nitro, the implementation of the sort method is actually done in JavaScript itself. Wait again, what? Am I saying that those JavaScript engines are written in JavaScript? Is this some kind of JavaScript inception? Yes, indeed.

In the world of computer science this is actually called self hosting: the art of implementing parts of a language in that very language itself. But then again, isn’t C++ supposed to be faster than JavaScript? Yes and no. C++ programs are definitely faster than JavaScript when they operate on C++ data structures but not on JavaScript ones.

JavaScript arrays have methods like forEach, map, reduce, and of course sort. Each one takes a callback as an argument. The method then iterates over every element of the list and invokes the callback during each iteration. This means that the execution has to switch between compiled C++ code and interpreted JavaScript, and this context switch is expensive. By staying in the same execution context within the same language, we can boost performance. If you need some proof, try comparing the performances of Array#forEach and a simple for loop.

In certain cases, you may notice that an engine’s Array#sort method implementation causes unneeded overhead. Calling callback functions for every element in our array adds an extra layer of complexity to the task of simply sorting numbers. It seems that the native implementations just do more overall (in terms of error handling and features) than the simple self-implementations.

TL;DR

Yes, most people are aware of Array#sort, its funky behavior, and the fact that you’ll need a compare function to achieve a numerical sort. Because JavaScript. But did you know that the algorithms used to natively implement this method vary across engines? And that this produces different results depending on where you are running it?

If you care about performance and consistency, creating your own self-implementation of the sort method might not be a wild idea. Of course, you should learn how to choose your battles depending on the needs of your applications, whether you’re handling complex databases in a NodeJS application or sorting in the front end. Don’t implement your own version for the sake of it, but take time to level the pros and cons. Only you can decide how pragmatic it is to spend time creating a homemade sort algorithm, as opposed to the native one.

About the Author

9 Reader Comments

Load Comments