Taming Data with JavaScript

I love data. I also love JavaScript. Yet, data and client-side JavaScript are often considered mutually exclusive. The industry typically sees data processing and aggregation as a back-end function, while JavaScript is just for displaying the pre-aggregated data. Bandwidth and processing time are seen as huge bottlenecks for dealing with data on the client side. And, for the most part, I agree. But there are situations where processing data in the browser makes perfect sense. In those use cases, how can we be successful?

Article Continues Below

Think about the data#section2

Working with data in JavaScript requires both complete data and an understanding of the tools available without having to make unnecessary server calls. It helps to draw a distinction between trilateral data and summarized data.

Trilateral data consists of raw, transactional data. This is the low-level detail that, by itself, is nearly impossible to analyze. On the other side of the spectrum you have your summarized data. This is the data that can be presented in a meaningful and thoughtful manner. We’ll call this our composed data. Most important to developers are the data structures that reside between our transactional details and our fully composed data. This is our “sweet spot.” These datasets are aggregated but contain more than what we need for the final presentation. They are multidimensional in that they have two or more different dimensions (and multiple measures) that provide flexibility for how the data can be presented. These datasets allow your end users to shape the data and extract information for further analysis. They are small and performant, but offer enough detail to allow for insights that you, as the author, may not have anticipated.

Getting your data into perfect form so you can avoid any and all manipulation in the front end doesn’t need to be the goal. Instead, get the data reduced to a multidimensional dataset. Define several key dimensions (e.g., people, products, places, and time) and measures (e.g., sum, count, average, minimum, and maximum) that your client would be interested in. Finally, present the data on the page with form elements that can slice the data in a way that allows for deeper analysis.

Creating datasets is a delicate balance. You’ll want to have enough data to make your analytics meaningful without putting too much stress on the client machine. This means coming up with clear, concise requirements. Depending on how wide your dataset is, you might need to include a lot of different dimensions and metrics. A few things to keep in mind:

Is the variety of content an edge case or something that will be used frequently? Go with the 80/20 rule: 80% of users generally need 20% of what’s available.
Is each dimension finite? Dimensions should always have a predetermined set of values. For example, an ever-increasing product inventory might be too overwhelming, whereas product categories might work nicely.
When possible, aggregate the data—dates especially. If you can get away with aggregating by years, do it. If you need to go down to quarters or months, you can, but avoid anything deeper.
Less is more. A dimension that has fewer values is better for performance. For instance, take a dataset with 200 rows. If you add another dimension that has four possible values, the most it will grow is 200 * 4 = 800 rows. If you add a dimension that has 50 values, it’ll grow 200 * 50 = 10,000 rows. This will be compounded with each dimension you add.
In multidimensional datasets, avoid summarizing measures that need to be recalculated every time the dataset changes. For instance, if you plan to show averages, you should include the total and the count. Calculate averages dynamically. This way, if you are summarizing the data, you can recalculate averages using the summarized values.

Make sure you understand the data you’re working with before attempting any of the above. You could make some wrong assumptions that lead to misinformed decisions. Data quality is always a top priority. This applies to the data you are both querying and manufacturing.

Never take a dataset and make assumptions about a dimension or a measure. Don’t be afraid to ask for data dictionaries or other documentation about the data to help you understand what you are looking at. Data analysis is not something that you guess. There could be business rules applied, or data could be filtered out beforehand. If you don’t have this information in front of you, you can easily end up composing datasets and visualizations that are meaningless or—even worse—completely misleading.

The following code example will help explain this further. Full code for this example can be found on GitHub.

Our use case#section3

For our example we will use BuzzFeed’s dataset from “Where U.S. Refugees Come From—and Go—in Charts.” We’ll build a small app that shows us the number of refugees arriving in a selected state for a selected year. Specifically, we will show one of the following depending on the user’s request:

total arrivals for a state in a given year;
total arrivals for all years for a given state;
and total arrivals for all states in a given year.

The UI for selecting your state and year would be a simple form:

The code will:

Send a request for the data.
Convert the results to JSON.
Process the data.
Log any errors to the console. (Note: To ensure that step 3 does not execute until after the complete dataset is retrieved, we use the then method and do all of our data processing within that block.)
Display results back to the user.

We do not want to pass excessively large datasets over the wire to browsers for two main reasons: bandwidth and CPU considerations. Instead, we’ll aggregate the data on the server with Node.js.

Source data:

[{"year":2005,"origin":"Afghanistan","dest_state":"Alabama","dest_city":"Mobile","arrivals":0},
{"year":2006,"origin":"Afghanistan","dest_state":"Alabama","dest_city":"Mobile","arrivals":0},
... ]

Multidimensional Data:

[{"year": 2005, "state": "Alabama","total": 1386}, 
 {"year": 2005, "state": "Alaska", "total": 989}, 
... ]

Transactional Details show several items with Year, Origin, Destination, City, and Arrivals. This is filtered through semi-aggregate data: By Year, By State, and Total. In the final column, we see a table with the fully composed data resulting from running the Transactional Details through the semi-aggregate data.

How to get your data structure into place#section4

AJAX and the Fetch API#section5

There are a number of ways with JavaScript to retrieve data from an external source. Historically you would use an XHR request. XHR is widely supported but is also fairly complex and requires several different methods. There are also libraries like Axios or jQuery’s AJAX API. These can be helpful to reduce complexity and provide cross-browser support. These might be an option if you are already using these libraries, but we want to opt for native solutions whenever possible. Lastly, there is the more recent Fetch API. This is less widely supported, but it is straightforward and chainable. And if you are using a transpiler (e.g., Babel), it will convert your code to a more widely supported equivalent.

For our use case, we’ll use the Fetch API to pull the data into our application:

window.fetchData = window.fetchData || {};
  fetch('./data/aggregate.json')
  .then(response => {
      // when the fetch executes we will convert the response
      // to json format and pass it to .then()
      return response.json();
  }).then(jsonData => {
      // take the resulting dataset and assign to a global object
      window.fetchData.jsonData = jsonData;
  }).catch(err => {
      console.log("Fetch process failed", err);
  });

This code is a snippet from the main.js in the GitHub repo

The fetch() method sends a request for the data, and we convert the results to JSON. To ensure that the next statement doesn’t execute until after the complete dataset is retrieved, we use the then() method and do all our data processing within that block. Lastly, we console.log() any errors.

Our goal here is to identify the key dimensions we need for reporting—year and state—before we aggregate the number of arrivals for those dimensions, removing country of origin and destination city. You can refer to the Node.js script /preprocess/index.js from the GitHub repo for more details on how we accomplished this. It generates the aggregate.json file loaded by fetch() above.

Multidimensional data#section6

The goal of multidimensional formatting is flexibility: data detailed enough that the user doesn’t need to send a query back to the server every time they want to answer a different question, but summarized so that your application isn’t churning through the entire dataset with every new slice of data. You need to anticipate the questions and provide data that formulates the answers. Clients want to be able to do some analysis without feeling constrained or completely overwhelmed.

As with most APIs, we’ll be working with JSON data. JSON is a standard that is used by most APIs to send data to applications as objects consisting of name and value pairs. Before we get back to our use case, let’s look at a sample multidimensional dataset:

const ds = [{
  "year": 2005,
  "state": "Alabama",
  "total": 1386,
  "priorYear": 1201
}, {
  "year": 2005,
  "state": "Alaska",
  "total": 811,
  "priorYear": 1541
}, {
  "year": 2006,
  "state": "Alabama",
  "total": 989,
  "priorYear": 1386
}];

With your dataset properly aggregated, we can use JavaScript to further analyze it. Let’s take a look at some of JavaScript’s native array methods for composing data.

How to work effectively with your data via JavaScript#section7

`Array.filter()`#section8

The filter() method of the Array prototype (Array.prototype.filter()) takes a function that tests every item in the array, returning another array containing only the values that passed the test. It allows you to create meaningful subsets of the data based on select dropdown or text filters. Provided you included meaningful, discrete dimensions for your multidimensional dataset, your user will be able to gain insight by viewing individual slices of data.

ds.filter(d => d.state === "Alabama");

// Result
[{
  state: "Alabama",
  total: 1386,
  year: 2005,
  priorYear: 1201
},{
  state: "Alabama",
  total: 989,
  year: 2006,
  priorYear: 1386
}]

`Array.map()`#section9

The map() method of the Array prototype (Array.prototype.map()) takes a function and runs every array item through it, returning a new array with an equal number of elements. Mapping data gives you the ability to create related datasets. One use case for this is to map ambiguous data to more meaningful, descriptive data. Another is to take metrics and perform calculations on them to allow for more in-depth analysis.

Use case #1—map data to more meaningful data:

ds.map(d => (d.state.indexOf("Alaska")) ? "Contiguous US" : "Continental US");

// Result
[
  "Contiguous US", 
  "Continental US", 
  "Contiguous US"
]

Use case #2—map data to calculated results:

ds.map(d => Math.round(((d.priorYear - d.total) / d.total) * 100));

// Result
[-13, 56, 40]

`Array.reduce()`#section10

The reduce() method of the Array prototype (Array.prototype.reduce()) takes a function and runs every array item through it, returning an aggregated result. It’s most commonly used to do math, like to add or multiply every number in an array, although it can also be used to concatenate strings or do many other things. I have always found this one tricky; it’s best learned through example.

When presenting data, you want to make sure it is summarized in a way that gives insight to your users. Even though you have done some general-level summarizing of the data server-side, this is where you allow for further aggregation based on the specific needs of the consumer. For our app we want to add up the total for every entry and show the aggregated result. We’ll do this by using reduce() to iterate through every record and add the current value to the accumulator. The final result will be the sum of all values (total) for the array.

ds.reduce((accumulator, currentValue) => 
accumulator + currentValue.total, 0);

// Result
3364

Applying these functions to our use case#section11

Once we have our data, we will assign an event to the “Get the Data” button that will present the appropriate subset of our data. Remember that we have several hundred items in our JSON data. The code for binding data via our button is in our main.js:

 document.getElementById("submitBtn").onclick =
  function(e){
      e.preventDefault();
      let state = document.getElementById("stateInput").value || "All"
      let year = document.getElementById("yearInput").value || "All"
      let subset = window.fetchData.filterData(year, state);
      if (subset.length == 0  )
        subset.push({'state': 'N/A', 'year': 'N/A', 'total': 'N/A'})
      document.getElementById("output").innerHTML =
      `<table class="table">
        <thead>
          <tr>
            <th scope="col">State</th>
            <th scope="col">Year</th>
            <th scope="col">Arrivals</th>
          </tr>
        </thead>
        <tbody>
          <tr>
            <td>${subset[0].state}</td>
            <td>${subset[0].year}</td>
            <td>${subset[0].total}</td>
          </tr>
        </tbody>
      </table>`
  }

The final output once our code is applied

If you leave either the state or year blank, that field will default to “All.” The following code is available in /js/main.js. You’ll want to look at the filterData() function, which is where we keep the lion’s share of the functionality for aggregation and filtering.

// with our data returned from our fetch call, we are going to 
// filter the data on the values entered in the text boxes
fetchData.filterData = function(yr, state) {
  // if "All" is entered for the year, we will filter on state 
  // and reduce the years to get a total of all years
  if (yr === "All") {
    let total = this.jsonData.filter(
      // return all the data where state
      // is equal to the input box
      dState => (dState.state === state)
        .reduce((accumulator, currentValue) => {
          // aggregate the totals for every row that has 
          // the matched value
          return accumulator + currentValue.total;
        }, 0);

    return [{'year': 'All', 'state': state, 'total': total}];
  }

  ...

  // if a specific year and state are supplied, simply
  // return the filtered subset for year and state based 
  // on the supplied values by chaining the two function
  // calls together 
  let subset = this.jsonData.filter(dYr => dYr.year === yr)
    .filter(dSt => dSt.state === state);

  return subset; 
};

// code that displays the data in the HTML table follows this. See main.js.

When a state or a year is blank, it will default to “All” and we will filter down our dataset to that particular dimension, and summarize the metric for all rows in that dimension. When both a year and a state are entered, we simply filter on the values.

We now have a working example where we:

Start with a raw, transactional dataset;
Create a semi-aggregated, multidimensional dataset;
And dynamically build a fully composed result.

Note that once the data is pulled down by the client, we can manipulate the data in a number of different ways without having to make subsequent calls to the server. This is especially useful because if the user loses connectivity, they do not lose the ability to manipulate the data. This is useful if you are creating a progressive web app (PWA) that needs to be available offline. (If you are not sure if your web app should be a PWA, this article can help.)

Once you get a firm handle on these three methods, you can create just about any analysis that you want on a dataset. Map a dimension in your dataset to a broader category and summarize using reduce. Combined with a library like D3, you can map this data into charts and graphs to allow a fully customizable data visualization.

Conclusion#section12

This article gives a better sense of what is possible in JavaScript when working with data. As I mentioned, client-side JavaScript is in no way a substitute for translating and transforming data on the server, where the heavy lifting should be done. But by the same token, it also shouldn’t be completely ruled out when datasets are treated properly.

3 Reader Comments

Brian Greig says:

December 20, 2018 at 9:00 pm

Much love to everyone who read, shared, and provided feedback on this article. Reminder: the code for this demo is available on Github:

https://github.com/ignoreintuition/tamingData

Happy coding.
Sebastien Tarnowski says:

December 21, 2018 at 4:16 am

“80% of users generally need 20% of what’s available”
The reality behind everything in this world…
Highlandspring says:

January 7, 2019 at 7:39 am

As always – data quality is key – if your data is flawed so will be the output.

Got something to say?

We have turned off comments, but you can see what folks had to say before we did so.

Taming Data with JavaScript

Think about the data#section2

Our use case#section3

How to get your data structure into place#section4

AJAX and the Fetch API#section5

Multidimensional data#section6

How to work effectively with your data via JavaScript#section7

`Array.filter()`#section8

`Array.map()`#section9

`Array.reduce()`#section10

Applying these functions to our use case#section11

Conclusion#section12

Like this:#section13

3 Reader Comments

Got something to say?

More from ALA

To Ignite a Personalization Practice, Run this Prepersonalization Workshop

The Wax and the Wane of the Web

Opportunities for AI in Accessibility

I am a creative.

Humility: An Essential Value

Think about the data#section2

Our use case#section3

How to get your data structure into place#section4

AJAX and the Fetch API#section5

Multidimensional data#section6

How to work effectively with your data via JavaScript#section7

Array.filter()#section8

Array.map()#section9

Array.reduce()#section10

Applying these functions to our use case#section11

Conclusion#section12

Like this:#section13

3 Reader Comments

Got something to say?

More from ALA

`Array.filter()`#section8

`Array.map()`#section9

`Array.reduce()`#section10