A Complete Guide to Data Layers

All the information you need to understand how data layers work, their benefits and how to create them.

Download the guide

Don't have time to read it now, then download the PDF version for later.

If you’re working in digital, whether that’s in marketing, analytics or web development, it’s very likely that you have heard the increasing use of term “data layer” recently. Especially in relation to Tag Management Systems, and the implementation of various web technologies using these systems.  

Data layers make data collection easier, more efficient, more robust and more reliable when used as the source of the website usage and performance data. This gives marketers additional data to inform decisions and appraise the effectiveness of websites.  

Standard analytics packages, once implemented on a site, will collect some data by default. Using a data layer allows you to combine the information that is recorded by various tracking solutions that may be in place on a site (i.e. Google Analytics and a Facebook pixel). 

Additional information that can be used to show the effectiveness of our marketing activity, properly understand our customers actions and base future decisions on is something we all crave. This guide will give you a complete understanding of how data layers work, how they can be used and how you can benefit from them.

So, what is a data layer?

A data layer, to put it simply, is a list of your business requirements as data points in a format that can be read as a technical specification by your IT team and interpreted/read by a client-side script.  

The data layer acts as the single point of reference for the data that needs to be collected by your Tag Management System (if relevant) and sent to any third-party technologies that are implemented across your digital estate.  

For the less tech savvy, this means you specify the information you need and want to collect from how your website is used. The data layer is then used to collect all of this and provide it to other programs that allow you to visualise and analyse the data (I.e. your analytics or reporting software). 

Get data collection and analysis support

If you’re looking to improve your online marketing results by making more informed decisions, we can help you accurately collect and analyse the most meaningful data.

A brief history

Around 10 years ago, Yard created a Tag Management System called SiteTagger. Back then, none of the clients being serviced by SiteTagger were using a data layer. All the data being sent to third parties was collected using a method we used to call “screen scraping”.  

This meant that we used to write a lot of custom JavaScript to grab the information we needed from the HTML of the webpage.  

The clients saw the benefits of us implementing technology like this, the main ones being that their IT team didn’t need to lift a finger and we could start the implementation right away. No need to wait for any releases over and above the addition of SiteTagger to the site.  

There were (and still are) drawbacks to this implementation method. Being aware of these imperfections allows a greater appreciation of the benefits of a data layer. The challenges of screen scraping include: 

Lengthier implementation process

A lot of code needed to be written, especially for complex implementations like full analytics deployment. This could easily reach thousands of lines in total. 

The code written also had to be cross-browser compatible. Given the quantities of code written, there could be native functions used that weren’t compatible in versions of Internet Explorer, for example. This meant that testing took a lot longer to ensure that the code worked properly in all browsers. 

For example, let’s look at grabbing the price of a product on a product page where the element we need doesn’t have an ID: 

(function() { 
    var product_details = document.getElementById("product-details"); 
    var divs = product_details.getElementsByTagName("div"); 
    var price = ""; 
    for (var i=0; i < divs.length; i++) { 
        if (divs[i].className === "product-price") { 
            price = divs[i].getElementsByTagName("p")[0].innerHTML.replace("£", ""); 
        } 
    } 
})(); 

Now, let’s imagine that a similar amount of code would be required to capture all the required information about the product, like product name, colour, brand, etc. You can see how the lines of code would begin to mount up.  

On top of that, we could also find that the mobile version of the website was totally different. In which case, you’d need a similar amount of code again. So, you’d have two different blocks of code, for the same page; one for mobile and one for desktop.  

Data not available everywhere

If data was required to be sent on page B, but this information was only available client-side on page A, we had to grab the information on page A, write it into a cookie and then read the cookie on page B. 

There were instances where some information was not available to us client-side at all. Information that was only stored on the server, for example a customer ID. This meant that some requirements could not be met.

If the information wasn’t in the HTML (visible or otherwise) then we couldn’t capture it. 

Website releases could break the code

When an HTML page is loaded by a browser, it creates a Document Object Model (DOM) of a page. This treats HTML elements as objects and arranges them as a tree-map. 

Screen scraping required trawling the DOM for information and getting text from within HTML elements.  

As a result, code had to be written that referenced these elements. If the web developers changed the attribute of a given element that we were looking for, our code would no longer return anything, or return incorrect information.  

If we look at the above code example for getting the price of a product, there are four points of reference we’re using to get the price:  

  1. An element with the ID “product-details 
  2. The divisions (<div>) within this element  
  3. A specific div with the class name “product-price”  
  4. A paragraph (<p>) element within this div.  

If any of these should change, for example the first ID, the class name of the div – changing the paragraph element to a span element, then our code would no longer return the price.  

New website pages would need to be accounted for too, if a new page type is released then code will need to be added to add tracking for this page type.  

Impact of HTML scraping

The drawbacks of the “HTML scraping” method meant that an increasing number of clients were looking for more robust ways of capturing data from their websites.  

In 2013, a number of contributors from various companies created version 1.0 of the W3C standard Customer Experience Digital Data Layer. This is a comprehensive guide to how a data layer should be created and what should be contained within it.  

At the time, this was the go-to resource for digital data layers, their design and implementation. More and more companies were adopting this standardised approach to data layers and realising their importance.  

However, it’s 6 years old at the time of writing and companies are now looking for a more bespoke data layer, designed for their specific needs.  

Still collecting data using HTML scraping?

Our team are ready to help you move to using a data layer for better quality MI.

Benefits of today’s data layers

Currently, as part of every new implementation we do, we recommend a data layer specification and design piece which is bespoke to the client.  

We also do this for clients who have existing implementations who want to migrate their old “HTML scraping” implementation to use a data layer. We do this for two key reasons. 

Implementations are more efficient

This applies to both cost, and page load speeds.  

The cost of us creating a data layer specification for their IT team to implement far outweighs the additional cost of implementation without a data layer using the HTML scraping method. 

Regarding page load speeds, the amount of code we need to write which is subsequently delivered to the client’s machine, is greatly reduced. Let’s return to the example above, which collected the price on a product page. The snippet of code used for this was 336 bytes. If the product page had a data layer, the code might look something like: 

var price = digitalData.productDetails.price;

This snippet of code would do the exact same thing and is only 45 bytes – an 87% reduction.  This is for a single variable, if you consider this over the whole implementation, you can see how significant this reduction is.  

Implementations are more robust

With a data layer, page updates can be made without risk of data loss (as long as the data layer remains as-was). Different page types don’t need different code either.  

For example, a mobile version of a page might be totally different to the desktop version of the same page. This means that the old HTML scraping method would have two sections of code to capture data for the same page as mentioned above. However, with a data layer, provided it was the same on mobile and desktop (which it should be) the same line of code would work for both, regardless of page structure.  

The fact that less code needs to be written for implementations also means that any debugging is easier and there is less chance of errors occurring.  

How to create and implement a data layer

The first step is to gather all your requirements. This requires determining your KPIs.  

Once KPIs are defined you can consider, for each page view or event, what kinds of data points you need to collect. With that in place, you can design your solution.

Download the guide

Don't have time to read it now, then download the PDF version for later.

Designing a data layer

Now you’re aware of the benefits of a data layer, read on for a step-by-step guide to creating a bespoke solution to your data gathering requirements. 

 

KPIs and requirements

The first and most important task is to gather the business requirements. This means you have to determine your KPIs.  

Once KPIs are defined you can consider, for each page view or event, what kinds of data points you need to collect 

Without fully documented requirements, the data captured by the data layer might be random, might not cover all requirements and would result in a data layer that is initially not fit for purpose.

Who should be involved?

This will obviously be different for each company, but it should be any individual or team that has any vested interest in the ingestion of the data, or those responsible for its design and implementation. The teams we usually meet when designing data layers for clients include:  

  • Analytics/Data and Insight
  • Marketing 
  • Data Science 
  • Business Intelligence 
  • IT/Developers 
  • Trading  

We’ve also found that many other teams and individuals become interested in this part of the project. Understanding customers, improving product and service offerings and evaluating the impact of work is key for people across all departments of a company being successful.  

Therefore, it’s worth making sure that the wider business is at least aware of the plan so the opportunity for input is available to all.

Data layer format

The next step is to consider the design approach. The developers should be involved in the decision-making process here. This is a technical requirement and should therefore be the responsibility of IT and development teams to decide on how the data layer should look.  

The main considerations are whether it should be a flat data layer, or a multi-level object. There are different schools of thought about which is better. Tealium recommends a flat structure, whereas the current W3C standard is multi-level.  

A flat data layer means all the object properties will be a direct property of the variable, and a multi-level data layer will have multiple objects or “levels” for different groups of data. For example: 

var digitalData = { 

    property1 : "value1", 

    property2 : "value2", 

    property3 : "value3", 

    property4: "value4" 

} 

Multi-level object: 

var digitalData = { 

    level_1a : { 

        property1 : "value1", 

        property2 : "value2", 

        property3 : "value3", 

        property4: "value4" 

    }, 

    level_1b : { 

        property1 : "value1", 

        property2 : "value2", 

        property3 : "value3", 

        property4: "value4" 

    } 

} 

There are pros and cons to each, for example:

Multi-level data layers

Pros 

  • Very structured- all related data points are within their own sub-object so it’s easy to follow and interpret
  • Arguably preferred by developers
  • Look-up speed may be quicker, but if this is true it will be negligible  

Cons 

  • Errors occur more easily- usually when referencing a child of an object that doesn’t exist  

Flat data layers

Pros 

  • Less chances of an error- referencing a child of the main object will not cause an error  

Cons 

  • Less structured – if you don’t know the name of the key you’re looking for, it can take a while to find it, especially if the object is large
  • Look-up speed may be slower but as above, if this is true it will be negligible 
  • Related data points aren’t usually grouped 

At Yard, we usually provide a specification that is multi-level unless it’s for a client using TealiumIQ. TealiumIQ works better with a flat object format. In which case we provide a flat data layer. For any other Tag Management System, we provide a multi-level format that is universal, even outside of Tag Management Systems.

Rhys Hogsden, Technical Services Manager

What to do about empty values?

Not all data points will be relevant for all pages. Therefore, certain instances occur where a data point or object does not necessarily belong on a particular page resulting in empty values.

We’ve worked with data layers where all data points were always there but with empty values as well as with data layers where irrelevant data points for certain pages were omitted.

This is entirely dependent on which option is best for the developers. However, if all the values are there all of the time, there is less chance of a JavaScript error.

Using this approach also means you don’t need to check for the existence of objects before referencing their properties, so it’s worth considering.

The downside adopting this method of having all values there in the data layer on all pages is that it may take the developers longer to do.

Adding data points

Once you have all the business requirements and the design has been chosen, then it’s time to decide which data points should be added.

These should cover all the current business requirements, as well as considering potential future business requirements.

adding data points

There are some variables that we believe are always relevant, regardless of industry. We recommend always including the following variables by default:

  • Page URL
  • Page path
  • Referrer
  • Page category
  • Campaign code

Some developers will note that some of these are available easily using JavaScript’s “location” or “document” object. However, the purpose of this is to have a single source for all required data so it makes sense to include these.

For the custom data there is a broader set of considerations. If a requirement or KPI is complex, then the chances are that measuring performance will require several data points. One such example is measuring the performance of a carousel on a homepage.

The above image shows the carousel on our test e-commerce website. With a “vague” requirement like measuring the performance of a carousel, there are a number of different ways you could measure this.

Performance measurement examples

  • Total number of CTA clicks
  • Number of CTA clicks per banner
  • Most popular CTA position/banner (1, 2 or 3)
  • Number of interactions with carousel navigations buttons

Based on this, you know that there are multiple data points that should be included either as part of the event on the same page, or on the following page.

Let’s take the following page as an example. The following data points could be included to meet the above performance points:

{
	carousel_clicked : true,
	banner_title: "The Yard Sale",
	banner_id : "tys_1234",
	banner_position : 1,
	CTA_text : "Find out more"
}

With this information, we can easily capture the data that would be needed to answer several different performance questions related to the home page banner.

The above should give you an idea about the considerations for what kinds of data points are needed to answer business requirements.

Despite it generally being better to have thorough and specific requirements, they’re sometimes vague for a number of reasons. When they’re vague, the best thing to do is to list potential questions that could be asked about the requirement and add data points that could answer each of the questions, as above.

Data layers for single page applications

Possibly the biggest change since the W3C spec was released is the increasing popularity of single page applications for improved user experience.

Single-page applications are web apps that serve users a single HTML page and then dynamically update that page as a user interacts with the app. Facebook is a good example of this.

There are a few ways that single page applications can be tracked. The most important thing to do though, is to trigger an event to say that a new page has been loaded. That’s because modern single page applications don’t use “document fragments” (or hash change events) to detect newly loaded content.

Therefore, this can be handled by using a JavaScript custom event (or a jQuery custom trigger) when a new page has finished loading, and the data layer has been updated.

Along with the trigger, there are two approaches that could be taken. There should always be just one trigger, with a consistent name, that executes for each page load after the data layer has been updated.

However, you could just have the trigger by itself, and read the data layer to identify the page that has been navigated to. For example (using jQuery):

jQuery( document ).trigger( "new_page_loaded" );

You can then bind to (or “listen out” for) this custom event to detect that a new page has been loaded. As long as the data layer has been updated by the time this trigger is executed, you can check to see what kind of page has been loaded.

Alternatively, you could pass the page information as event data as part of the trigger. For example (again, using jQuery):

jQuery( document ).trigger( "new_page_loaded" , {
	page_type : "product page"
});

This way, the check can be done as part of the trigger binding, and if it’s the page you’re looking for then you can read the data layer variables required for that page.

Handling dynamic events

As well as single page applications, there are dynamic events that occur within pages that wouldn’t be considered a new page view. These are usually events that are triggered after some sort of user interaction with a page. Some examples could be applying filters to search results or adding a product to the cart.

When these events happen, it’s a good idea to consider adding custom triggers to dispatch when the AJAX request has completed. We always recommend this over binding to clicks (or any other JavaScript event types), especially as this is more suited to improved data collection from visitors to a site using a mobile device.

It’s best that the trigger data contains all the information related to that event. The reason for this is multiple events can happen on the same page, making the identification of the correct information for the event more challenging.

This is particularly the case if you attempt to update the data layer specifically for the event and the event updates aren’t added to the data layer in order.

The below code is an example snippet that could be sent when an add to cart happens.

jQuery( document ).trigger('add_to_cart', {
	product_name : "plain t-shirt",
	colour : "white",
	price : 10.00,
	parent_sku : "12345",
	variant_sku : "pwt",
	full_sku : "12345-pwt",
	product_review_status : "4.8"
});

We could bind to the above custom trigger and read the event data associated with it. This way, there will be no confusion as to what data and values are associated with the event.

Download the guide

Don't have time to read it now, then download the PDF version for later.

The Yard approach

As you can probably tell, there is no single data layer model that will work for any two different sites. Of course, there are data points that will be relevant across different sites, even in different verticals/industries. However, it is a piece of work that is bespoke for each website.

Every company or website will have different KPIs, different measurement requirements and different business objectives that need to be taken into consideration when a data model is designed and created. A good example of handling the data and measurement requirements across multiple sites, that could also be analysed collectively can be seen in this case study of our work with the Dining Club Group.

Whenever we are asked to create a data layer specification, our first priority is to meet with the client. Depending on the number of stakeholders or teams involved in the project, this may be one meeting with all stakeholders or require a series of meetings with different teams in the business.

The purpose of these meetings is for us to get a deep understanding of the business, its KPIs and the requirements. These meetings are almost always more effective when held face-to-face.

Meeting in person not only helps build rapport and trust of the wide-ranging stakeholders for these projects, they also result in more effective communication. As such, we get rich information from attendees and greater insights into the requirements.

It’s important for us to get as many requirements as possible as this enables us to include more in the first draft of the data layer. It also means we get a better understanding of the kinds of requirements the business has, so we can include some additional data layer points we think may be of value in future.

A continuous process

We don’t believe that the data layer should ever really be “finished”, depending on the company’s resource and the speed of maturity, the data layer should be updated often with new requirements over and above what Yard has added.

It’s essential that the data layer specification design is a collaborative effort. We will collect the requirements and create the data model based on these. As much as we gather large amounts of information from our clients at the start of these projects, we’re realistic enough   to know we’re unlikely to come away from the sessions with the same amount of knowledge about the company as the internal stakeholders.

The piece very much relies on knowledge share. A good example of really getting a handle on a client’s business and data requirements was the work we delivered for dealchecker.  This was achieved through a collaborative workshop between key stakeholders from the client, ourselves and our partners at Adobe. This laid the foundations for a market-leading piece of work that you can read more about in our case study.

Improve your data capture

Interested in finding out how our experts can help you collect more reliable and meaningful data?