Cover of masterclass

Masterclass for Web Managers, incl. editable templates ($29.99)

(Video transcript)

Excel icon

Content Groups analysis spreadsheet (XLS 100KB)

Use this spreadsheet to identify the Content Topics that users of your website are most interested in. Watch the video above for set-up and usage instructions.

I bet you want to know what content your website users are most interested in. Right?

Of course, you do. Me too.

If your situation is anything like mine, you likely have very little time to spend on useful things—like making your content easier to find, read and understand. So, when you do get this time, you want to make sure you're prioritising the right content.

Not the stuff no-one really cares about.

Content volumes are unmanageably high!

But, here's the problem.

Not only are you likely to have vast amounts of content on your site to track as it is—I bet more and more is being added all the time.

And more and more.
And more and more and more.
And more and more and more and more.
And more and more and more and more and more.
And more and more and more and more and more and more ... you know what I mean!

So, volumes are unmanageably high and, yes, I mean "unmanageably". There are very few web teams that could—hand on heart—say they are "managing" their content. We're not managing. We're coping!

So, not only that...the basic metric of content engagement—Views per page—is at best highly suspect for tracking user interests across an entire site. At worst it's misleading and often meaningless.

I have written about this at length on my website.

Lots of web content

There is no such thing as a "standard" web page

Just consider all the different types of web pages you have on your site. Think about how varied they are in length, scope, density and any other aspect of content design.

Probably you have SOME that are short, highly targeted and well organised—and MANY others that are extremely long, loosely structured and cover many different subjects all in one place.

The reality is that there is no such thing as a "standard" web page. Trying to compare or rank them all across an entire site is just WRONG and will lead to bad decisions.

You can make far better decisions by ditching pages and Views, and tracking user interest based on Visits to Content Topics.

Content Topics encompass pages about discrete subject matters

A Content Topic is a set of pages that encompass some discrete subject matter or other criteria of interest to you.

How you organise them is down to you.

For example, you might define a topic "Brexit" for all pages on your site about the UK's exit from the EU. Importantly, these pages do not need to be located in a single place in your site's Information Architecture.

They can be anywhere across on your website.

All that matters is that they concern the same subject matter or division of content you are interested in tracking as regards users' interests.

Measuring content in this aggregated way instead of using individual pages, means you don't have to worry whether one topic has 25 short pages and another just 1 long page—or indeed any other difference in length, scope, volume, density, etc.

This is because you're no longer tracking user interests based on how many times they look at pages, only how they engage with high-level topics based on Visits.

This "flattens" issues caused by the varying numbers of pages per topic. The number of pages in each topic no longer matters.

In a loose sense, every topic is treated as if it were (sort-of) a single page. This imposes an equality of measurement between each topic and allows us to make direct and valid comparisons for engagement at a cross site level.

Ultimately, it allows us to identify and rank, in the most accurate way possible, the content that is of most interest to your users.

As we'll see in a little while, this data is as captured by the advanced Content Groups feature in Google Analytics.

Content Topics result in waaaaaay better decisions

As you can probably guess, the data is incredibly powerful and will allow you to see (maybe for the first time) the topics that your users are and are NOT actually interested in.

This then can feed back into your decision making.

If your experience is anything like mine, you'll find that a quite small number of topics attract the overwhelming amount of your users attention. For instance, maybe just a fifth or a quarter of your Content Topics accounting for over 80% of activity.

This is very significant.

It essentially means that three-quarters of your content is of little or no interest to users. And for a team with very limited manpower this fantastically useful information.

It means you now know exactly where NOT to spend any of your limited resources for UX or other improvements.

Apart from basic QA, you can effectively ignore that content with little or no impact on the majority of user engagements.

Happy web users

You must ignore the long tail of content

Now don't get me wrong.

Of course, it would be wonderful to improve all the content on our websites.

But the truth is that, absent some radical and transformational change in how web teams are resourced, you simply do not have the manpower, skills or time to justify it.

Not only would improving this long, long tail of content demand massive effort, the marginal benefit would be vanishingly small. Only a tiny number of engagements will actually benefit.

On a limited web team, it seems crazy to me to spend your resources on content that gets the least engagement. All effort must go into top Content Topics first.

Only if and when they are as good as they can possibly be, should secondary content be considered.

Let's find out how to set-up Content Topics

So look, that's the context for measuring cross site activity using Content Topics. Again you can read more about the rationale on my website at www.diffily.com.

But now we're going to get practical.

Step 1: First we're going to explore how to build your topic by creating a taxonomy for your content.

Step 2: Then we'll see how to set up the topics as Content Groups in Google Analytics.

Step 3: After that we'll look at how to import the resulting data from GA and into a custom spreadsheet I developed.

Step 4: And finally we'll see the ranking of Content Topics emerge as figures and charts.

You can see an example of my spreadsheet on screen.

The challenge of tracking Content Topics in Google Analytics

The reason we need to import the data into the old webmaster's friend—Excel—is that as good and all as GA is, the level of analysis it provides is not adequate to analyse data about Content Topics so we can accurately identify users' real interests.

Pulling the data into Excel, allows us to see in a simple, robust and visual way the topics that are of most and least interest to users—and thus highlight what you should prioritise for attention...and those to IGNORE.

The spreadsheet achieves this by combining 2 numbers:

  1. Visits per topic per day
  2. Average count of pages visited per topic per day

The basic rule of thumb is that priority should go to Content Topics with a high average number of Visits and a low average count of pages visited per day.

Improvement to those topics will benefit the maximum number of engagements at the lowest cost for you. This is simply because improving topics with a small number of pages generally takes far less effort than those with lots of pages.

It's a no-brainer.

And for websites that may have scores or hundreds of Content Topics—like the sample spreadsheet on screen—this type of data is pure magic.

Or at least it has been for me.

So feel free to download the spreadsheet and re-use and amend it as you see fit. I'm interested to see what extensions you come up with.

Yes, there's a fair bit of work involved, but it's something I have found to be really, really worthwhile doing.

There are 3 sheets in the file and it includes all the active formulae and other elements you need.

The first 2 sheets are examples related to my own website diffily.com—the first illustrates the taxonomy of content topics and Regular Expression queries used for my site as Content Groups in Google Analytics (which we'll explore in a moment).

The second sheet is where I record the output data and see the results.

The third sheet is essentially the same as the second sheet, but has been loaded with extra dummy data to show what the output for an enterprise scale website would look like.

Now, let's get started.

Step 1: Define your taxonomy

In my introductory article to Content Topics, I describe the first step as defining your topic taxonomy. The taxonomy is the list of content topics you wish to track on your website—including the pages that make up each topic.

If you look at my own website (which admittedly is quite small in scale), we see it has a section labelled "Blog & Articles", "Web Manager's Handbook", "Online training" and some others for "About" and "Contact".

These elements are then broadly reflected in my topic taxonomy.

So here we have the topic "About" which contains pages from the "About" directory, the "Contact" directory and "Speaking".

We also have topics for "Downloads" and "Publications".

We then have the 2 major topics which encompasses most of the pages on my site.

One that contains all my articles about "Website governance" and the other articles about "UX and content".

So all in all I defined 9 topics for my site, and they comprise the discrete divisions of content I am interested in in terms of tracking my users interests.

Again, it is worth noting that the taxonomy does not exactly match my site's Information Architecture. That is, not all the pages within a topic are from the same section of the site.

All that matters is that they belong to the same subject matter or division of content I am interested in tracking.

Now having defined my topics—the next step is to prepare them for input into Google Analytics as Content Groups.

Excel spreadsheet

Step 2: Create your Content Groups in Google Analytics

To do so, you go into Admin in GA and then into Content Groups.

If you research this area, you'll find you have a maximum of 5 groups you can add—at least for the free version of GA.

I have set up 2 groups already, one previously as a test and a new one which we'll explore today.

There are a number of ways to configure Content Groups and the method I have chosen is Rule Definitions. These are based on Regular Expression queries which is something I am familiar with and is quite flexible in terms of data definition.

You'll see that the Content Groups here are the same as in my taxonomy spreadsheet.

To set up the topics as Content Groups, you need to define them using Regular Expressions to match the selection of pages or other criteria you have chosen for each.

The first step to that is simply list all the pages or other criteria you want captured by each topic—and then build the query using RegEx rules.

You'll see the most common method I use is Page Filenames, though there are other methods and I use the criteria of Page Title for one topic, the 404s.

Google analytics admin

Then we build the combined RegEx queries so as to encompass each sub-element.

Now, I can't go into how to write RegEx here but there is some very good online guidance from Google. It requires a bit of practice to get used to, but it is relatively straightforward.

For page-based topics, I first isolate the RegEx appropriate for each page element and then build the full query as shown. Next we transfer it into Google Analytics.

So, back to GA.

Look under Rule Definitions and here is where we define the topics.

So for example, if I open topic "About", we see that it is set up to capture everything based on Pages that exactly match the RegEx as defined.

For the element that doesn't use Page Filenames, but instead uses Page Title, we see that it is set up to capture any instances where the Page Title matches the RegEx "page not found".

The reason I built the query like that is that on my website if you get a 404 page, the title of that page comes back as "404 not found". So that's a convenient way to scoop up all 404s within a single topic.

Hard lessons

Now, before we move on I want to emphasise that the order in which you topics to the Content Groups list is extremely important.

I suggest you refer back to my original Content Groups article to find out more about this and the veeeerrrrryyyyy hard lessons I had to learn to get this right.

The first lesson was that topic measurement is mutually exclusive and cumulative.

What that means is that, if I add a RegEx query at the top of the Content Groups list and if that query is clumsily written and accidentally encompasses too much content, it will make that data inaccessible to subsequent topics, based on the order in which they appear.

This means your topic queries need to be—A—very accurately written and—B—the order in which they appear must be very well thought out.

Almost certainly you won't get it right first time—I didn't! It took a lot of time and effort before things stabilized. Plus remember that you need to continue to tweak these queries over time, as new content is added to your site or moved about!

The second lesson is that I advise you to set up a discrete Content Group to capture all instances of 404s and put that first in your list. This was very useful for me as it scooped up activity for such pages and ensured only valid visits were logged in core topics.

So, overall, as I said, put your 404 topic first, then all your general Content Topics after that and finally, your Wayfinding, Navigation and subsidiary Content Topics at the bottom.

That order seems to work best for me. Though again this is more art than science.

Step 3: Import data from GA and into Excel

And now everything is set up, GA will start to capture and record the data.

You will find the main output under the Behaviour report, Site Content, All Pages—the place to see it is under Content Groups.

There we see the 2 Content Groups I set up. Select All Content and now we see the data for my Content Topics coming through.

Great!

Again remember we are not tracking topics based on Page Views, we're tracking them using Visits or Sessions.

And for some odd reason GA the label "Unique Views" to describe Visits or Sessions in the Content Groups report. Yes, that is weird and don't ask me why they do that.

So, make sure to sort on that column first and change the chart also.

Sometimes you may get some stray data in your report, the most common being "(not set)". To exclude that from the report, go into Advanced, Exclude, Matching RegEx, type in the query and that will cause it to disappear.

So, now we have the data we want and we can begin to export it and pull it unto Excel for further analysis.

Back to Excel.

Here again we see the spreadsheet I showed before for my own website. It may look complicated but it's not really.

For each of the Content Topics all we're doing is pulling in 2 numbers from Google Analytics per day.

Those are:

  • First, the number of Sessions or Visits to each topic per day
  • And second, the number of Pages Visited per topic per day—or as I call it the Page Count per day

Those are only 2 numbers we need. The spreadsheet does the rest in terms of calculations and graphing.

The reason we need to pull this data out of GA into Excel, is that GA can't really present it the straightforward way we want—in particular the Page Count. And the charting per topic is not great either.

Now maybe there's a way to do it better in GA-4—which I know also includes Content Groups, but haven't explored yet—or indeed into some other data crunching tool like Power BI. I'll leave that to you to investigate.

Anyway, to pull out the numbers, back to Google Analytics we go.

Because the data is recorded on a day-by-day basis, we need to focus on a single day. So let's go to a recent full day as shown here.

So now we can see the data.

The first numbers we want to pull out are the Unique Views as they are called in GA.

Here, you can either do it by hand as it's a small number or else you export using Excel.

When downloaded, open the files, go into the dataset—and there we see the Content Topics. As the only numbers we are interested in are the Unique Views so we can delete all the others.

We then transpose the numbers into the main spreadsheet.

For large numbers of topics I suggest using a Lookup table for this exercise, but for the small numbers here, we can do it by hand.

Next we need to get the Page Count, so back to GA again.

As the Page Count does not appear by default, we first need to add a Secondary Dimension. Search for Pages and then select.

It then gives us a list of all the Pages visited within each topic for the selected time period.

I suggest sorting by topic so you can see the pages within each topic more clearly.

Again as the numbers are quite low, we can count things straight off the screen—but for big numbers you'll need to export and use the counting function in Excel before transposing the various totals into the main Excel sheet.

Just one thing to note here for my own website, some of the pages are doubled up.

For example the root directory of "book" and the index file of "book" are both counted as separate pages. However, I know they are actually the same page, so I'll just count them as a single page. You may find similar anomalies you'll need to work around on your own site.

And really, that's it.

Step 4: Explore the ranking of Content Topics

From then on, all you'll need to do is grab the data day-by-day and add it to the Content Topics spreadsheet. It'll soon fill up with numbers here on the left—and as time passes the insight generated will become more and more robust and useful.

Personally, I found that about a month's data was needed before I could start to use it to inform decision making.

On the bottom, you also see the trend lines for engagement on the Content Topics emerge over time. These trend lines show in a fairly simple and visual way the topics that get most attention and also how they compare to one another. They also show the Page count per topic changes over time.

Again for a small website like my own, the results don't look so impressive—but for a large corporate or enterprise scale website, the insight can be extremely powerful.

Off you go!

And, that's it. Off you go. Try it yourself and I promise you won't be disappointed.

You can read more about Content Topics on my website.

In a future article I plan to expand on how to use these numbers to decide what content to prioritise for UX improvements - in particular the Content Topics that will benefit most from better Findability and Readability.

But that's it for now.

Thanks for watching.


Cover of masterclass

Masterclass for Web Managers, incl. editable templates ($29.99)