(Skip to 'lessons learned' for implementing Content Groups.)

As you probably know, measuring aggregate web activity based on Page Views in Google Analytics is not a good idea.

For instance, imagine a site with 2 content topics: Economics and Politics. There are 5 pages about Economics and 10 about Politics.

You look at your analytics. Economics gets 500 views per month. But wow! Politics gets 1000.

Its obvious, therefore, that Politics is much more popular and should get most attention in terms of UX, optimisation, etc.


Look deeper and you discover that each topic gets exactly the same number of visits: 100 each.

2 content topics with web pages: Economics has 500 and Politics has 1000

It is simply because Economics has twice as many pages that it appears twice as popular using Page Views.

(Why it has twice as many pages is a separate question. Perhaps there is much more information on that topic? Or maybe it has the same volume of information but uses a different content design approach? Or perhaps the information has been poorly planned and arbitrarily separated among too many pages?)

This shows that measuring or comparing activity using Page Views (including Unique Page Views) at an aggregate level is a bad idea. Visits are far better.

The problem is that the Google Analytics' default Behaviour report does not count total Visits at a topic level, not even in Content Drilldown. (Content Drilldown does show total Unique Views, but that is not the same as total Visits.)

This is where Content Groups step in.

Content Groups flatten issues caused by varying numbers of pages

Content groups allow you cluster pages based on topic and then track activity using Visits.

This is incredibly powerful.

It no longer matters if one topic has 10 pages and another has 5. Activity is based on Visits to the topic overall – not to individual pages.

This "flattens" issues caused by varying numbers of pages per topic. In a sense, the Content Group treats the entire topic as if it were a single page.

This gives you much better insight about which content is truly most popular.

Screengrab of content groups in Google Analytics

Start with a taxonomy

Creating content topics is like creating a taxonomy. The aim is to cluster pages based on subject matter (or some other factor of interest to you, e.g. audience, geographic targeting, etc).

The Information Architecture (IA) of your site is probably a good place to start. This is because many IAs are subject matter based – though not all.

IAs are built to match user behaviours, not strict taxomonies. Think of the difference between the Dewey Decimal system and the IA for a library website.

For instance, you may want to create a content topic for a subject where the relevant pages have been located in several parts of your site.

Using the example of our imaginary website again, maybe you want to set up Brexit as a discrete content topic, but find that relevant pages have been placed within both Politics and Economics.

Default measurement in Google Analytics does not allow you to easily account for such variances.

Content groups are the way around this.

As described, they allow you to cluster pages into content topics no matter where they are on your site – and then count total activity using Visits.

Brexit pages within Economics and Politics sections

Four hard implementation lessons learned

I won't describe the step-by-step for setting up a Content Group here. There is waaaay too much detail and it is better described elsewhere.

Personally, I found Content Groups quite tricky to configure and measurement outcomes are often confusing.

It took me a LOT of trial and error to unpack how it (seems to!) work. Many of the pitfalls are not well documented online.

To help you get started, below I have listed some of the key lessons I learned. They should be of use to you, but still ... proceed with caution!

1. Topic measurement is mutually exclusive and cumulative

GA offers 3 methods for building content groups. I chose Rule Definitions as it is based on Regular Expressions (RegEx) and that is something I am familiar with from elsewhere.

The Content Group itself is composed of topics based on RegEx rules that define the pages to be included in each topic. For example:

Content Group: International News

  • Topic 1: Politics. RegEx = "^\/politics"
  • Topic 2: Economics. RegEx = "^\/economics"
  • Topic 3: Brexit. RegEx = "^\/politics\/brexit|^\/economics\/brexit"
Screengrab of rule definitions in Google Analytics

As you add each new topic, the most important thing to remember is that all queries are mutually exclusive and cumulative.

It took me weeks to understand this, so I want to explain it.

As above, the RegEx queries for the first and second topics (Politics and Economics) capture ALL activity within those directories and subdirectories – including activity in any Brexit subdirectories they contain.

That means that, even though the RegEx query for topic 3 has also been set up to measure activity for Brexit, the ultimate report in GA will not show any metrics. This is because Brexit has already been fully captured by topic 1 and 2.

In summary, any content captured by a RegEx topic is inaccessible to a subsequent RegEx topic, based on the order in which they appear in the Rule Definition list.

To capture activity on Brexit as a standalone topic, I need to reorder the Rule Definitions to ensure the Brexit query runs first, as below:

Screengrab of modified rule definitions order in Google Analytics

This new order will capture all activity in the Brexit subdirectories first and then activity in the Politics and Economics directories.

And remember - this is important - the GA reports for Politics and Economics will now show zero activity for Brexit. This is because that activity has already been captured in topic 1 (even though the way the RegEx queries for Politics and Economics are written means Brexit activity should be captured).

Weird, I know.

So, you can see that you need to plan your approach to Rule Definitions with care and build queries very carefully from top to bottom.

This is awkward, but it can be done.

Unhappily, it does mean that individual pages can only be allocated to a single topic within a Content Group. This does not reflect how true taxonomies work, where pages often belong to more than 1 topic, e.g. a page can be both about Brexit AND Economics.

However, you have 5 Content Groups to play with, so you may find a way around it.

2. Exclude 404s first

In my Content Groups, I ensure the very first rule definition captures all Visits to potential 404 pages.

Based on the rule explained above, capturing all 404s in the first topic means that all subsequent topics are 404-free. This gives me a more accurate view of actual activity – unpolluted by Visits caused by 404s.

3. Do not add Visits together

For some reason, in the Content Groups report "Visits" are called "Unique Views" - but this does NOT correspond to the usual understanding of "Unique Page Views".

Unique Views are not Unique Page Views. Unique Views = Visits.

Screengrab of the unique views report in Google Analytics

As a general rule, do not add Visits together. If you do, the total will exceed the true number of Visits to your site. The reason is that many users access more than 1 topic per session.

For example, you could see a result like:

  • Economics : 1000 Unique views (i.e. Visits)
  • Politics : 500 Unique views (i.e. Visits)

In this instance, your site did not get 1500 Visits. It probably got (at least) 1000 Visits, but 500 of those Visits were for pages in both Economics and Politics.

So be careful about how you interpret results. Remember, GA just crunches numbers – YOU need to do the work of interpretation.

4. Manage closely and keep your queries up to date

Websites change. New pages are added. Pages are moved. Directory names change. Depending on how tightly you define RegEx topics, you may need to update your queries a lot.

Trial and error taught me to keep my queries relatively loose to avoid daily updates.

Now use the data to build a content "topography"

So, now you see – Content Groups are hard to configure, easy to get wrong and need lots of attention.

But used correctly, they are an incredibly powerful way to identify the content that your users are really most interested in – and ditch the issues caused by cumulative Page Views.

For me, that has made them more than worth it.

Read my the follow-on article: "How to identify the website content you need to prioritise (and ignore) using Content Groups" (Sept 2021).

Update 22 May 2021 - This article about Content Groups has been shared on the official Google Analytics Twitter account. Must be something useful in it after all!