Subscription Confusion, and the State of Podcasting Data

a newsletter smörgåsbord of mixed metaphorical delights

It’s been a weird week. I mean, they’re all weird now, but this one has had an extra soupçon of weird. Thus, you are getting a smörgåsbord of a newsletter. Today, I’ve got something extra to think about when you ask people to subscribe to your podcast, a bit of an extended rant on podcasting’s “real data,” and some detective work.

Ready? låt oss äta!

A Prescription for Subscription Decryption

Recently, I was a guest on a podcast and was asked about Apple’s rumored move into offering a podcast subscription service. Now, Spotify is also going to get into the paid podcast subscription model, joining the ranks of Luminary, Stitcher Premium, and Wondery Plus. When it comes to alternate (as in, non-advertising supported) models for monetizing podcasts, I am all in. The more ways to make money from podcasting, the more robust the medium is and the more ways we all have to make a living. With the content creators themselves, offering a premium tier is a wonderful way to offer listeners a choice of experiences. Ultimately, I want to get you paid so that I get paid and it keeps me off the streets for another week.

I do have one niggling little concern about all of this, however. I hate to even bring it up.

A couple of years ago, I published a long piece on NiemanLab and Medium called Podcasting’s Next Frontier: A Manifesto for Growth. It remains the most widely-read thing I’ve ever put on Medium. The centerpiece of the article was some bespoke research we conducted for Podcast Movement with people who were familiar with the term “podcasting,” but had yet to listen to one. The reasons this sizable portion of the population had yet to take the plunge were legion, but among them was this choice nugget:

Note the stat in the lower right corner. Nearly half of the yet-to-be converted cited a belief that podcasts cost money to listen to. The reason they believe this (and still do) is simple: that troublesome word we use to describe the behavior we want from our audience—subscribe.

Other than podcasts, we don’t subscribe to anything for free. Disney+, Time Magazine, the Cheese-of-the-Month Club, and the Bourbon-of-the-Morning Club all charge us money, every month, to subscribe to their wares. Every time we promote our podcast with a banner or widget that implores people to Subscribe To The Podcast On (laundry list of services) we are telegraphing to at least some portion of the population that podcasts are not free.

For years, even with that complication, our call-to-action was simple: subscribe to the show (for free) and get notified when new episodes are available. Except that on Spotify, there is no subscribe—it’s “follow,” and you don’t exactly get notified when a new episode is available. And soon, on the two most-popular pure audio distribution platforms in podcasting, we will have to communicate to people that we want them to subscribe to our podcast which is free unless of course they want to subscribe to our podcast which is not free.

The word “subscribe” has always been a stone in the shoe of podcast growth. Now that stone is going to rattle around a bit, and maybe settle someplace less comfortable. I want you get paid—believe me. But if you go the way of creating a paid podcast or a paid tier to your podcast offering, you may want to give some lengthy consideration as to how you are going to message this.

Of course, you could only offer a paid subscription podcast, with no free option. This reminds me of all of the articles talking about the lessons to be learned from the sales success of Radiohead’s In Rainbows album. Radiohead famously circumvented the machinery of the recording industry and just put the album up for sale on their site for whatever listeners wanted to pay for it. Despite the fact that many chose to pay “zero” for the album, it made more money before it was physically released than their previous album made in total. There were many articles about the marketing and content lessons this exercise had to teach us, but I’m going to distill the secret of its success into three words: Already Be Radiohead.

In any case, I think it’s a thing we are going to have to think about in addition to all the other things we have to think about, I think.

OK. I’m well hydrated, I am in a comfortable chair, and I have a snack. It’s time for a good, solid rant.

Does Podcasting Have Real Data?

Hard Data

I had someone contact me this week looking for data about the audience of about a dozen or so top podcasts. Not just any data, however--they were looking for the same kind of data files that Facebook has about you and me. When I responded to him that this kind of personally identifiable profile data was not readily available in podcasting, he asked me a simple question in response, one that for someone who buys and sells ads on Facebook is perfectly reasonable: How do advertisers make informed buying decisions without “hard” audience data?

Look, I get the question. Facebook lets you target (and I am not kidding) Expat single Moms from Wisconsin who love professional football and buy pet food and own a timeshare and drink wine and support veterans who are in the market for a used vehicle for their farm-related small business. Should they have this data? This is between Mark Zuckerberg and his god. Do they have it? You betcha.

I assured the gentleman that there is indeed a robust, data-driven advertising market for podcasts, and sent him a list of resources from our clients and the IAB. Podcasting throws off much better data than broadcast media, for example, and agencies have no trouble sending billions of dollars towards network television and AM/FM radio despite their being pretty "data light" compared to podcasting and other digital media. Still, this perception that podcasting lacks hard data was troublesome to me. And I am not sure marketers should get too used to "hard data," either.

Real Data

So, if we don't have "hard data," like Facebook, what does podcasting have? This week I heard about another type of data: "real data." A few days ago I was part of a panel on the current state of audio, and at the end of the session we were all asked to bring out our crystal balls and talk about what we are going to see in the future. One of the participants, an executive with an ad-tech company, talked about his anticipation for a future where we could finally get real data on audience, and not have to rely on survey research. Dude, I'm right here but I get it. Ad-tech companies have their hammer, I have mine, and everything looks like a nail.

Right now, pretty much everything you have done online is available to be cross-referenced and matched with other data by various providers of ad-tech. You probably don't want this. Europe and California really don't want this. You may have noticed on every single site you visit that you are asked about cookies--this is why. Third-party cookies designed to track your movements and collect data for cross-site purposes are going to go away--even Google is getting rid of them by default in Chrome next year, and their whole business is advertising. But rest assured, work on ad-tech continues apace, and though the fate of the cookie may be sealed, there will continue to be innovation on advertising technology and tracking. For better or worse, digital marketers have been spoiled by having all of this data on you. Just as you can't tell a dieter not to want that chocolate chip cookie, you can't get marketers not to want your delicious data cookies either.

People in the advertising industry have been conditioned, I think, to look at the digital data thrown off by our online activities as "behavioral data," while survey research is "self-reported data." Or, as I have heard some people say, the former is "real data." Ultimately, though, any advertising performance report or digital source of audience data is going to tell you some things that--I hate to tell you--had to have come from survey research. It's just way, way back in the food chain.

The Platform

There is a Spanish dystopian sci-fi movie currently on Netflix called The Platform, which describes life in a deep, underground pit called a "Vertical Self-Management Center"--essentially, a tower but stretching under the ground instead of above it. Residents are fed from a central platform that is filled with an embarrassment of food riches on floor one, and then lowered floor by floor while the deeper denizens get the leftovers from the floors above. It's like a dumbwaiter, open on all sides. The platform is completely empty of food by the time it gets to floor 171, but it starts to look pretty nasty not far below the surface, as hoarding and all-around brutish behavior quickly become the norm.

To me, the audience and demographic data associated with your cookies, pixels, and other ad-tech reporting is kinda like what's on that platform by, say, the fifth floor down. It seems like magic. It has descended from heaven. It still looks like food! But all of that obscures the fact that the food was originally put on the platform by human hands, you didn't see them make it, the quality has degraded, and you have no idea what has already been eaten.

I may be going to metaphor jail for this. I dunno. But my point is not to denigrate ad-tech, which ultimately puts food on my family. It's to convince you that "survey data" versus "real data" is a ridiculous false choice. Survey data is not the thing you have to settle for when you lack "digital data." Properly-sampled survey data is your meal, prepared tableside to exacting standards, before it gets on The Platform. It's a balanced part of your ad-tech data breakfast. Whether it is apparent to you or not, somewhere in the chain of your attribution data is information that could only have obtained from a human being answering questions about themselves.

There is no perfect data. The download sure isn't perfect. Google's inferred demographic data is pretty good, but only in that it minimizes errors of commission. If Google says you are a young female, you probably are. But it doesn't minimize errors of omission--those instances in which the algorithm isn't sure what you are, and thus doesn't hazard a guess. Survey research also has an error factor. But it is important to note that the outputs of good survey research aren't guesses, they are estimates. Estimates behave themselves. Estimates are reproducible. A proper estimate is a thing that, better than nine times out of ten, is really damn close.

Since 2003, my company has executed the largest single-day survey research project in America--the National Election Pool exit polls (you can see our work here.) On a presidential election year, we will sample, process, clean, and weight the data from a six-figure sample of voters, with the networks reporting that data the very same day. In the 18 years we have been the sole providers of this data, our network clients have never projected a single race incorrectly. I say this not to brag (well, ok--a little) but to make a more profound point: we are competent at a reproducible craft that adheres to both a public code of ethics and standards and is informed by more than four centuries of peer-reviewed research on statistics and two centuries of opinion polling. It's not tucked away in an inscrutable proprietary black box, and it isn't guesswork.

Which brings me to my favorite kind of data. No, it isn't real, real hard, hardly real, or even survey research data. It's something else that is enabled by the fact that we are now getting quality digital data and quality survey research data, in concert. And that's insight.

PSI: Podcast Scene Investigation

For example, here's a little detective work I found mentally fun. A few weeks ago we published the Top 50 most listened-to podcasts in the U.S. for 2020. The point of such a list was not to compete with the various download charts out there or to challenge them--it was to present a different thing that we can now measure. It's not meant to look like a download ranker. It's a people ranker. We got a ton of great feedback on the list, but a few complaints. Interestingly, ALL of the complaints centered around the same podcast.

I won't publish the whole list again, but look at the top 20 podcasts by 2020 audience reach:

  1. The Joe Rogan Experience

  2. The Daily

  3. Crime Junkie

  4. This American Life

  5. My Favorite Murder

  6. Stuff You Should Know

  7. Office Ladies

  8. Pod Save America

  9. Planet Money

  10. Wait Wait...Don't Tell Me1

  11. Radiolab

  12. The Ben Shapiro Show

  13. Serial

  14. Fresh Air

  15. Call Her Daddy

  16. Up First

  17. The Dave Ramsey Show

  18. Armchair Expert with Dax Shepard

  19. Conan O'Brien Needs a Friend

  20. (TIE)
    WTF with Marc Maron
    Freakonomics Radio

This list comprises three types of podcasts:

  • Podcasts that are currently measured by various download charts and make sense

  • Podcasts that are not participating in the various download charts, but still make sense


  • Serial.

If you eyeball this list, it should make sense to you. The fact that the digital breadcrumbs thrown off by server-based measures or URL-wrapping actually line up with what people tell us they listened to is validation for both measures. You'd be tempted to think that this is incredible, except, see the bit on reproducible science, above. It's kinda the job. The fact that survey-based audience demographic data and digital tracking of download traffic agree so neatly simply tells me that podcasting has hard data, real data, and more importantly good data about the medium and its audience from the medium's leading data providers.

Still, the one nit people picked about our list was Serial. How could Serial be in our Top 20 for 2020 when their last new episode was in 2018? Doesn't this prove that a survey-based list only spits back the first show people can think of? There are two easy ways to partially respond to this argument, but there's a more compelling insight that makes the whole question fascinating.

First, have a look again at the list. Except for Serial, it's a mix of long-standing shows like This American Life and new shows like Office Ladies. People in the podcast industry might know that Dax Shepard has a huge podcast, but outside? Not necessarily. The rest of the list is completely plausible. Serial’s presence is an apparent outlier, not the norm.

Second, regardless of the fact that Serial hasn't had a new show since 2018, if you ask anyone new to podcasting what shows they have listened to in the last year, you are going to hear a ton of mentions of Serial. It remains one of the main and most important gateway shows to the entire medium.

But still, even if you stipulate both to be true, that doesn't completely explain Serial's prominence on our list. So, some detective work. In the methodology for the Podcast Consumer Tracker for this list, we asked over 8000 weekly podcast listeners to look at their podcast clients/apps and write down all the shows they listened to in the last week (the margin of error on an 8,000 person sample, by the way, is +/- 1% at a 99% confidence interval.)

Over the course of the year, Serial was named enough to be the 13th most listened to podcast of 2020. It did not reach a similar position in Podtrac, to use an example. But here's what DID reach a very prominent position in downloads:

Nice White Parents, from Serial, was the #1 most-downloaded new show of 2020, according to Podtrac. Nice White Parents has its own feed, and its own cover art, but Serial also tried an experiment--they dropped the show into Serial's feed, as well. Here's what that looks like on my phone:

It looks like Serial. And when you play the episode, right after the pre-roll, you hear: "From Serial, this is Nice White Parents."

Now, you can tag the actual audio file as "Nice White Parents" and that's what a download measure is going to tell you. But for the millions of Americans who form the built-in, already subscribed audience for Serial, it looked like Serial, and it sounded like Serial. And many of the podcast listeners we sampled over the course of 2020 looked at their playing history and told us that Serial is what they heard.

Are they right? Well, that's the fascinating bit, isn't it? A server thinks Nice White Parents was downloaded. Humans think a new version of Serial was downloaded. And thanks to the marvelous sulci and gyri of our human brains, we can hold two seemingly competing thoughts in our head at the same time. Nice White Parents was one of the most successful podcasts of 2020. And a lot of people thought it was Serial: School Justice in the same way we continue to get CSI iterations.

You can look at Serial being different on our list and say "that's wrong." Instead, I would encourage you to look at it and say "that's different--and I wonder why." I know our data isn’t wrong—we octuple-check it. So, are the people “wrong” for writing in Serial? In the words of the noted philosopher, Admiral Ackbar…

The audience is never wrong.

The show was legitimately listened to. The ads were correctly served. A good time was had by all. And we have learned something here about introducing a new show into the feed of an existing podcast, haven’t we? We have learned a thing that would not be possible to know without both well-executed download measurement AND well-executed quantitative research. This is good, hard, real insight.

The next time someone tells you that podcasting doesn’t have “hard data,” you just make them read this entire newsletter. That should be punishment enough.

Thanks for reading this far. If you have found value here, I hope you’ll share and subscribe FOR FREE REALLY.

I leave you with some instructions on how to make a smörgåsbord of your own. Have a great weekend.


Photo credit: Smörgåsbord by by bigmick - flickr, CC BY-SA 2.0, Bork! Bork! Bork!