Who Said That?
Context clues in massive datasets help us understand the communities where certain narratives take hold
Photo by Nainoa Shizuru on Unsplash
July 16, 2017
By Aaron Harms
I’ve written before about how the Protagonist Platform uncovers meaning from the billions of pieces of text scattered around the digital nether, but in reality that’s only half of the story. When we’re analyzing Narratives, we don’t just want to understand what’s being said; we want to understand who’s saying it.
Considering the scale of data we’re generally working with at Protagonist, it doesn’t make sense to try to track the opinions of individual people, so rather than examining what Beth on Twitter thinks or what u/JJ2017 on Reddit has to say, we’re identifying how demographics and narratives intersect within and across channels. Unfortunately for Narrative Analysts, people don’t always self-report their age, gender and occupation alongside their opinions, so we have to use a blend of self-reported data and inference analysis.
Advanced data inference can identify the likely background of subjects based on clues in their word usage or behavior patterns. The great thing about this type of analysis is that it enables uniform, consistent analysis across millions of sources, regardless of the specific parameters of each individual channel. Being source agnostic means that no dataset gets left out, thereby skewing the findings — we go wherever the narratives take us.
The accuracy for any one given data point is okay but not amazing — but once you apply these approaches at scale, the insights get more powerful. Audience inference can uncover demographics like age, gender, profession, or location—all of which is hugely valuable information for businesses looking for insight into a specific audience. It can also be divided to expose sub-segments (like “males over forty”).
The algorithms are rooted in advanced statistical analysis such as conditional probabilities. For example, what’s the probability that someone is a man over 40 if that person uses certain types of profanity and writes in short sentences? Achieving reasonable accuracy rates requires troves of human labeled datasets. Fortunately there’s enough of that out there to train a statistical classifier, and we’re adding to the body of knowledge with each incremental analysis. The same principles can be applied to other attributes beyond basic demographics, such as psychographics, though these types of inferences are in their infancy.
Bridging narrative data with demographic data enables us to put more texture and detail behind the people who actually adhere to these essential beliefs. And, Narrative Analytics becomes more actionable for marketers when we draw out the connections between Narrative data and demographic data, which can power campaign targeting, media planning, and more advanced segmentation.
Aaron Harms, EVP, Product and Technology
START YOUR HERO’S JOURNEY NOW