Diving Into Media Trends With Machine Learning: A Case Study in U.S. Election Coverage
Michael Burke, MSR Communications
PR practitioners are probably more well-versed in media trends than any other profession--perhaps even more so than journalists themselves--yet, we often find media coverage patterns as baffling as the rest of the public. Undoubtedly, the next frontier for communications will involve using technology to gain a greater understanding of why certain topics, people, events etc. resonate. Fortunately, some surprisingly accessible machine learning tools combined with data that most agencies have at their fingertips can provide insight into some of the hidden currents that would have previously been impossible to identify. Those of us who remember collecting media clips from the local Barnes & Noble can attest that media monitoring has come a long way. If you’ve got access to any of the major monitoring platforms you can get an exhaustive list of coverage on any subject you can imagine. In addition to dashboard reports that can track what’s happened, platforms like Meltwater allow you to download Excel files from media coverage reports that contain a wealth of data which can be analyzed with any number of tools for further insight. And while ‘Machine Learning’ may sound a little intimidating, in actuality, machine learning techniques are closely related to much of what you may have learned in a college statistics class.For example, at my firm, MSR Communications, we were curious about what may happen in a given week of election coverage, so we ran analyzed more than 6,000 election-related articles that ran in top news sources. We used a platform called R-studio to take a closer look at the trends and somewhat mysterious relationships in media coverage, and what we found was fascinating.
POTUS dominates the news cycle
80 percent of all election coverage in the U.S. included a mention of the President. This may not be terribly surprising on the surface, but considering that the coverage we looked at includes state and even local election coverage, this does indicate how heavily the President plays into the conversation. In the press’ view, it appears that he plays a part in almost every election in the country, either helping or hurting the candidates. Furthermore, it’s rather surprising to see how much more coverage he receives compared to ANY Democratic candidate. The President got more than 4x times as much coverage as anyone else!
Frontrunner Biden is also the media coverage ‘frontrunner’
At a not-so-close second, Democratic front-runner Joe Biden was listed in 17 percent of all coverage. However, in terms of total coverage, he handily beat out his competitors, with Bernie Sanders being listed in 11 percent of the coverage, and Elizabeth Warren in 10 percent.
AOC gets massive coverage, despite not even being in the race
Interestingly, Alexandria Ocasio-Cortez, who is not running for president, appears in more coverage than all but the top Democratic candidates, beating out Pete Buttigieg, Cory Booker, Amy Klobuchar and Julian Castro in terms of total coverage.
What machine learning reveals about inflammatory and accusatory language
To discover relationships between various accusatory words and candidates, we applied a machine learning technique called Association Rules. This employs the Apriori algorithm to sort through data describing people’s behaviors and determine how frequently certain actions are accompanied by other actions. This technique, famous for their use in recommendation engines, can be applied to any set of human behaviors, including journalist coverage of election topics. In particular, we were interested to learn how various negative or accusatory terms were used in conjunction with each other, and with candidates.
“Racist” appeared more than ALL Democratic candidates (except Biden)
Clearly, to the U.S. press, racism and race-related issues are very important in this election cycle. With the exception of Biden and Trump, the term “racist” appears more than the name of any other candidate in the election.
It comes in threes: Homophobic, racist and sexist
Beyond reporting mere frequencies, Association Rules can tell us when the presence of one word is predictive of the presence of another. As it turns out, “homophobic”, “racist” and “sexist” were by far the most predictive of each other, as terms. An article containing the term “sexist” was 50 times more likely to contain the term ‘homophobic’. However, if an article contained the terms “homophobic” and “racist” it was 85 times more likely to also contain the term “sexist”.
Socialists and Communists
Many of these terms, not surprisingly, were heavily associated with Trump. For example, if an article included the terms “racist”, “sexist” and “Trump” it was 57 times more likely to also contain the term “homophobic”. Similarly, if an article contained the terms “sexist” and “Trump” it was 56 times more likely to contain the term “homophobic”. Interestingly, if an article referenced both Mike Pence and Kamala Harris, it was 7 times more likely to contain the term ‘racist’.But the Democrats have to deal with their own accusatory terms. In particular, the term ‘socialist’ was heavily associated with virtually all candidates. For example, an article that included Biden, Castro and Klobuchar was 6.8 times more likely to contain ‘socialist’. Similarly, an article containing Biden, Booker and Castro was 5.7 times more likely to contain the term.It should of course be noted that not all of the politicians in this study consider ‘socialist’ to be a bad thing, and both Sanders and AOC embrace it. “Communist”, however, is universally avoided like the plague in American political discourse. While the term was used relatively infrequently in general (in only about .8 percent of the articles, compared to ‘socialist’ which appeared in 3.5 percent of the articles), the combination of candidates was a determinant in whether or not it was included in an article. As it turned out, if an article contained a reference to AOC, Pete Buttigieg and Elizabeth Warren, it was 15 times more likely to contain the term “communist”. If an article contained AOC, Buttigieg and Sanders, it was 13 times more likely to contain the term “communist”.
Lessons in media trends
What does it mean for Biden that being mentioned alongside Castro and Klobuchar meant the article was 6.8 times more likely to contain the term “socialist”? I’ll leave it to Biden’s media team to figure that one out. But this kind of information ought to raise eyebrows with any media strategist interested in presenting their client in the most favorable light possible. We’re certainly only scratching the potential of machine learning, but for any PR or communications professional interested in understanding not just what’s being covered, but why it’s being covered--and I suspect that’s just about all of us--machine learning is the new frontier.
About the Author: Michael Burke has worked with some of the world’s top brands on marketing and PR strategy, including The Myers-Briggs Company and AirBnB, as well as dozens of cutting-edge technology clients. As a director and data scientist at MSR Communications, he’s living his dream of applying data science to MarComm.