Tuesday, May 21, 2013

Even in an article about ngrams, NYT fails to care enough about language to factcheck

Anyone who knows me knows how much I love google ngrams. So I was interested to see David Brooks wrote a column based on it.

The beauty of google ngrams is that it makes it easy even for journalists to check facts about language. But that doesn't mean they bother to do the work. Alas, the column, which promised to string a series of ngrams observations together, in fact had its facts wrong.

For example, Brooks claims that:
That is to say, over those 48 years [between 1960 and 2008], words and phrases like “personalized,” “self,” “standout,” “unique,” “I come first” and “I can do it myself” were used more frequently. Communal words and phrases like “community,” “collective,” “tribe,” “share,” “united,” “band together” and “common good” receded.
For the time being, let's ignore the likelihood that the use of the word "tribe" in published books represents communal feeling and check some facts. I began with, "common good" and "band together," and saw that they do not both "recede" as he says. "common good" did drop over the middle of the 20th century, but has been rising over the last twenty years; "band together" has risen steadily over the last two centuries.

Next I tried the phrase, "I come first." It seemed only fair to put it in context by searching a family of such phrases, so I included phrases such as "family comes first", "faith comes first," and so on.

It turns out that "I come first" and "God comes first" track each other nearly perfectly, both rising and falling over the last one hundred years without any clear overall trend as he describes. The only clear trend that emerges from the set is a steady rise in the phrase "family comes first."

I decided to test his general theory that we've seen a rise in individualism and a decline in collective thinking with the most basic words and searched "myself" and "together." Alas, the chart holds nothing to confirm Mr. Brooks' argument.

To be clear, even if the trends he describes were true, they wouldn't necessarily tell us much. Rare in these word lists are phrases that seem equivalent -- comparing trends around "economic justice" and "prudence" (two words that do actually move in the ways he describes) doesn't tell you much since a speaker would virtually never be in a context in which they would be likely to use either word.

But even if his analysis was valid, his data are not.

It's hard to believe the NYT would be this lazy fact-checking anything else. Can you imagine they'd allow a column on sports in which the win-loss records were all made up on the spot? Or a column on politics that got the number of republicans in the senate wrong? With language, you might argue, it is harder to check facts, but in this case the article actually describes the tool needed to do the fact checking, a tool free to the public and a mouse click away.

No comments: