You know the feeling, waking up achy with a fever and an unhappy stomach. It’s a day home from the office watching reruns and slowly working up to dry toast #flu kind of feeling. Since misery loves company, many of us send our flu sadness into the universe via Twitter. In fact, the social media site’s becoming increasingly reliable as a surveillance tool for flu-like symptoms. The way it works now, tracking flu cases from tally to report takes a week or two. That’s hardly timely in a world that moves at the speed of Internet. Researchers at San Diego University got to thinking about this—and decided to study Twitter’s potential for keeping an eye on flu-like symptoms in a city in real time.
The researchers chose 11 U.S. cities, gathering tweets containing the keyword “flu” within a specific radius of each city. New York and Chicago were obvious choices. The others included Boston, Cleveland, Columbus, Denver, Detroit, Fort Worth, Nashville-Davidson, and San Diego. These cities have two characteristics that made them great “research hubs”: large populations and city-level flu stats.
Using a social media search tool that pinpoints tweet locations (from user-enabled GPS or the user’s listed hometown,) the researchers gathered nearly 160,000 tweets containing the word “flu.” They collected tweets from late September 2013 through March 2014.
The main goal was to make sense of the tweets and measure how accurately Twitter reflected official surveillance reports and lab-confirmed cases of the flu.
Not every flu tweet among the hundreds of thousands represented a “valid” case of the flu. Researchers tested filtering and classifying methods to narrow down which tweets came from sick people—and which were just “noise.”
They filtered tweets by asking two questions: Did the tweet have a link? Had it been retweeted? Then they did analysis to see which types of tweets correlated better with the city’s flu data.
The tweets that best reflected the flu situation were ones that hadn’t been retweeted and those without a link. The findings make sense. A flu-related tweet with a URL probably links to a health article—and I can’t think of a good reason to retweet someone’s “I’m sick” tweet.
Filtering by type of tweet is only so helpful, though. The actual content matters. Researchers tested a machine-learning classifier to evaluate how well it could read tweets and decide validity. Since the machine “learns”, the researchers started by training it. They took 1,500 tweets that had been manually sorted and gave them a score based on how statistically significant each term in the tweet was. To be valid, tweets needed to reach a minimum score.
Then the researchers fed the machine 1,000 different tweets, also sorted by hand. Call it a pop quiz for the machine. Researchers were curious if it would sort the tweets the same way they had.
Here are a few examples of tweets the classifier labeled valid:
- “I hate being sick with the flu”
- “Not a good time to be hit by a flu”
- “Been home sick with the flu the last 2 days.”
The classifier labeled “getting flu shot” tweets and those with health article links as invalid. But it labeled this one invalid, too: “Now it’s my turn to have the stomach flu. Ugh.” The person tweeting that was probably actually sick—or the last one standing in a flu-stricken family.
Oddly enough—and I had to read the results section of the paper twice to get this—the machine did really well making the call about tweets the researchers had labeled valid. The classifier’s precision fell a little, though, when it had to think about tweets researchers had put in the invalid pile.
While not perfect, the classifier worked better than past classifying methods. The tweets it labeled “valid” mostly reflected the flu trends in the cities, correlating with official city-level reports. No one’s turning to Twitter to analyze the number of flu cases just yet, but it did prove more reliable than in the previous flu season.
If Twitter achieves full reliability, it will probably be a supplementary surveillance tool at best—a way to keep an eye on the flu situation in real time. Regular people don’t tweet about coming down with influenza. We call it the flu*. The CDC has a set definition of “influenza-like illnesses” for reporting purposes (100-degree fever or higher, coughing, possible sore throat, and no “KNOWN” cause apart from influenza.) We don’t.
But Twitter’s limitation could also be its strength one day. With real people giving updates in real time, Twitter could be a timely way to track flu cases—and keep an eye out for pandemics.
*In a previous study, results revealed—rather unsurprisingly—that more people tweeted about having the “flu” than having “influenza”.