There’s always a food angle, even in text analytics

Text analytics was one of those things I heard about every so often. Like so many terms in this business, the term comes out of a speaker’s mouth or PR person’s press release only to blow away. There’s no story, no context, nothing to chew on.

Then came a press release at BI This Week with a rare combination: surprise and concreteness. It said text analytics would help with food safety. I’m all for food, but I had no idea what text analytics had to do with it.

I emailed UK-based Linguamatics, publisher of the nifty tool they call I2E. What’s this I hear about food? Product manager Phil Hastings, ready to call it a day in Croatia, called to explain the features to me, barely post-breakfast and not fully verbal. I2E was indeed a powerful little thing, but I still didn’t get the food angle.

It wasn’t until I got William Hayes on the phone that things started making sense. He’s director of library and literature informatics at pharmaceutical research company Biogen Idec. They don’t do food, but close enough.

If you think the Sunday New York Times is enough for one day, consider what the research community has to bear. Hayes says, “If you’ve got 20 million articles to read, where do you start?’

“The research industry works under a tougher knowledge model than terrorist intelligence gathering,” says Hayes. “Our ability to tap that ocean of literature is like dropping a line into the ocean for fish.”

In general, a scientist can read 150 to 200 full text journal articles a year, he explains. A curator can review about 100 abstracts a day “for a few days before you start going nuts.” Text mining is the only way to keep up with the ocean of literature produced each year.

The food industry fries potatoes, but it also has to keep a lookout on research.

TNO information analyst Fred van de Brug told me the acrylamide story: Most people in the food industry missed the first warning. Scientists had published a discovery in 2000 about a possible carcinogen known as acrylamide, which can develop in starch-rich foods like potatoes as they are fried. By the time the warning finally hit the public media in 2002, millions of people became frightened, perhaps unnecessarily. Text mining would have given food processors time to head off a crisis.

I2E is more agile than standard text mining. You can learn to use it in a few hours. Hayes told me, “If you can remember bits of grammar and have some concept of what you’re researching, it’s a piece of cake.”

It’s a story in progress for BI This Week.

The data industry thrives on conversation. Please submit a comment.

Other recent posts

Bohemian Grove a la BI

The Bohemian Grove of the BI industry convenes for the fifteenth time in just three weeks. Naturally, you ask the obvious question: Are you serious? The Grove? A summit? The answer begins with a fond recollection of the Grove. If you’ve never attended the Bohemian Grove yourself — I haven’t, though I live in the… Continue Reading

Favorite Star Trek, a data story

This story shows how elemental data stories really are. Humans come ready to tell and hear them, requiring no plug-ins at all. This young person can do a good job of it. There was a question, followed by data, then questions and answers, and and finally a conclusion. It’s all there. It’s elementary. Sure, this… Continue Reading

Bad stories stop good data at the water cooler

We agree by now that data’s a good compass. One neglected question is tougher: Which map? Everyone’s known the kind of “grouchy guy” TDWI instructor Kellee M. Franklin, Ph.D tells about. This guy knew better than most of his co-workers about how their Washington, D.C. defense agency worked. And he was frustrated. Over the years,… Continue Reading