There’s always a food angle, even in text analytics

Text analytics was one of those things I heard about every so often. Like so many terms in this business, the term comes out of a speaker’s mouth or PR person’s press release only to blow away. There’s no story, no context, nothing to chew on.

Then came a press release at BI This Week with a rare combination: surprise and concreteness. It said text analytics would help with food safety. I’m all for food, but I had no idea what text analytics had to do with it.

I emailed UK-based Linguamatics, publisher of the nifty tool they call I2E. What’s this I hear about food? Product manager Phil Hastings, ready to call it a day in Croatia, called to explain the features to me, barely post-breakfast and not fully verbal. I2E was indeed a powerful little thing, but I still didn’t get the food angle.

It wasn’t until I got William Hayes on the phone that things started making sense. He’s director of library and literature informatics at pharmaceutical research company Biogen Idec. They don’t do food, but close enough.

If you think the Sunday New York Times is enough for one day, consider what the research community has to bear. Hayes says, “If you’ve got 20 million articles to read, where do you start?’

“The research industry works under a tougher knowledge model than terrorist intelligence gathering,” says Hayes. “Our ability to tap that ocean of literature is like dropping a line into the ocean for fish.”

In general, a scientist can read 150 to 200 full text journal articles a year, he explains. A curator can review about 100 abstracts a day “for a few days before you start going nuts.” Text mining is the only way to keep up with the ocean of literature produced each year.

The food industry fries potatoes, but it also has to keep a lookout on research.

TNO information analyst Fred van de Brug told me the acrylamide story: Most people in the food industry missed the first warning. Scientists had published a discovery in 2000 about a possible carcinogen known as acrylamide, which can develop in starch-rich foods like potatoes as they are fried. By the time the warning finally hit the public media in 2002, millions of people became frightened, perhaps unnecessarily. Text mining would have given food processors time to head off a crisis.

I2E is more agile than standard text mining. You can learn to use it in a few hours. Hayes told me, “If you can remember bits of grammar and have some concept of what you’re researching, it’s a piece of cake.”

It’s a story in progress for BI This Week.

Leave a reply

Other recent posts

End of one-size-fits-all data stories

This appeared originally on the TDWI site in September behind a paywall. It’s still there, but today they’ve had the 90 days of exclusive use that I agreed to. Survey after survey reveals that about 80 percent of business users don’t use data analysis—despite all the marketing and “easy to use” tools. As if in… Continue Reading

Qlik finally set to leapfrog Tableau?

Who’s your rival? I carelessly asked a Qlik person at the company’s annual analyst reception Monday night in Miami if she hadn’t once worked for Tableau. Her revulsion was immediate. “No! Never!,” she said. We smiled. There was so much more to talk about. For one thing, how will private equity change things? Qlik wasn’t… Continue Reading

Five Tips for Better Data Stories

Originally published on September 22, 2015 in BI This Week, a TDWI publication. A “data story” sounds like such a great idea. You just mix data with storytelling and you’re done — except that most data storytellers get one thing wrong: they drown out the story with data. Such storytellers, I believe, assume that audiences… Continue Reading

Bohemian Grove a la BI

The Bohemian Grove of the BI industry convenes for the fifteenth time in just three weeks. Naturally, you ask the obvious question: Are you serious? The Grove? A summit? The answer begins with a fond recollection of the Grove. If you’ve never attended the Bohemian Grove yourself — I haven’t, though I live in the… Continue Reading