Month: September 2016

Six surprises on my way from data to analysis

Originally published on November 29, 2011 in BI This Week, a TDWI publication.

What do you do when you have a bunch of data on your hands and your skills fail to make sense of it? That’s where I was the other day.

As much as I’ve written about data analysts, I’m not one myself. When, in late September, I finally closed my survey of analysts, I needed a pro to help interpret my new data. I was like the amateur cook who could flip eggs and found a side of beef in his kitchen. That was one meal I wasn’t going to cook alone.

It’s not easy to find the right guide. What skilled analyst has the time? Who cares about this kind of data? Who has the patience to answer my beginner’s questions? It’s a problem I suspect that many business people face.

I was lucky. I knew someone: He’s the expert’s expert in the Tableau Software community, the one I’m most familiar with. He’s renowned among fellow analysts for his expertise —and his generosity. He’s on the Tableau forum every day answering questions from users of all levels.

His name is Joe Mako and he is mostly self taught. Soon after two Army tours in Iraq, he went to work at a large Internet service provider managing documents. His first taste of data analysis came when his boss asked him to figure out if certain customers were getting a free ride —and with that Joe found a career and a passion.

My transition to data analyst was serendipitous.

Surprise #1. Even as bare numbers, data’s thrilling if it’s yours. It must be like coming across cave paintings. Someone had left them there, full of meaning that’s left behind to be revealed.

Surprise #2. Data analysis is harder than marketing makes it look. It looks so easy when the experts fling dimensions and measures around, dragging them on and off of shelves to make bar charts, scatterplots, and heat maps appear and reshape and altogether dazzle. (TV cooks make their acts look easy, too.)

Surprise #3. Data analysis doesn’t begin with analysis. It begins with data. I told Joe that I wanted to see the whole process from the start, so he first walked me through “the boring part”: data preparation. My export of the data from the service provider, Survey Monkey, was far from Tableau-ready. Multiple-choice questions posed special problems with their double row of headers.

Surprise #4. Data analysts have their own style. Other analysts, Joe acknowledged, would not go to the same lengths of data preparation. In fact, he suggested, some are careless.

Joe, for example, takes measures I think other analysts would skip. For example, instead of going from Excel straight to Tableau, he took the data through a third tool, Lyza. There he made the data more easily show who did not respond to all questions.

Once data was imported into Tableau, Joe checked the data once more. “I’m confirming that I have every possible combination of response ID and question ID available,” he explained deliberately and calmly. “That’s how I get my high degree of flexibility with Tableau.”

Surprise #5. The analyst’s personal interest matters. Within the first few minutes, I came to appreciate Joe’s passionate interest in this data.

One question in my survey asked about the relative importance of statistics. I wanted to know whether beginners would rate its importance lower than experienced analysts. Some marketing seems to suggest that it’s optional. In the results, however, only a few rated it unimportant.

Still, Joe didn’t like to see even those few. He didn’t say anything for a few seconds after the stacked bar chart appeared. His cursor ceased the usual circular movement that preceded leaps into new views. “That throws me for a loop,” he said. “It just seems strange to me that someone would say that statistics is unimportant.”

“Why don’t we compare these guys with someone else?” I said, “or see what skills they do value.”

“Let’s see,” he said, enunciating with delicious anticipation. These respondents would now either save themselves or prove themselves idiots. His cursor circled busily again, and he flipped through a new succession of views and dialog boxes.

Surprise #6. Some data’s a dud. We found little about respondents who judged statistics of low importance. We couldn’t explain it except to assume that they had understood the question differently from others. I had written a faulty question.

The next questionnaire will be better. What may surprise me, though, is to find an expert in questionnaire design as generous with his time as Joe is with his data analysis.

How to find a story in data: What a news reporter would do

Originally published on December 15, 2015 in BI This Week, a TDWI publication.

A data analyst raised her hand in a class I taught on data storytelling and asked the question I hadn’t even thought about since journalism school: How do you “see” a story in a jumble of facts?

It’s a novel problem for data analysts, but it’s an old one for journalists. In fact, as confusing as the task seems to analysts, the confusion is a mystery to journalists. Don’t analysts know a story when they see one?

Now in the grand new confluence, journalists use data and analysts tell stories — and each side shudders with the other’s ham-handed work. Yet, as other once-irreconcilable factions have done and others may do yet, we all might as well get used to it and learn from each other.

What advice do journalists give analysts about seeing a story? I thought I’d find an easy answer with Google, and searches came up with page upon page of advice — just about all of which stayed on the data analysis side of the chasm. Not one looked across to journalism.

I gave an impromptu answer to the data analyst: Take off your analyst’s hat and put on your journalist’s hat. Here are a few approaches to do that.

Focus on the audience. Stop thinking about the data and think of what the audience wants to know of what it means, what’s new, or what’s different.

Think of what story you would tell a member of the audience over coffee. Forget your grand entrance, forget the brass band, forget about your boss staring at you. Just tell the story simply and plainly between sips of coffee. What aspects would you emphasize and what would you leave out? How would you structure it? You might find your story’s germ there.

Is anything significant in your data? Events become newsworthy with timeliness, proximity, novelty, or impact. The reporter covering a house fire, for example, may ponder various angles: a house that burned down within the news medium’s area is more significant than one outside of it. Yesterday’s fire is more significant than last year’s fire. The mayor’s house is more significant than those of most former mayors. On the other hand, George Washington having slept in the house even one night trumps everything.

Look for anomalies. Everyone’s heard of “man bites dog,” the anomaly that explains what becomes news. “When everything goes as you expect — the sun comes up, spring follows winter, the airplane works flawlessly — there’s no story,” writes Stephen Denning in The Leader’s Guide to Storytelling (2011; Jossey-Bass). “Paying attention to apparent anomalies is one of the reasons that we have survived as a species.”

Remember that the data is not necessarily the story. This is the most common discovery I’ve heard from data analysts. A vice president at AT&T once told Fern Halper, now director of TDWI Research for advanced analytics, to just tell him something that is 80 percent correct. Don’t get too get down in the weeds. “For him,” she said, “good enough was good enough.”

Max Galka, a cofounder of Revaluate, an apartment-rating service for renters in Manhattan, found that his customers wanted simple data. “You have to focus on the high level,” Max said. At first, he displayed data the way he likes it. “I wouldn’t put much credence in a building’s overall score [if] there wasn’t any detail behind it,” he said. In fact, he could offer deep, rich data on scores of apartments, “but consumers like [simplicity].”

He tried to lure people into logging into the site to get rich data in elaborate tables and hierarchies, the way he likes it. Few did. “One guy checked 10 or so buildings every week without logging in.”

Deciding what data to show, lose, or summarize has to be guided by audience and medium. What does the audience really want to know? What does it know already? What incomplete stories can you support or question?

Is the anecdote about the guy checking so many buildings without logging in meaningful as data? No, it’s about just one person — but it’s the story that’s remembered and retold. Galka, a data analyst, had taken off his data analyst’s hat and put on his journalist’s hat.