Category: Data analysis

Democratic pollster: Hillary campaign’s data malpractice

Hillary Clinton’s data analysis failed her — even with the help of Barack Obama’s 2008 data cruncher. The problem, says a Democratic pollster, wasn’t in how they crunched the data. The problem was the data they ignored — with a result that’s rarely so clear in business.

Democratic pollster and strategist Stanley Greenberg explained in a blog post a few days ago.

… When campaign developments overtake the model’s assumptions, you get surprised by the voters — and this happened repeatedly. … Astonishingly, the 2016 Clinton campaign conducted no state polls in the final three weeks of the general election and relied primarily on data analytics to project turnout and the state vote. They paid little attention to qualitative focus groups or feedback from the field, and their brief daily analytics poll didn’t measure which candidate was defining the election or getting people engaged.

Some on the team were worried, such as campaign chair John Podesta. He wanted to fire the data guy, Robbie Mook. But Clinton refused, recalling his work for Obama.

The trouble was that Hillary lacked Obama’s star power, something she probably understood but dismissed. Listen to her post-election interviews and you’ll hear her miss the people point again and again. The difference created a soft but crucial margin that put Obama over the top and left Clinton losing to a candidate no one should have lost to.

Without the Obama zing, Mook was riding bareback. The data analysis itself had to be right on, but it wasn’t — having been selected on bad assumptions that went unmodified by what sounds like a smug disregard for all that fell outside of the model.

Data: it’s just notation, not reality

The always fascinating Donald Farmer, former Qlik exec and now Treehive Strategy principal, has news for users of data business: “Data isn’t the real world.” It’s just a reflection that’s framed by stories we tell ourselves.

Stories come first, contrary to the data industry’s dubious vision. Data, the marketing likes to imply, is a divine compass from a virginal birth. Just get some and you’ll know the way.

There was no virginal birth for data but, as Donald Farmer illustrates, there is jazz. In his presentation to two dozen industry experts at this year’s annual Pacific Northwest BI and Analytics Summit, you can try this: Force a John Coltrane song into musical notation, then give it to 10 jazz musicians. They’ll produce 10 different songs — and not one will be Coltrane’s song.

“People say we’re recording [business events],” said Donald, “but we’re not. We’re notating it. It’s a representation. Sampling is more like it.” He’s not the first one to say such things, but it takes someone of Donald’s authority to win much notice.

Data needs interpretation, and that’s always based on assumptions. “When we say we ‘lost an opportunity,'” he said, “that’s just a story we tell ourselves.” Sales people often come back from meetings gloomy about lost sales. “They say, ‘I’m going to miss my quarterly target, or my girlfriend will leave me because I couldn’t give her the vacation I promised.’ We think that’s the real world.”

The “lost” sale may be not be lost for long, such as when the prospect comes back in six months after the competitor failed to deliver. The salesperson may also cultivate a trusted-advisor role and win in the long run. And the girlfriend leaving just because she couldn’t go to Cancun, well, maybe that’s a good thing.

Even in IoT (Internet of Things), what’s assumed to be pure data, hot off the sensor, was configured based on beliefs. What is “just a binary signal” is limited, for example, to a given spectrum.

What’s a business person to do? Farmer suggests that data users “walk back down the ladder” and to inspect any unconsciously adopted limits. There, on the lower rungs of the mind, you might find unfounded assumptions, stories, and alternate premises.

Donald’s observations stirred up concerns, of course. Suzanne Hoffman, veteran BI software executive now with ZenOptics, asked about the effect of too many individual interpretations. “That’s chaos,” she said. “You can’t have that.” Donald replied that that’s just competitive advantage: “Businesses do things in different ways,” he said. Suzanne: “Isn’t the goal of methodology to accept thinking ‘outside the box’?” Donald: “Methodology can get in the way of doing that.”

Merv Adrian, vice president of research at Gartner, said, “It’s the difference between implicit and explicit…We live every day in the implicit set of choices and the ideology that represents. … If we can deconstruct how we got here, we might make different choices.”

Ideology is embedded even in the design of analysis tools. Tableau makes certain things easy for those assumed to be using it, skilled analysts (at least according to Qlik dogma). They are different from the users Qlik assumes it serves, everyday business people. Qlik’s users, less skilled in analytics, won’t have to face statistically-laden trend lines, Donald explains — though he hasn’t yet said what Qlik offers instead.

Donald’s forthcoming book will go into far more depth on the subject in the first half. The second half will address handling ambiguity. He expects it to be out in the second quarter of 2018.

Six surprises on my way from data to analysis

Originally published on November 29, 2011 in BI This Week, a TDWI publication.

What do you do when you have a bunch of data on your hands and your skills fail to make sense of it? That’s where I was the other day.

As much as I’ve written about data analysts, I’m not one myself. When, in late September, I finally closed my survey of analysts, I needed a pro to help interpret my new data. I was like the amateur cook who could flip eggs and found a side of beef in his kitchen. That was one meal I wasn’t going to cook alone.

It’s not easy to find the right guide. What skilled analyst has the time? Who cares about this kind of data? Who has the patience to answer my beginner’s questions? It’s a problem I suspect that many business people face.

I was lucky. I knew someone: He’s the expert’s expert in the Tableau Software community, the one I’m most familiar with. He’s renowned among fellow analysts for his expertise —and his generosity. He’s on the Tableau forum every day answering questions from users of all levels.

His name is Joe Mako and he is mostly self taught. Soon after two Army tours in Iraq, he went to work at a large Internet service provider managing documents. His first taste of data analysis came when his boss asked him to figure out if certain customers were getting a free ride —and with that Joe found a career and a passion.

My transition to data analyst was serendipitous.

Surprise #1. Even as bare numbers, data’s thrilling if it’s yours. It must be like coming across cave paintings. Someone had left them there, full of meaning that’s left behind to be revealed.

Surprise #2. Data analysis is harder than marketing makes it look. It looks so easy when the experts fling dimensions and measures around, dragging them on and off of shelves to make bar charts, scatterplots, and heat maps appear and reshape and altogether dazzle. (TV cooks make their acts look easy, too.)

Surprise #3. Data analysis doesn’t begin with analysis. It begins with data. I told Joe that I wanted to see the whole process from the start, so he first walked me through “the boring part”: data preparation. My export of the data from the service provider, Survey Monkey, was far from Tableau-ready. Multiple-choice questions posed special problems with their double row of headers.

Surprise #4. Data analysts have their own style. Other analysts, Joe acknowledged, would not go to the same lengths of data preparation. In fact, he suggested, some are careless.

Joe, for example, takes measures I think other analysts would skip. For example, instead of going from Excel straight to Tableau, he took the data through a third tool, Lyza. There he made the data more easily show who did not respond to all questions.

Once data was imported into Tableau, Joe checked the data once more. “I’m confirming that I have every possible combination of response ID and question ID available,” he explained deliberately and calmly. “That’s how I get my high degree of flexibility with Tableau.”

Surprise #5. The analyst’s personal interest matters. Within the first few minutes, I came to appreciate Joe’s passionate interest in this data.

One question in my survey asked about the relative importance of statistics. I wanted to know whether beginners would rate its importance lower than experienced analysts. Some marketing seems to suggest that it’s optional. In the results, however, only a few rated it unimportant.

Still, Joe didn’t like to see even those few. He didn’t say anything for a few seconds after the stacked bar chart appeared. His cursor ceased the usual circular movement that preceded leaps into new views. “That throws me for a loop,” he said. “It just seems strange to me that someone would say that statistics is unimportant.”

“Why don’t we compare these guys with someone else?” I said, “or see what skills they do value.”

“Let’s see,” he said, enunciating with delicious anticipation. These respondents would now either save themselves or prove themselves idiots. His cursor circled busily again, and he flipped through a new succession of views and dialog boxes.

Surprise #6. Some data’s a dud. We found little about respondents who judged statistics of low importance. We couldn’t explain it except to assume that they had understood the question differently from others. I had written a faulty question.

The next questionnaire will be better. What may surprise me, though, is to find an expert in questionnaire design as generous with his time as Joe is with his data analysis.

Alteryx and Tableau, yin and yang

Many of those who watch the amazing acrobatics in Tableau or other visualization tool find there’s one more spectacle to come: setting up the data — cleaning it, fixing incomplete sets, combining sources, and so on. Only when that’s over can real analysis commence.

Many advanced users still prefer to use a specialized tool for prep. Strong contenders have appeared for graduates of Excel, including Paxata and Trifacta. But it’s Alteryx I finally got a look at not so long ago.

I was curious about Alteryx. It seems to have become the go-to among advanced users of Tableau, which at first seemed odd. To my novice eye, Paxata and Trifacta seem more like Tableau, so why wouldn’t users flock there instead? Based on what I’ve heard from one Tableau super-user I know, Joe Mako, the answer must be in the contrast. Alteryx and Tableau just approach work differently.

Before I talked to Joe, I followed Alteryx chief scientist Dan Putler as he walked through a data analysis challenge. The question: Which of many alumni who have never donated any money are most likely to give some if asked in a new donation drive? A certain public university’s development office would solicit the most likely ones by phone. As if that weren’t enough work, the development staff could make no more than about 10,000 calls per year. Who should they call first? Data on each alumnus included age, area of study, social or familial associations with the university, and a handful of other facts. The answer: the alumnus’s area of study. Putler showed it in a line chart, the results of three of his competing models compared with past experience.

It was an interesting problem, but getting to that answer was most of the fun.

Putler describes Alteryx as a “visual programming framework,” he says, that’s often compared to Visio, the Microsoft diagramming application. Just like in Tableau, its drag and drop interface is friendly to beginners. Unlike Tableau, coding is there for the advanced. A wide range of ready-made possibilities line an upper row of the workspace. Handy documentation has explanations and samples, effective even for those whose ghost-like memory of a statistics class years ago lingers on the edge of awareness.

Also unlike in Tableau, all the details are right out in the open. You see exactly what happens to the data as you do what you will with it.

Putler’s models grow into a diagram that shows exactly what will happen to the data at every step. This is the part of data analysis that resembles detective story, where you sift facts to find the most cogent narrative.

That transparency and control is why Joe Mako prefers Alteryx. Joe is the almost legendary Tableau super-user and volunteer guide. For more than five years, he’s spent hours every week helping even some of the most advanced users figure out how to make Tableau do what they want it to do.

“It’s like having a kitchen,” he says, “and Tableau and Alteryx are your appliances. You can cook without them, but your life in the kitchen is so much better if you have them.” Even better if you use them the way they work best.

“Tableau and Alteryx are wonderful together,” he says. “To me, they’re like yin and yang. They’re both wonderful, but they’re both challenging.”

Tableau’s goal is to make its complexity invisible, which is both good and bad. It powerfully enables the cycle of visual analysis, the uninterrupted flow of questions, insights, and more questions. He says, “No one else comes close.”

But the magic has a flip side. Though the unseen magician almost always guesses correctly what you want to do, it can be fooled by unusually complex work — such as the kind Joe often does. That’s when the magic fails. The computation logic could change without the user’s knowledge or control. A feature known as data densification seems to be one of Joe’s main culprits. (See his video on data densification.)

If Tableau employs an unseen magician, you might say that Alteryx employs a fully visible carpenter. What it does, it does in full view, consistently. Joe says, “It’s the tool you turn to once you need to work with a millions of records and do hard computing and make it a repeatable, self-documenting process instead of a manual, brute-force, hand jammed thing. It lets me codify a thought process.”

Though complex computations can be done without writing code, you can write code if you want to. The user with a sketchy knowledge of some complex predictive logic can open a module with a use case and a description of what it does at input and what comes out on the other end.

Even the icons signify with consistent colors and shapes what each one does. In a series, they reveal at a glance a chain of functions. Joe says, referring to a process he’s constructed, “I see here that I’m pulling the data in, joining it, and I split it out into some groups, and then I put it all back together one piece at a time, precisely how I want it to be pulled back together, and then output it. I get exact control.”

The main downside, he says, is weak interactive visualization. He believes Alteryx simply wasn’t designed for it.

You pick the tool for the job: Tableau to do the visual analysis, and Alteryx to do the tech-side stuff. Joe says, “Together, they’re nirvana.”

My latest in Information Management: “‘Sexy'” Data Science is a Team Sport”

The word got out last year: data scientist is the “sexiest job,” a late-2012 declaration by the renowned Tom Davenport of “Competing on Analytics” fame. Trouble is, “sexy” goes bad faster than fish.

“Data scientist,” still fresh, is my word of the year. In 2013, the data analysis industry discovered it, many loved or hated it, but most of all, we repeated it. Google Trends shows the mention of it soaring like the 1990s Dow Jones Industrial Average — and you know what happens next.

Alert as data scientists are to patterns, I wonder if many don’t shudder at the “sexy” label. If so, they might have had some comfort from a discussion around the big table at the Pacific Northwest BI Summit. There, calm conversation displaced the industry’s noise around the topic for nearly two hours last summer.

Read the rest of it here, on the Information Management site.