Month: January 2017

Free-the-data movement meets privacy

Back when data was little and simple, self-service analysis advocates started the chant, “Free the data!” IT stood in the way, they said. Fast forward to 2016: “democratized data” has become common, but so has public concern over privacy.

That nettlesome struggle drove a discussion that now stands as the data industry’s’ most important discussion of 2016. Around the conference table at last summer’s Pacific Northwest BI Summit — the annual, invitation-only confab held in Grants Pass, Oregon — two dozen data leaders pondered the issue for almost two hours. They concluded with an idea that broke open industry assumptions.

UK-based consultant Mike Ferguson told of a meeting held on continental Europe. He expected the usual usual stakeholders, such as from marketing and HR. But alongside them were two staff from the corporate counsel’s office. The lawyers made it clear that anything decided there would need their approval.

Their worry? Compliance with the European Union’s impending data privacy law, the General Data Protection Regulation. When it takes effect in 2018, privacy violations — including failure to erase individuals’ online presence on request — could amount to 4 percent of global revenue. Other regions will likely comply, eager to ensure continuing access to the EU market and to ensure access to EU data. Across the globe, the old democratization-versus-privacy is just about to grow some big, sharp teeth.

It poses a dilemma. Everyday business now requires ready access to data. Even compliance with new privacy regulations requires access even as the regulations seek to limit it.

At the problem’s root, says Ferguson, is data integration. Multiple platforms and tools have evolved to serve big data’s proliferating, specific workloads. Streaming data, Hadoop, the enterprise data warehouse, NoSQL and others chug away, each one possibly processing another platform’s data. And all that data keeps coming in faster and faster.

Data integration’s too expensive

“What I hear from clients,” said Ferguson who is managing director of Manchester-based Intelligent Business Strategies, “is that the cost of data integration is way too high.” Skills are spread across lots of tools, everything gets re-invented continually, metadata is fractured or lost entirely as it runs through multiple tools, and there’s just too much repetition all around. Data integration among platforms seems to become more complex all the time.

Self-service data integration is cheap. Many in IT like it. But, says Ferguson, it quickly results in “a kind of Wild West.” Data moves uncontrollably, with no one guarding the sources. Users apply countless tools for data prep, ETL, data integration, and other functions, and silos proliferate.

“There’s got to be better way,” said Ferguson. He suggests supplanting the “data lake” with what he calls a “data reservoir”: a governed[ replaced “organized”; obviously it’s organized. “Governed” comes from the following slide.] collection of raw, in-progress and trusted data that incorporates multiple stores and streams. The “reservoir” would define data once to run anywhere and supply info fast.

“The smart thing is to offer virtual views, Amazon-like,” said Ferguson. Instead of copying the data, it would be offered in virtualized form, ready to use but not copied. Data’s “Wild West” would be tamed with riding stables: Ride a trusted horse on a known trail.

Local policies could be applied as the data’s dispensed. Users with proper rights would see the data. But not those without rights would be told, “Sorry, Dude, you can’t see it. Wrong jurisdiction!” 


Underscoring the urgency of controlling data, vice president of marketing at IBM Harriet Fryman told about a crashed drone on her roof and an unsettling tweet. The tweet read, “I think my drone is on your roof. Can I have it back?” Fryman went to her roof and, sure enough, there was a crashed drone. As the owner explained later, his drone was equipped to send one last photo home before it crashed. From that image, he matched the visible roofline with a Google Maps satellite view, and from there he followed a circuitous path to Fryman’s Twitter account.

Meanwhile, explained SAS vice president of best practices Jill Dyché, executives are fed up waiting for a solution to the problem Ferguson described. Dyché has observed “an utter lack of confidence” among executives in the ability of organizations to govern data.

Donald Farmer, principal at TreeHive Strategy, raised another problem. “It’s incredibly difficult to prove something’s been deleted,” especially when the data’s already been propagated. “How do you track it back?”


The typically voluble group went quiet for a moment, attesting to the challenge.

A surprising suggestion came from Donald Farmer, principal at TreeHive Strategy: The solution may be organizational, he said, not technological. The risk of violating privacy laws could be minimized if companies isolated risk with spinoffs. The mother company would grow as far as it could with the current technology, governance, and practices. Then it would spin off a subsidiary that would own the risky data along with the liability. Eventual innovations would transfer homeward, abandoning the risk with the spinoff’s shell.

Merv Adrian, however, disagreed. “I don’t believe that for a minute,” he said. “They’ll find a way around it,” he said. Later, he wrote to me in email, “Companies don’t do spinouts lightly. It’s disruptive, complex and costly.” The incentive would have to be strong.

Farmer had a second, even more intriguing idea: “One of the myths is that we need more information,” said Farmer. If we think again about the data we use and why we use it, he explained, we might find just about the same value with bayesian noise added. Data can be slightly wrong, with enough noise inserted to prevent hacks, and still have equal benefit to business users.

That is, the data doesn’t have to be right, just slightly wrong — at first glance an outlandish idea. It invited quips, perhaps a natural response to an implicit admission that technology may not be the answer. But who can even hope that until-now unknown difficulties, founded on a new world of unheard of complexity and an aroused public, could be solved with technology alone? Farmer’s idea, or something like it, may prove itself yet.

These are hard problems,” observed Robert Eve, director of product management, data and analytics software at Cisco Systems, with one last quip before lunch. With a colloquialism denoting the need for deep thought and newfound finesse, he added, “At run time, you have to understand the kung fu.”

Stephen Few: data’s “harmful ways”

Visualization guru and data-industry skeptic Stephen Few in has a worthwhile review of Weapons of Math Destruction by Cathy O’Neil.

Data can be used in harmful ways. This fact has become magnified to an extreme in the so-called realm of Big Data, fueled by an indiscriminate trust in information technologies, a reliance on fallacious correlations, and an effort to gain efficiencies no matter the cost in human suffering.

Read his review here.

Qlik road goes past white coated smart guys

An earlier version of this post, with a different conclusion and minor differences, appeared in late November.

Qlik CTO Anthony Deighton was drying his hands on the thick, almost cloth-like paper towels in the men’s room at the Miami Edition hotel at the recent Qlik analyst event. He heard another man in the room comparing the towels to some at a past hotel. There, the man said, the towels were thin, the man said. Deighton replied, “They were useless, like Tableau.”

Back in 2012, the T word was barely uttered at the gathering that year of industry analysts. Not long before, the bright and playful Tableau had just stung the plodding, script-laden Qlik in what felt like a surprise attack. This year, Qlik seemed to have regained its poise — and two dozen or so industry analysts gathered at the hotel with good paper towels to hear about the progress.

First, the analysts wanted to know about the buyout. As of late 2016, Qlik’s no longer publicly traded. CEO Lars Björk introduced Chip Virnig, a principle at the private equity firm that bought Qlik, Thoma Bravo, and now a Qlik board member. The buyout is “a very big bet” for the firm, he said, but it felt not only “safe” but also well positioned to thrive. Deighton, speaking afterward, praised the new “cloak of darkness that frees management from an old distraction, public scrutiny.

Decline of “white-coated smart guys”

Qlik sees the end of BI “as a destination,” in which “white-coated smart guys” serve hapless data consumers. This is the beginning of BI “as a platform,” Deighton said, that feeds on a wide variety of data sources, whether on the cloud or under a desk, which then supplies bits of analysis to vertical applications.

You might imagine BI disappearing into everyday business. Applications will serve specific needs and embedded apps will weave into “real work” throughout the day. Deighton cited the Uber app, which is at first glance hardly a data-analysis tool. It’s only under the hood that that shows itself.

During a break afterward, a few analysts grumbled about Qlik’s road toward the cloud: “They’re late,” said one person. Later, others seemed to agree.

Does lateness matter? As if in defense to the grumbles outside, Deighton declared, “I don’t care what competitors do. What really matters is ‘know thyself.’” Imitations usually compare poorly with the original. You’re better off knowing what you do best and doing it for all you’re worth. I agree.

Sticking to the be-who-you-are strategy, they stick with three known Qlik differentiators. One is the platform and a second is its traditional fondness for governance, which has given Qlik an edge on Tableau.

The third differentiator is the troublesome one: the “associative experience.” The concept is easy: It answers not only the direct question, “What’s in this set?” but also the implied question, “What’s not in this set?” Hey boss, it might say, I know what you asked to see, but did you notice this over here?

Actual examples of the feature at work seem scarce. Many of the supposed proofs don’t prove anything: money saved, decisions made, and other fine outcomes that fail to demonstrate the feature at work. I can recall only one example that truly illustrates the value: IT consultant Don Marks, one of four Qlik customers flown in for this year’s UnSummit, told me in a one-to-one meeting about a fraud-prevention project at a bank. They had managed to suppress fraud in areas where it had occurred. But then Qlik Sense let them see it pop up in areas they hadn’t thought to look.

Tableau users I’ve heard from seem to think little of the feature. They compare it poorly to Tableau filtering, though Qlik argues that by definition filtering would have hidden the rebounding fraud.


Deighton asked the assembled seers, “Does this resonate?” Well, sure. If you squint, you might even see his trends coming true already.

But what does it matter? What’s it matter that Qlik is, as some say, “late” to the cloud? What’s it matter that it can do some things better than Tableau or any other tool? Each constituency insists that their chosen tool is more useful. Each side trivializes the other’s advantages. Each one’s pitch to industry analyst assumes roughly the same trends. Only the emphasis varies.

My impression: Generally, Qlik seems to be building out to a bigger, bolder ecosystem. Its three differentiators — platform, governance, and the “associative” feature — contrast boldly with Tableau’s differentiators, which seem best expressed as flow, art, and expression.

Overall, Tableau looks like fun, and Qlik looks like work. Both are useful, but each tells a different story.

Which one would go on my short list depends on the type of organization. Qlik if users wanted a routine and relatively limited set of analyses to be used in mature organization. Tableau if analyses had to be more free-ranging and used by more intellectual or creative people than the average business user, in a dynamic and creative organization.

Useful stories

Any paper towel is useful depending on location and intent. To compare brands, some might turn to the Handle-O-Meter, an actual machine developed by Johnson & Johnson to measure surface friction and flexibility — the same way that industry analysts like to add up features.

But the Handle-O-Meter is useless for judging a towel’s important aspect: its message to the user. Does its plush, silky finish tell you that you’re a treasured guest worthy of comfort? Or does is say with its cheap, rapidly disintegrating fiber that you’d better hurry up and get out?

What do Qlik and Tableau tell the user? Tableau says, “Ask! Explore! Play!,” which appeals to some cultures. Qlik says, “Be serious!” which appeals more to other cultures.

Deighton’s quip is fair enough. But whichever is more useful depends on who’s asking and why.