Home » BI industry » events

Category: events

Free-the-data movement meets privacy

Back when data was little and simple, self-service analysis advocates started the chant, “Free the data!” IT stood in the way, they said. Fast forward to 2016: “democratized data” has become common, but so has public concern over privacy.

That nettlesome struggle drove a discussion that now stands as the data industry’s’ most important discussion of 2016. Around the conference table at last summer’s Pacific Northwest BI Summit — the annual, invitation-only confab held in Grants Pass, Oregon — two dozen data leaders pondered the issue for almost two hours. They concluded with an idea that broke open industry assumptions.

UK-based consultant Mike Ferguson told of a meeting held on continental Europe. He expected the usual usual stakeholders, such as from marketing and HR. But alongside them were two staff from the corporate counsel’s office. The lawyers made it clear that anything decided there would need their approval.

Their worry? Compliance with the European Union’s impending data privacy law, the General Data Protection Regulation. When it takes effect in 2018, privacy violations — including failure to erase individuals’ online presence on request — could amount to 4 percent of global revenue. Other regions will likely comply, eager to ensure continuing access to the EU market and to ensure access to EU data. Across the globe, the old democratization-versus-privacy is just about to grow some big, sharp teeth.

It poses a dilemma. Everyday business now requires ready access to data. Even compliance with new privacy regulations requires access even as the regulations seek to limit it.

At the problem’s root, says Ferguson, is data integration. Multiple platforms and tools have evolved to serve big data’s proliferating, specific workloads. Streaming data, Hadoop, the enterprise data warehouse, NoSQL and others chug away, each one possibly processing another platform’s data. And all that data keeps coming in faster and faster.

Data integration’s too expensive

“What I hear from clients,” said Ferguson who is managing director of Manchester-based Intelligent Business Strategies, “is that the cost of data integration is way too high.” Skills are spread across lots of tools, everything gets re-invented continually, metadata is fractured or lost entirely as it runs through multiple tools, and there’s just too much repetition all around. Data integration among platforms seems to become more complex all the time.

Self-service data integration is cheap. Many in IT like it. But, says Ferguson, it quickly results in “a kind of Wild West.” Data moves uncontrollably, with no one guarding the sources. Users apply countless tools for data prep, ETL, data integration, and other functions, and silos proliferate.

“There’s got to be better way,” said Ferguson. He suggests supplanting the “data lake” with what he calls a “data reservoir”: a governed[ replaced “organized”; obviously it’s organized. “Governed” comes from the following slide.] collection of raw, in-progress and trusted data that incorporates multiple stores and streams. The “reservoir” would define data once to run anywhere and supply info fast.

“The smart thing is to offer virtual views, Amazon-like,” said Ferguson. Instead of copying the data, it would be offered in virtualized form, ready to use but not copied. Data’s “Wild West” would be tamed with riding stables: Ride a trusted horse on a known trail.

Local policies could be applied as the data’s dispensed. Users with proper rights would see the data. But not those without rights would be told, “Sorry, Dude, you can’t see it. Wrong jurisdiction!” 

Urgency

Underscoring the urgency of controlling data, vice president of marketing at IBM Harriet Fryman told about a crashed drone on her roof and an unsettling tweet. The tweet read, “I think my drone is on your roof. Can I have it back?” Fryman went to her roof and, sure enough, there was a crashed drone. As the owner explained later, his drone was equipped to send one last photo home before it crashed. From that image, he matched the visible roofline with a Google Maps satellite view, and from there he followed a circuitous path to Fryman’s Twitter account.

Meanwhile, explained SAS vice president of best practices Jill Dyché, executives are fed up waiting for a solution to the problem Ferguson described. Dyché has observed “an utter lack of confidence” among executives in the ability of organizations to govern data.

Donald Farmer, principal at TreeHive Strategy, raised another problem. “It’s incredibly difficult to prove something’s been deleted,” especially when the data’s already been propagated. “How do you track it back?”

Solutions

The typically voluble group went quiet for a moment, attesting to the challenge.

A surprising suggestion came from Donald Farmer, principal at TreeHive Strategy: The solution may be organizational, he said, not technological. The risk of violating privacy laws could be minimized if companies isolated risk with spinoffs. The mother company would grow as far as it could with the current technology, governance, and practices. Then it would spin off a subsidiary that would own the risky data along with the liability. Eventual innovations would transfer homeward, abandoning the risk with the spinoff’s shell.

Merv Adrian, however, disagreed. “I don’t believe that for a minute,” he said. “They’ll find a way around it,” he said. Later, he wrote to me in email, “Companies don’t do spinouts lightly. It’s disruptive, complex and costly.” The incentive would have to be strong.

Farmer had a second, even more intriguing idea: “One of the myths is that we need more information,” said Farmer. If we think again about the data we use and why we use it, he explained, we might find just about the same value with bayesian noise added. Data can be slightly wrong, with enough noise inserted to prevent hacks, and still have equal benefit to business users.

That is, the data doesn’t have to be right, just slightly wrong — at first glance an outlandish idea. It invited quips, perhaps a natural response to an implicit admission that technology may not be the answer. But who can even hope that until-now unknown difficulties, founded on a new world of unheard of complexity and an aroused public, could be solved with technology alone? Farmer’s idea, or something like it, may prove itself yet.

These are hard problems,” observed Robert Eve, director of product management, data and analytics software at Cisco Systems, with one last quip before lunch. With a colloquialism denoting the need for deep thought and newfound finesse, he added, “At run time, you have to understand the kung fu.”

Qlik finally set to leapfrog Tableau?

Who’s your rival? I carelessly asked a Qlik person at the company’s annual analyst reception Monday night in Miami if she hadn’t once worked for Tableau. Her revulsion was immediate. “No! Never!,” she said.

We smiled. There was so much more to talk about. For one thing, how will private equity change things? Qlik wasn’t doing so well at the public-equity thing, you may recall, and over the last few months they went private.

Knowledgeable Qlikkers assure me with apparent sincerity that “good things” will ensue. I can think of no reason to doubt them. It must be nice to have the riffraff off your back, which one experienced business person described to me as “having ants in your pants.”

Tableau’s still public, though not quite as shiny as it was. It has that well worn feel of a recently plush restaurant. No one notices in the mood lighting and boozy good vibes, but the cleaning crew sees it plainly enough when the bright lights go on after closing.

To be rid of ants might just set Qlik on the way to leapfrogging Tableau. Old-timers will recall that Tableau was once the upshot that caught Qlik by surprise. Now Qlik might show off what it’s learned.

We’ll see on Tuesday and Wednesday.

Bohemian Grove a la BI

The Bohemian Grove of the BI industry convenes for the fifteenth time in just three weeks. Naturally, you ask the obvious question: Are you serious? The Grove? A summit?

The answer begins with a fond recollection of the Grove. If you’ve never attended the Bohemian Grove yourself — I haven’t, though I live in the same metropolitan area — you may know of it as that century-old, mid-July pow-wow of leaders from big-iron industry, national politics, and old-time movie-making. Ronald Reagan slept there.

They ate, and they told stories, and they all went back to work a little more satisfied with the world as it was. Or not. Something in the stories coming through that fine Cuban cigar smoke might have stirred their hearts.

Read more

Conversation: data’s roots

Is time spent at a TDWI conference worthwhile? How would a prospective exhibitor or attendee judge beforehand? Perhaps the data would dictate — if the data could really tell the whole story.

At the recent TDWI conference in Boston, I counted a mere 18 booths in the exhibit hall. Most of the big names had stayed away, including Tableau, Qlik, and MicroStrategy. Only IBM planted itself there on the Hynes Convention Center’s big floor.

Even with much of the cavernous hall draped off, the stage designers still had a large space to fill — done with what seemed like overly abundant banquet tables. But what a surprise: The tables seemed reasonably full at the height of lunch — showing at a glance unexpected, healthy growth in attendance from last year. It was even better than the previous conference, in Chicago.

But that data was hardly all there was to it. No complex human event can be summed up so neatly, no matter what the “data driven” people insist. I asked around, starting with Dave Wells. He’s a former TDWI education director and still a guy who knows what’s going on around the organization.

Last fall, he returned to TDWI to help revive the moribund education program. He’s in the thick of the big story that has so far included an energetic new leader, sharpened marketing, new experiments with new course formats, and a return to four annual conferences instead of the disastrous five. He and I also co-teach a class in data storytelling, a welcome broadening of the agenda for the data-warehouse weary among us.

“Attendance precedes exhibitors,” said Dave. He means that the 18-booth data point doesn’t exist alone. It is just one point on a trend line, which could be flat, declining, or rising. Dave suggests that it’s a rising line, and because I think he’s a smart guy and worthy of trust, I try on that story.

I also listened to other, darker stories. One faculty member, also a smart guy, worried about rumors of a new friendliness with vendors. He calls it “whorishness.” Will the old firewall break down to allow too much compromise for education and media? Just how friendly will TDWI be, and what exactly is the plan? The imaginary trend line suggested by Dave seems to level off slightly with those questions.

The data devotees among us would at this point object to my approach. How can you compare rumors to data? But what makes them assume that data has integrity while stories, even unverified, do not? When will the data devotees hatch from that cocoon?

Information does not derive from data. Conversation and stories — in actual conversation or just anticipated — always precede and color data.

What does Boston portend for coming events in San Diego, Orlando, or Las Vegas? Obviously, TDWI’s revival has only begun, and while the new leader’s stride inspires us to see a rising trend line, he still has some thinking ahead. Overall, though, the buzz is good, and the trend line looks promising. What’s more, the catering is always better at the next event, in San Diego.

Impressions of Strata Conference

Strata buzzes. Other events go to sleep for long stretches. But Strata+Hadoop World, at least the one in Silicon Valley if not those held in New York and London, is the only event I’ve seen with buzz that comes close to the buzziest of all, the Tableau conference. And like Tableau, Strata is growing. It switched venues this year from the Santa Clara Convention Center to the much bigger San Jose Convention Center, and it still sold out.

As usual in the tech world, the tagline “make data work” merely implies the many elephants in the room, the humans. Who does the work? The lazy data sure doesn’t, and the tools are like the golden retriever that forgets basic training. The humans gather at events like Strata to train themselves to make technology do what they need it to do.

These are my impressions from a full day as reporter and industry analyst.

Mark Madsen’s storytelling session

Mark Madsen, the man many find remarkable for his “intellectual energy,” spoke for 20 minutes Wednesday morning about storytelling. How to begin a story? The usual advice, he says, is wrong. Don’t start with the data. And don’t start with “the story.” No, instead start with your intent. Business stories should always aim for action, whether it’s to explore and understand, to change behavior, or to just change minds. His intense, stimulating presentations — which he’s often made final just a few hours before, sometimes late, late at night — never wrap up without goring a sacred cow or two. The cows this time were visualization bulls Edward Tufte and Stephen Few, who advocate data density. But that’s not always appropriate, says Madsen. It’s more effective, he says, to ask whether the audience wants to see the details. Or do they just want “the basic interpretation so they can get on with the decision making and go out to play a round of golf”? If Florence Nightingale had gone by today’s dogma, she would have shown detailed charts to doctors — and persuaded fewer of them to sanitize their surgical instruments, thus killing thousands more from infection.

Teradata’s AppCenter

I had listened to the Teradata senior director of Aster marketing Manan Goel and vice president of UDA product marketing Chad Meley explain the new Teradata Aster AppCenter and its “build, deploy, consume” theme for self-service with big data. I had heard all about how it lets users fit functions together. I had heard about user-made apps that monitor a retail customer’s path to purchase or a patient’s path to surgery. It all sounded like fun, actually, and I said so. Their faces brightened. Then we went on to Loom, Teradata’s tool for weaving value from Hadoop’s snarl.

Pentaho: the big question that went unanswered

I arrived at my meeting with Pentaho with a question. Because no industry observer I asked could make sense of Hitachi’s just-announced purchase of this open-source warhorse, I wondered if their action was the other shoe dropping after JasperSoft dropped into Tibco?

Is Hitachi thinking about the Internet of things? No one knew. I asked three Pentaho representatives about one reaction I repeated one particularly provocative comment: “I’m surprised anyone would see value in a company that controls so little intellectual property.” The three paused, sighed, and looked away from the table. Finally, director of big data product marketing Chuck Yarbrough began to explain his position, and director of corporate communications Rebecca Shomair got up to close the door. We control “lots” of IP, they argued. True, 80 percent of the IP contained in Pentaho’s enterprise edition is, indeed, open source.

However, what one of them called technology’s “shifting sands”—the normal advances and adjustments all business software undergoes all the time—requires regular updates of the other, proprietary 20 percent. None of them would discuss this point. I suspect that the real value is the knowledge that comes with the workforce. The whole gang is staying, which at least ensures that customer support of Pentaho Business Analytics will continue without interruption. As if considering the broad landscape ahead, one of them stared into the distance and concluded, “It’ll be interesting.” Indeed, it will be.

Four disruptions that paved the way for Paxata

I asked Paxata co-founder Nenshad Bardoliwalla about the long-dead rumor that Tableau was about to buy Paxata — pronounced pax-AH-ta — which was just an appetizer for what I knew would be a meaty conversation. He had that cheerful laugh of someone who was happy with what he has: by all accounts, an interesting and disruptive success. Paxata is a self-service tool meant for ordinary users to prepare data for analysis. I wanted to hear more of a story spun by VP of marketing Cari Jaquet: the four disruptions that paved the way for Amazon, eBay, Netflix, and now Paxata. There was the mature browser; also machine learning at scale; also distributed computing; and also the infinitely flexible cloud that expands and contracts on demand. Paxata’s now got 35 customers, with seven new ones in the last quarter alone. So I asked him when he would buy Tableau, and he gave the best laugh I heard all day.

Get-to-know-you with Trifacta

Much later, it was get-to-know-you time at my first-ever meeting with Trifacta, a Paxata rival. CEO Adam Wilson and the officially no-title marketing chief Michael Hiskey told me about “data wrangling,” the cost and security advantages of all-on-premises data, the advantages of sampling versus the all-data approach. One more thing: “It’s all about Hadoop”; Trifacta requires it. Thirty customers have signed on.

O’Reilly report on active learning

You may not know yet what active learning can do for you, but let me tell you something right now: It’s coming to an analytics tool near you if it’s not there already. Read all about active learning — essentially, a human layer over machine learning that dramatically improves accuracy — in a new O’Reilly Media report that showed up in the O’Reilly booth. Download it here.

Two new sources

I met two people I’ll call for updates and opinion: WebAction director of marketing Jonathan Geraci and Metanautix CEO Theodore Vassilakis.

Articulating buzz

“What did you hear?” is everyone’s question to those who go out to wander through buzz. What I heard or overheard is what I saw in the buffet: self-service with excellent features along with inevitable shortcomings that show up only with close inspection. You wanted things to work exactly as you expected? Silly you! At the Strata buffet, the salmon was good, the eggplant parmesan was tasty and filling, the string beans had a little bit of garlic and were not overcooked, and the green salad was exactly like the salad I’ve had in dozens of other events in each of the last three decades. But there was no dessert, not even a bowl of fortune cookies. Those who saw silver chafing dishes and expected ceramic plates and silverware rolled in cloth napkins instead found paper plates, plastic forks, and napkins that tore on mere 15-hour stubble. Will the update for Lunch Buffet have cloth napkins and metal forks? Let’s put that on the wish list.