Home » BI industry » events

Category: events

Data lake: compositional or architectural?

Is the data lake following the typical path for new technology? Merv Adrian, research VP, data management and integration and Gartner was talking about data lakes and big data projects at the just-concluded Pacific Northwest BI and Analytics Summit. Josh Good, senior director of product marketing at Qlik asked the question.

Merv’s answer:

That’s a terrific question. We’re talking about a phenomenon of some recency which is the notion of the new platform sell. [It’s] not a new application, not a new function, but a new platform designed to replace existing ones or supplement them (usually the first until they figure out that’s not practical). And that, I think, is the larger market failure … or the blunting of the thrust that there’s this new opportunity to build new platforms.

I’m relatively convinced that people coming into the market now are not thinking about the replacement of the end to end. They are looking for parts. If they’ve gotten at all sophisticated or knowledgable about how to achieve the outcome that they presumably have defined, then they have put together in their head at least some sort of chart they can draw on the wall, which is a bunch of boxes that connect to one another with flows, and they’re identifying the APIs among them.

That’s becoming an issue especially as we move to the cloud and people start talking about services-based architectures and are thinking about the way they want to get to where they want to go is a composition exercise, not an architecture one.

BI Summit / Goin’ up north where the wind blows tall

I’ve never figured out why one hard-thumping song by Tom Waits brings to my mind the annual Pacific Northwest BI and Analytics Summit. Yet it does, even now as I prepare for the six-hour drive up to Grants Pass, Oregon. It’s my sixth consecutive time, and the Summit’s sixteenth.

I start out the drive with “Goin’ Out West” on my mind. “I’m goin’ out west where the wind blows tall / ‘Cause Tony Franciosa used to date my ma.” By the time I arrive on the Weasku Inn’s big lawn, dig out a beer from the ice chest, and say hello to the nearest person, the song’s gone. It’s show time.

The pessimism of “Goin’ Out West” seems like a raspberry at the event’s breezy conversation and such traditions as grilled salmon dinner on the deck, tequila shots later, and friendly conversation until you can’t stay up any longer. Very late, you can glance up at one window and imagine Clark Gable mourning Carole Lombard. He spent two weeks doing that up there.

But forget the raspberry. It’s just fun. Though the leading men and women there do obsess about technology and bluff about everything else, this is a summit, you know. Wind happens at high altitude, and everyone’s got altitude here.

Astute readers will observe that “Goin’ Out West” makes fun of those who, it would seem, should stay home. “I’m no extra, baby, I’m a leading man,” says Waits’ character. He drives his “Olds 88” with “a hole in the roof the shape of a heart.” He’s “goin’ out west where they’ll appreciate me.” He’s headed for Hollywood.

No one wears a name badge here. Anyone can hang out on the deck and be a leading man or leading woman. Everyone knows each other or is about to. You can change your name to Hannibal or maybe just Rex.

Twitter hashtag is #BISUM.

BI Summit / Putting one more V on big data: virtue

Big data needs a bigger heart than it’s shown so far — essentially the point that Jill Dyché will make this Friday at the sixteenth annual Pacific Northwest BI and Analytics Summit in Grants Pass, OR.

Organizations have a responsibility to improve lives, as she puts it, “one citizen, patient, taxpayer, sports fan, and dog at a time.” To report on her presentation, which precedes a 90 minute discussion among 20 industry experts and observers, will be three dutiful reporters: longtime industry observer Steve Swoyer, TechTarget executive editor Craig Stedman, and me.

Jill’s session will be one of four. The first two occur on Friday, one on Saturday, and the last one on Sunday. The three others are by Donald Farmer, recently of Qlik and now of his own Treehive Strategy, on the analytic experience; Mike Ferguson of his own, UK-based Intelligent Business Strategies on the new-and-cool edge analytics; and Merv Adrian of Gartner on data lake architectures.

Jill’s topic continues on her theme of last year. She told how a dog shelter using pre-digital processes sent a dog to be euthanized just as would-be adoptees asked to take the dog. That was sad, but the eventual adoption of digital processes, which she drove, certainly prevented future tragedies.

Getting for-profit organizations to use data for more than profit might be harder. Do companies really care about philanthropy? Or does most business leadership believe that one-offs are good enough? Is it good enough to ally with the Sierra Club?

We’ll see what she and others have to say.

Twitter hashtag is #BISUM.

Free-the-data movement meets privacy

Back when data was little and simple, self-service analysis advocates started the chant, “Free the data!” IT stood in the way, they said. Fast forward to 2016: “democratized data” has become common, but so has public concern over privacy.

That nettlesome struggle drove a discussion that now stands as the data industry’s’ most important discussion of 2016. Around the conference table at last summer’s Pacific Northwest BI Summit — the annual, invitation-only confab held in Grants Pass, Oregon — two dozen data leaders pondered the issue for almost two hours. They concluded with an idea that broke open industry assumptions.

UK-based consultant Mike Ferguson told of a meeting held on continental Europe. He expected the usual usual stakeholders, such as from marketing and HR. But alongside them were two staff from the corporate counsel’s office. The lawyers made it clear that anything decided there would need their approval.

Their worry? Compliance with the European Union’s impending data privacy law, the General Data Protection Regulation. When it takes effect in 2018, privacy violations — including failure to erase individuals’ online presence on request — could amount to 4 percent of global revenue. Other regions will likely comply, eager to ensure continuing access to the EU market and to ensure access to EU data. Across the globe, the old democratization-versus-privacy is just about to grow some big, sharp teeth.

It poses a dilemma. Everyday business now requires ready access to data. Even compliance with new privacy regulations requires access even as the regulations seek to limit it.

At the problem’s root, says Ferguson, is data integration. Multiple platforms and tools have evolved to serve big data’s proliferating, specific workloads. Streaming data, Hadoop, the enterprise data warehouse, NoSQL and others chug away, each one possibly processing another platform’s data. And all that data keeps coming in faster and faster.

Data integration’s too expensive

“What I hear from clients,” said Ferguson who is managing director of Manchester-based Intelligent Business Strategies, “is that the cost of data integration is way too high.” Skills are spread across lots of tools, everything gets re-invented continually, metadata is fractured or lost entirely as it runs through multiple tools, and there’s just too much repetition all around. Data integration among platforms seems to become more complex all the time.

Self-service data integration is cheap. Many in IT like it. But, says Ferguson, it quickly results in “a kind of Wild West.” Data moves uncontrollably, with no one guarding the sources. Users apply countless tools for data prep, ETL, data integration, and other functions, and silos proliferate.

“There’s got to be better way,” said Ferguson. He suggests supplanting the “data lake” with what he calls a “data reservoir”: a governed[ replaced “organized”; obviously it’s organized. “Governed” comes from the following slide.] collection of raw, in-progress and trusted data that incorporates multiple stores and streams. The “reservoir” would define data once to run anywhere and supply info fast.

“The smart thing is to offer virtual views, Amazon-like,” said Ferguson. Instead of copying the data, it would be offered in virtualized form, ready to use but not copied. Data’s “Wild West” would be tamed with riding stables: Ride a trusted horse on a known trail.

Local policies could be applied as the data’s dispensed. Users with proper rights would see the data. But not those without rights would be told, “Sorry, Dude, you can’t see it. Wrong jurisdiction!” 

Urgency

Underscoring the urgency of controlling data, vice president of marketing at IBM Harriet Fryman told about a crashed drone on her roof and an unsettling tweet. The tweet read, “I think my drone is on your roof. Can I have it back?” Fryman went to her roof and, sure enough, there was a crashed drone. As the owner explained later, his drone was equipped to send one last photo home before it crashed. From that image, he matched the visible roofline with a Google Maps satellite view, and from there he followed a circuitous path to Fryman’s Twitter account.

Meanwhile, explained SAS vice president of best practices Jill Dyché, executives are fed up waiting for a solution to the problem Ferguson described. Dyché has observed “an utter lack of confidence” among executives in the ability of organizations to govern data.

Donald Farmer, principal at TreeHive Strategy, raised another problem. “It’s incredibly difficult to prove something’s been deleted,” especially when the data’s already been propagated. “How do you track it back?”

Solutions

The typically voluble group went quiet for a moment, attesting to the challenge.

A surprising suggestion came from Donald Farmer, principal at TreeHive Strategy: The solution may be organizational, he said, not technological. The risk of violating privacy laws could be minimized if companies isolated risk with spinoffs. The mother company would grow as far as it could with the current technology, governance, and practices. Then it would spin off a subsidiary that would own the risky data along with the liability. Eventual innovations would transfer homeward, abandoning the risk with the spinoff’s shell.

Merv Adrian, however, disagreed. “I don’t believe that for a minute,” he said. “They’ll find a way around it,” he said. Later, he wrote to me in email, “Companies don’t do spinouts lightly. It’s disruptive, complex and costly.” The incentive would have to be strong.

Farmer had a second, even more intriguing idea: “One of the myths is that we need more information,” said Farmer. If we think again about the data we use and why we use it, he explained, we might find just about the same value with bayesian noise added. Data can be slightly wrong, with enough noise inserted to prevent hacks, and still have equal benefit to business users.

That is, the data doesn’t have to be right, just slightly wrong — at first glance an outlandish idea. It invited quips, perhaps a natural response to an implicit admission that technology may not be the answer. But who can even hope that until-now unknown difficulties, founded on a new world of unheard of complexity and an aroused public, could be solved with technology alone? Farmer’s idea, or something like it, may prove itself yet.

These are hard problems,” observed Robert Eve, director of product management, data and analytics software at Cisco Systems, with one last quip before lunch. With a colloquialism denoting the need for deep thought and newfound finesse, he added, “At run time, you have to understand the kung fu.”

Qlik finally set to leapfrog Tableau?

Who’s your rival? I carelessly asked a Qlik person at the company’s annual analyst reception Monday night in Miami if she hadn’t once worked for Tableau. Her revulsion was immediate. “No! Never!,” she said.

We smiled. There was so much more to talk about. For one thing, how will private equity change things? Qlik wasn’t doing so well at the public-equity thing, you may recall, and over the last few months they went private.

Knowledgeable Qlikkers assure me with apparent sincerity that “good things” will ensue. I can think of no reason to doubt them. It must be nice to have the riffraff off your back, which one experienced business person described to me as “having ants in your pants.”

Tableau’s still public, though not quite as shiny as it was. It has that well worn feel of a recently plush restaurant. No one notices in the mood lighting and boozy good vibes, but the cleaning crew sees it plainly enough when the bright lights go on after closing.

To be rid of ants might just set Qlik on the way to leapfrogging Tableau. Old-timers will recall that Tableau was once the upshot that caught Qlik by surprise. Now Qlik might show off what it’s learned.

We’ll see on Tuesday and Wednesday.