Look, Ma. No ETL

One of the first things you learn about in business intelligence is ETL. Raw data gets harvested, washed and served. But Sandy Steier hadn’t heard.

Sandy had been busy analyzing data. For years on Wall Street, he pored over mortgage-backed securities with a tool he and peers developed for themselves.

He only learned of ETL recently. He’d become acquainted with a data architect with whom he shared a bus ride every day to and from their offices in downtown Manhattan. “I had never really spoken to him before,” Sandy recalls. “He was in a different world even though we both dealt with data.”

Sandy described to him his rapidly maturing tool. As I imagine the scene, the calm data architect suddenly twisted himself on the cramped bus seat to face Sandy. “You don’t do ETL? You work with raw data??”

No, he didn’t do any ETL, Sandy explained. “We didn’t realize how important that was,” he recalled. “We had always just stuck the raw data into the database and then realized, ‘Hey, this data’s a mess.'” He instructed users to clean it themselves. “You get the data from the horse’s mouth. You’re the expert. We didn’t realize how powerful this was.”

In Sandy’s system, you don’t worry about database design. He and his partners not only didn’t worry about ETL, they wondered how data analysis could not be done their way — import first, clean later. “It makes good sense if you can get away with it.”

A crucial factor that lets the tool work as it does is speed. It allows the 1010Data engine to calculate and recalculate repeatedly. The summaries that cubes harbor for anticipated queries are no longer necessary. Parallel processing with a columnar database runs fast enough. In place of ETL, he uses what he now calls “ELTAR,” for extract, load, and transform as required.

A hurdle, he says, is conventional beliefs held by his sales prospects. In one phone call recently, he explained to a prospect that ETL was unnecessary. The man replied, “That’s not credible.” In fine sales form, Sandy said, “Then you’ll be impressed when I prove it to you.” The prospect replied more firmly, “You don’t understand. That’s not credible.”

Actually, the technology’s credibility doesn’t matter much. The company, 1010Data, offers reporting and analytics on the cloud — invisible to customers except for the results. Sandy says, “We could have monkeys writing on scratchpads.” To those willing to try, he offers to prove it with the prospect’s own data.

Their technology’s speed allows them to do the work of dozens with a team of a few people, he says, and to finish large data warehouse projects in weeks that would otherwise take months or years. If multiple customers use the same data, such as stock market data, the time required is even less.

All without ETL.

One Response to Look, Ma. No ETL

The data industry thrives on conversation. Please submit a comment.

Other recent posts

Bohemian Grove a la BI

The Bohemian Grove of the BI industry convenes for the fifteenth time in just three weeks. Naturally, you ask the obvious question: Are you serious? The Grove? A summit? The answer begins with a fond recollection of the Grove. If you’ve never attended the Bohemian Grove yourself — I haven’t, though I live in the… Continue Reading

Favorite Star Trek, a data story

This story shows how elemental data stories really are. Humans come ready to tell and hear them, requiring no plug-ins at all. This young person can do a good job of it. There was a question, followed by data, then questions and answers, and and finally a conclusion. It’s all there. It’s elementary. Sure, this… Continue Reading

Bad stories stop good data at the water cooler

We agree by now that data’s a good compass. One neglected question is tougher: Which map? Everyone’s known the kind of “grouchy guy” TDWI instructor Kellee M. Franklin, Ph.D tells about. This guy knew better than most of his co-workers about how their Washington, D.C. defense agency worked. And he was frustrated. Over the years,… Continue Reading