Monday, February 23, 2009

Why Isn’t Predictive Analytics a Big Thing?

 

Why Isn’t Predictive Analytics a Big Thing?

I’m baffled as to why Predictive Analytics still has the status of a fringe, niche, and advanced realm of the Microsoft Business Intelligence world. Business Intelligence without Predictive Analytics is like a bus taking you most of the way home from work, but dropping you off on the main drag leaving you with a two mile walk. Those two miles are doable and it is great exercise, but what if you’re short on time, ill or injured, it’s pouring rain, or it’s full of dangerous drivers? The bus may have taken you 90% of the way home in terms of miles, but that last 10% of miles is really half the journey in terms of time and effort.

Predictive Analytics as a whole is the discipline of formulating insights or forecasts based on statistics garnered through raw, historic data. This is essentially what human BI users ultimately do manually. Highly skilled analysts sift through oceans of data looking for patterns such as correlations, associations, similarities, and dependencies which are the elements of decisions.

Data Mining algorithms shipped in SQL Server Analysis Services since SQL Server 2000 provide functionality for automating, or at least semi-automating, this process. At this time, computers are able to recognize certain patterns, especially straight-forward patterns in an overwhelming amount of data or a tricky “background” (like “Where’s Waldo”), better than humans and vice versa. The human’s ability is more robust and versatile, on the other hand, and thus able to recognize fuzzier, more complex patterns. Team up this “machine intelligence” and “human intelligence” and you have a smarter information worker.

Figure 1 depicts a tank commander1 who must take action based on decisions which are based on data gathered from various sources and angles; his own senses, radar, satellites, his commanders, etc. Without Predictive Analytics, the Tank Commander must expend more effort and time in order to make well-informed decisions. If he had all the time in the world and making a mistake didn’t matter too much, this would be fine. Unfortunately, the tank commander is making life and death decisions against smart enemies who are also making decisions against him. Whoever makes the best decision first wins.

Figure 1 – The chasm between the OLAP cube and the Information Worker.

The life of the tank commander’s counterpart in the corporate world, the line manager, isn’t quite as mission-critical, but the analogy carries over. American corporate culture reflects military culture with a hierarchy of ranks and battles and strategic alliances with other corporations. The smartest, best-equipped, most optimized, most agile, most highly-trained, most strategically aligned corporation will win.

Notice that data mining as presented in Figure 1 bridges a chasm between data in an OLAP cube and the intelligent tank commander; it takes you that last step of the journey, which is often the hardest part of the journey. Also notice that data mining doesn’t make decisions, it just assists with making decisions.

Data Mining is NOT Artificial Intelligence

Earlier I highlighted the word, “recognize”, to point out that Data Mining as the term applies in the Microsoft BI world stops at the point of “recognizing things”. That is, “recognizing things” is just the beginning of the process of strategy execution. Let us step back a little and think about life on Earth from the point of view of the interplay of species of creatures fighting towards the goal of “survival of the species” or even the interplay of corporations each fighting for the goal of success (profit, growth, dominance, etc).

From that point of view, the purpose of intelligence is to formulate a strategy towards those goals then to execute that strategy. Execution of a strategy takes this form:

StrategyßExecutionßActionßDecisionßRecognition

Execution of a strategy consists of actions of all sorts. Actions executed by executives have broader scope for the corporate “creature”, whereas actions executed by peons such as myself have more local, specialized scope. Actions are actually physical manifestations of decisions we arrive at in our heads.

Decisions are the result of a process of logic. The logic could be simple as in a “no-brainer” decision. Or they can be complicated, taking into consideration dynamic, ambiguous factors. Such decisions are generally beyond the capability of software today … and that is not what Data Mining is about.

The factors that are “plugged” into our decision process are “recognitions”. Recognitions are values and are simpler concepts decisions, actions, and strategies. Recognitions (values), like decisions, are the result of a process, but the process for recognitions are well-defined rules. For example, a recognition of “an up-tick in sales” is the result of summing sales for today and yesterday and determining if today is greater than yesterday. Because the rules for recognitions are well-defined (there is no ambiguity about the rules), they are programmable. Microsoft’s Data Mining algorithms are programs recognizing patterns by well-defined rules.

In the example illustrated in Figure 1, the President determines the goal, the generals formulate the strategy, the junior officers execute the strategy, the soldiers (the tank commander) takes action and makes decisions, and the data mining algorithms recognize things that are hard for him to recognize in the timely and accurate manner.

So Again, Why isn’t Data Mining a Big Thing?

As I mentioned, the data mining features of SQL Server Analysis Services have existed since SQL Server 2000. Why is it still so shamefully under-utilized? My opinion is that the perception is it is a high-end skill requiring extensive training beyond the capability and desire of the vast majority of people. In other words, only the really smart and geeky people, the people we either can’t be or don’t want to be, can do this. So that relatively small audience makes Predictive Analytics a niche, fringe realm, which is better served in niche, fringe scenarios by focused players such as SAS and SPSS than a general player like Microsoft.

On the other hand, maybe it’s not a matter of wondering why Predictive Analytics isn’t bigger, but rather, why isn’t Microsoft selling way more BI? BI without Predictive Analytics drops the customer off at “Here are your reports. Analyze away!” With all the hype around BI, I can see that it could sometimes leave a customer a bit underwhelmed.

There is also a sort of “Goldilocks and the Three Bears” syndrome. On one end of the spectrum, there are the folks at the university labs I’ve worked with who laugh at our data mining. Their requirements are quite sophisticated and our “<<fill in the blank>> for the masses” approach doesn’t yield the right tool for them. On the other end of the spectrum are those I mentioned a couple of paragraphs ago who think of data mining as something only smart and geeky people do. Can we effectively implement Predictive Analytics without a PhD in math and an MBA?

BI is IT Brain Surgery

Microsoft has done an incredible job of making the tough task of getting a computer to recognize and highlight patterns from a chaotic ocean of data pretty accessible. See for yourself by playing with the tutorials. But many folks are scared of DATA MINING and therefore shy away from it, both implementers and potential customers. Thus there currently isn’t a critical mass of consultants, architects, or engagement managers or demand from customers willing to take the leap into data mining. I think most recognize the potential. But there is a chicken and the egg problem with a lack of skill to implement a critical mass of successful predictive analytics engagements and there isn’t a demand from customers in part because there isn’t a critical mass of talented architects and consultants.

However, this lull in the economy presents an opportunity to break the chicken and the egg cycle. Idle consultants and architects can prepare for the eventual upswing, which will usher in fun IT things like Web 3.0 (Semantic Networks!) and the big leap to parallel processing as a mainstream development skill (if you old VB6 or COBOL programmers thought the leap from “procedural” to object-oriented was big …).

Analysis Services abstracts much of the ugly math and statistics from you. What is left for the successful Predictive Analytics consultant is to possess a great deal of business savvy and common sense. Additionally, even though a predictive analytics project may fall under well-defined categories such as Risk Analysis and Customer Churn, each implementation is at least somewhat unique to each enterprise. Therefore, a practitioner needs at least a good level of a software developer’s mind.

Is this going to be hard? Of course it is. BI is IT brain surgery. BI integrates data from all multiple data sources for an all-encompassing view, applies that integrated data to rules, executes actions, and waits for the feedback results in more data. That sounds like what the brain does to me.

The phrase, “It ain’t brain surgery”, suggests we think of the brain surgeon as the smartest of the smart (all doctors have to be smart). In the case of the BI consultant, we may not necessarily be the smartest of the smart (the TV character, “House”, isn’t a brain surgeon, but he is smarter than everyone), but it’s hard to argue we face more ambiguity and need for high-end capability in a range of skills than any other IT discipline I can think of.

As an IT consultant I wish for tools that will make my job easier, but I also pray that the field continues to evolve at a snappy pace so there is room on the cutting edge requiring a high level of experience so I can continue to add significant value while I make a fairly decent living. With the growing number of product group and subject matter expert blogs, early-adopter programs like TAP and CTPs from the product groups, fantastic on-line tutorials (free or purchased for really practically nothing), Wikipedia, inexpensive off-shore support, etc, it becomes increasingly hard for a field consultant to add the level of value that justifies the sort of rates the customer is charged.

Predictive Analytics requires more than “best practices” packaged in cook books. Every company is unique and hopefully operates on a unique strategy (otherwise, how can it compete?), which in turn means Predictive Analytics manifests itself differently. It is like software development where the requirements are unique (otherwise why wouldn’t you buy an existing package instead?). There is certainly a core of skills requiring a high degree of competency and there are common patterns, but usually no readily reusable solution. Predictive Analytics is at the cutting edge where thinking (so much harder than merely applying) reigns supreme.

Now that data warehousing and even ETL and OLAP (“Gemini” will make my high-end Analysis Services skill a little less needed) are maturing, Predictive Analytics opens the door for BI to continue requiring highly skilled practitioners with the ability to improvise for a long time to come. Customers will require increasingly intelligent automated assistance to wade through oceans of data because of further integration of systems around the world, the sheer increase in volume of wired people and data, the addition of potentially magnitudes more mobile-embedded devices and RFID tagged items onto the Internet, and the more we ask from our computers.

For those consultants who have tried to enter BI but were overwhelmed by Analysis Services OLAP, particularly optimization and MDX, Data Mining may prove to be a more feasible entry point into a high-end BI skill. This is particularly true for those from financial backgrounds as I write about in my blog, Data Mining in the PerformancePoint MAP Framework.

Broad and Simple Deployment of Predictive Analytics

What will open the flood gates for Predictive Analytics, at a “data mining for the masses” level, is to point out where Microsoft’s data mining is best suited, at least for now. And that is when it is deployed in a broad and simple manner. That is, predictive analytics is embedded in many places throughout an enterprise support robust, dynamic requests for data that are simple enough to develop and maintain without a staff of PhDs.

Figure 2 depicts how a broad deployment of embedded predictive analytics throughout an enterprise will significantly expand the world for Microsoft Business Intelligence. The upper box, colored in navy blue, shows the Microsoft BI world today without Predictive Analytics as a common component. We see the end result as reports and visualizations such as PerformancePoint dashboards and Excel charts. (Of course, there are many other reports and visualizations such as Reporting Services reports and OLAP cube browsers such as Excel Pivot Tables which aren’t depicted here.)

Figure 2 – Microsoft’s Data Mining is uniquely suited for broad and simple implementations.

The lower box, colored in green, shows a whole new world for Microsoft BI as broadly implemented, embedded predictive analytics. Data Mining models are created from the Enterprise Data Warehouse (EDW) and feed robust, versatile data directly into the enterprise’s applications. (Figures 4 and 5, later in this blog, depict the many scenarios for such embedded predictive analytics.) BI as most customers have it implemented today, that is without predictive analytics (the upper box in Figure 2), was always about predicting the future based on the past.

Rather than focusing on the common scenario of a high-end analyst with her PhD in math and MBA solving very complex problems, where SAS and SPSS software is utilized, we focus on one of the “data mining for the masses” approach. That means data mining models provide simple, relatively straight-forward predictions to applications throughout an enterprise.

The BI “Stimulus” Plan – BI Meets Business Process Management

Microsoft’s easy to use data mining features combined with Microsoft’s wide array of integrating technology (such as ADO.NET, XML/A, WCF, SSIS, and Linked Servers) enables “Data Mining for the Masses”. Except the “masses” can be software applications as well as information workers across an enterprise. Because software applications aren’t as smart as humans, simpler decisions can be delegated (or semi-delegated) to software across an enterprise.

Figure 3 depicts an architecture that goes a long way towards raising IT to the level of “Strategic Asset” in BPIO terms. On the left side of Figure 3, we see two examples of business processes, the sales cycle and manufacturing. In the center, we see an EDW (Enterprise Data Warehouse) capturing metrics from various points in the processes, usually where handoffs are made (such as sales to shipping). From that EDW, data mining models are created which will be queried by the business process application for robust, versatile answers. That forms a cycle (red arrowsàgreen arrowsàpurple arrows) that utilizes BI data to directly to drive the actions of the business.

Figure 3 – BI and Predictive Analytics raises an IT system to the level of “Strategic Asset”.

Towards the upper-right of Figure 3, we also see a mechanism for monitoring the effectiveness (performance) of the mining models; the predictions are compared against the eventual actual value. After all, if the Predictive Analytics will actually contribute to decisions (autonomously or semi-autonomously) its performance should be monitored within a Performance Management framework as would any other Information Worker. The blue arrows form a cycle of Monitor, Analyze, and Plan.

With this in mind, software developers should begin to develop applications with Predictive Analytics in mind. They need to think in terms of probability in addition to absolutes, non-linear thinking in addition to linear workflows, heuristics in addition to algorithms, and dynamic models in addition to static graphs. Instead of stating there is a projected 3% rise in profits for the next quarter, we say “There is a probability of a 3% rise, with a standard deviation of .25 and a moderate confidence”.

Figures 4, 5, and 6 focus in on different aspects of Figure 3. Figures 4 and 5 list some of the many examples of Predictive Analytics for the enterprise applications I included. Most of the items listed in Figure 4 should be pretty familiar to you both as an IT consultant and as a consumer. It is said that Artificial Intelligence is here today, but not as it was expected in the 1980s. You would hardly notice it. It arrived in the form of a collection of simpler gems of “intelligent” tasks distributed across all aspects of IT … not like C3PO, HAL or the Terminator.

Some items listed in Figure 4 are simpler than others, but all of them are very much feasible. That means Predictive Analytics can be implemented in baby steps. Predictive Analytics can be implemented iteratively minimizing the chance for failure. That’s an important point. We’re not even really out of the days where even run of the mill BI projects are notoriously prone to failure. On one level, attack specific, digestible opportunities for Predictive Analytics, but on another level, look at Predictive Analytics as a whole as a paradigm just as much a part of enterprise software as its ability to network and run tasks in parallel.

The items listed also return significant value on the investment. Return on investment can take many forms, for example:

  • Optimizing the use of resources such as targeting the customers most likely to act on an ad.
  • Minimizing loss such as theft or accidents.
  • Improve customer satisfaction efficiently such maintaining just in time inventory levels.

Figure 4 – Predictive Analytics for the Sales Cycle Applications.

Figure 5 – Predictive Analytics for the Manufacturing Plant.

Figure 6 – The effectiveness of Predictive Analytics monitored in a MAP framework.

A Few Parting Thoughts

Whether you’re a customer or consultant, I hope I got you to think a little of the “what” and the “why” regarding Predictive Analytics. That will lead to the “how” questions which I will address in future blogs. However, remember, this entire blog is just the opinion of one Microsoft BI practitioner attempting to inject more life into Business Intelligence and maybe raise the level of innovative spirit.

Take my advice with a grain of salt. Although I’m a BI Architect in the MSC BI Global Practice, as they say, “The views expressed here don’t necessarily reflect the views of the Microsoft BI leadership”. To some people, what I just wrote about is obvious, but you are in the minority, which is the problem. To some, this sounds like pie in the sky, and I hope to have changed your mind. To some, you believe in the value, but were just afraid, and I hope I gave you the inspiration to tackle it and a plan of attack.

What I do know is that the world is spinning faster and faster. If this merry-go-round crashes, it will crash spectacularly. We will need our computer systems to assist us in ever increasingly intelligent fashion to help us avoid those spectacular crashes before they happen. I purposely phrase “spectacular crashes” as I realize our systems need to help us engineer “controlled crashes” for reasons similar to why we have “controlled forest fires”.

That is where the high-end skills, the thinking skills that separate the “BI doctors” from the “BI Technicians”, will be needed. That’s where I plan to be as I haven’t yet succumbed to the overly-engineered McGoogle way of operating that certainly plays a necessary role, but doesn’t inspire much passion in me. My motto is, “I make my living at the cutting edge and my hobbies are at the bleeding edge.” Now, the price to pay for living on the cutting edge is that you end up feeling stupid a good deal of the time. That’s not for everyone, and it’s often not for me as well. But it gets me up in the morning excited about what awaits out there.

In the BI classes I teach, I tell the students that BI is a tough subject and it could take months or years to get to a point in this broad, ambiguous, highly technical field where you feel like you have a handle on it. But it is worth pouring your heart and soul into it because BI marks the beginning of how humans and machines will begin to relate more in a partnership than as human/tool. I wrote a little article on this over the recent Holiday season titled, The Socialization of Business Intelligence.

Helpful Stuff:

Business Modeling and Data Mining, Dorian Pyle – This is the best book out there for teaching you why data mining is such a valuable thing as well as describing how to attack a data mining engagement. I put this book up there with Ralph Kimball’s classic The Data Warehouse Lifecycle Toolkit (this book’s Data Warehousing counterpart) and the late, great Ken Henderson’s The Guru's Guide to SQL Server Architecture and Internals (the relational database counterpart). So this book is best for experienced BI consultants who wish to branch into Data Mining.

Super Crunchers: Why Thinking-By-Numbers is the New Way to be Smart, Ian Ayres – This is an excellent introduction to the “data mining way of thinking”. If you have no idea what this blog is about, this book will convince you that data mining skills will be imperative. I reviewed this book a couple of years ago when it first appeared.

Statistics for Dummies, Deborah Rumsey – Statistics are the basis for Microsoft’s Data Mining. You must start with a good foundation. In all honesty, I’ve just flipped through this book (since I already have a good enough knowledge of statistics) in order to see if it is sufficient and easily digestible, which it is. There are many books on beginning statistics, and this one seems better than most.

The Black Swan: The Impact of the Highly Improbable, Nassim Taleb – This book will help you to better understand how statistics works in the real world and equally important, where are its limitations. I highly recommend this book as a way to further wrap your mind around a data mining mentality.

Predictive Analysis with SQL Server 2008 – This white paper from the SQL Server product team offers a good place to start.

Analysis Services Data Mining Mental Blocks – A little blog I wrote on my personal blog site discussing the mental blocks I see in many folks I encounter as I discuss data mining.

Notes:

1 The picture of the tank commander is a watercolor by my wife, Laurie Asahara. It won 1st Place in Portraits at the 2008 Idaho State Fair.

Published Monday, February 23, 2009 5:29 AM by EugeneA

Eugene Asahara : Why Isn’t Predictive Analytics a Big Thing?

2 comments:

SteveOLAP said...

The answer was: "[Budding practitioners] need to think in terms of probability..." This has daunting repercussions.

Developers don't like QA as it is, yet they're willing to ensure their reports tic and tie to the accounting systems. Imagine the look on their face when they realize that 2+2 only has a *probability* of equaling 4 in the world of DM.

It's the rare individual that can successfully address false positives, build models that maintain themselves (within reason), *and* can do it well enough to avoid getting fired because their system's benefits (are eventually seen to) outweigh inherent limitations.

DM *is* the future of BI, but it's a high-risk/high-reward slog for those who take it half-heartedly.

Priya Kannan said...

Great post! I am see the great contents and step by step read really nice information.I am gather this concepts and more information. It's helpful for me my friend. Also great blog here with all of the valuable information you have. Cloud Computing Training in Chennai

Blog Archive