Business Intelligence (BI) On The Cheap: Getting Valuable Business Insights on a Shoestring Budget
© by Mike Robinson
“Business Intelligence (BI)” is the holy-grail of the statistics world, and a considerable amount of their bread-and-butter. But if you've ever looked into it for your own company, you'll find that it's some pretty expensive bread! “Dashboards” and other lovely displays can give up-to-the-minute information to “the key executives of your Fortune-500™ business,” but ... what about your business? How can you get useful information about your own customers, and your own business practices, without mortgaging the farm to do it? The answer is: it's easier than you think.
Getting Started (On the Right Foot):
Start by asking yourself some basic but very-important questions:
- What do I actually need to know? In other words, “if I had ‘the right answer’ in my hands right now, how would I recognize it and what would I do with it?”
- What information do I have, readily available to me? Big-gun BI practitioners talk about “the data warehouse,” in which all the useful information about a business is supposedly gathered so it can be magically turned into “intelligence.” You don't have a warehouse, but you do have information out there that you can get your hands on ... even if you don't yet know how to get it. (That comes later.)
- How “good,” and how consistent, is that data? When people type-in information, there's a lot of variation to it. Computers don't deal well with differences: they can't even tell that “Smith, John” and “Smith John” are the same string. Such differences would have to be handled more-or-less manually by recoding, whereas any sort of common identifier such as a customer-number might not require it. The less “handling” your information-source requires, the better.
- How far back in time can I reasonably and usefully go? Businesses change. Customers come and go. If your study goes too-far back or fails to consider the impact of a business change or seasonal variations, you might be comparing apples to oranges.
-
What type of data is it?
Statisticians recognize that there are fundamentally several different types of information
(for example ...):
- “Pass/Fail” information: “either the bag leaked, or it didn't.”
- “Categorical” information: Characteristics that divide things up into usefully distinct groups, but that don't say anything useful about the groups themselves.
- “Rank” information: Categorical information that does say something useful about the groups, i.e. that one group is “better than” another, but the terms used to make these distinctions don't say anything numerically about how much better.
- “Measurement” data: “how much does it weigh?” Information of this sort falls on a continuous scale, and the value of numerical difference between any two measurements has a distinct and clearly-defined meaning.
- How much data do I have? “The ubiquitous spreadsheet” is not designed to manage large amounts of data, nor to deal with anything that doesn't easily fall into a square grid of moderate size. You'll need to use tools that can handle, and that can randomly sample from, the volume of data that you do have.
- Why and How do I expect to obtain benefits; how quickly and at what cost? Obviously, when you are setting-out on a Business Intelligence study, to some degree you don't know what you're going to come up with and how it might turn out to be beneficial. “That's why a ship sets sail.” But you should have some firm idea of what benefits you expect to receive, and of what benefits would be most ... well, beneficial. Categorically speaking, a well-thought-out series of small, focused studies will be much more useful than a “grand poohbah project” that seeks to be All Things To All Men.
The general approach of a statistical study:
Any application of statistics is a procedure that has a somewhat-uncertain outcome. You're setting-out to find something out, but it's a voyage-of-discovery, not only about the answers that you seek, but also about the study itself. A study begins with these conceptual steps:
- The formulation of a hypothesis: A hypothesis is something that you think, or assert, to be true; a relationship that you think exists and that you think will be useful to you. The outcome of the study will be to “confirm or deny” (not “prove or disprove”) that hypothesis.
- Deciding upon a method: This is the approach that you will take, based on the information you have available to you (and the qualities of that data), in your quest to confirm or to deny the hypotheis.
- Validating the method: Your chosen method is based upon certain “assumptions,” both about the data and about the method itself, which may or may not actually turn out to be valid. If they are not, then the results you obtain also won't be valid. For example, if you've decided to use a measurement test that presupposes the existence of “a normal distribution,” you must determine whether the data actually exhibits that distribution! If it doesn't, your results will be garbage.
- Conduct the study: If the method is validated, the study should be conducted several times and the results compared for consistency. Fortunately, when you are dealing with historical data that's already available to you, this is fairly easy to do.
- “I'm from Missouri... Show Me™” : Now that you've got these spanking-new results in front of you, cast your most-critical, most-skeptical eye upon them. Try your best to prove that they're garbage... that some mistake was made... that there's some other explanation... that some other unwanted and unforseen influence is at work to “skew” these results. Nothing about the results themselves will really tell you that. What you're doing here is a careful and critical re-evaluation of the entire process, seeking to affirm or deny your reasons to have confidence in it.
- “Rinse and Repeat” : The outcome of one study will determine and influence the subsequent studies that you do. This is both normal and desirable. Once you have accepted the results of your study, carefully evaluate them both in terms of what they might mean to your business and in terms of how they might influence other studies. A series of short, focused studies is generally more useful than one that is more complex and difficult: hence the dubious practical value (in my view) of “dashboards.”
Not every study that you perform will turn out to be as valid or as useful as you expected it to be, or to produce the clear and consistent results that you expected. In statistics, though, these “failures” can be far more thought-provoking and therefore useful than any “success.” After all, if you really had a firm idea that all of your assumptions about your business were correct, you would have very little reason to be conducting a statistical study about them. “Fine-tuning” an existing business process is of-course useful to a degree, but discovering the unexpected is far more valuable.
Tricks of the trade: explaining “unexplained variations”
Probably the most-frequent outcome of a study is that you find that there are no clear correlations to be found. You find that you don't have an explanation for the many variations that you see.
What's usually happening here is that you've got a basket full of apples and oranges and grapefruits. All three fruits in the same basket, each one exhibiting its own distinct characteristics but producing a cacaphony of muddled results when all of them are taken together. Trouble is, the fruits are painted identically and you can't readily tell them apart. You need to discover the meaningful and measurable differences between the various hidden sub-groups in the sample you are measuring, select the most useful and distinct difference(s) for grouping purposes, then finally, divide and measure each group distinct from the others. When you do this successfully, each group will exhibit its own distinct characteristics and the once-cacaphonous original outcome will be explainable as the product of the characteristics of these groups.
This “voyage of discovery” is ruled by tests of independence, normally found in the crosstab function of a statistical software package. This is done by “slicing” the data-sample in various ways, looking for slices in which there is low independence within the groups and high independence between them. Crosstab displays usually offer several different commonly-used measurements of independence, but each one is designed for only certain types of data ... categorical, measurement, and so-on. You must consider only the measurements that are useful to your type of data and ignore the others.
In particular, be wary of the common-sense term, “correlation.” Correlation is not independence, and independence is not correlation. The classical tests for correlation carry the prerequisite assumption of a normal distribution, which usually does not hold. Furthermore, when considering non-parametric tests of independence, pay very close attention to exactly what you are considering and how you are measuring. If you unwittingly “ask a loaded question,” you're going to get a nonsensical answer.
Tricks of the trade: don't get lost in “a garden of goodies”
A statistics-package can offer to tell you just-about anything, and can turn it into a beautiful chart or graph in seconds. But all of the tools and techniques in such a package are actually very special-purpose. There are so-many of them for the same reason that a Swiss Army® Knife has so many blades. Any time that you embark upon a study, it is vital that you know exactly where you're going, why you're using a particular tool, and to ignore all the others.
Remember what Mark Twain famously said about “lies, damned lies, and statistics.” Far more to be feared than someone setting-out to intentionally deceive you is the risk that you might unintentionally deceive yourself. Wrong or misleading answers look no different than useful, valid ones.
Tools of the trade: recommended packages “on the cheap”
Any time that you're setting-out on a statistical study, it's important to buy the right tools for the job. You need to buy tools that are known to produce the right answers for any particular algorithm that you intend to use. If you find yourself butting-up against the limitations of the wrong toolset, you're just wasting your time.
Microsoft Excel®, for instance, really isn't designed to be a statistical package, even though there are statistical plug-ins available for it.
The two packages that we use most-often are SAS® and SPSS®. The latter one is considerably less-expensive to get into: you can actually buy a copy of their base system without talking to a salesman first. Both of these are general-purpose tools with a myriad of special-purpose “add-ins” built around a central “base.”
Some business disciplines have industry-specific tools that are in widespread use, and if you are in one of those industries (but on a small scale) it still may pay to use them. It's very important, as you first begin your foray into “business intelligence on the cheap,” that you identify all the obstacles in your path and work-out a strategy to overcome them with the least amount of effort overall.
Dictum Ne Agas: Do Not Do A Thing Already Done
A good practitioner knows, or learns, how to leverage all the tools that are at your disposal, including general-purpose programming languages like Visual Basic® or Perl®, but always seeks to do so efficiently. You're probably not really saving any money if you are re-inventing the wheel. But you're also not saving money if you've just bought a dump-truck to do the work of a wheelbarrow.
The Journey is the Reward:
Business Intelligence is not a “finding.” It's a process. When you undertake it in a serious and a useful way, you'll find that it changes your business because it changes your perspectives on your business. Developing a perspective that encompasses more than the financially-obligatory “bottom line” is Always A Good Thing, and this is probably the most-compelling benefit of Business Intelligence.