The four realms of analytics: descriptive analytics, predictive analytics, prescriptive analytics, and diagnostic analytics can be organized along a dimension from rules-based to probability-based and the dimension of time (past and future). This simple two-by-two matrix offers a powerful framework for organizing and describing the differences between analytical processes. While the four realms are often cited, they seldom appear without there being considerable confusion in understanding the distinctions between them. Rather than relying on dictionary definitions and unspecified connotations, this simple framework is offered as a way to communicate different types of analytics to lay audiences.
We do a lot of work introducing the field of “analytics” to broad business audiences. Most everyone acknowledges that businesses will have to be more “analytically adept” in the future. That is, as more and more business functions are informed and improved through the use of analytic data processing, more people within an organization will be involved in producing analytics. As more data is presented to executives, managers, and employees throughout organizations, the minimum required level of skill in understanding and consuming analytics is rising all the time. The term “analytics” does not mean the same thing to all people (particularly those who are involved in the field!). I’m a firm believer that the vast majority of people already have decent analytic capabilities and competencies, they just need organization to their thoughts and understandings. Organization helps in several ways, it helps prevent the misapplication of intuition that may work well in one situation to another where the same intuitive idea is either unhelpful or even harmful. Organization helps people talk about analytic methods and applications. Many times people understand the ideas (even when they may not understand the algorithms and math) of analytics, but get frustrated in communicating with others. In short, I’m interested in developing frameworks that help organize and communicate the broad range of analytic processes and use to business audiences.
I fully accept that all frameworks or models lose information in their attempt to simplify, so I don’t see these frameworks as being perfectly consistent, just that they’re useful. I also recognize a lot of very similar work has been done by others previously and that I’ve benefitted from reading others’ work on analytics frameworks. I’d like to know if I’ve exactly, precisely replicated someone else’s frameworks unconsciously.
The first framework I call “The Four Realms of Analytics”. The idea is to take a standard, broad categorization of analytic processes and organize them along two dimensions. The categorization has four members, Descriptive Analytics, Diagnostic Analytics, Predictive Analytics, and Prescriptive Analytics. The X axis dimension is time and can be read as a time line extending both into the distant past to the left and to the distant future to the right. The Y axis is a bit more conceptual, but is separated into two general realms, that of conditional statements or “rules” and that of likelihoods and probability distributions. At the bottom, simple “if then” statements apply. As one moves up the axis, the conditions become more complex and represent distribution functions rather than discrete assignments or delineations.
One point to make explicit is that the four cases are well established in analytics literature. They are placed in their “most logical” position according to how they are generally understood; the lines between the quadrants are not perceived as boundaries restricting the definitions of the four cases. For example, certain descriptive analytic techniques may be used to help describe or characterize potential outcome states in a predictive analytic process. Likewise, a certain degree of the predictive analytics realm is included in the prescriptive analytics realm as algorithms are applied and outcome rules are developed.
The lower-left quadrant representing the intersection of rules-based systems and past time frames is labeled as “Descriptive Analytics”. Descriptive Analytics is all about making “what” statements regarding the data set being analyzed. Descriptive Analytics typically involve “known facts” and their derived measures and are focused on describing features and characteristics of the data set. Because in this context past events and comparative states are “set by time” and are thus determined and knowable, there is no uncertainty related potential future states. Generally speaking, the purpose of Descriptive Analytics is to characterize data elements through similarity and differentiation comparisons to one another and to produce summary statements that are shorter and denser than the full set of data elements. While most of traditional business intelligence reporting falls into the realm of Descriptive Analytics, complex and sophisticated analytic techniques also fall into this realm when their purpose is to describe or characterize past events and states. Summary statistics, clustering techniques, and association rules used in market basket analysis are all examples of Descriptive Analytics.
The upper-left quadrant representing the intersection of probability based systems and past time frames is labeled as “Diagnostic Analytics”. Diagnostic Analytics is all about making “why” statements. Think of causal inference and the comparative effect of different variables on a particular known outcome. While diagnostic analytic processes often include a substantial amount of description and characterization about data elements, their primary use is to develop insights on the sequence and comparative outcomes of past events and states. While Descriptive Analytics might be concerned with describing how large or significant a particular outcome, Diagnostic Analytics will be more focused on determining what factors and events contributed to the outcome. Much of Diagnostic Analytics lives in the area of probabilities, likelihoods, and the distribution of outcomes. As more and more cases are included in a particular analysis and more and more factors or dimensions are included, it may be impossible to determine precise, limited statements regarding sequences and outcomes. Contradictory cases, data sparseness, missing factors (the “unknown unknowns”), and data sampling and preparation techniques all contribute to uncertainty and the need to qualify conclusions in Diagnostic Analytics as occurring in a “probability space”. Training algorithms for classification and regression techniques can be seen as falling into this space since they combine the analysis of past events and states with probability distributions (although they are often not necessarily included in many definitions of diagnostic analytics). Examples of Diagnostic Analytics include attribute importance, principle components analysis, sensitivity analysis, and conjoint analysis.
The lower-right quadrant is labeled as “Prescriptive Analytics”. In this framework, Prescriptive Analytics is all about automated future actions or decisions which are defined programmatically through an analytic process. Our definition of Prescriptive Analytics here may be somewhat different than in other frameworks. The emphasis in this framework is on defined future responses or actions and the rules that specify which actions to take. While the justifications or analyses may be uncertain and come from the realm of Diagnostic Analytics, the result set is defined. One can think of Prescriptive Analytics as the natural realm of automated decision making. While some frameworks show a progression with Prescriptive Analytics being the “top” or most evolved form of analytics, I think this misses an important point about discrete decision making. Most decisions are binary or non-continuous. A purchase from a supplier is made or it is not. A discount is offered to a customer or it is not. Prescriptive Analytics are mostly focused on these actions, and not as much on the contextual explanation of predictions (which is more in the realm of Predictive Analytics in this framework.) If you’re focused exclusively on what the predictions are and are interested in automating them, you’re in the realm of Prescriptive Analytics. If you’re focused on confidence intervals, sources of error, attribute importance, and other descriptions or characterizations of predictions, you’re in the realm of Predictive Analytics. While simple threshold based “if then” statements are included in Prescriptive Analytics, highly sophisticated algorithms such as neural nets are also typically in the realm of Prescriptive Analytics because they are focused on making a specific prediction. Other examples of Prescriptive Analytics include recommendation engines, next best offer analysis, cueing analyses with automated assignment systems, and most operations research optimization analyses.
The upper-right quadrant of Predictive Analytics is relatively straightforward. This group is all about understanding predictions based on quantitative analyses of data sets. This is the traditional realm of “predictive modeling” and statistical evaluation of those models. It also includes most algorithms and processes that don’t clearly fit into one of the other realms. In Predictive Analytics, we evaluate the set of predictions and results and the distribution of potential inputs and outcomes rather than being focused on discrete, individual assignments (which is really more the realms of prescriptive analytics.) In a sense, predictive analytics encompasses a broad range of “understanding and characterizing predictions under the conditions of uncertainty” rather than “making decisions under the conditions of uncertainty”. The process of comparing predictive models and assessing their “goodness” belongs in the realm of Predictive Analytics. Confidence intervals, T statistics, AIC, K-S test statistics, P values, and the like all typically belong to Predictive Analytics as they are involved in characterizing predictive models. If Prescriptive Analytics is mostly about “what to do”, Predictive Analytics is mostly about “how do we understand predictive models and characterize and describe future events and states.” Examples of Predictive Analytics include classification models, regression models, Monte Carlo analysis, random forest models, and Bayesian analyses.
In a sense, the emphasis of the top left and the bottom right (Diagnostic and Prescriptive analytics) are on actions and decisions (past and future) while the bottom left and top right are on characterizations and descriptions (past and future). Developing predictive models (the “training” and “testing” phases of a model) falls into the realm of Diagnostic Analytics while the application of a previously developed model falls into the realm of both Predictive Analytics and Prescriptive Analytics. The evaluation of results often falls into the realm of Descriptive Analytics. This emphasis on “what is the motivation for performing analytic processes?” in combination with either a backward looking emphasis or a forward looking emphasis helps bring the business goal or objective front and center. It should be noted that while decision making is often at the center of the motivation (as it is in the Prescriptive Analytics realm), it isn’t always the immediate motivation. Sometimes improved understanding and insight along with pattern detection and differentiation is the immediate need. Other times we’re more interested in having an automated process for making a prediction and less interested in understanding the model behind it.
We see business people who are just starting to get involved in learning more about analytics get wrapped up in processes and algorithms and lose focus on the key issue of motivation. In a similar vein, it isn’t unusual for experienced data scientist/analytic developer types to get lost in what can be admittedly complex and intricate processes and forget to place the results of their analyses within a context of a desired result.
- · Backward looking
- · Focused on descriptions and comparisons
- · Category description development based on similarities and differences
- · Discrete assignment of individual data set members based on similarities and differences
- · Pattern detection and description
- · Characterization of data groups and elements
- · MECE (mutually exclusive and collectively exhaustive) categorizations
- · Backward looking
- · Focused on causal relationships and sequences
- · Relative ranking of dimensions/variable based on inferred explanatory power
- · Target/dependent variable with independent variables/dimensions
- · Pattern detection and diagnoses
- · Inference and likelihood based
- · Model training and testing
- · Includes both frequentist and Bayesian causal inferential analyses
- · Forward looking
- · Focused on non-discrete predictions of future states, relationship, and patterns
- · Description of prediction result set probability distributions and likelihoods
- · Error sources, estimates, and bounds are important contextual outputs
- · Model application
- · Non-discrete forecasting (forecasts communicated in probability distributions)
- · Forward looking
- · Focused on optimal decisions for future situations
- · Simple rules to complex models that are applied on an automated or programmatic basis
- · Discrete prediction of individual data set members based on similarities and differences
- · Optimization and decision rules for future events