lunes, 1 de mayo de 2017

Big Data Analytics By Philip Russom - Best Practices Report

big data analytics best practices report

Big Data Analytics By Philip Russom - Best Practices Report


Big data analytics is where advanced analytic techniques operate on big data sets. Hence, big data analytics is really about two things—big data and analytics—plus how the two have teamed up to create one of the most profound trends in business intelligence (BI) today. Let’s start by defining advanced analytics, then move on to big data and the combination of the two.

According to a 2009 TDWI survey, 38% of organizations surveyed reported practicing advanced analytics, whereas 85% said they would be practicing it within three years.1 Why the rush to advanced analytics? First, change is rampant in business, as seen in the multiple “economies” we’ve gone through in recent years. Analytics helps us discover what has changed and how we should react. Second, as we crawl out of the recession and into the recovery, there are more and more business opportunities that should be seized. To that end, advanced analytics is the best way to discover new customer segments, identify the best suppliers, associate products of affinity, understand sales seasonality, and so on. For these reasons, TDWI has seen a steady stream of user organizations implementing analytics in recent years.

The rush to analytics means that many organizations are embracing advanced analytics for the first time, and hence are confused about how to go about it. Even if you have related experience in data warehousing, reporting, and online analytic processing (OLAP), you’ll find that the business and technical requirements are different for advanced forms of analytics. To help user organizations select the right form of analytics and prepare big data for analysis, this report will discuss new options for advanced analytics and analytic databases for big data so that users can make intelligent decisions as they embrace analytics.

Note that user organizations are implementing specific forms of analytics, particularly what is sometimes called advanced analytics. This is a collection of related techniques and tool types, usually including predictive analytics, data mining, statistical analysis, and complex SQL. We might also extend the list to cover data visualization, artificial intelligence, natural language processing, and database capabilities that support analytics (such as MapReduce, in-database analytics, in-memory databases, columnar data stores).

Instead of “advanced analytics,” a better term would be “discovery analytics,” because that’s what users are trying to accomplish. (Some people call it “exploratory analytics.”) In other words, with big data analytics, the user is typically a business analyst who is trying to discover new business facts that no one in the enterprise knew before. To do that, the analyst needs large volumes of data with plenty of detail. This is often data that the enterprise has not yet tapped for analytics.

For example, in the middle of the recent economic recession, companies were constantly being hit by new forms of customer churn. To discover the root cause of the newest form of churn, a business analyst would grab several terabytes of detailed data drawn from operational applications to get a view of recent customer behaviors. The analyst might mix that data with historic data from a data warehouse. Dozens of queries later, the analyst would discover a new churn behavior in a subset of the customer base. With any luck, that discovery would lead to a metric, report, analytic model, or some other product of BI, through which the company could track and predict the new form of churn.

Discovery analytics against big data can be enabled by different types of analytic tools, including those based on SQL queries, data mining, statistical analysis, fact clustering, data visualization, natural language processing, text analytics, artificial intelligence, and so on. It’s quite an arsenal of tool types, and savvy users get to know their analytic requirements before deciding which tool type is appropriate to their needs. All these techniques have been around for years, many of them appearing in the 1990s. The difference today is that far more user organizations are actually using them. That’s because most of these techniques adapt well to very large, multi-terabyte data sets with minimal data preparation. That brings us to big data.