Thursday, 20 August 2015

Predictive Analytics with Big Data - Challenges & solutions


Business analytics and Big Data help improve Customer Experience (precise customer segmentation , interaction & servicing: increased loyalty & retention, Operational Efficiency (Increase transparency, Resource Optimization, Process Quality and Performance) and developing new business modes (expanding existing or generating new revenue streams).

Data becomes big data when its variety, volume or velocity exceeds the ability of traditional IT systems to ingest, store, analyze and process this data. Big data requires often a both technical and cultural change.

Traditional tools work on enterprise data captured in the data warehouses. Additional statistical analysis, data mining, text mining and predictive analytics takes now usually place on separate, dedicated servers. The process of exporting and creating copies on external servers is time consuming and becomes infeasible when data amounts become too large.

So let us talk about Prescriptive Analytics in relationship with Big Data.

Available tools make predictive analysis increasingly manageable. It is not only domain for data scientist any longer. One click predictive modelling automatically run a series of algorithms on the data and select the one with highest accuracy.

A number of challenges still remain and require respective actions:

-          Explore and discover what data you really have and how these data sets relate to each other.

-          Develop insight through a process of experimentation and iteration you gradually.

-          Mine the data to discover patterns and relationships

-          Determine how such data relates to the traditional enterprise data

-          Simplify the process to implement and automate the necessary actions.

-          Minimize data movement to conserve computing resources (ETL architecture becomes with less efficient with increasing data amounts)

-          Use intuitive discovery, BI tools and In-database analytics; use Hadoop for pre-processing data to identify macro-trends and special information (such as out of range values)

-          Enable decision making and informed action based on predictive modelling, business rules and self-learning.

A systematic step-by-step view on process can help companies:

1.       Identify and gather data relevant to the business goal from a variety of sources across data silos in enterprise applications and external sources (social media, public, licensed). Use visualization tools to ease work.

2.       Prepare the data. Integrate and enrich into an analytical data set: Calculate aggregate fields, merge multiple data sources, fill missing data, strip extraneous characters, etc.

3.       Build predictive model using statistical and machine learning algorithms (depending on type/ completeness of data available and level of prediction desired). Run analysis on training data and use model to predict test data set.

4.       Evaluate and assure predictive model is effective and accurate. It must predict the test data set.

5.       Use model in applications and deliver actionable prescription to business (predict opportunity/ avoid negative event)

6.       Monitor, improve and update model (adjust parameters of algorithms, add new/ more data)

There are proprietary and open source programming tools. The Open Software Community is strongly driving predictive analysis. The open source programming language R is a widely used across the industry. API libraries in Python, Java and Scala are available. Many BI platforms (Accenture, Deloitte, Infosys, etc.) already include some predictive analytics capabilities.

IBM, SAS and increasingly SAP are the clear leaders in predictive analysis tools.

IBM has the most comprehensive set to build models, conduct analysis and deploy predictive applications both on-premises and in the cloud. SAS provides data scientists with an all-in-one visualization and predictive analytics solution, integrated with R, Python and Hadoop. Other providers are RapidMiner, Alteryx, Oracle, Alpine Data Labs, Angoss, Dell, Fico and Knime and Microsoft Azure Machine Learning.

One major problem still remains: Much time and effort needs to be spend in the data preparation (30 to up to 60 percent) when using data from data ware houses. A main reason is that data is often stored without context. The process integrating data from multiple databases become very complex. Modelling the context takes another 20 to 30 percent.  The following chart explains:



Traditional companies such as SAS, IBM, Oracle and Cognos try to solve the problem leveraging their computing resources and throwing “brute force” at it.

Another option that online retailers and credit card companies use is to build applications that store their own transactional context and then process that data in batch after execution. The difficulties are: data volumes become large and logging difficult (storage, overhead for the application, etc.). Difficult to gain value from the data in real time. Still significant post-processing occurs. Often it is not feasible to enable already to existing applications.

Another interesting option, proposed by OpTier, would be to create transaction context through a third party software application and build a single stream of data from multiple sources. This is still an area that requires more research.



+++
To share your own thoughts or other best practices about this topic, please email me directly to alexwsteinberg (@) gmail.com.

Alternatively, you also may connect with me and become part of my professional network of Business, Digital, Technology & Sustainability experts at

https://www.linkedin.com/in/alexwsteinberg   or
Xing at https://www.xing.com/profile/Alex_Steinberg   or
Google+ at  https://plus.google.com/u/0/+AlexWSteinberg/posts




No comments:

Post a Comment