Data scientists are the business ‘rock stars’ of the 21st century

Data science is a complicated discipline that requires advanced skills and competencies in areas such as statistics, computer science, data mining, mathematics, and computer programming.

As stated countless of times, data scientists are the business ‘rock stars’ of the 21st century. Although what data scientists do can be quite complex, what they are trying to achieve is not.

In fact, I find that the very best introductory book to data science is Moneyball: The Art of Winning an Unfair Game by Michael Lewis (W.W. Norton & Company, 2004).

Data science is about finding new variables and metrics that are better predictors of performance. It’s that simple.

But the power of that simple statement is game-changing, and the success that Billy Beane and the Oakland A’s have achieved by making player acquisitions and in-game decisions were based on a different, more predictive set of metrics.

BI vs data science: The questions are different

When clients ask me to explain the difference between a Business Intelligence (BI) analyst and a data scientist, I start by explaining that the two disciplines have different objectives and seek to answer different types of questions.

Also Read: Eduardo Saverin-owned tech fund closes first part of second fund at US$406M

BI focuses on descriptive analytics, aka the “What happened?” types of questions.

Examples include: How many widgets did I sell last month? What were sales by zip code for Christmas last year? How many units of Product X were returned last month? What were company revenues and profits for the past quarter? How many employees did I hire last year?

BI focuses on reporting on the current state of the business, or as is now commonly called Business Performance Management (BPM).

It also provides retrospective reports to help business users to monitor the current state of the business and answer questions about historical business performance. These reports and questions are critical to the business, sometimes required for regulatory and compliance reasons.

Data science questions, on the other hand, search for the variables and metrics that are better predictors of business performance.

Consequently, data scientists focus on predictive analytics (“What is likely to happen?”) and prescriptive analytics (“What should I do?”) types of questions.

For example, predictive questions (“What is likely to happen?”) include: “How many widgets will I sell next month?” and “What will sales by zip code be over this Christmas season?”

The analytic approaches are different

Unfortunately, these explanations are insufficient to satisfactorily answer the question of what’s different between BI and data science. So let’s closely examine the different engagement approaches (including goals, tools, and techniques) that the BI analyst and the data scientist use to do their jobs.

BI analysts engagement process
The BI analyst engagement process is a discipline that has been documented, taught and refined over three decades of building data warehouses and BI environments.

The data scientist engagement process
The data science process is significantly different. In fact, there is very little from the BI analyst engagement process that can be reused in the data science engagement process.

The data models are different

The data models that are used in the data warehouse to support an organisation’s BI efforts are significantly different from the data models the data scientists prefer to use.

Data modelling for BI
The world of BI (aka query, reporting, dashboards) requires data technology that allows business users to create their own reports and queries. To support this need, Ralph Kimball pioneered dimensional modelling — or star schemas — while at Metaphor Computers back in the 1980s.

Data modelling for data science
In the world of data science, Hadoop provides an opportunity to think differently about how we do data modelling. Hadoop was originally designed by Yahoo to deal with very long, flat weblogs.

Its design included large data blocks (Hadoop’s default block size stands at 64 MB to 128 MB against relational database block sizes that are typically 32 Kb or less). And, to optimise this block size advantage, the data science team wants very long, flat records and long, flat data models.

Also Read: The open source business model: can ‘free’ be ‘profitable’?

For example, some data scientists prefer to “flatten” a star schema by collapsing or integrating the dimensional tables that surround the fact table into a single, flat record in order to construct and execute more complex data queries without having to use joins In the world of data science.

Hadoop provides an opportunity for us to think differently on how we can do data modelling.

Summary

Organisations are realizing that data science is very different from BI and that one does not replace the other.

Both combine to provide the “dynamic duo” of analytics — one focused on monitoring the current state of the business and the other trying to predict what is likely to happen and then prescribe what actions to take.

Big data is a key enabler of a new discipline called data science. Data science seeks to leverage new sources of structured and unstructured data, coupled with advanced predictive and prescriptive analytics, to uncover better predictors of performance.

As discussed in this chapter, BI is different from data science in the following ways:

The questions are different.
The analytic characteristics are different.
The analytic engagement processes are different.
The data models are different.
The business view is different.

—

Image by everythingpossible

e27 publishes relevant guest contributions from the community. Share your honest opinions and expert knowledge by submitting your content here.

The post Differentiating business intelligence from data science appeared first on e27.