“Data Science” An Essential Skill or a Buzzword? Understanding the Data Science Pipeline
Author:Nischay Bikram Thapa

Data Science is a fascinating technology which has gained a lot of popularity in recent years and is rapidly evolving. We find people discussing this buzzword almost in every tech industry with or without prior knowledge. One of the most attractive things about this career path is considered to be the highest paying job with an annual median salary of $95K for recent graduates in 2019 as listed in Glassdoor and Indeed. However, the essential skills required to become a data scientist is much more challenging and requires a strenuous effort and dedication.

There are a plethora of ways to define the term “data science” but fundamentally it is a multi-disciplinary field that uses scientific methods, processes, algorithm, and systems to extract patterns and insights from data which are available in various forms, such as structured, unstructured and semi-structured which is similar to data-mining. We often believe a data science as a blend of mathematics, statistics and computer science but this does not depict the right information as the picture might be far beyond our imaginations. In order to discuss the subject in more details, let’s dive into the real picture and the essential skills required to become a Data Scientist.

Through the recent hype the data science has picked up, we observe that it has been around for over thirty years. We have been using this technology with synonym practices like Business Intelligence, Business Analytics and Predictive Modeling which now refers to a broad concept of dealing with data and finding the relationship within it to create and enhance the business value of any organization. Extracting enormous amounts of data to identify patterns can help an organization overcome costs, recognize new market opportunities and increase the organization’s competitive advantage. Many companies are also using this technology to improve their business efficiency. Considering some examples with the frequently used tools every day, is google a search engine or an integrated always-on productivity enhancing AI platform? Is Facebook a social network connecting people and making life easier or a data-driven engagement platform with a learning audience? Is Tesla a self -driving car company or a new data-driven experience organization? All these firm excel at the data supply chain which includes:

Raw Data->Aggregated Data->Intelligence->Insights->Decisions->Operational Impact->Financial Outcomes->Value Formation.

All these firms are now heading from a mastery of a System of Insights to System that Learns and the pace of this movement is staggering.

The first thing to become a successful data scientist is to understand the business problem of any organization which is often disguised by many. Having an extensive technical skill of solving problems is not sufficient unless we are able to understand the complications faced by firms in the business environment. So, one of the predominant skills to have is the domain knowledge and foresee the problems and take actions to overcome such possibilities before the company faces any serious challenges over sustainability and performance.

After acquiring the idea from a business perspective and stating the real problem faced, we are able to generate a certain hypothesis that would eventually help to predict and overcome such difficulties. These hypotheses might include questions like:

What impacts have been created by the products which we are selling? To what extent our customers are using it? What relations do we have in our existing products? Do we have anything in common among our customer's preferences?

This step is very essential to solve any business or data science problems as it helps to understand the data and draw meaningful insights which can positively impact the performance of organizations. Only after completion of this step we extract and collect data from various sources to answer our questions and build the right model to support the decision making and create business value. After completion of these processes, we can use our technical abilities to manipulate and aggregate the data, detect missing values and outliers that can affect our model accuracy, create and modify features(columns or independent variables in our data) for better performance, plot and visualize to detect patterns and insights which assists us to answer our questions and support our previous hypothesis, draw conclusions from our dataset and build a predictive model to classify objects and events or even prognosticate financial or marketing trends. There are numerous approaches to solving these technical problems and we have prominent languages like Python, R, and Scala which makes our life easier with their built-in packages and libraries. However, these are beyond the scope of this article but will soon be published and about another hype that machine learning has created and how we can use this in our data science project.

Finally, after we interpret our result and conclusions, the success of the project comes only when we are able to communicate and develop a story from the data to the senior managers and stakeholders. Stories are the most robust delivery tool for information, even more strong and enduring than any other forms of art. It is important to understand your audience and lead them through the major steps of your story and point out the interesting facts and insights using different captions and annotations. This will help them understand your conclusions more appropriately rather presenting as a technical person might not be effective and you might lose your potentials. This step is also considered as one of the major processes in a data science pipeline which people often ignore and regret later. In overall, Data Science is not a new subject matter rather we have been using this since the ’90s with different synonyms and the buzz which it has created in the market is just due to the emergence of the big data which simply refers to data with a huge volume, variety and velocity generated through our enterprise transactional systems, individual smartphones and IoT devices which are stored in so-called data lakes and data marts. Not getting along the buzzword, data science is an essential skill in the twenty-first century not only to become a data scientist but to leverage the business with data-driven approaches.