Data jobs, part 1

Data jobs, part 1

Business Analyst, Data Analyst, Data Engineer and Data Scientist

The excitement surrounding every kinds of data and AI-related topics brings an unprecedented amount of attention on data jobs. The issue: the associated challenges are not always well understood, and some editors take advantage of this lack of clarity to promote softwares and services expected to make an employee become a Data Scientist, Data Analyst or Data Engineer in just a few clicks. More and more data-related jobs have appeared with the big data, yet each of these titles is linked to specific issues and skills.

Here is a synthetic presentation of these different jobs. Please keep in mind that the frontiers between them are porous and in constant evolution with respect to the complexity of the projects and/or the companies. In addition, other roles exist and will be detailed in a future article (Data Architect, Data Manager, Chief Data Officer, Statisticians, etc).

Business Analyst: at the service of the Business Intelligence

Business Analysts spontaneously question the numbers. They use the tools at their disposal to generate reports enriched with graphical representations. Gradually, they set up performance indicators that allow them to move from reporting to monitoring. Their descriptive analyses allow them to design hypotheses based on the data they manipulate. They do not have any particular programming skills and are not familiar with statistical correlations or database management systems (DBMS). On the other hand, they are comfortable with proprietary tools like Excel, Power BI, Tableau or Qlik.

Example of a situation: to plan for the last quarter, the management is requesting a detailed week-by-week report on ice cream sales for the current year.

 There is a 32% increase in sales between the 1st and 3rd quarter. This increase could be due to the fact that during these periods the population is mainly outdoor. Indeed, we sell more ice creams on festival days and summer activities (music festival, etc.). We could hypothesize that the number of people outside directly impacts our sales. In this case, we should have a sales peak during the end of the year festivities.

Data Analyst: minimizing risk

Data Analysts have a strong knowledge of statistics. Their knowledge of hypothesis testing, probability distributions and other regression methods allows them to assist decision-makers by minimizing the risks associated with their decision-making. Their role is not limited to manipulating data through proprietary tools: they develop real workflows that start from raw data and go through inferential or Bayesian statistics to achieve results that can be directly used by the business. Data Analysts generally master one or more programming languages (including R or Python) as well as DBMS.

By analyzing the data, I was able to see that the variability between sales was closely related to the number of people outdoor. The phenomenon is significant enough to exclude the possibility that it is due to chance. By investigating further, I understood that the number of people outside is highly correlated and explained by the temperature of the day. The higher the temperature, the higher the sales. Since the temperature is low during the Christmas holidays, the population we can reach is smaller and consumes differently, so we should offer a different type of product.

Data Scientist: building models

Thanks to their expertise and above all their solid experience in statistics and machine learning, Data Scientists (DS) develop systems capable of performing complex analysis tasks. Their role is to create predictive models that are robust enough to provide the level of performance expected by decision-makers. The DS are therefore able both to evaluate statistical models and to take the necessary hindsight to identify their weaknesses and propose improvements. They also create Machine Learning algorithms using learning models adapted to their needs, test and optimise them, as Data Science is an iterative process.

Example of a situation: the management wants to adjust its production and stocks according to demand, without the risk of being unprepared for peaks in demand.

I spoke with the Data Analyst who informed me of his conclusions. With weather forecasts, it should be possible to predict the volume of ice sales as a function of time and other parameters such as the holding of special events or school holidays. From this model we could estimate the quantity of raw materials to be ordered. In addition, it might also be possible to build a recommendation system that will indicate which flavours to offer on a given population segment or geographical area.

Data Engineer: ensuring access to reliable data

Exploiting data is good, exploiting relevant and quality data is better. To extract the full value of their data, companies need infrastructures that can ensure reliable, up-to-date and secure data exploitation, as well as an architecture that can support the evolution of heterogeneous data volumes. As specialists in the Cloud, infrastructure administration and software development, the Data Engineers build and optimize the architecture to support the data production, storage and operations pipeline. Their work allow the Data Analyst and Data Scientist to focus on analytical missions. 

The Data Engineers manage the integration of data sources, create the APIs that will make the data usable and supervise the entire infrastructure to ensure optimal performance. They juggle with technologies such as Java and/or Scala, Pig, Hive, Hadoop, Spark, Kafka and NoSQL, taking advantage of the abstraction capabilities offered by Cloud environments (Amazon Web Services, Google Cloud Platform, Microsoft Azure).

We will therefore build an infrastructure that can support up-to-the-minute updates of stock market prices as well as weather data, sports calendars and many other parameters. Our infrastructure must therefore be able to aggregate this data in near real time.

The temptation of the five-legged sheep

Rather than recruiting different profiles, some companies are tempted to look for the “five-legged sheep”. We regularly see job offers relating to “Data Scientist full stack” profiles, or research on a DS capable of designing a big data infrastructure. From our point of view, this is a mistake: Data Scientists must, for example, keep a constant watch in order to keep up to date and develop their knowledge in a constantly evolving field. Similarly, the tasks and knowledge of Data Engineers and Data Analysts are so vast that they must also provide a substantial amount of work in order to remain efficient. While it is true that Data Scientists and Data Engineers have common skills, the knowledge of machine learning, statistics, forecasting, automatic natural language processing (NLP) that the Data Scientists have do not replace the advanced skills in development, on big data frameworks, in system administration that Data Engineers have.

Conclusion

This first part of the overview of data-related jobs highlights the diversity of the associated missions. For a company that wants to extract value from its data, the most important thing is not the job title: the key challenge is to develop and encourage a true data culture among the company’s businesses. As a result, it is necessary to understand what their roles are and how they can serve the company’s strategy, rather than looking for a ninja doped with ML algorithms that is unable to communicate with its collaborators, either way. Finally, it is essential that the company’s decision-makers embrace and incarnate this transformation of the organization operated by the Data. This will be achieved by developing a clear understanding of the benefits, the challenges and the consequences of their decisions: an essential requirement for building a data-driven culture.

Firms must become much more serious and creative about addressing the human side of data if they truly expect to derive meaningful business benefits.

Harvard Business Review, 2019