Machine Learning Engineer: what he does and what skills he needs

23:22
28.11.2024
TroyMan
212

Hello! My name is Anton Morgunov, I am an ML engineer at "Basis.Center" and a program expert of the course "Machine Learning Engineer" at Yandex Practicum. In this article, I will tell you what machine learning is, what an ML engineer does, what skills and competencies he will need, and in which companies this specialist is in demand. At the end, I will give a couple of tips for novice ML engineers.

What is machine learning

This is a technology that allows a computer to self-learn and recognize patterns - almost like a human does. The concept of "machine learning" (from English machine learning) was formulated in the 1950s by artificial intelligence researcher Arthur Samuel: he created an algorithm that played checkers with itself and improved itself in the process of playing.

Thanks to machine learning, a computer can draw, write, read, distinguish between styles of music and visual art, and much more. The goal of ML is to teach a computer to independently find solutions based on data. This is what an ML engineer does: they provide the computer with historical data and explain what result is needed. The computer does not have a predetermined answer, but there is a model, by training which the computer will be able to answer a question with a certain degree of confidence: for example, to predict a certain class or numerical value. Historical data should contain not only information in the form of data, but also the answer that was previously found.

Examples of machine learning applications:

Finance. Calculation of clients' credit rating, detection of fraudulent transactions, prediction of price growth or decline on the stock exchange, risk assessment, optimization of trading strategies, etc.

Business. Forecasting demand for goods, recommending content or products to customers based on their interests, targeted advertising, optimizing advertising campaigns, etc.

Medicine. Assessment of the risk of developing diseases, detection of diseases at early stages based on the patient's medical history, etc.

Technologies. Dynamic pricing, trip time prediction, spam protection, etc.

What tasks does an ML engineer perform

An ML engineer can have many tasks, but today we will focus on three types that fall within the realm of classical (tabular) machine learning, which we cover in detail in the course "Machine Learning Engineer":

Classification and regression tasks. For example, predicting the probability of loan repayment, precipitation in the next hour, credit limit for a bank client, etc.

Recommendation systems and uplift modeling. For example, creating recommendation systems. Take product recommendations on a marketplace: the client sees an individual selection of products based on their search history.

Read also:

Security for Non-Security Experts

Another task in the domain area is uplift modeling. An example is sending SMS messages with a discount promo code. The mailing is paid, so in order not to waste the budget, the machine learning task is to sort potential recipients who are most likely to respond to the mailing.

An ML engineer is a specialist who can lead a machine learning project from start to finish, "full cycle," as we call it. An ML engineer works according to the following algorithm:

Obtaining data from sources. The ML engineer understands the data, cleans it, prepares it for further work, and analyzes it.

Example. The team received a task to develop a demand forecasting system for products. At this stage, ML engineer Ivan collects data from various sources: sales history for the past three years, marketing campaign data, seasonal coefficients, competitor information, etc. During data analysis, gaps in sales history were discovered due to a technical failure in 2021 and inconsistency in date formats between different systems. The ML engineer cleans the data, restores missing values, and brings all the data to a unified format.

Modeling. This is the second stage: we build a baseline machine learning model. The goal of the baseline model is to prove that the business problem can be solved using ML methods. Then we start improving the baseline model using advanced machine learning practices. The goal of the improved model is to solve the business problem as accurately and quickly as possible, bringing maximum benefit to the business.

Example. First, Ivan trains the model on the available data and with the basic configuration of the model itself. Based on the results, he understands that the baseline model demonstrates an unimpressive but satisfactory level of quality. This means that the model can and should be improved: tuning it, generating new features, and experimenting with the architecture and type of model.

Preparing the model for practical application. In technical terms, we deploy the model to production. At this stage, the ML engineer needs certain skills to turn the model into a service that can be accessed to get the desired answer.

Example. The improved model needs to be integrated into the company's workflow. Ivan, together with the team, creates an API for the model, integrates it with the existing procurement planning system, sets up automatic forecast uploads to the inventory management system, and organizes a logging system for all predictions. The model works in real-time and provides demand forecasts for the next three months.

Monitoring the ML model. The ML engineer constantly monitors that the model works correctly as a service: whether it remains relevant, benefits the business, and handles current business changes.

Example. After launching the model, Ivan sets up a monitoring system that tracks the accuracy of forecasts compared to actual sales, model response time, and the number of anomalous predictions. After a month, monitoring shows a decrease in forecast accuracy for certain product categories. Analysis revealed that this is due to the emergence of new customer behavior patterns after a marketing campaign. Ivan sets up automatic model retraining every two weeks on current data.

Skillset of an ML engineer — and how to apply it

The set of basic skills — the skillset that I consider necessary for an ML engineer. It includes the main tools for work, competencies, and soft skills:

Programming languages. For example, Python — for working with data and ML modeling, SQL — for accessing databases.
Data libraries, domain libraries, and tools for conducting experiments. Our goal is to get the best model in terms of quality. To do this, we work with data, performing feature selection and engineering, as well as a series of experiments to establish the best algorithm and optimal set of parameters for it. For example, Airflow — for creating pipelines. And to ensure reproducibility when working with data, we use DVC.
Docker. This is a tool that helps "package" an ML project into a container so that it works the same on any computer.
FastAPI. To make the ML model in Docker accessible, it needs to be turned into a service. For this, we need the FastAPI framework.
Grafana. Allows you to monitor how the model behaves and decide what to do with this model next.
Analytical thinking. Helps to approach problem-solving systematically. Critical thinking, prioritization, and the ability to draw conclusions are also important. And of course, the ability to argue and defend your own ideas.
Communication. It's essential. This includes teamwork and the ability to ask the right questions to business representatives: what exactly needs to be looked at in the data, what they specifically mean, etc. For example, a product manager understands the product and business better than a technical specialist, so you need to ask him the right questions about the product and business goals. This will help you as an ML engineer to perform your tasks more accurately.

Where and with whom an ML engineer works

To understand where ML engineers are especially needed at the moment, you need to study the structure of the economy: it reflects the demand for specialists. Which technologies are developing especially actively? Today it is banking, retail (including e-commerce), telecom. Companies in these areas create the lion's share of demand for ML engineers.

A machine learning engineer works in a technical team together with other specialists in the field of data analysis. For example, with data engineers - they supply data to the ML engineer, with system analysts - they help decipher customer requirements, with data analysts - they help answer business questions, etc. The entire technical team works to qualitatively automate processes - this will allow ML to deliver the most accurate results.

Tips for a novice ML engineer

A couple of recommendations for specialists who are just starting their career path as ML engineers:

Be prepared to constantly learn. The IT field is dynamically developing, including the field of Data Science. New architectures, technologies appear, the functionality of current libraries and solutions is updated. It is important to immerse yourself in all this and stay up-to-date.

Do not be afraid of large amounts of data. And problems with them. We always receive data in not very good quality. We always need to think a little, delve into the essence of the business, understand it. This is reflected in the work process. Do not be afraid to ask yourself questions and look for answers to them - either through data, or from colleagues and stakeholders.

Practice on projects close to real ones. This way you will be better prepared for the market situation and the tasks that the market expects you to solve.

Follow the trends. There are a large number of meetups in IT where experienced guys share best practices and tips. This is also a great opportunity for networking.