At DiUS we are seeing increasing interest from businesses in how to drive new value from machine learning (ML), but the technology is not yet broadly adopted across the enterprise market.

So what’s stopping adoption? Sometimes it’s due to a lack of understanding of which ML application is going to solve the problem an organisation has, but increasingly it comes down to difficulties in productionising ML applications—actually getting them to work within existing corporate systems or consumer-facing applications at an accuracy or performance level that is required.

While we hear a lot about ML, it’s useful to remember that it’s still a relatively new concept. It’s generally agreed, even within ML specialists, there’s still a fair way to go towards a set of best practices for consistently and reliably productionising ML applications. Like all areas of emerging technology, as ML matures we will see best practices emerge and solidify. 

However, there’s no denying that the commercial need exists now. We are seeing increasing attention and funding dedicated to ML projects. However, understanding of the ML world varies from client to client. Some clients have already identified a problem they are wanting to solve using ML, while others have some budget dedicated to ML and will ask for our help in defining the right problem to solve. 

As specialists in helping organisations navigate new areas of tech, we wanted to share what we’ve learnt about how to successfully productionise ML projects. The practical points discussed here will benefit ML specialists and consultants, MLOps and project managers or software developers who are involved in productionising ML projects.  

It’s all about data

In recent years, the most valuable asset in the world has shifted from oil to data. Data readiness is usually the biggest bottleneck when it comes to ML. 

A quick refresher that the power of ML lies in its ability to automate many cumbersome tasks that were previously carried out by humans as well as impossible or highly complicated by pure logic implementations in computer systems. This includes applications of computer vision, natural language processing and chatbots, recommendation systems, time-series predictions, smart control systems and robotics, pattern recognition, learning how to interact with the environment and reward design, 3D data categorisation and multimodal data applications…just to name a few. And they all involve data.

Indeed, most of the practical ML projects today involve supervised learning, where the model is provided with labeled training data. In such cases, the data labels are vital. Depending on the nature of the project, the data might need to be annotated manually, or might come with the labels. As an example, in computer vision projects the image data is usually annotated manually at a later date. However, in a call centre senario, a customer is usually asked to rate their conversation which provides the sentiment label. As consultants, we sometimes help our clients devise and adopt an appropriate data collection strategy. 

Data readiness on ML projects

We have worked with clients who have collected their data in a structured, clean and centralised way, perfect for an ML initiative. Conversely, and more often, we have seen clients struggle to provide the data that is required for their ML project. For such clients, we ask them to take a step back and invest in data engineering to enable crisp data pipelines and processing components. This also often results in a data lake which can be used to fuel analytics, as well as ML projects.       

It would come as no surprise then, that data cleansing is one of the most important tasks when it comes to an ML project. The traditional approach is to wrangle the data in a jupyter notebook by a data scientist. This is sufficient for a Proof of Concept (POC), however for ML productionisation, data cleansing needs to be provided as a service which can be used to process batch or streaming data as a part of a bigger system. Depending on the throughput and size of the data, distributed and big data techniques might be the right approach for data cleansing pipelines.             

After working with our clients to collect and clean the data, finally the ML can begin. ML is usually the most uncertain and riskiest part of the project, and that is why we usually suggest an ML experimentation such as a POC before productionisation. For production, implementing an ecosystem around the ML model is just as important, and usually more time consuming than the POC itself. For productionisation, the ecosystem and model should go hand in hand. 

Machine Learning success points

An ML model can be demonstrated abstractly as a computation unit with data as the input and its best guess, according to the data, as the output. The output can be one or more predicted classes, a predicted scalar, or in more complicated cases, segmentation of the input image, an action the system takes, etc.  While ML experts and data scientists are specialists in building and delivering high-performance ML models, they still need to fine tune the algorithms, data and approaches. It’s a mistake not to look under the hood, or inside the box, to check the validity and accuracy of results.

Model evaluation

Whatever the goal of the model, it’s important to boil down the performance of the model to a set of success metrics. In practice, a set of mathematical metrics are used to give the notion of the model’s performance. These metrics also provide a way to compare different models designed for the problem.

Choosing the right metrics which give confidence in the model’s performance is extremely important. A single metric is very unlikely to account for all different aspects of the problem. Therefore, it is sometimes necessary to consider a couple of metrics altogether. You want to avoid typical failures such as those that occur in unbalanced datasets where the model actually behaves very poorly, but delivers a very high accuracy level. 

Connect the dots, then iterate and reiterate

There is a high chance of failure when it comes to ML projects. Therefore, try to find and address the pain points and bottlenecks as soon as possible. This is impossible without connecting all the dots first. This is where the ML experts, data engineers, developers, ML/DevOps and architects work hand in hand to prepare the infrastructure and data transformation pipelines. 

One good approach is to make sure to implement all of the required pipelines that send the data to the model and receive the results from the model. This can be facilitated by standardising the input and output format of the model in the early stages of the project. For this, we always suggest an encapsulation which will allow quick deployments of the newly-trained model, without requiring any change to the other parts of the system. 

Fortunately, microservices enhance this process. Using microservices, each model can be considered an independent component which can be hosted as a service working in conjunction with other services through API calls and web service requests. This provides the right infrastructure to replace the previous models and deploy newly-trained models without propagating the changes to other components.        

We also suggest fast iterations with simultaneous model improvements and reproducibility for the model component. This means we should aim for improvements to the model in different iterations and always keep track of model performance. For this, the practical approach is to first define and implement one or two benchmarks and then try improving the performance by adapting different techniques and models. The benchmark can be an easily obtainable model like a cloud ML/AI service or a simple statistical or ML model which can be implemented and spun up quickly. We suggest preparing the baseline and deploying it into the test environment as quickly as possible. This will allow the team to test the ML service end-to-nd and make sure the project is not blocked by the ML model.

It is also important to keep an eye on a few other aspects of the data and model, along with model performance. After each iteration of model preparation, we look deeper into the data and model outputs to find learned biases and incorrect model predictions. Sometimes this identifies unseen cases on the training data where the model performs poorly, the fairness of the model, unbalanced classes, and in some applications, adversarial attacks. These findings often result in the client collecting more data to cover bad cases, and removing biases or increasing the data size to improve model accuracy. This iterative circle of model preparation and improvement can reduce the uncertainty and risk of the ML project.   

Automation rules

In ML projects, there are many code snippets and scripts that need to be executed to cleanse and prepare the data in a model-friendly format. It might look as if this code will only be executed once, however it is very likely that the code will need to be executed again at some point. For example, the code that prepares the training data might be used anytime in the future when more data is collected and when the model needs to be updated. Hence, we suggest spending some time automating the execution of any code that you predict will be run more than once. I’d like to note that automation can have different levels, from writing a main function that takes the input parameters and runs the code with new parameters, to web services and web apps that allow the end user with the right permission to run the code. 

Keep an eye on machine learning tools and cutting-edge models

Model iteration and versioning is very important in the ML lifecycle, and there are some tools designed for this purpose which should be utilised by ML experts. MLFlow, Kubeflow, or your cloud platform equivalent, are among the tools we suggest should be considered for model versioning, which also help with reproducibility and metrics comparison. We also like to recommend DVC for data versioning and the usage of docker containers for reproducibility and infrastructure agnosticism. The requirement for data and model tracking, versioning, packaging and deployment are common enough in ML projects that ML frameworks and libraries like TensorFlow and PyTorch are extending their ecosystems to address them.

ML is an evolving discipline with lots of breakthroughs every couple of months. It is important for ML experts to keep up-to-date on these emerging techniques, models and breakthroughs, and to quickly apply them to the problems being solved. Some techniques like data augmentation, transfer learning, self-supervised learning, neural structured learning, hyperparameter tuning and tools like quantisation, knowledge distillation, and state-of-the-art models should be in an ML expert’s toolkit which can be drawn on and deliver significant improvements.

Model deprecation is unavoidable

We believe that ML projects need more maintenance than non-ML projects. This is due to the model being dependent on the data properties it is used on; if the data changes, the model performance will also change. This phenomenon, known as model drift, is more common than you might think. Model drift, coupled with the constant improvements and breakthroughs in ML, means you should always be prepared to make changes in the core engine of your ML service, the model! We recommend replacing your models with more performant models trained on all the collected data. Even large-scale and established projects such as the Google search engine do this, for example it recently adopted the BERT model. 

A highly collaborative, multidisciplinary team

ML projects can be complicated, and to succeed there is a real need for different expertise from ML experts to DevOps and software engineers. Managing a cross functional ML team can require more effort as the team members will use different languages and have different criteria in mind when implementing a system. 

DevOps and software engineers usually focus more on the functional and nonfunctional requirements of a project, and try to provide more resilient functional services with high throughput. However, ML experts account for model development and improvement and therefore are inclined to choose larger models which contradict the latency requirements of the project. This highlights the need for more interaction and communication within the team, which will more likely result in the success of an ML project. 

Key takeaways

I hope our experience in productionising successful ML projects helps pave the road for others. Here are five of our key takeaways from this blog post to get you started: 

  1. Invest in data preparation early and reap the benefits later
  2. Start with a POC to prove viability, and use benchmarks to assure performance improvement and be mindful of biases and unfairness
  3. Invest in an automated ML pipeline and cross-functional team to enable reproducibility, benchmarking, quality checks, etc.
  4. Iterate and reiterate.
  5. Keep yourself up to date!