How Computer Vision Can Augment How We Work And Design New Customer Experiences

Earlier this week we heard from Nigel Hooke—AI/Machine Learning Lead and Shahin Namin—Machine Learning Engineer at DiUS on their experience building commercial software applications powered by computer vision.

The team shared some common use cases and stepped us through case studies of where computer vision has been used to drive efficiencies in the mining and resources sector, and also improve the customer experience for a new digital insurance product.

You can watch the live stream replay of their talk: How computer vision can augment how we work and design new customer experiences.

Following their talk, Nigel and Shahin took some really great questions from the audience. So, we thought we’d capture them here for anyone who either missed their talk, or wanted to refer back to them.

Why would you use quantisation in the first place?

Shahin: Quantisation is a process that allows you to shrink the model into a smaller size. If you don’t use quantisation, you need to use a much smaller model. By using quantisation, you can have a larger model which is more accurate or more performant, and convert that into a model which is smaller in size, but still accurate. And that’s why we use this technique.

What are some of the challenges around productising ML? Don’t you get everything worked out once you have a working model?

Nigel: In principle, your working model is a demonstration that the idea is sound. Typically when you build a model, you do it with certain tools in an environment that’s quite different from how it’s going to be run in production. Sometimes you have to change the language and the libraries with which it’s implemented. You have to deal with issues around how fast it’s going to do the inference and manage the fact that the model will probably have to be updated quite regularly; more often than you would update the rest of your application. So, you need the model embodied in a different form of software to put it into production. And it’s how you make that transition. It’s really that step that people have to think their way through carefully. There’s evidence to suggest that around half of the models that are developed worldwide don’t successfully make that step for one reason or another. Sometimes it’s due to the organisation. For example, one team inside an organisation will build models, but are not responsible or empowered to put them into production. So they have to hand over their work to another team who may not properly understand what the model is doing or even how machine learning or computer vision works. So there can be organisational barriers around that. Quite a few things have to be lined up to successfully make that transition. And that’s why we call it out as a separate stage in the development process.

What are some of the ways of addressing the lack of model performance due to gaps between training data and the actual data?

Shahin: There are a lot of different techniques that can be used. But one of the techniques is data augmentation. When you use that, you can deal with lack of enough data or make your model less variant to different parameters like scaling or illumination. So that way your model will become more accurate and more performant, and more general in the end. But this is not the only technique, there are lots and lots of different ways that this can be handled depending on the situation.

How small a dataset can you use to get started with Computer Vision?

Nigel: So when we started on the geology drill core sample work, just when we were playing around to see if we could get something to work. I think we started off initially with about half a dozen sample images that we played with and did some augmentation that Shahin described. And we started to get a feeling that this might work. Meanwhile, we were trying to get access to lots more images, but even with just a small number, you can begin to see some results that indicate whether it’s promising or not.

You mentioned ML Services and Custom Models. Why would you go with one approach over another?

Nigel: The barrier to entry for an ML service is quite low. Essentially, it’s a programming API. Anyone who can program in Java, Python or JavaScript can access that API, pass an image over and get back a data structure that tells them what’s in the image. And for general purpose images of everyday objects and everyday scenery, that can be quite effective. If that’s what your application needs, then you’re done. However, if you want to identify a very specific kind of image, like that example I gave of cyclists in racing gear on racing bikes, then you’re going to need a custom model where you have a collection of images. And that’s when you need to go to a custom model.

What sources of data are used for training more general purpose models, e.g. fashion?

Nigel: So they would be images typically taken with a smartphone, in reasonable lighting. I know people who play around with these models will sometimes go to online retail catalogs, and just scrape off images that they find on the web, in magazines etc. Essentially, they’re just JPEG images. And you can start to build up a collection of images that way.

Can you combine multiple models to build a more complex solution? If so, what are the additional challenges associated with that process?

Shahin: Yes, there are techniques called ensemble methods that allow you to combine lots and lots of different models together. You usually end up getting much better results in the end. However, the problem is that you have to host those models individually, and scaling of those models will be much harder. Also in terms of the infrastructure, it will be more expensive to do that. So that is something that is sometimes done, but because the cost will increase pretty dramatically through that process, sometimes we don’t recommend it.

How do you control for bias when training models with regards to diversity in a retail user case like Trendii?

Nigel: It definitely helps to recognise up front that there is a risk of bias. Our society has certain biases, and that’s reflected in what we see around us. So, I think the first thing is to recognise that there is going to be some bias. And if that is a concern, because it may not be a concern in a particular situation, then I think firstly you need to try and build up a training set that has the kind of diversity that reflects the performance of the model you want. So, if you want a model that is equally likely to recognise men or women, for example, then you want to train your model with a training set that has equal representation. If you don’t have enough images of men in a particular outfit, and your images are mostly of women, then you probably need to take the images of men that you do have, and try and augment them with techniques such as Shanin mentioned. That’s probably the best way I would suggest to counter bias. But I think it is also quite an insidious problem, one that we don’t fully understand yet, and is quite challenging.

Shahin: So the first thing we have to do is to find where the biases come from. And whether we can change the training data set in such a way that makes it less biased. But there will be cases where this is impossible or the data set is given as is and you can’t go and collect more data because again, it becomes very expensive. And that’s when you have to choose mathematical techniques that will make your model more fair. So there are many different techniques, and that’s one of the research areas in computer vision and machine learning these days, that you try to mathematically make your model less biased and more fair. Another approach is that you just need to remove where the biases come from, from your training data. It might be a feature that talks about ethnicity or gender, and you make your model totally ignorant of that specific feature as the input. That way, you might get less biased results.

Are there any popular ML SaaS (open source) services you can suggest that developers can use to develop applications without much ML knowledge?

Nigel: On the solution landscape slide that I showed, on the far left, there were ones from each of the major cloud providers such as Microsoft Azure, Google Cloud and Amazon. And each of those clouds offer services. They don’t require you to know much at all about machine learning. Essentially you make a call to their API and you get back this data structure that tells you what it found in the image. As long as your image has everyday objects in it, and that is what you’re interested in, you really don’t need to understand anything more about how it works. It’s just a black box.

Would you ever consider developing a trigger to alert the product owner that the algorithm is no longer performing to the initial specifications?

Nigel: Yes, that’s definitely an interesting question. The first question is how you would even recognise that the model is not performing. That may not be easy. I’m sure it would help to have humans check that on some typical data, from time to time.

Shahin: Yes, you can monitor your model while it’s running in practice, as well through different components on the cloud as an example, that is implemented specifically for that. If your model drifts for any reason, which might be that the data is changing, or there is something in the system that is different from the time when you trained your model, you want to be notified about that. And that’s when you try to retrain your model using more collected data, and data that is more similar to your prediction. And yet, that is an iterative process that you have to do, and you always have to keep track of your models to make sure that it’s working fine.

Do you use any tool or platforms to automate the ML pipeline?

Shahin: There are so many different aspects of a computer vision or machine learning model, there is no specific open source tool that you use for pipelining. We have lots of different tools that work hand in hand, and in the end you might use them because you can handle model versioning data and monitoring of your model, but there is no real single open source or non open source product that is recommended for that.

A key feature of traditional stats models is explainability. What are your thoughts on the explainability of ML models and whether that might restrain adoption?

Shahin: That’s something that we deal with a lot when it comes to different clients, depending on whether they want their model to be very accurate, or they want their model to be totally explainable. There is a trade off, because models that are explainable are usually simpler models. And when you use them, you lose some performance. But you can move to the other end of the spectrum, which is deep learning models and the models that are very complicated mathematically. They are not easily interpreted, but you get better performance. And depending on the use case, we might choose one or the other, or something in between. And that’s what we’ve seen so far with different clients. It might be something that they are interested in or might be something that they are happy to sacrifice because they want to get a better model in the end.

What comes first in solving ML problems. Collecting the data or identifying the opportunity and then focus on data collection.

Nigel: The first thing is to identify the problem you want to solve. I think if you just have data, it’s quite possible that the data you’ve collected won’t actually be suitable for solving the problem. So it’s much better to identify your goal, and then proceed to access the data that you need to address that problem.

Watch the live stream replay of Nigel and Shahin’s talk, here.

AI and Machine Learning

Lunch & Learn Q&A: How computer vision can augment how we work and design new customer experiences