You’re not alone. Data is among the top three business challenges to adopting AI/ML, according to last year’s DiUS National Pulse Report on AI/ML in Australia and New Zealand.
We’ve found that clients that come to us with structured, well-labelled and centralised data to enable machine learning are rarer than hen’s teeth. Particularly when an organisation is right at the start of its AI/ML journey.
As this is such a common challenge, I thought it would be useful to share the conversations we often have around data with clients, together with the advice we share.
If you’re new to AI/ML
Focus on speed to value without overinvesting in data infrastructure for your machine learning proof of concept. It’s far better to invest time in making sure you’re focusing on the right problem to solve using machine learning.
We advise organisations to pick a problem that delivers business value, but is also solvable. Demonstrating progress and success early on will deliver learnings and the momentum to continue within an organisation.
We also suggest design thinking be used to select the most promising machine learning ideas. The sweet spot for innovation is at the intersection of desirability, feasibility and viability. Applying the same lenses to ML ideas can help assess ones with the most potential.
It’s completely fine to start off by getting a data scientist to wrangle the data in a jupyter notebook so it’s in the format needed. And you can start with a relatively small dataset. It’s much better to start with what you have and explore ways of expanding your dataset through data augmentation techniques and/or using external data sources.
Look at data challenges as an opportunity. By running an AI/ML experiment, you can gain insights into the quality of your data and its challenges to help inform or reinforce your data strategy for machine learning, which may include a refined data collection approach.
What if you don’t have the data you need for AI/ML?
Be clever. It’s true that data is a big determinant of machine learning success, because of how integral it is to model training. But sometimes you can find a workaround.
That was the case for a challenge posed by a hotel booking site with low conversions from searches into accommodation bookings. We developed a proof-of-concept using machine learning in a custom recommendation engine to lift the conversion rate and drive more bookings, all without that essential user information that’s traditionally used to power sophisticated recommendations.
Additionally, sometimes you need a little help collecting the ‘right’ data. We supported bolttech in developing a machine-learning-powered process for customers to digitally apply for device screen protection insurance, no human intervention required.
One of the keys to making the application process run seamlessly was a quantised machine learning model small enough to run the device and efficiently guide a user to take the right-sized and appropriate-angled photo of their screen in a mirror. This ‘right’ data, or photo, is sent to the cloud, and processed by another computer vision model to validate that it’s not already damaged.
You don’t always need machine learning to collect data for your machine model to run well. But sometimes it really helps, particularly when you need something very specific.
For AI/ML to successfully scale
Once you’re achieving some measure of success with your proof of concept, then it’s the time to think about productionsing your models.
The data cleansing that was initially done by your data scientist needs to be provided as a service which can be used to process batch or streaming data as a part of a bigger system. Depending on the throughput and size of the data, distributed and big data techniques might be the right approach for data cleansing pipelines.
Most of the time, we are helping clients take a step back and invest in data engineering to enable crisp data pipelines and processing components as part of their journey to machine learning. Essentially, we’re helping them get their data house in order.
Ultimately, the importance of data quality, engineering and building appropriate infrastructure and pipelines to enable business outcomes by scaling AI/ML cannot be overstated. If you’re ready for machine learning, ask yourself if your data is as well.
We have a deep understanding of how to enable AI/ML and generative AI by making data contextual and accessible under the hood. Talk to us to see how we can help you.