5 Tips to Align Your Data Strategy With AI
Fit-For-Purpose AI Data Strategies
Large businesses across the globe have enjoyed productivity gains through their applications of machine learning and similar technologies. Lately, their efforts have turned to exploring the potential of generative AI. Unsurprisingly, small and medium organisations are keen to follow suit and reap these possible benefits — ideally before their competition does.
However, data is the lifeblood of all machine learning system, so any AI is only as good as its data.
Accordingly, the business’s data strategy must be fit to accommodate its use of this group of technologies. If data strategy is an afterthought, the downstream difficulties can be costly to rectify.
1. Align AI with Data Governance
Data governance is a term often presented without definition. In general, it refers to the set of overarching principles and policies that inform an organisation’s day to day data management practices. Good data governance means having precise oversight, clear chains of responsibility, and a company-wide understanding of, and adherence to, standard practices. This provides the foundation for operationalising good, consistent data management and strong security.
Data governance is an important tool for staying abreast of your legal and ethical obligations, especially in an era of fast-changing data privacy legislation. However, research conducted by the Governance Institute of Australia indicates that the understanding of data governance among Australian businesses is not particularly robust. For example, 60% of respondents to its survey reported that their company board “does not have an understanding of the organisation’s current data governance challenges.”
Use of third-party AI products must align with business goals. Consulting with end-users to develop an explicit AI policy to which staff can refer can help direct best practice AI use for your organisation. Concomitantly, a clear and practical policy can also help prevent “shadow” AI and avert the very real risks associated with unauthorised AI use. This process may also clarify how AI fits in with broader business goals, so it can also inform the selection of specific AI tools and providers.
2. Set Clear Goals
You likely have a real and practical use-case in mind for how to use AI within your organisation.
There are many possible applications, but common examples include all-hours customer service or supply chain optimisation. Before you commence an AI project, it’s vital to determine the outcomes and performance indicators you need to see delivered in order to make the process worthwhile.
Despite what enthusiastic gurus online might post, implementations of AI with the vague goal of “improving some kind of productivity” do not offer clear evidence of success. Having clear and measurable goals will keep your business on track, and it can also help protect you from the sunk cost fallacy!
Clear and measurable goals will also ensure you know what kind of data your new AI system will require. The required data will change depending on use case, and the type of data will inform your priorities and obligations.
Using the earlier examples cited, a customer service chatbot might require transaction and behavioural data, which means you’ll have to answer questions about privacy and personally identifiable information before going forward. On the other hand, supply chain optimisation will require information like historical sales data, market trends, and inventory data.
3. Take Stock of Current Data
Auditing your existing data will be foundational to an effective AI deployment.
High quality outputs need high quality data. That starts with complete knowledge of the data your organisation currently holds, where it’s held, and what you’re currently doing with it. This process is called a data audit.
When auditing data, we ask questions like ‘What data do we keep?’ and ‘Where do we store the data? In what format?’ and ‘Do these systems speak to one another, or are they confined to silos?’
Undertaking this process will permit you to find gaps in your existing data and raise awareness of silos, both of which ought to be addressed to get the most out of a new AI tool.
This is also a great time to determine where you can append metadata to your existing data, as needed. Metadata is information that describes information — an AI system may be using metadata to tell the difference between any given string of numbers and a phone number, for example.
4. Maintain Data Quality
In the data services sector, we use the phrase ‘garbage in, garbage out,’ like a mantra. When it comes to putting data to good use, nothing is more true. This is also the case with regard to machine learning systems.
Data quality factors to consider to ensure your AI system operates as intended include:
- Deduplication
- Incorrect data
- Inaccurate or incomplete data
- Validation
- Verification
- Appropriate metadata
- Thoughtful, compliant and ethical data sourcing
Most available AI products assume that the information provided to them is complete, accurate and correct — or, rather, they have no reliable way of assessing the truth of the information provided to them at all. If the data they use is wrong, the outputs those systems generate will look indistinguishable from data poisoning.
5. Consider Data Enrichment
If you don’t have complete data for your purposes, you might need to consider enriching what you do have by adding to it from third party sources.
Not all sources of data for enrichment are made equal.
It’s important to critically assess the data source you use to enrich your data. It’s not always clear from where organisations might source their data, which presents the risk of appending low-quality data to your own records. Since the goal of enrichment is to improve the performance of business data for an AI deployment, this is to be avoided.
However, while some third party data sets can be high-risk, some are still secure and reliable. Highest-trust data often comes from reputable organisations that handle business or logistical information on a massive scale, such as government sources or Australia Post.
Properly curated, the judicious use of third party data to enrich your own data sets can assist with completeness as well as sample size and breadth.