Bad public sector data 'limiting effectiveness of AI'

Share this on social media:

An investigation into the connection between GP prescriptions and the contributing factors affecting them has been hampered by the poor quality of public sector data. 

Business intelligence and data science company Polymatica announced the findings of an analysis of government GP data, which discovered the amount of asthma medication prescribed has increased by 17 per cent in six years, while the amount of antibiotics prescribed dropped by 12 per cent during the same period.

However, the company said that any conclusions were difficult to come by, with poor data quality limiting the ability to assess possible root causes.

CEO Mark Hinds said: 'We wanted to investigate the factors behind the findings, but we’ve been hindered by the quality of the data. What we’re able to see is that the message about reducing the number of antibiotics being prescribed is largely getting through. However, we can’t dig into why this has decreased, or why the level of asthma medication being prescribed has risen.'

He continued: 'We wanted to see if external factors such as socioeconomic status or pollution would affect the level of prescriptions. But the data left us questioning whether the infrastructure and processes in place for data entry and management are up to standard. The government is clearly willing to make changes to public health policy – but what are they basing these decisions on? You need clean data to understand the root cause of problems like rising asthma medication. This is something that as a society we need to monitor and understand, but poor data quality is restricting these efforts. The consequences for this could be sizeable – impacting policy decisions based on data analysis and limiting the effectiveness of new technologies such as artificial intelligence (AI).'

The main data quality issue stemmed from incorrect and varying inputs on addresses. Polymatica found that cities and counties were often entered incorrectly or inconsistently; suggesting a manual input process instead of drop-down lists, resulting in spelling mistakes, use of various abbreviations, and other inconsistencies. This made aggregating the data difficult which in turn made it challenging to connect it with pollution and socioeconomic data – making analysis unreliable.