Developing an ontological framework for effective data quality assessment and knowledge modelling

Citation

Latsou C, Garcia I Minguell M, Sonmez AN, et al., (2022) Developing an ontological framework for effective data quality assessment and knowledge modelling. In: 11th International Conference on Through-life Engineering Services - TESConf 2022, 8-9 November 2022, Cranfield UK, Paper number 5379

Abstract

Big data has become a major challenge in the 21st century, with research being carried out to classify, mine and extract knowledge from data obtained from disparate sources. Abundant data sources with non-standard structures complicate even more the arduous process of data integration. Currently, the major requirement is to understand the data available and detect data quality issues, with research being conducted to establish data quality assessment methods. Further, the focus is to improve data quality and maturity so that early onset of problems can be predicted and handled effectively. However, the literature highlights that comprehensive analysis, and research of data quality standards and assessment methods are still lacking. To handle these challenges, this paper presents a structured framework to standardise the process of assessing the quality of data and modelling the knowledge obtained from such an assessment by implementing an ontology. The main steps of the framework are: (i) identify user’s requirements; (ii) measure the quality of data considering data quality issues, dimensions and their metrics, and visualise this information into a data quality assessment (DQA) report; and (iii) capture the knowledge from the DQA report using an ontology that models the DQA insights in a standard reusable way. Following the proposed framework, an Excel-based tool to measure the quality of data and identify emerging issues is developed. An ontology, created in Protégé, provides a standard structure to model the data quality insights obtained from the assessment, while it is frequently updated to enrich captured knowledge, reducing time and costs for future projects. An industrial case study in the context of Through life Engineering Services, using operational data of high value engineering assets, is employed to validate the proposed ontological framework and tool; the results show a well-structured guide that can effectively assess data quality and model knowledge.

Description

11th International Conference on Through-life Engineering Services - TESConf 2022, 8-9 November 2022, Cranfield UK

Software Description

Software Language

Github

Keywords

data quality issues, data quality dimensions, data quality assessment, ontology, data management

DOI

Rights

Attribution 4.0 International

Relationships

Relationships

Supplements

Funder/s

DMG Mori