Citation:
Latsou C, Garcia I Minguell M, Sonmez AN, et al., (2022) Developing an ontological framework for effective data quality assessment and knowledge modelling. In: 11th International Conference on Through- life Engineering Services - TESConf2022, 8-9 November 2022, Cranfield, UK, Paper number 5379
Abstract:
Big data has become a major challenge in the 21st century, with research being carried out to classify, mine and extract
knowledge from data obtained from disparate sources. Abundant data sources with non-standard structures complicate even
more the arduous process of data integration. Currently, the major requirement is to understand the data available and detect
data quality issues, with research being conducted to establish data quality assessment methods. Further, the focus is to improve
data quality and maturity so that early onset of problems can be predicted and handled effectively. However, the literature
highlights that comprehensive analysis, and research of data quality standards and assessment methods are still lacking. To
handle these challenges, this paper presents a structured framework to standardise the process of assessing the quality of data
and modelling the knowledge obtained from such an assessment by implementing an ontology. The main steps of the
framework are: (i) identify user’s requirements; (ii) measure the quality of data considering data quality issues, dimensions
and their metrics, and visualise this information into a data quality assessment (DQA) report; and (iii) capture the knowledge
from the DQA report using an ontology that models the DQA insights in a standard reusable way. Following the proposed
framework, an Excel-based tool to measure the quality of data and identify emerging issues is developed. An ontology, created
in Protégé, provides a standard structure to model the data quality insights obtained from the assessment, while it is frequently
updated to enrich captured knowledge, reducing time and costs for future projects. An industrial case study in the context of
Through life Engineering Services, using operational data of high value engineering assets, is employed to validate the
proposed ontological framework and tool; the results show a well-structured guide that can effectively assess data quality and
model knowledge.