Integrating an incident dataset with a question and answering language model to assist hazard identification: comparison of an extractive and generative model

Date published

2024

Free to read from

2025-02-27

Supervisor/s

Journal Title

Journal ISSN

Volume Title

Publisher

SAGE

Department

Type

Article

ISSN

1748-006X

Format

Citation

Ricketts J, Guo W, Pelham J, Barry D. (2024) Integrating an incident dataset with a question and answering language model to assist hazard identification: comparison of an extractive and generative model. Proceedings of the Institution of Mechanical Engineers, Part O: Journal of Risk and Reliability, Available online 29 September 2024

Abstract

Robust hazard identification (HAZID) relies upon extensive knowledge of the system being analysed, the technical aspects, and how it will be used operationally. Typically, this knowledge is held by human participants who can draw out answers in natural language to hazard related questions based upon their own experience. However, several threats exist to this, such as high staff turnover, a poor learning from incidents capability or even insufficient Information Technology resources. Alternatively, incident databases hold vast amounts of hazard information that can be transformed into a source of knowledge. As mitigation to the aforementioned issues, this paper presents a Question and Answering (Q&A) Bidirectional Encoder Representations from Transformers (BERT) language model trained upon aviation incidents and a unique Q&A dataset. The model can extract answers to typical HAZID questions, based upon factual incident reports. Alongside this extractive approach, the paper also explores the use of a generative Large Language Model combined with an incident dataset. Both models proved a useful addition to HAZID activities based upon the Structured What If Technique (SWIFT), answering safety-themed questions based upon a retrieved context of incident reports that semantically matched the query. For the purposes of HAZID, it was suggested that the generative option is preferable based upon its ease of implementation, lower resource requirements and quality of responses. Additionally, it is shown that it is possible for organisations to train and create their own custom models for HAZID purposes. Future work may wish to consider the application of models that can hypothesize scenarios based upon incident reports, building further understanding to the relationships between causes, hazards and consequences.

Description

Software Description

Software Language

Github

Keywords

Natural language processing, hazard analysis, information retrieval, incident reporting, safety analysis, 4005 Civil Engineering, 4015 Maritime Engineering, 40 Engineering, 4017 Mechanical Engineering, 4005 Civil engineering, 4015 Maritime engineering, 4017 Mechanical engineering

DOI

Rights

Attribution 4.0 International

Relationships

Relationships

Resources

Funder/s