A hybrid machine learning and text-mining approach for the automated generation of early warnings in construction project management.

Date

2017-05

Free to read from

Journal Title

Journal ISSN

Volume Title

Publisher

Department

Type

Thesis

ISSN

Format

Citation

Abstract

The thesis develops an early warning prediction methodology for project failure prediction by analysing unstructured project documentation. Project management documents contain certain subtle aspects that directly affect or contribute to various Key Performance Indicators (KPIs). Extracting actionable outcomes as early warnings (EWs) from management documents (e.g. minutes and project reports) to prevent or minimise discontinuities such as delays, shortages or amendments is a challenging process. These EWs, if modelled properly, may inform the project planners and managers in advance of any impending risks. At presents, there are no suitable machine learning techniques to benchmark the identification of such EWs in construction management documents. Extraction of semantically crucial information is a challenging task which is reflected substantially as teams communicate via various project management documents. Realisation of various hidden signals from these documents in without a human interpreter is a challenging task due to the highly ambiguous nature of language used and can in turn be used to provide decision support to optimise a project’s goals by pre-emptively warning teams. Following up on the research gap, this work develops a “weak signal” classification methodology from management documents via a two-tier machine learning model. The first-tier model exploits the capability of a probabilistic Naïve Bayes classifier to extract early warnings from construction management text data. In the first step, a database corpus is prepared via a qualitative analysis of expertly-fed questionnaire responses that indicate relationships between various words and their mappings to EW classes. The second-tier model uses a Hybrid Naïve Bayes classifier which evaluates real-world construction management documents to identify the probabilistic relationship of various words used against certain EW classes and compare them with the KPIs. The work also reports on a supervised K-Nearest-Neighbour (KNN) TF-IDF methodology to cluster and model various “weak signals” based on their impact on the KPIs. The Hybrid Naïve Bayes classifier was trained on a set of documents labelled based on expertly-guided and indicated keyword categories. The overall accuracy obtained via a 5-fold cross-validation test was 68.5% which improved to 71.5% for a class-reduced (6-class) KNN-analysis. The Weak Signal analysis of the same dataset generated an overall accuracy of 64%. The results were further analysed with Jack-Knife resembling and showed consistent accuracies of 65.15%, 71.42% and 64.1% respectively.

Description

Software Description

Software Language

Github

Keywords

Risk management, unstructured data, construction project documents, text mining, early warning signal, data mining, artificial intelligent, machine learning Naive Bayes, TF-IDF methodology, key performance indicators (KPI), K Nearest Neighbour

DOI

Rights

© Cranfield University, 2017. All rights reserved. No part of this publication may be reproduced without the written permission of the copyright holder.

Relationships

Relationships

Supplements

Funder/s