A hybrid machine learning and text-mining approach for the automated generation of early warnings in construction project management.

dc.contributor.advisorMakatsoris, Charalampos (Harris)
dc.contributor.authorAlsubaey, Mohammed Hajer
dc.date.accessioned2022-05-03T18:06:10Z
dc.date.available2022-05-03T18:06:10Z
dc.date.issued2017-05
dc.description.abstractThe thesis develops an early warning prediction methodology for project failure prediction by analysing unstructured project documentation. Project management documents contain certain subtle aspects that directly affect or contribute to various Key Performance Indicators (KPIs). Extracting actionable outcomes as early warnings (EWs) from management documents (e.g. minutes and project reports) to prevent or minimise discontinuities such as delays, shortages or amendments is a challenging process. These EWs, if modelled properly, may inform the project planners and managers in advance of any impending risks. At presents, there are no suitable machine learning techniques to benchmark the identification of such EWs in construction management documents. Extraction of semantically crucial information is a challenging task which is reflected substantially as teams communicate via various project management documents. Realisation of various hidden signals from these documents in without a human interpreter is a challenging task due to the highly ambiguous nature of language used and can in turn be used to provide decision support to optimise a project’s goals by pre-emptively warning teams. Following up on the research gap, this work develops a “weak signal” classification methodology from management documents via a two-tier machine learning model. The first-tier model exploits the capability of a probabilistic Naïve Bayes classifier to extract early warnings from construction management text data. In the first step, a database corpus is prepared via a qualitative analysis of expertly-fed questionnaire responses that indicate relationships between various words and their mappings to EW classes. The second-tier model uses a Hybrid Naïve Bayes classifier which evaluates real-world construction management documents to identify the probabilistic relationship of various words used against certain EW classes and compare them with the KPIs. The work also reports on a supervised K-Nearest-Neighbour (KNN) TF-IDF methodology to cluster and model various “weak signals” based on their impact on the KPIs. The Hybrid Naïve Bayes classifier was trained on a set of documents labelled based on expertly-guided and indicated keyword categories. The overall accuracy obtained via a 5-fold cross-validation test was 68.5% which improved to 71.5% for a class-reduced (6-class) KNN-analysis. The Weak Signal analysis of the same dataset generated an overall accuracy of 64%. The results were further analysed with Jack-Knife resembling and showed consistent accuracies of 65.15%, 71.42% and 64.1% respectively.en_UK
dc.description.coursenamePhD in Manufacturingen_UK
dc.identifier.urihttp://dspace.lib.cranfield.ac.uk/handle/1826/17846
dc.language.isoenen_UK
dc.rights© Cranfield University, 2017. All rights reserved. No part of this publication may be reproduced without the written permission of the copyright holder.
dc.subjectRisk managementen_UK
dc.subjectunstructured dataen_UK
dc.subjectconstruction project documentsen_UK
dc.subjecttext miningen_UK
dc.subjectearly warning signalen_UK
dc.subjectdata miningen_UK
dc.subjectartificial intelligenten_UK
dc.subjectmachine learning Naive Bayesen_UK
dc.subjectTF-IDF methodologyen_UK
dc.subjectkey performance indicators (KPI)en_UK
dc.subjectK Nearest Neighbouren_UK
dc.titleA hybrid machine learning and text-mining approach for the automated generation of early warnings in construction project management.en_UK
dc.typeThesisen_UK

Files

Original bundle
Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
Alsubeay_M_2017.pdf
Size:
2.61 MB
Format:
Adobe Portable Document Format
Description:
License bundle
Now showing 1 - 1 of 1
No Thumbnail Available
Name:
license.txt
Size:
1.63 KB
Format:
Item-specific license agreed upon to submission
Description: