Understanding Insider Threats Using Natural Language Processing



Journal Title

Journal ISSN

Volume Title









Insider threats are security incidents committed not by outsiders, such as malicious hack ers or advanced persistent threat groups, but instead an organisation’s employees or other trusted individuals. These attacks are often more impactful than incidents committed by outsiders. Insiders may have valid security credentials, knowledge relating to the organ isation they work for (such as competitors), knowledge of security controls in place and potentially how to bypass those controls. This activity could be unintentional, such as an employee leaving a laptop on public transport, or malicious, when an insider purposefully chooses to attack for some gain, such as selling IP to a competitor. When an outsider chooses to attack, they may leave digital breadcrumbs as they perform various stages of the cyber kill-chain. These breadcrumbs can allow organisations to detect and respond to an incident, flagging suspicious behaviour or access. Comparatively, an insider may be able to continue their attack for years for being caught. Therefore, insider threat activity can be considered co-spatial and co-temporal with legitimate activity; an insider conducts their attack during their work or very soon after leaving their jobs. There are three fundamental approaches to control the risk of malicious insider threats: organisational, technical, and psychological. More recently, insider threat models attempt to encapsulate all these factors into one approach, combining all these into a single frame work or model. However, one issue with these models is their static nature; models cannot adapt as insider threat changes. For example, during the COVID-19 Pandemic, many or ganisations had to support remote working, increasing the risk of attacks. This work attempts to address this flaw of models directly. Instead of attempting to supplant existing practices in these three domains, this work will support them, providing new techniques for exploring an insider threat attack to better understand the attack through the lens of strategic and tactical decision making. This dynamic, custom insider threat model can be constructed by leveraging natural language processing techniques, a type of machine learning completed on text, and a large corpus (body of documents) of news articles de scribing insider threat incidents. This model can then be applied to a new, previously unseen corpus of witness reports to offer an overview of the attack. The core technique this work uses is topic modelling, which uses word association to identify key themes across a document, similar to grounded theory approaches. By identifying themes across many different insider threat incidents, the core attributes of insider threat are recognised, such as methodologies, motivations, information about the insider’s role in an organisa tion or the weakness they exploited. These topics can be further enriched by identifying temporal, casual and narrative clues to place events on a graph and create a timeline or causal chain. The final output of this process is a collection of visualisations of the incident; this visualisation then aims to support the investigator as they ask critical questions about an incident, such as ”What was the motivation of the insider?” ”What assets did they target and how?” ”Were there any security controls in place?” ”Did they bypass those?” allowing for the full exploration of the attack. Informed organisations can make changes using the answers to these questions combined with existing controls, policies, and procedures. The work presented in this thesis has many implications for both insider threat spe cifically and the broader domains of sociology and cyber security. Primarily this work introduces a new approach to incident response, supporting the reflection stage of incid ent response. While this work represents a proof of concept for NLP to be used in this way, due to the technical nature of this work, it could be improved to produce an implement able and deployable piece of software, generating further impact, while there would be some necessary training required, this could offer a new tool for handling insider threat within an organisation. Aside from this direct impact in the insider threat domain, the methods developed and designed during this work will have a broader impact on cyber security, mainly due to its interdisciplinary nature within social science. With the ability to leverage witness reports or organic narratives and map these automatically to an exist ing framework, rather than ask a witness to adapt their narrative to a framework directly. Reports can then be collected on a large scale and analysed. These techniques provide a holistic view of an attack, considering many aspects of an insider threat attack by using reports already collected after an incident to create a better understanding of insider threat which leads to more techniques in prevention and detection.


c Cranfield University 2021. All rights reserved. No part of this publication may be reproduced without the written permission of the copyright owner.

Software Description

Software Language



CYber security, Insider threat, Natural language processing, Organic narratives, Topic modelling



© Cranfield University, 2015. All rights reserved. No part of this publication may be reproduced without the written permission of the copyright holder.