Web robot detection using supervised learning algorithms

Date

2020-06

Journal Title

Journal ISSN

Volume Title

Publisher

Cranfield University

Department

SATM

Type

Thesis or dissertation

ISSN

Format

Citation

Abstract

Web robots or Web crawlers have become the main source of Web traffic. Although some bots perform well, such as search engines, other bots can perform DDoS attacks, posing a huge threat to websites. The project aims to develop an offline system that can effectively detect malicious web robots, which is not only conducive to network traffic cleaning, but also conducive to improving the network security of IoT systems and services. A comprehensive literature review for the years 2010-2019 was conducted to identify the research gap. The key contributions of the research are: 1) it provided a systematic methodology to address the web robot detection problem based on the log file from industrial company; 2) it provided an approach of feature engineering, thus overcoming the challenge of curse of dimensionality; 3) It made a big progress in the accuracy of off-line web robot detection through a holistic study on the three types of machine learning techniques based on real data from industry. Three algorithms based on Keras sequential model, random forest, and SVM, were developed with python to detect web robots from human visitors on the TensorFlow 2.0 platform. Experimental results suggested that random forest obtained the best performance in accuracy and training time...[cont.]

Description

Software Description

Software Language

Github

Keywords

Web robot, Web crawler, Random forest, Sequential model, SVM, Feature importance, TensorFlow 2.0

DOI

Rights

© Cranfield University, 2020. All rights reserved. No part of this publication may be reproduced without the written permission of the copyright holder.

Relationships

Relationships

Supplements