Web robot detection using supervised learning algorithms
Date published
Free to read from
Authors
Supervisor/s
Journal Title
Journal ISSN
Volume Title
Publisher
Department
Type
ISSN
Format
Citation
Abstract
Web robots or Web crawlers have become the main source of Web traffic. Although some bots perform well, such as search engines, other bots can perform DDoS attacks, posing a huge threat to websites. The project aims to develop an offline system that can effectively detect malicious web robots, which is not only conducive to network traffic cleaning, but also conducive to improving the network security of IoT systems and services. A comprehensive literature review for the years 2010-2019 was conducted to identify the research gap. The key contributions of the research are: 1) it provided a systematic methodology to address the web robot detection problem based on the log file from industrial company; 2) it provided an approach of feature engineering, thus overcoming the challenge of curse of dimensionality; 3) It made a big progress in the accuracy of off-line web robot detection through a holistic study on the three types of machine learning techniques based on real data from industry. Three algorithms based on Keras sequential model, random forest, and SVM, were developed with python to detect web robots from human visitors on the TensorFlow 2.0 platform. Experimental results suggested that random forest obtained the best performance in accuracy and training time...[cont.]