Modelling sign language with encoder-only transformers and human pose estimation keypoint data

dc.contributor.authorWoods, Luke T.
dc.contributor.authorRana, Zeeshan A.
dc.date.accessioned2023-05-05T08:59:07Z
dc.date.available2023-05-05T08:59:07Z
dc.date.issued2023-05-01
dc.description.abstractWe present a study on modelling American Sign Language (ASL) with encoder-only transformers and human pose estimation keypoint data. Using an enhanced version of the publicly available Word-level ASL (WLASL) dataset, and a novel normalisation technique based on signer body size, we show the impact model architecture has on accurately classifying sets of 10, 50, 100, and 300 isolated, dynamic signs using two-dimensional keypoint coordinates only. We demonstrate the importance of running and reporting results from repeated experiments to describe and evaluate model performance. We include descriptions of the algorithms used to normalise the data and generate the train, validation, and test data splits. We report top-1, top-5, and top-10 accuracy results, evaluated with two separate model checkpoint metrics based on validation accuracy and loss. We find models with fewer than 100k learnable parameters can achieve high accuracy on reduced vocabulary datasets, paving the way for lightweight consumer hardware to perform tasks that are traditionally resource-intensive, requiring expensive, high-end equipment. We achieve top-1, top-5, and top-10 accuracies of 97%, 100%, and 100%, respectively, on a vocabulary size of 10 signs; 87%, 97%, and 98% on 50 signs; 83%, 96%, and 97% on 100 signs; and 71%, 90%, and 94% on 300 signs, thereby setting a new benchmark for this task.en_UK
dc.identifier.citationWoods LT, Rana ZA. (2023) Modelling sign language with encoder-only transformers and human pose estimation keypoint data. Mathematics, Volume 11, Issue 9, May 2023, Article number 2129en_UK
dc.identifier.issn2227-7390
dc.identifier.urihttps://doi.org/10.3390/math11092129
dc.identifier.urihttps://dspace.lib.cranfield.ac.uk/handle/1826/19615
dc.language.isoenen_UK
dc.publisherMDPIen_UK
dc.rightsAttribution 4.0 International*
dc.rights.urihttp://creativecommons.org/licenses/by/4.0/*
dc.subjectsign language recognitionen_UK
dc.subjecthuman pose estimationen_UK
dc.subjectclassificationen_UK
dc.subjectcomputer visionen_UK
dc.subjectdeep learningen_UK
dc.subjectmachine learningen_UK
dc.subjectsupervised learningen_UK
dc.titleModelling sign language with encoder-only transformers and human pose estimation keypoint dataen_UK
dc.typeArticleen_UK

Files

Original bundle
Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
sign_language_with_encoder-only_transformers-2023.pdf
Size:
1.55 MB
Format:
Adobe Portable Document Format
Description:
License bundle
Now showing 1 - 1 of 1
No Thumbnail Available
Name:
license.txt
Size:
1.63 KB
Format:
Item-specific license agreed upon to submission
Description: