CERES :: Browsing by Author "Bessant, Conrad"

Browsing by Author "Bessant, Conrad"

Now showing 1 - 20 of 20

Open Access
Analysis of MRSA Staphylococcal Chromosome cassette mecA status from next generation sequence data
(Cranfield University, 2012-03) Goulden, Matthew G; Larcombe, Lee D; Clark, Taane; Bessant, Conrad
NGS sequencing libraries prepared on an Illumina NGS platform for 10 isolates of Staphylococcus aureus were analysed. After extensive pre-processing to address library quality issues, for each isolate the status of the Staphylococcal Chromosome Cassette, and its mecA gene specifying resistance to meticillin, was determined. All mecA-positive isolates encoded canonical mecA. None encoded the new variant mecA identified in strain LGA251.
Open Access
Bioinformatics solutions for confident identification and targeted quantification of proteins using tandem mass spectrometry
(Cranfield University, 2009-10) Cham, Jennifer A.; Bessant, Conrad; Regan, Stephen
Proteins are the structural supports, signal messengers and molecular workhorses that underpin living processes in every cell. Understanding when and where proteins are expressed, and their structure and functions, is the realm of proteomics. Mass spectrometry (MS) is a powerful method for identifying and quantifying proteins, however, very large datasets are produced, so researchers rely on computational approaches to transform raw data into protein information. This project develops new bioinformatics solutions to support the next generation of proteomic MS research. Part I introduces the state of the art in proteomic bioinformatics in industry and academia. The business history and funding mechanisms are examined to fill a notable gap in management research literature, and to explain events at the sponsor, GlaxoSmithKline. It reveals that public funding of proteomic science has yet to come to fruition and exclusively high-tech niche bioinformatics businesses can succeed in the current climate. Next, a comprehensive review of repositories for proteomic MS is performed, to locate and compile a summary of sources of datasets for research activities in this project, and as a novel summary for the community. Part II addresses the issue of false positive protein identifications produced by automated analysis with a proteomics pipeline. The work shows that by selecting a suitable decoy database design, a statistically significant improvement in identification accuracy can be made. Part III describes development of computational resources for selecting multiple reaction monitoring (MRM) assays for quantifying proteins using MS. A tool for transition design, MRMaid (pronounced „mermaid‟), and database of pre-published transitions, MRMaid-DB, are developed, saving practitioners time and leveraging existing resources for superior transition selection. By improving the quality of identifications, and providing support for quantitative approaches, this project brings the field a small step closer to achieving the goal of systems biology.
Open Access
Building bioinformatics solutions for biomarker identification
(Cranfield University, 2008-08) Oakley, Darren; Bessant, Conrad
This thesis describes the design, implementation and application of bioinformatics systems to aid work in the field of biomarker discovery and diagnostic test development. The aim of the work was to develop a flexible data storage and analysis platform that would be capable of housing and working with data from a variety of modern biomarker analysis techniques. In order to achieve this aim, several tools were developed: a flexible database schema, taking ideas from the field of systems biology, was developed with the goal of being flexible enough to house information about experiments looking at targets such as genes, proteins and metabolites; and API was created to allow easy programmatic interaction with the database; and multivariate data analysis routines were prepared so that data imported into the database could be investigated. Together this toolset was named XPA [for ‘Cross Platform Analysis’]. The XPA system was tested by using it to house and analyse data from two different medical studies, one using quantitative PCR [qPCR] to observe gene expression changes in prostate cancer, and the second using surface enhanced laser desorption/ionisation mass spectrometry [SELDI MS] to generate protein profiles in sufferers of pre-eclampsia. In both studies XPA was used to develop multivariate classification models using partial least squares discriminant analysis [PLS-DA] and support vector machines [SVMs], with the aim of evaluating the data acquired for potential diagnostic use. The results showed the benefit of a tool such as XPA to the field of biomarker discovery.
Open Access
Data analysis tools for safe drinking water production
(Cranfield University, Cranfield University at Silsoe, 2006-11-08T17:00:01Z) Cauchi, Michael; Setford, S.; Bessant, Conrad
Providing safe and high quality drinking water is essential for a high quality of life. However, the water resources in Europe are threatened by various sources of contamination. This has led to the development of concepts and technologies to create a basis for provision of safe and high quality drinking water, which had thus resulted in the formation of the Artificial Recharge Demonstration project (ARTDEMO). The overall aim of this thesis in relation to the ARTDEMO project was to develop a realtime automated water monitoring system, capable of using data from various complementary sources to determine the amounts of inorganic and organic pollutants. The application of multivariate calibration to differential pulse anodic stripping voltammograms and fluorescence spectra (emission and excitation-emission matrix) is presented. The quantitative determination of cadmium, lead and copper acquired on carbon-ink screen-printed electrodes, arsenic and mercury acquired on gold-ink screen-printed electrodes, in addition to the quantitative determination of anthracene, phenanthrene and naphthalene have been realised. The statistically inspired modification of partial least squares (SIMPLS) algorithm has been shown to be the better modelling tool, in terms of the root mean square error of prediction (RMSEP), in conjunction with application of data pre-treatment techniques involving rangescaling, filtering and weighting of variables. The % recoveries of cadmium, lead and copper in a certified reference material by graphite furnace atomic absorption spectrometry (GF-AAS) and multivariate calibration are in good agreement. The development of a prototype application on a personal digital assistant (PDA) device is described. At-line analysis at potential contamination sites in which an instant response is required is thus possible. This provides quantitative screening of target metal ions. The application imports the acquired voltammograms, standardises them against the laboratory-acquired voltammograms (using piecewise direct standardisation), and predicts the concentrations of the target metal ions using previously trained SIMPLS models. This work represents significant progress in the development of analytical techniques for water quality determination, in line with the ARTDEMO project's aim of maintaining a high quality of drinking water.
Open Access
Depth or breadth: towards a contingency model of innovation strategy in the automotive sector
(Cranfield University, 2010-09) Rosenberg, Mike; Bessant, Conrad
The thesis explores the strategic choices made by automotive manufacturers in developing and deploying technology that is discontinuous and potentially disruptive. It studies the deployment of seat belts, airbags, hybrid vehicles and fuel cell electric vehicles, drawing on product deployment histories, patents and the opinions of industry experts. The thesis identifies two fundamental strategies called depth and breadth and shows how the different manufacturers’ approach to these four technologies is arrayed along a continuum between these two choices. The thesis contributes to the theory of the technology-based firm which focuses on the management of scale, scope, time and space by making operational the idea of scope with depth and breadth. It also explicitly links the theory to the literature on coevolution and dynamic capabilities and adds to the understanding of the co-evolutionary dynamics at play in the automotive industry by applying the idea of technological pathways to the technologies under study. This discussion yields some potentially interesting insight for practitioners. The thesis also reviews the literature concerning the potential changes to automotive power train technology and adds to it by using the theory of the technology-based firm as well as environmental literature and the non market strategy lens in order to develop a nonbiased view of the state of development of fuel cell and hybrid technology. Finally, the thesis provides a rigorous review of the use of patents in management science over the last 50 years and makes one of the first attempts in the academic literature to study patents using a patent mapping tool to help make sense of the large amounts of data available in line with the new ideas concerning the importance of developing visualisation techniques in data intensive scientific enquiry.
Open Access
Design of a field-portable low power personal data logger - A hardware perspective
(Cranfield University, 2008-01) Pitts, David G.; Bessant, Conrad
There are a vast number of field–portable data loggers currently on the market. They differ greatly in terms of capability and complexity, in many cases being application or function specific. A survey was undertaken to identify market trends and future developments, system hardware specifications and the technologies employed. After comparing system specifications, it was apparent that there was a strong correlation between system performance and power consumption - high performance systems tend to be power hungry, and are typically larger and heavier than their lower performance counterparts. The aim of this project was to design the core of an advanced, flexible, low-power portable data acquisition system, a ‘personal’ data logger (PDL), suitable for medical or athletic performance monitoring. The pocket-sized target system should be capable of high performance - sampling daily or up to 20,000 samples per second – with low power operation, and should be able to measure both analogue and digital signals. The data must be stored in a high-capacity non-volatile memory card, with USB and RS-232 ports provided for data upload and system configuration. With the design specification defined, low power design techniques and the various battery and power supply options were investigated. A survey of system components was carried out and suitable low-power parts identified and selected for the design. After checking the project schematics, the circuit board was designed, manufactured and carefully assembled, ready for function and performance testing. The test results indicated that the project met the design specification, demonstrating its potential for use in a small portable personal data logger. Further work would be required to refine the power supply and power management systems, add an interface board housing a real-time clock, analogue signal conditioning, and input and output connectors, and to develop embedded system software.
Open Access
Development and evaluation of statistical approaches in proteomic biomarker discovery
(Cranfield University, 2011-11) Patel, Amit; Bessant, Conrad
A biomarker is a characteristic that is objectively measured and evaluated as an indicator of normal biological processes, pathogenic processes or pharmacological responses to a therapeutic intervention. The aim of this project was to deal with the identification of potential biomarker candidates from experimental data comparing samples displaying divergent physiological traits. Chapter 1 introduces the topic and the aims of the project. The primary aim was to identify the ideal statistical analysis methods and data pre- and post-treatment options to use for potential biomarker identification from proteomic datasets. The product of this work was a statistical analysis pipeline for identifying potential biomarker candidates from proteomic experimental data. Proteomic data often suffers from missing values, so methods to deal with these were also evaluated in this project. Chapter 2 outlines the data sets that were used as well as presenting an overview of the “Biomarker Hunter” pipeline software solution created in this project. Chapter 3 evaluates the appropriate univariate statistical methods to use for biomarker identification and the results of biomarker identification using these techniques. Chapter 4 evaluates options for data pre- and post-processing. Chapter 5 suggests the use of missing value imputation as well as offering a novel clustering algorithm to deal with missing values. The software pipeline also offers multivariate statistical methods, which are evaluated in Chapter 6. Chapter 7 provides some business context for both biomarker discovery and the statistical analysis software available for the purpose of proteomic biomarker discovery. As well as providing a software pipeline for the identification of biomarkers, the project aimed to identify a suggested strategy for statistical analysis of proteomic experimental data. Strong conclusions regarding the ideal statistical approach could only be made if the list of actual, validated biomarkers were available. Unfortunately this information was not available, but in the absence of this a strategy was suggested based on the available information from both the available literature and the author’s interpretation of the results from this study. In terms of data pre-processing, this strategy involved not averaging technical replicates, and using total abundance normalisation to reduce technical variation. A novel clustering algorithm was suggested to reduce the presence of missing values prior to existing methods of missing value imputation. Following statistical analysis multiple testing correction methods should be implemented to reduce the number of false positives.
Open Access
Development and optimisation of chemometric techniques for the evaluation of meat freshness
(Cranfield University, 2013) Chatzimichali, Eleni Anthippi; Bessant, Conrad
Muscle foods such as meat, fish and poultry are an integral part of human diet. Over time, such food succumbs to spoilage, resulting from various intrinsic and extrinsic factors, the most significant of which is microbial activity. Spoilage changes the organoleptic properties of meat, rendering it unacceptable to the consumer, and may ultimate result in the food becoming toxic. Spoilage is therefore of major commercial and public health interest. This thesis describes the development and application of a novel suite of software tools designed to support novel instrumental approaches for the accurate, rapid and inexpensive evaluation of meat freshness. A pipeline was built for the analysis of highly heterogeneous data obtained by a diverse range of high-throughput techniques across four three-class case studies. As a first step, PCA was applied for dimensionality reduction, feature extraction and exploratory analysis. PLS-DA and SVMs were employed as classifiers, and classification ensembles implemented as a means of improving classification accuracy. Rigorous validation and evaluation methods based on bootstrapping and permutation testing were applied to ensure that the performance metrics are representative of real-world application, and to ascertain the statistical significance of the results. This was made possible by the development of an advanced optimisation approach, which reduced the computational demands of SVM tuning by up to ~ 90× times. The functionality of the pipeline was further enhanced by exploiting GPA and CPCA as data fusion techniques, to evaluate whether better classification accuracy is achieved when integrated as opposed to standalone datasets are used. SVM ensembles proved to be the most powerful and accurate classification method since they produced consistently higher prediction rates ( ) than PLS-DA. Among the analytical techniques, HPLC was established as the most diagnostic method for the assessment of meat freshness, with a of 80%. Among the two data fusion techniques, CPCA outperformed GPA. However, CPCA only exceeded standalone HPLC in a minority of cases, presenting an overall of 82%.
Open Access
Development of a database and its use in the Investigation of Interferences in SRM assay design
(Cranfield University, 2013-04) Dokpesi, Oshiobugie; Bessant, Conrad
Selected Reaction Monitoring (SRM), is a form of mass spectrometry that guarantees high throughput and also a high level of selectivity and specificity. Performing SRM experiments requires the development of assays to aid in peptide identification. This is a time consuming and expensive process thus biological researchers have come up with bioinformatics solutions for the design of SRM assay. The accuracy of these bioinformatics methods is quite high and the next step is to optimise the process by tackling the interference issue. As various analytes may have the same signals within an SRM experiment and thus interfere with each other’s signals, different solutions are being derived to tackle the issue. This thesis describes the development of a SRM transition database to store peptide and transition data, software to populate the database and also software to retrieve the data from the database. Finally the database is tested with the MRMaid transitions for the human proteome which were mined from the PRIDE database and the results analysed to investigate the transition interference issue. The database currently contains data for 20220 proteins and approximately 870,000 tryptic peptides from the human proteome.
Open Access
Development of a database with web-based user interface for taqman assay design
(Cranfield University, 2007-01) Simecek, Nikol; Bessant, Conrad
TaqMan RT-PCR (reverse transcription-polymerase chain reaction) is a technique used to measure the relative gene expression in a biological sample and is one of the core technologies used by the Molecular Pathology and Toxicology (MPT) Group at GlaxoSmithKline. Conducting TaqMan experiments is a complex process which involves the design of a TaqMan assay specific to a gene of interest. A wealth of data has been generated during assay design, but systems are not currently available to readily share this data within the MPT group. There is a need for a central data storage repository so that data associated with assay design can be organised efficiently and rapidly accessed. Experiments are conducted within limited timeframes and resource is often limited so this would be of great benefit to the MPT group. This thesis describes the development of a database to house data associated with TaqMan assay design, software to populate the database with minimal user interaction and a web based CGI application for members of the MPT group to query and submit data to the database. Finally, the output from testing the software is provided and discussed.
Open Access
Development of an automated identification system for nano-crystal encoded microspheres in flow cytometry
(Cranfield University, 2008-08) Clarke, Colin; Bessant, Conrad
Quantum dot encoded microspheres (QDEMs) offer much potential for bead based identification of a variety of biomolecules via flow cytometry (FCM). To date, QDEM subpopulation classification from FCM has required significant instrument modification or multiparameter gating. It is unclear whether or not current data analysis approaches can handle the increased multiplexed capacity offered by these novel encoding schemes. In this thesis the drawbacks of currently available data analysis techniques are demonstrated and novel classification methods proposed to overcome these limitations. A commercially available 20 code QDEM library with fluorescent emissions at 4 distinct wavelengths and 4 different intensity levels was analysed using flow cytometry. Multiparameter gating (MPG) a readily available classification method for subpopulations in FCM was evaluated. A support vector machine (SVM) and two types of artificial neural networks (ANNs), a multilayer perceptron (MLP) and probabilistic radial basis function (PRBF) were also considered. For the supervised models rigorous parameter selection using cross validation (CV) was used to construct the optimum models. Independent test set validation was also carried out. As a further test, external validation of the classifiers was performed using multiplexed QDEMs solutions. The performance of MPG was poor (average misclassification (MC) rate = 9.7%) was a time consuming process requiring fine adjustment of the gates, classifications made on the dataset were poor with multiple classifications on single events and as the multiplex capacity increases the performance is likely to decrease. The SVM had the best performance in independent test validation with 96.33% accuracy on the independent testing (MLP = 96.12%, PRBF = 94.38%). Furthermore the performance of the SVM was superior to both MPG and both ANNs for the external validation set with an average MC rate for MLP = 6.1% and PRBF = 7.5% whereas the SVM MC rate was 2.9%. Assuming that the external test solutions were homogenous the variance between classified results should be minimal hence, the variance of correct classifications (CCs) was used as an additional indicator of classifier performance. The SVM demonstrates the lowest variance for each of the external validation solutions (average σ 2 = 31479) some 50% lower than that of MPG. As a conclusion to the development of the classifier, a user friendly software system has been developed to allow construction and evaluation of multiclass SVMs for use by FCM practitioners in the laboratory. SVMs are a promising classifier for QDEMs that can be rapidly trained and classifications made in real time using standard FCM instrumentation. It is hoped that this work will advance SAT for bioanalytical applications.
Open Access
Development of medical point-of-care applications for renal medicine and tuberculosis based on electronic nose technology
(Cranfield University, 2004) Fend, Reinhard; Woodman, Anthony C.; Bessant, Conrad
INTRODUCTION: Current clinical diagnostics are based on biochemical, immunological or microbiological methods. However, these methods are operator dependent, time consuming, expensive and require special skills, and are therefore not suitable for point-of-care testing. Recent developments in gas-sensing technology and pattern recognition methods make electronic nose technology an interesting alternative for medical point-of-care devices. METHODS: We applied a gas sensor array based on 14 conducting polymers to monitor haemodialysis in vitro and to detect pulmonary tuberculosis in both culture and sputum. RESULTS and DISCUSSION: The electronic nose is able to distinguish between control blood and “uraemic” blood. Furthermore, the gas sensor array is not only capable of discriminating pre- from post-dialysis blood (97% accuracy) but also can follow the volatile shift occurring during a single haemodialysis session. The electronic nose can be used for both dialysate side and blood-side monitoring of haemodialysis. The pattern observed for post- and pre-dialysis blood might reflect the health status of the patients and can therefore be related to the long-term outcome. Furthermore, the gas sensor array was also able to discriminate between Mycobacterium spp. and other lung pathogens such as Pseudomonas aeruginosa. More importantly the gas sensor array was capable of resolving different Mycobacterium spp. such as Mycobacterium tuberculosis, M. scrofulaceum, and M. avium in both liquid culture and spiked sputum samples. The detection limit for M. tuberculosis in both sputum and liquid culture is 1 x 104 mycobacteria ml-1 and therefore partially fulfils the requirement set by the WHO. The gas sensor array was able to detect culture proven TB with a sensitivity of 89% and a specificity of 91%. CONCLUSIONS: In conclusion, this study has shown the ability of an electronic nose as a point-of-care device in these areas.
Open Access
Evaluation of Wireless Sensor Networks Technologies
(Cranfield University, 2008-09) Salan Padillo, Ignacio; Bessant, Conrad
Wireless sensor networks represent a new technology that has emerged from developments in ultra low power microcontrollers and sophisticated low cost wireless data devices. Their small size and power consumption allow a number of independent ‘nodes’ (known as Motes) to be distributed in the field, all capable of ad-hoc networking and multihop message transmission. New routing algorithms allow remote data to be passed reliably through the network to a final control point. This occurs within the constraints of low power RF transmissions in a congested 2.4GHz radio spectrum. Wireless sensor network nodes are suitable for applications requiring long term autonomous operation, away from mains power supplies, such as environmental or health monitoring. To achieve this, sophisticated power management techniques must be used, with the units remaining ‘asleep’ in ultra low power mode for long periods of time. The main aim of this research described in this thesis is first to review the area and then to evaluate one of the current hardware platforms and the popular software used with it called TinyOS. Therefore this research uses a hardware platform designed from University of Berkeley, called the TmoteSky. Practical work has been carried out in different scenarios. Using Java tools running on a PC, and customized applications running on the Motes, data has been captured, together with information showing topology configuration and adaptive routing of the network and radio link quality information. Results show that the technology is promising for distributed data acquisition applications, although in time critical monitoring systems new power management schemes and networking protocols to improve latency in the system will be required.
Open Access
Extraction of genetic network from microarray data using Bayesian framework
(2007-04) Kumuthini, Judit; Bessant, Conrad; Setford, S.
The aim of the work described in this thesis was to develop novel methods for the extraction of gene regulatory networks (GRN) from gene expression data, and use these methods to capture previously unknown relationships between genes in specific biological applications. This has been accomplished through the application of Bayesian Networks (BN) through minimum description length (MDL) and taboo search for parameter and structure learning respectively to three large scale microarray datasets from Saccharomyeces cerevisae, Escherichia coli and human stem cells. The application of BNs for modelling the well characterised yeast cell cycle demonstrated the efficacy of the techniques employed. Using the cDNA microarray data from the yeast cell cycle project by Spellman et a l (1998), this study succeeded in extracting many biologically plausible genetic relationships, which were supported by evidence from publicly available genome and literature databases. Two novel knowledge extraction techniques were applied; Target Node (TN) analysis and learning through simulation. Further, it was demonstrated how the addition of prior knowledge to the extracted network can improve the network structure extracted purely from experimental data. The second part of this thesis demonstrated how the BN approach could be adapted to a data set of very high dimensionality, specifically data from a 54,634 probe array used to monitor human adipose tissue. Genetic networks extracted included insulin receptor (IR) and Fatty acid binding proteins (FABP) families that play key roles in fatty acid uptake, transport, and metabolism In the final part of this thesis, the genome-wide GRNs of a prokaryotic expression system were extracted from novel oligo cDNA microarray data from E-coli K12 to identify metabolic stress responsive genes during recombinant protein production. Also, detailed analysis of known metabolic stress related genes and the genes that are directly or indirectly associated in the GRN were used to establish possible markers for host system exhaustion. In conclusion, the BN methods developed proved to be a powerful and effective means of extracting GRNs in a variety of applications.
Open Access
The functional role of methylated short tandem repeats in early mouse development
(Cranfield University, 2011-08) Deakin, Greg; Bessant, Conrad
Short tandem repeats, or microsatellites are ubiquitous throughout all genomes that have been explored. In common with other sequences, the DNA in microsatellites has DNA marks in the form of chromatin methylation. Regulation of DNA methylation and changes in their pattern is critical for the establishment of unique cell states throughout development in mammals. DNA methylation is extensively reprogrammed during the early phases of mammalian development to establish unique developmental patterning. Whether microsatellites are also reprogrammed with developmental patterns is unknown. In this thesis, we assessed the characteristics of di- and trinucleotide microsatellites in the NCBIM37 Mus musculus assembly and observed a marked difference in quantity and length of microsatellites of differing motif, not explained by any known mechanism. Secondly we assessed the quantities of di-, tri- and tetranucleotide microsatellites in experimentally determined methylomes of Mus musculus at various stages in development. Our results indicate that at least one tetranucleotide microsatellite motif and more tentatively a second trinucleotide microsatellite follow a pattern of methylation consistent with reprogramming. Finally we show that the genes containing these specific microsatellites in the NCBIM37 genome have strong links to known developmental processes.
Open Access
Machine learning for predicitng the risk of osteoporosis from patient attributes, health and lifestyle history
(Cranfield University, 2004) Tate, Geoffrey W.; Bessant, Conrad
The most widely-used method for diagnosis of osteoporosis is to determine bone mineral density (BMD) by bone densitometry. At present mass screening is not, on the basis of resource constraints, considered a option. This project investigates if artificial neural networks (ANN s) or Baysian networks (BNs), using the health and lifestyle history of a patient, (risk factors - used as a generic term for inputs) may be used to develop a preliminary screening system to determine in a patient is at particular risk from osteoporosis and hence in need of a scan. Two databases have been used, one containing 486 records (29 risk factors) of patients examined with a G E Linear Peripheral Densitometer (PIXI) and the other with 4,980 records (33 risk factors) of patients examined with dual energy X ray absorptiometry (DEXA). BNs tend to out-perform AN s particularly where smaller learning sets are involved. The best result was 84% accuracy (sensitivity 0.89 and specificity 0.80) with PIXI and a BN. I general, however, with ANNs the sensitivity achieved with PIXI and DEXA was 0.65 and 0.80 respectively and the corresponding values with BNs were 0.72 and 0.81. The diagnostic performance with ANNs could be achieved with fewer risk factors (PDQ from 29 to 4 and DEXA from 33 to 5) but with BNs a reduction in performance accompanied a reduction in the number of risk factors. l The results also indicate: 0 For Positive patients, the more severely affected by the disease the more accurately they are diagnosed . 0 The lack of continuous values in the DEXA data results in a poor diagnosis of Negative patients. 0 Classifications based on BMD predictions and pattern recognition give similar results. 0 Reasoning with BNs can provide an indication of how a particular risk factor state contributes to a patient`s risk from osteoporosis.
Open Access
Multivariate analysis methods for veterinary diagnostics using SIFT-MS
(Cranfield University, 2010) Spooner, Andrew; Bessant, Conrad
Selected ion flow tube mass spectroscopy (SIFT-MS) is an analytical method for the investigation of volatile organic compounds (VOCs). It produces mass to charge (m/z) ratio ion counts with a range of 10-200 m/z. Current data analysis involves sifting through the spectra files one at a time looking for peaks of interest. This is time consuming and requires expert knowledge. This thesis proposes, implements and demonstrates a novel approach to the analysis of SIFT-MS data using multivariate techniques similar to those employed to analyse electronic nose and gas chromatography mass spectroscopy (GCMS) data. The methodology was developed using a set of samples created in the laboratory that belonged to two groups which contained different VOCs found in biological samples. The methodology requires the removal of the m/z peaks associated with the precursors, then principal component analysis (PCA) and partial least squares discriminant analysis (PLSDA) methods were evaluated for biomarker discovery and sample classification. Both methods produced excellent results, identifying the volatiles in the mixtures and being able to classify samples with 100% accuracy. This methodology was then tested using a variety of samples. Ammonia was found as a possible marker for bovine TB (Mycobacterium bovis) infection using serum samples taken from wild badgers. Discrimination results of an accuracy of 67%±6% were acquired. The number of sample needed to build the best performing model from this dataset was empirically shown to be 120. It was shown to be effective for the discrimination of serum samples from cattle taken before and after introduction of bovine TB (Mycobacterium bovis) bacteria in a clinical trial (accuracy of 85% achieved). A similar dataset pertaining to infection by Mannheimia haemolytica failed to produce models that performed as well as the others - this is suspect to be due to a poor experimental design. Finally, discrimination accuracies of 88% for urine samples collected from cattle from herds infected with Mycobacterium paratuberculosis and 90% for urine samples collected in the same bovine TB trial as above were achieved. The novel multivariate approach to SIFT-MS data analysis has been shown to be effective with a number of datasets but it is sensitive to the experimental design. Recommendation for the consideration required for analysis using this method have been made.
Open Access
Optimisation of machine learning methods for cancer detection using vibrational spectroscopy
(Cranfield University, 2011-01) Sattlecker, Martine; Bessant, Conrad; Stone, Nicholas
Early cancer detection drastically improves the chances of cure and therefore methods are required, which allow early detection and screening in a fast, reliable and inexpensive manner. A prospective method, featuring all these characteristics, is vibrational spectroscopy. In order to take the next step towards the development of this technology into a clinical diagnostic tool, classification and imaging methods for an automated diagnosis based on spectral data are required. For this study, Raman spectra, derived from axillary lymph node tissue from breast cancer patients, were used to develop a diagnostic model. For this purpose different classification methods were investigated. A support vector machine (SVM) proved to be the best choice of classification method since it classified 100% of the unseen test set correctly. The resulting diagnostic models were thoroughly tested for their robustness to the spectral corruptions that would be expected to occur during routine clinical analysis. It showed that sufficient robustness is provided for a future diagnostic routine application. SVMs demonstrated to be a powerful classifier for Raman data and due to that they were also investigated for infrared spectroscopic data. Since it was found that a single SVM was not capable of reliably predicting breast cancer pathology based on tissue calcifications measured by infrared micro-spectroscopy a SVM ensemble system was implemented. The resulting multi-class SVM ensemble predicted the pathology of the unseen test set with an accuracy of 88.9%, in comparison a single SVM assessed with the same unseen test set achieved 66.7% accuracy. In addition, the ensemble system was extended for analysing complete infrared maps obtained from breast tissue specimens. The resulting imaging method successfully detected and staged calcification in infrared maps. Furthermore, this imaging approach revealed new insights into the calcification process in malignant development, which was not previously well understood.
Open Access
A study of FT-IR spectroscopy for the identification and classifcation of haematological malignancies
(Cranfield University, 2009-06) Babrah, Jaspreet; Stone, Nicholas; Bessant, Conrad
The aim of the work presented in this thesis was to explore the use of FT-IR spectroscopy, as a complementary clinical tool for haematological laboratory analysis. FT-IR spectra were measured from air-dried and frozen cell lines derived from lymphoma, lymphoid, myeloid leukaemia and normal and chronic lymphocytic leukaemia blood samples. Multivariate statistical analysis was used to extract important spectral information with the greatest discriminative power. Principal component fed linear discriminant spectral models have been tested with leave one out cross validation procedures. A preliminary unfiltered classification model using 50 frozen and air-dried samples correctly classified 54% of 18556 spectra. The performance improved with the three cell line group datasets, with 71% of 19903 spectra correctly classified. Furthermore, the use of the frozen spectra improved the performance of the three cell line group classification model considerably. Findings showed that 73.3% of 9920 spectra were correctly classified in the frozen datasets, whereas in the air-dried only 41.5% of 9983 spectra are correctly classified. Optimisation of the spectral models by selection of principal components, application of Savitsky-Golay filters and selecting spectra using standard deviation and absorption filter tool was investigated. Using the first 25 significant PCs, a 0 th derivative Savitsky-Golay filter and the absorbance filter tool on the frozen five cell line spectral dataset were shown to be the optimal parameters for constructing a classification model. When tested with leave one batch out cross validation 90% of the spectra were correctly classified for the five cell line model. Blood component classification models tested with leave one batch out cross validation performed well. The whole blood model correctly classified 70% of 1736 spectra, measured on 22 samples. The plasma model correctly classified 80.6% of 331 spectra and the buffy coat model correctly classified 99.5% of 1438 spectra. This demonstrated that the buffy coat (containing white blood cells) holds the key biochemical information for discrimination between the pathology of the blood samples. Partial least squares analysis has been demonstrated as a method to support whole blood count tests for real time prediction of cellular constituents. These findings demonstrate the potential of FT- IR spectroscopy as a clinical tool although more work is needed if it is to be applied in clinical practice.
Open Access
Vibrational spectroscopy for the rapid and early diagnosis of leukaemias and lymphomas
(Cranfield University, 2013-11) Jackson, Olivia; Stone, Nicholas; Bessant, Conrad; Rye, Adam; Lush, Richard; McCarthy, Keith
This thesis aimed to investigate vibrational spectroscopies for the identification of biochemical markers of leukaemias and lymphomas. In a preliminary study using the blood proteins albumin, fibrinogen and globulin, Drop Coating Deposition Raman Spectroscopy was explored and extended for use with Fourier Transform infrared spectroscopy for leukaemia blood sample analysis. Due to low sample volumes and minimal preparation required it was identified as a potential alternative to blood centrifugation to obtain the buffy coat for analysis. These studies identified that it was capable of detecting low levels of protein from small, highly concentrated droplets. Thus this method, alongside cytospin centrifugation, was used for the spectroscopic analysis of different blood fractions. Due to the low number of lymphoma samples obtained, only a feasibility study is outlined in this thesis. Samples were collected from leukaemia patients and healthy volunteers. Infrared and Raman spectra were measured of whole blood and buffy coat samples cytospun onto slides and whole blood and plasma pipetted by drop coating deposition. Multivariate statistical analysis was employed to extract key spectral differences between the pathologies and develop classification models for diagnosing chronic lymphoblastic leukaemia from previously treated and untreated patient groups. Principal component analysis followed by linear discriminant analysis was employed to identify the largest variances in the data and leave one sample out cross validation evaluated the performance of the spectral models measured on different blood components in diagnosing leukaemia. The buffy coat infrared model correctly classified 59% of the spectra, and blood droplet Raman 62%. The treated and untreated groups were then combined, which improved classification to 83% for buffy coat infrared and 71% for blood droplet Raman. These findings highlight the potential of drop coating deposition spectroscopy of whole blood for leukaemia diagnosis, although further work is required to achieve a clinically validated method.

Browsing by Author "Bessant, Conrad"

Results Per Page

Sort Options