I largely work in data science. I try to accomplish two things: fundamental research into algorithms and techniques, and analysis of data from other people. The latter of these tends to drive my interest in furthering my fundamental understanding. This means that depending on who I am working with, my current research can be quite diverse. However, a mainstay topic from a fundamental perspective is matrix and tensor decomposition which are very different, but related.

I teach the Data Science course in the Centre for Doctoral Training in Topology, and co-run projects in the CDT and fourth year projects in Theoretical Physics.

I work on the following research topics currently

- Fundamental tensor decompositions, and applications to text analysis and Raman spectroscopy.
- Student data analytics.
- Analysis of Raman spectroscopy data.

### More Information

Here is a selection of topics of current interest to me, and therefore all of the work here is ongoing. This is an often changing list depending on the people I am talking to.

#### Tensor Decomposition and Text Analysis

My first work in text analysis involved the development of supervised PLSA (sPLSA) and generating classification algorithms. Although not an advanced technique, it can easily achieve accuracy on par with a Support Vector Machine when an appropriate classifier is used, whilst also providing a deep and rich view into how the process works. PLSA can be seen as matrix factorisation and is based upon only word probabilities; to include dependancies one wishes to move to a tensor-based method. A recent development in the world of tensor decompositions is the Tensor Train (also known as Matrix Product State, or Tensor Network), which allows for an efficient computation of a tensor. This is my current area of research in this area.

I am also interested in the fundamental question regarding the maximum and typical rank of tensors. Some results exist, some of surprising generality and others of surprising specificity, but most are rather difficult to understand. I am working on providing an elementary understanding of the maximum and typical rank of 3-way tensors. This is of particular value to the Tensor Train method, which is built up from 3-way tensors.

#### Statistical Analysis of Student Data

I am interested in applying statistics to student data. Here, a clear understanding of how the conclusions are reached is absolutely essential, and therefore I am focusing on a statistical approach. I am using non-standard parametric distributions for describing student data, coupled with parametric and non-parametric tests, to provide insights into where gaps happen between different social groups. It is difficult to draw simple conclusions out of student data - often simple human level stories are insufficient to describe even the simplest feature in the data. However, by letting the data drive the analysis, some ideas can be abstracted.

#### Analysis of Raman Spectroscopy Data

Raman spectroscopy is an excellent playground for data analysis techniques. It is multiway, high-dimensional, outlier rich, and noisy, whilst also being well understood. It provides a very clean arena for inventive techniques. I am interested in regularised regression, factor analysis, cluster analysis, and classification, all applied to Raman data and the complexities that are inherent. I collaborate with an industrial partner, which also allows me to understand the current questions, whilst keeping in mind that a simple output is often required in order that the techniques might actually be used.