Research

My statistical research is in the areas of computer experiments, computer model calibration, uncertainty quantification and statistical methods for large datastructures aka “big data”. Most of the applied problems that have motivated my statistical research have so far originated from climate/environmental science. However, I am also very interested in leveraging mathematical models and observational data to understand environmental impacts on health.

Current Research
Efficient Proposal Mechanisms for Bayesian Regression Tree Models:  in this project we are looking at some new ways of proposing changes to tree structures in Bayesian regression trees.  In applying regression tree approaches to computer experiment problems, the typically small error variance term can lead to poor mixing, resulting in less than nominal coverage of credible intervals for posterior predictions.  Our new proposal mechanisms seem to alleviate this problem  (to be submitted).

Utilization of a nonlinear inverse procedure to infer CO2 emissions using limited downwind observations:  this is an applied paper where we combine a simulator of the time-evolution of a CO2 plume with a column-integrated observation produced by a Fourier Transform Spectrometer (FTS) sensor.  (in preparation)

Downstream evolution of a plume of CO2 emitted from a smokestack as simulated by HiGRAD.

Parallel MCMC sampler for Bayesian Additive Regression Trees:  here’s a very interesting and fun project we’ve been working on – a scalable statistical model for big data that doesn’t make any inferential sacrifices.  Leverages the very basic statistical ideas of sufficiency and data reduction in a creative way.  We also make some technical developments to prove the algorithm’s scalability. (submitted, under revision)

Dream Project:  A recent project involves writing a review paper about quantifying the contribution of outdoor-sourced environmental tobacco smoke on residential indoor air quality and its health impacts in modern cities.  The idea in this initial work is to review the current state of scientific knowledge on exposure levels, health impacts and remedies.

Non-Parametric Bayesian Calibration:  In another project, we have developed a non-parametric Bayesian calibration model that scales to challenging calibration problems. The idea is to learn locally-adaptive bases while retaining efficient MCMC sampling of the posterior distribution. We demonstrate our method by calibrating the Community Ice Sheet Model. (submitted)

Calibration and EKF:  We also have done some comparative work involving the usual GP approach to calibration versus ensemble Kalman filter approach which is popular in the data assimilation world.  Motivated by applied collaborations involving ice-sheet modeling (submitted) and remote CO2 emission rate estimation (in progress).

Simulated CISM Ice-sheet

Snapshot of simulated flow for idealized ice sheet using CISM.

CO2 Plume

HIGRAD simulation of CO2 plume emitted from a source (stack) with fixed emission rate and variable wind forcing. Stack located at bottom-left corner of plot.

Applied Motivations:  I have been lucky enough to be involved with some great collaborative research that motivates much of my own statistical developments.  Two recent collaborations involved investigating the calibration of the CISM ice sheet model and performing statistical inversion of a complex CO2 emission rate problem for international treaty verification.

Solar wind plasma interacting with the Earth’s magnetosphere. Also known as the Aurora Borealis. Photo taken from the Space Shuttle. Work at NCAR involved calibrating a space weather model investigating such behaviours. Image from http://www.geo.mtu.edu/weather/aurora/images/space.

Ph.D. work
During my Ph.D. I developed new methods for model calibration experiments. Model calibration is an interesting statistical approach that enables scientists or practitioners to investigate a real-world phenomena by combining a simulator of the phenomena with observational data that has been collected. The idea, broadly speaking, is to leverage the simulator to understand the phenomena without requiring extensive observations (they may be expensive, difficult or even dangerous or impossible to collect). And, just to make things a little more challenging, such simulators can usually only be sparsely sampled themselves due to, for example, their computational cost. If that wasn’t enough, it is usually unrealistic to expect the simulator to be a perfect representation of reality, so there is the notion of model bias or discrepancy. The statistical approach of model calibration attempts to account for these various sources of uncertainties while estimating model parameters, constructing predictions and uncertainty bounds, etc.

The first development in my thesis was a practical approach for calibrating large and non-stationary computer model output, which was a common feature of the datasets that were motivating our work. The second development deals with incorporating derivative information from the computer model into a calibration experiment. Many computer models are governed by differential equations, and including this derivative information can be helpful, particularly in small n situations. The final development deals with extending the methodology incorporating derivatives to allow for the inclusion of possible bias in the computer model. The main concern here is whether this bias is identifiable. Our results indicated some modest improvements over the previous approach in some experimental conditions. Well, it is a challenging problem – one that I hope to return to in the next year or so. If some of this sounds interesting, you can see my thesis here.

M.Sc. work
It was during my masters that I was introduced to the very modern area of statistics known as computer experiments or uncertainty quantification. In my project, I worked on constructing optimal designs for a typical statistical model in a non-typical setting – the case where the design space is non-convex. This problem arises in many environmental problems where geography places constraints on the variable of interest, such as contaminants in waterways and streams. The method we came up with was interesting but also computationally expensive, anyhow if you like you can read about it here.

Previously…
In a previous life I was a CS undergrad. In those days, I worked under the supervision of Dr. Thomas Wolf on various problems related to GoTools, a program specializing in solving life & death problems in the game of Go. I also represented the university twice at the ACM Programming Contest and also won the Brock programming contest once. I also organized an HPC meeting with representatives of Brock University, SHARCNET and AMD. In my upper years, I became very interested in Statistics, and as they say, the rest is history.