The University of Adelaide
You are here
Text size: S | M | L
Printer Friendly Version
March 2018

Search the School of Mathematical Sciences

Find in People Courses Events News Publications

People matching "Statistical data mining"

Associate Professor Gary Glonek
Associate Professor in Statistics

More about Gary Glonek...
Associate Professor Inge Koch
Associate Professor in Statistics

More about Inge Koch...
Professor Matthew Roughan
Professor of Applied Mathematics

More about Matthew Roughan...
Professor Patty Solomon
Professor of Statistical Bioinformatics

More about Patty Solomon...
Dr Simon Tuke
Lecturer in Statistics

More about Simon Tuke...

Courses matching "Statistical data mining"

Advanced statistical inference

We begin with modern and classical statistical inference and cover cumulants, the cumulant generating function, natural exponential family models, minimal sufficient statistics, completeness, and generalised linear models. We then consider conditional and marginal inference including the concept of ancillary statistics, marginal likelihood and conditional inference. Chapter 2 is about model choice, in particular Akaike's Information Criterion (AIC), Network Information Criterion (NIC), and cross-validation (CV). We will explore the theoretical basis of AIC via model misspecification and the Kullback-Leibler distance. Chapter 3 is devoted to bootstrap methods for assessing statistical accuracy; we will focus on bootstrap estimation and confidence intervals, and consider the jackknife and its relationship to the bootstrap. Chapter 4 is on the analysis of missing data; we will study the different types of missingness and the Expectation-Maximisation (EM) algorithm in particular. Chapter 5 is about survival analysis, and we will cover the Kaplan-Meier estimator, parametric survival models, and the semi-parametric proportional hazards model.

More about this course...

Analysis of multivariable and high dimensional data

Multivariate analysis of data is performed with the aims to 1. understand the structure in data and summarise the data in simpler ways; 2. understand the relationship of one part of the data to another part; and 3. make decisions or draw inferences based on data. The statistical analyses of multivariate data extend those of univariate data, and in doing so require more advanced mathematical theory and computational techniques. The course begins with a discussion of the three classical methods Principal Component Analysis, Canonical Correlation Analysis and Discriminant Analysis which correspond to the aims above. We also learn about Cluster Analysis, Factor Analysis and newer methods including Independent Component Analysis. For most real data the underlying distribution is not known, but if the assumptions of multivariate normality of the data hold, extra properties can be derived. Our treatment combines ideas, theoretical properties and a strong computational component for each of the different methods we discuss. For the computational part -- with Matlab -- we make use of real data and learn the use of simulations in order to assess the performance of different methods in practice. Topics covered: 1. Introduction to multivariate data, the multivariate normal distribution 2. Principal Component Analysis, theory and practice 3. Canonical Correlation Analysis, theory and practice 4. Discriminant Analysis, Fisher's LDA, linear and quadratic DA 5. Cluster Analysis: hierarchical and k-means methods 6. Factor Analysis and latent variables 7. Independent Component Analysis including an Introduction to Information Theory The course will be based on my forthcoming monograph Analysis of Multivariate and High-Dimensional Data - Theory and Practice, to be published by Cambridge University Press.

More about this course...

Mathematical epidemiology: Stochastic models and their statistical calibration

Mathematical models are increasingly used to inform governmental policy-makers on issues that threaten human health or which have an adverse impact on the economy. It is this real-world success combined with the wide variety of interesting mathematical problems which arise that makes mathematical epidemiology one of the most exciting topics in applied mathematics. During the summer school, you will be introduced to mathematical epidemiology and some fundamental theory required for studying and parametrising stochastic models of infection dynamics, which will provide an ideal basis for addressing key research questions in this area; several such questions will be introduced and explored in this course. Topics: An introduction to mathematical epidemiology Discrete-time and continuous-time discrete-state stochastic infection models Numerical methods for studying stochastic infection models: EXPOKIT, transforms and their inversion Methods for simulating stochastic infection models: classical (Gillespie) algorithm, more efficient exact and approximate algorithms Methods for parameterising stochastic infection models: frequentist approaches, Bayesian approaches, approximate Bayesian computation Optimal observation of stochastic infection models

More about this course...

Statistical Analysis and Modelling 1

This is a first course in Statistics for mathematically inclined students. It will address the key principles underlying commonly used statistical methods such as confidence intervals, hypothesis tests, inference for means and proportions, and linear regression. It will develop a deeper mathematical understanding of these ideas, many of which will be familiar from studies in secondary school. The application of basic and more advanced statistical methods will be illustrated on a range of problems from areas such as medicine, science, technology, government, commerce and manufacturing. The use of the statistical package SPSS will be developed through a sequence of computer practicals. Topics covered will include: basic probability and random variables, fundamental distributions, inference for means and proportions, comparison of independent and paired samples, simple linear regression, diagnostics and model checking, multiple linear regression, simple factorial models, models with factors and continuous predictors.

More about this course...

Statistical Modelling and Inference

Statistical methods are important to all areas that rely on data including science, technology, government and commerce. To deal with the complex problems that arise in practice requires a sound understanding of fundamental statistical principles together with a range of suitable modelling techniques. Computing using a high level statistical package is also an essential element of modern statistical practice. This course provides an introduction to the principles of statistical inference and the development of linear statistical models with the statistical package R. Topics covered are: Point estimates, unbiasedness, mean-squared error, confidence intervals, tests of hypotheses, power calculations, derivation of one and two-sample procedures; simple linear regression, regression diagnostics, prediction; linear models, ANOVA, multiple regression, factorial experiments, analysis of covariance models, model building; likelihood based methods for estimation and testing, goodness of fit tests; sample surveys, population means, totals and proportions, simple random samples, stratified random samples. Topics covered are: point estimates, unbiasedness, mean-squared error, confidence intervals, tests of hypotheses, power calculations, derivation of one and two-sample procedures: simple linear regression, regression diagnostics, prediction: linear models, analysis of variance (ANOVA), multiple regression, factorial experiments, analysis of covariance models, model building; likelihood-based methods for estimation and testing and goodness-of-fit tests.

More about this course...

Statistical Modelling III

One of the key requirements of an applied statistician is the ability to formulate appropriate statistical models and then apply them to data in order to answer the questions of interest. Most often, such models can be seen as relating a response variable to one or more explanatory variables. For example, in a medical experiment we may seek to evaluate a new treatment by relating patient outcome to treatment received while allowing for background variables such as age, sex and disease severity. In this course, a rigorous discussion of the linear model is given and various extensions are developed. There is a strong practical emphasis and the statistical package R is used extensively. Topics covered are: the linear model, least squares estimation, generalised least squares estimation, properties of estimators, the Gauss-Markov theorem; geometry of least squares, subspace formulation of linear models, orthogonal projections; regression models, factorial experiments, analysis of covariance and model formulae; regression diagnostics, residuals, influence diagnostics, transformations, Box-Cox models, model selection and model building strategies; models with complex error structure, split-plot experiments; logistic regression models.

More about this course...

Statistical Practice I

Statistical ideas and methods are essential tools in virtually all areas that rely on data to make decisions and reach conclusions. This includes diverse fields such as medicine, science, technology, government, commerce and manufacturing. In broad terms, statistics is about getting information from data. This includes both the important question of how to obtain suitable data for a given purpose and also how best to extract the information, often in the presence of random variability. This course provides an introduction to the contemporary application of statistics to a wide range of real world situations. It has a strong practical focus using the statistical package SPSS to analyse real data. Topics covered are: organisation, description and presentation of data; design of experiments and surveys; random variables, probability distributions, the binomial distribution and the normal distribution; statistical inference, tests of significance, confidence intervals; inference for means and proportions, one-sample tests, two independent samples, paired data, t-tests, contingency tables; analysis of variance; linear regression, least squares estimation, residuals and transformations, inference for regression coefficients, prediction.

More about this course...

Statistical Practice I (Life Sciences)

Statistical ideas and methods are essential tools in virtually all areas that rely on data to make decisions and reach conclusions. This includes diverse fields such as science, technology, government, commerce, manufacturing and the life sciences. In broad terms, statistics is about getting information from data. This includes both the important question of how to obtain suitable data for a given purpose and also how best to extract the information, often in the presence of random variability. This course provides an introduction to the contemporary application of statistics to a range of real world situations. It has a strong practical focus using the statistical package SPSS to analyse real data relevant to the life sciences. Topics covered are: organisation, description and presentation of data in the life sciences; design of experiments and surveys; random variables, probability distributions, the binomial distribution and the normal distribution; statistical inference, tests of significance, confidence intervals; inference for means and proportions, one-sample tests, two independent samples, paired data, t-tests, contingency tables; analysis of variance; linear regression, least squares estimation, residuals and transformations, inference for regression coefficients, prediction.

More about this course...

Statistical Practice I (Life Sciences) (Pre-Vet)

Statistical ideas and methods are essential tools in virtually all areas that rely on data to make decisions and reach conclusions. This includes diverse fields such as science, technology, government, commerce, manufacturing and the life sciences. In broad terms, statistics is about getting information from data. This includes both the important question of how to obtain suitable data for a given purpose and also how best to extract the information, often in the presence of random variability. This course provides an introduction to the contemporary application of statistics to a range of real world situations. It has a strong practical focus using the statistical package SPSS to analyse real data relevant to the life sciences. Topics covered are: organisation, description and presentation of data in the life sciences; design of experiments and surveys; random variables, probability distributions, the binomial distribution and the normal distribution; statistical inference, tests of significance, confidence intervals; inference for means and proportions, one-sample tests, two independent samples, paired data, t-tests, contingency tables; analysis of variance; linear regression, least squares estimation, residuals and transformations, inference for regression coefficients, prediction.

More about this course...

Events matching "Statistical data mining"

Mathematics of underground mining.
15:10 Fri 12 May, 2006 :: G08 Mathematics Building University of Adelaide :: Prof. Hyam Rubinstein

Underground mining infrastructure involves an interesting range of optimisation problems with geometric constraints. In particular, ramps, drives and tunnels have gradient within a certain prescribed range and turning circles (curvature) are also bounded. Finally obstacles have to be avoided, such as faults, ore bodies themselves and old workings. A group of mathematicians and engineers at Uni of Melb and Uni of SA have been working on this problem for a number of years. I will summarise what we have found and the challenges of working in the mining industry.
Watching evolution in real time; problems and potential research areas.
15:10 Fri 26 May, 2006 :: G08. Mathematics Building University of Adelaide :: Prof Alan Cooper (Federation Fellow)

Recent studies (1) have indicated problems with our ability to use the genetic distances between species to estimate the time since their divergence (so called molecular clocks). An exponential decay curve has been detected in comparisons of closely related taxa in mammal and bird groups, and rough approximations suggest that molecular clock calculations may be problematic for the recent past (eg <1 million years). Unfortunately, this period encompasses a number of key evolutionary events where estimates of timing are critical such as modern human evolutionary history, the domestication of animals and plants, and most issues involved in conservation biology. A solution (formulated at UA) will be briefly outlined. A second area of active interest is the recent suggestion (2) that mitochondrial DNA diversity does not track population size in several groups, in contrast to standard thinking. This finding has been interpreted as showing that mtDNA may not be evolving neutrally, as has long been assumed.
Large ancient DNA datasets provide a means to examine these issues, by revealing evolutionary processes in real time (3). The data also provide a rich area for mathematical investigation as temporal information provides information about several parameters that are unknown in serial coalescent calculations (4).
  1. Ho SYW et al. Time dependency of molecular rate estimates and systematic overestimation of recent divergence times. Mol. Biol. Evol. 22, 1561-1568 (2005);
    Penny D, Nature 436, 183-184 (2005).
  2. Bazin E., et al. Population size does not influence mitochondrial genetic diversity in animals. Science 312, 570 (2006);
    Eyre-Walker A. Size does not matter for mitochondrial DNA, Science 312, 537 (2006).
  3. Shapiro B, et al. Rise and fall of the Beringian steppe bison. Science 306: 1561-1565 (2004);
    Chan et al. Bayesian estimation of the timing and severity of a population bottleneck from ancient DNA. PLoS Genetics, 2 e59 (2006).
  4. Drummond et al. Measurably evolving populations, Trends in Ecol. Evol. 18, 481-488 (2003);
    Drummond et al. Bayesian coalescent inference of past population dynamics from molecular sequences. Molecular Biology Evolution 22, 1185-92 (2005).
A Bivariate Zero-inflated Poisson Regression Model and application to some Dental Epidemiological data
14:10 Fri 27 Oct, 2006 :: G08 Mathematics Building University of Adelaide :: University Prof Sudhir Paul

Data in the form of paired (pre-treatment, post-treatment) counts arise in the study of the effects of several treatments after accounting for possible covariate effects. An example of such a data set comes from a dental epidemiological study in Belo Horizonte (the Belo Horizonte caries prevention study) which evaluated various programmes for reducing caries. Also, these data may show extra pairs of zeros than can be accounted for by a simpler model, such as, a bivariate Poisson regression model. In such situations we propose to use a zero-inflated bivariate Poisson regression (ZIBPR) model for the paired (pre-treatment, posttreatment) count data. We develop EM algorithm to obtain maximum likelihood estimates of the parameters of the ZIBPR model. Further, we obtain exact Fisher information matrix of the maximum likelihood estimates of the parameters of the ZIBPR model and develop a procedure for testing treatment effects. The procedure to detect treatment effects based on the ZIBPR model is compared, in terms of size, by simulations, with an earlier procedure using a zero-inflated Poisson regression (ZIPR) model of the post-treatment count with the pre-treatment count treated as a covariate. The procedure based on the ZIBPR model holds level most effectively. A further simulation study indicates good power property of the procedure based on the ZIBPR model. We then compare our analysis, of the decayed, missing and filled teeth (DMFT) index data from the caries prevention study, based on the ZIBPR model with the analysis using a zero-inflated Poisson regression model in which the pre-treatment DMFT index is taken to be a covariate
Statistical convergence of sequences of complex numbers with application to Fourier series
15:10 Tue 27 Mar, 2007 :: G08 Mathematics Building University of Adelaide :: Prof. Ferenc Morics

The concept of statistical convergence was introduced by Henry Fast and Hugo Steinhaus in 1951. But in fact, it was Antoni Zygmund who first proved theorems on the statistical convergence of Fourier series, using the term \"almost convergence\". A sequence $\\{x_k : k=1,2\\ldots\\}$ of complex numbers is said to be statistically convergent to $\\xi$ if for every $\\varepsilon >0$ we have $$\\lim_{n\\to \\infty} n^{-1} |\\{1\\le k\\le n: |x_k-\\xi| > \\varepsilon\\}| = 0.$$ We present the basic properties of statistical convergence, and extend it to multiple sequences. We also discuss the convergence behavior of Fourier series.
Likelihood inference for a problem in particle physics
15:10 Fri 27 Jul, 2007 :: G04 Napier Building University of Adelaide :: Prof. Anthony Davison

The Large Hadron Collider (LHC), a particle accelerator located at CERN, near Geneva, is (currently!) expected to start operation in early 2008. It is located in an underground tunnel 27km in circumference, and when fully operational, will be the world's largest and highest energy particle accelerator. It is hoped that it will provide evidence for the existence of the Higgs boson, the last remaining particle of the so-called Standard Model of particle physics. The quantity of data that will be generated by the LHC is roughly equivalent to that of the European telecommunications network, but this will be boiled down to just a few numbers. After a brief introduction, this talk will outline elements of the statistical problem of detecting the presence of a particle, and then sketch how higher order likelihood asymptotics may be used for signal detection in this context. The work is joint with Nicola Sartori, of the Università Ca' Foscari, in Venice.
Regression: a backwards step?
13:10 Fri 7 Sep, 2007 :: Maths G08 :: Dr Gary Glonek

Most students of high school mathematics will have encountered the technique of fitting a line to data by least squares. Those who have taken a university statistics course will also have heard this method referred to as regression. However, it is not obvious from common dictionary definitions why this should be the case. For example, "reversion to an earlier or less advanced state or form". In this talk, the mathematical phenomenon that gave regression its name will be explained and will be shown to have implications in some unexpected contexts.
Statistical Critique of the International Panel on Climate Change's work on Climate Change.
18:00 Wed 17 Oct, 2007 :: Union Hall University of Adelaide :: Mr Dennis Trewin

Climate change is one of the most important issues facing us today. Many governments have introduced or are developing appropriate policy interventions to (a) reduce the growth of greenhouse gas emissions in order to mitigate future climate change, or (b) adapt to future climate change. This important work deserves a high quality statistical data base but there are statistical shortcomings in the work of the International Panel on Climate Change (IPCC). There has been very little involvement of qualified statisticians in the very important work of the IPCC which appears to be scientifically meritorious in most other ways. Mr Trewin will explain these shortcomings and outline his views on likely future climate change, taking into account the statistical deficiencies. His conclusions suggest climate change is still an important issue that needs to be addressed but the range of likely outcomes is a lot lower than has been suggested by the IPCC. This presentation will be based on an invited paper presented at the OECD World Forum.
Moderated Statistical Tests for Digital Gene Expression Technologies
15:10 Fri 19 Oct, 2007 :: G04 Napier Building University of Adelaide :: Dr Gordon Smyth :: Walter and Eliza Hall Institute of Medical Research in Melbourne, Australia

Digital gene expression (DGE) technologies measure gene expression by counting sequence tags. They are sensitive technologies for measuring gene expression on a genomic scale, without the need for prior knowledge of the genome sequence. As the cost of DNA sequencing decreases, the number of DGE datasets is expected to grow dramatically. Various tests of differential expression have been proposed for replicated DGE data using over-dispersed binomial or Poisson models for the counts, but none of the these are usable when the number of replicates is very small. We develop tests using the negative binomial distribution to model overdispersion relative to the Poisson, and use conditional weighted likelihood to moderate the level of overdispersion across genes. A heuristic empirical Bayes algorithm is developed which is applicable to very general likelihood estimation contexts. Not only is our strategy applicable even with the smallest number of replicates, but it also proves to be more powerful than previous strategies when more replicates are available. The methodology is applicable to other counting technologies, such as proteomic spectral counts.
Global and Local stationary modelling in finance: Theory and empirical evidence
14:10 Thu 10 Apr, 2008 :: G04 Napier Building University of Adelaide :: Prof. Dominique Guégan :: Universite Paris 1 Pantheon-Sorbonne

To model real data sets using second order stochastic processes imposes that the data sets verify the second order stationarity condition. This stationarity condition concerns the unconditional moments of the process. It is in that context that most of models developed from the sixties' have been studied; We refer to the ARMA processes (Brockwell and Davis, 1988), the ARCH, GARCH and EGARCH models (Engle, 1982, Bollerslev, 1986, Nelson, 1990), the SETAR process (Lim and Tong, 1980 and Tong, 1990), the bilinear model (Granger and Andersen, 1978, Guégan, 1994), the EXPAR model (Haggan and Ozaki, 1980), the long memory process (Granger and Joyeux, 1980, Hosking, 1981, Gray, Zang and Woodward, 1989, Beran, 1994, Giraitis and Leipus, 1995, Guégan, 2000), the switching process (Hamilton, 1988). For all these models, we get an invertible causal solution under specific conditions on the parameters, then the forecast points and the forecast intervals are available.

Thus, the stationarity assumption is the basis for a general asymptotic theory for identification, estimation and forecasting. It guarantees that the increase of the sample size leads to more and more information of the same kind which is basic for an asymptotic theory to make sense.

Now non-stationarity modelling has also a long tradition in econometrics. This one is based on the conditional moments of the data generating process. It appears mainly in the heteroscedastic and volatility models, like the GARCH and related models, and stochastic volatility processes (Ghysels, Harvey and Renault 1997). This non-stationarity appears also in a different way with structural changes models like the switching models (Hamilton, 1988), the stopbreak model (Diebold and Inoue, 2001, Breidt and Hsu, 2002, Granger and Hyung, 2004) and the SETAR models, for instance. It can also be observed from linear models with time varying coefficients (Nicholls and Quinn, 1982, Tsay, 1987).

Thus, using stationary unconditional moments suggest a global stationarity for the model, but using non-stationary unconditional moments or non-stationary conditional moments or assuming existence of states suggest that this global stationarity fails and that we only observe a local stationary behavior.

The growing evidence of instability in the stochastic behavior of stocks, of exchange rates, of some economic data sets like growth rates for instance, characterized by existence of volatility or existence of jumps in the variance or on the levels of the prices imposes to discuss the assumption of global stationarity and its consequence in modelling, particularly in forecasting. Thus we can address several questions with respect to these remarks.

1. What kinds of non-stationarity affect the major financial and economic data sets? How to detect them?

2. Local and global stationarities: How are they defined?

3. What is the impact of evidence of non-stationarity on the statistics computed from the global non stationary data sets?

4. How can we analyze data sets in the non-stationary global framework? Does the asymptotic theory work in non-stationary framework?

5. What kind of models create local stationarity instead of global stationarity? How can we use them to develop a modelling and a forecasting strategy?

These questions began to be discussed in some papers in the economic literature. For some of these questions, the answers are known, for others, very few works exist. In this talk I will discuss all these problems and will propose 2 new stategies and modelling to solve them. Several interesting topics in empirical finance awaiting future research will also be discussed.

Elliptic equation for diffusion-advection flows
15:10 Fri 15 Aug, 2008 :: G03 Napier Building University of Adelaide :: Prof. Pavel Bedrikovsetsky :: Australian School of Petroleum Science, University of Adelaide.

The standard diffusion equation is obtained by Einstein's method and its generalisation, Fokker-Plank-Kolmogorov-Feller theory. The time between jumps in Einstein derivation is constant.

We discuss random walks with residence time distribution, which occurs for flows of solutes and suspensions/colloids in porous media, CO2 sequestration in coal mines, several processes in chemical, petroleum and environmental engineering. The rigorous application of the Einstein's method results in new equation, containing the time and the mixed dispersion terms expressing the dispersion of the particle time steps.

Usually, adding the second time derivative results in additional initial data. For the equation derived, the condition of limited solution when time tends to infinity provides with uniqueness of the Caushy problem solution.

The solution of the pulse injection problem describing a common tracer injection experiment is studied in greater detail. The new theory predicts delay of the maximum of the tracer, compared to the velocity of the flow, while its forward "tail" contains much more particles than in the solution of the classical parabolic (advection-dispersion) equation. This is in agreement with the experimental observations and predictions of the direct simulation.

Probabilistic models of human cognition
15:10 Fri 29 Aug, 2008 :: G03 Napier Building University of Adelaide :: Dr Daniel Navarro :: School of Psychology, University of Adelaide

Over the last 15 years a fairly substantial psychological literature has developed in which human reasoning and decision-making is viewed as the solution to a variety of statistical problems posed by the environments in which we operate. In this talk, I briefly outline the general approach to cognitive modelling that is adopted in this literature, which relies heavily on Bayesian statistics, and introduce a little of the current research in this field. In particular, I will discuss work by myself and others on the statistical basis of how people make simple inductive leaps and generalisations, and the links between these generalisations and how people acquire word meanings and learn new concepts. If time permits, the extensions of the work in which complex concepts may be characterised with the aid of nonparametric Bayesian tools such as Dirichlet processes will be briefly mentioned.
Oceanographic Research at the South Australian Research and Development Institute: opportunities for collaborative research
15:10 Fri 21 Nov, 2008 :: Napier G04 :: Associate Prof John Middleton :: South Australian Research and Development Institute

Increasing threats to S.A.'s fisheries and marine environment have underlined the increasing need for soundly based research into the ocean circulation and ecosystems (phyto/zooplankton) of the shelf and gulfs. With support of Marine Innovation SA, the Oceanography Program has within 2 years, grown to include 6 FTEs and a budget of over $4.8M. The program currently leads two major research projects, both of which involve numerical and applied mathematical modelling of oceanic flow and ecosystems as well as statistical techniques for the analysis of data. The first is the implementation of the Southern Australian Integrated Marine Observing System (SAIMOS) that is providing data to understand the dynamics of shelf boundary currents, monitor for climate change and understand the phyto/zooplankton ecosystems that under-pin SA's wild fisheries and aquaculture. SAIMOS involves the use of ship-based sampling, the deployment of underwater marine moorings, underwater gliders, HF Ocean RADAR, acoustic tracking of tagged fish and Autonomous Underwater vehicles.

The second major project involves measuring and modelling the ocean circulation and biological systems within Spencer Gulf and the impact on prawn larval dispersal and on the sustainability of existing and proposed aquaculture sites. The discussion will focus on opportunities for collaborative research with both faculty and students in this exciting growth area of S.A. science.

Key Predistribution in Grid-Based Wireless Sensor Networks
15:10 Fri 12 Dec, 2008 :: Napier G03 :: Dr Maura Paterson :: Information Security Group at Royal Holloway, University of London.

Wireless sensors are small, battery-powered devices that are deployed to measure quantities such as temperature within a given region, then form a wireless network to transmit and process the data they collect. We discuss the problem of distributing symmetric cryptographic keys to the nodes of a wireless sensor network in the case where the sensors are arranged in a square or hexagonal grid, and we propose a key predistribution scheme for such networks that is based on Costas arrays. We introduce more general structures known as distinct-difference configurations, and show that they provide a flexible choice of parameters in our scheme, leading to more efficient performance than that achieved by prior schemes from the literature.
From histograms to multivariate polynomial histograms and shape estimation
12:10 Thu 19 Mar, 2009 :: Napier 210 :: A/Prof Inge Koch

Histograms are convenient and easy-to-use tools for estimating the shape of data, but they have serious problems which are magnified for multivariate data. We combine classic histograms with shape estimation by polynomials. The new relatives, `polynomial histograms', have surprisingly nice mathematical properties, which we will explore in this talk. We also show how they can be used for real data of 10-20 dimensions to analyse and understand the shape of these data.
Multi-scale tools for interpreting cell biology data
15:10 Fri 17 Apr, 2009 :: Napier LG29 :: Dr Matthew Simpson :: University of Melbourne

Trajectory data from observations of a random walk process are often used to characterize macroscopic transport coefficients and to infer motility mechanisms in cell biology. New continuum equations describing the average moments of the position of an individual agent in a population of interacting agents are derived and validated. Unlike standard noninteracting random walks, the new moment equations explicitly represent the interactions between agents as they are coupled to the macroscopic agent density. Key issues associated with the validity of the new continuum equations and the interpretation of experimental data will be explored.
Statistical analysis for harmonized development of systemic organs in human fetuses
11:00 Thu 17 Sep, 2009 :: School Board Room :: Prof Kanta Naito :: Shimane University

The growth processes of human babies have been studied sufficiently in scientific fields, but there have still been many issues about the developments of human fetus which are not clarified. The aim of this research is to investigate the developing process of systemic organs of human fetuses based on the data set of measurements of fetus's bodies and organs. Specifically, this talk is concerned with giving a mathematical understanding for the harmonized developments of the organs of human fetuses. The method to evaluate such harmonies is proposed by the use of the maximal dilatation appeared in the theory of quasi-conformal mapping.
Contemporary frontiers in statistics
15:10 Mon 28 Sep, 2009 :: Badger Labs G31 Macbeth Lectrue :: Prof. Peter Hall :: University of Melbourne

The availability of powerful computing equipment has had a dramatic impact on statistical methods and thinking, changing forever the way data are analysed. New data types, larger quantities of data, and new classes of research problem are all motivating new statistical methods. We shall give examples of each of these issues, and discuss the current and future directions of frontier problems in statistics.
Exploratory experimentation and computation
15:10 Fri 16 Apr, 2010 :: Napier LG29 :: Prof Jonathan Borwein :: University of Newcastle

The mathematical research community is facing a great challenge to re-evaluate the role of proof in light of the growing power of current computer systems, of modern mathematical computing packages, and of the growing capacity to data-mine on the Internet. Add to that the enormous complexity of many modern capstone results such as the Poincare conjecture, Fermat's last theorem, and the Classification of finite simple groups. As the need and prospects for inductive mathematics blossom, the requirement to ensure the role of proof is properly founded remains undiminished. I shall look at the philosophical context with examples and then offer some of five bench-marking examples of the opportunities and challenges we face.
Estimation of sparse Bayesian networks using a score-based approach
15:10 Fri 30 Apr, 2010 :: School Board Room :: Dr Jessica Kasza :: University of Copenhagen

The estimation of Bayesian networks given high-dimensional data sets, with more variables than there are observations, has been the focus of much recent research. These structures provide a flexible framework for the representation of the conditional independence relationships of a set of variables, and can be particularly useful in the estimation of genetic regulatory networks given gene expression data.

In this talk, I will discuss some new research on learning sparse networks, that is, networks with many conditional independence restrictions, using a score-based approach. In the case of genetic regulatory networks, such sparsity reflects the view that each gene is regulated by relatively few other genes. The presented approach allows prior information about the overall sparsity of the underlying structure to be included in the analysis, as well as the incorporation of prior knowledge about the connectivity of individual nodes within the network.

Interpolation of complex data using spatio-temporal compressive sensing
13:00 Fri 28 May, 2010 :: Santos Lecture Theatre :: A/Prof Matthew Roughan :: School of Mathematical Sciences, University of Adelaide

Many complex datasets suffer from missing data, and interpolating these missing elements is a key task in data analysis. Moreover, it is often the case that we see only a linear combination of the desired measurements, not the measurements themselves. For instance, in network management, it is easy to count the traffic on a link, but harder to measure the end-to-end flows. Additionally, typical interpolation algorithms treat either the spatial, or the temporal components of data separately, but in many real datasets have strong spatio-temporal structure that we would like to exploit in reconstructing the missing data. In this talk I will describe a novel reconstruction algorithm that exploits concepts from the growing area of compressive sensing to solve all of these problems and more. The approach works so well on Internet traffic matrices that we can obtain a reasonable reconstruction with as much as 98% of the original data missing.
A variance constraining ensemble Kalman filter: how to improve forecast using climatic data of unobserved variables
15:10 Fri 28 May, 2010 :: Santos Lecture Theatre :: A/Prof Georg Gottwald :: The University of Sydney

Data assimilation aims to solve one of the fundamental problems ofnumerical weather prediction - estimating the optimal state of the atmosphere given a numerical model of the dynamics, and sparse, noisy observations of the system. A standard tool in attacking this filtering problem is the Kalman filter.

We consider the problem when only partial observations are available. In particular we consider the situation where the observational space consists of variables which are directly observable with known observational error, and of variables of which only their climatic variance and mean are given. We derive the corresponding Kalman filter in a variational setting.

We analyze the variance constraining Kalman filter (VCKF) filter for a simple linear toy model and determine its range of optimal performance. We explore the variance constraining Kalman filter in an ensemble transform setting for the Lorenz-96 system, and show that incorporating the information on the variance on some un-observable variables can improve the skill and also increase the stability of the data assimilation procedure.

Using methods from dynamical systems theory we then systems where the un-observed variables evolve deterministically but chaotically on a fast time scale.

This is joint work with Lewis Mitchell and Sebastian Reich.

The mathematics of theoretical inference in cognitive psychology
15:10 Fri 11 Jun, 2010 :: Napier LG24 :: Prof John Dunn :: University of Adelaide

The aim of psychology in general, and of cognitive psychology in particular, is to construct theoretical accounts of mental processes based on observed changes in performance on one or more cognitive tasks. The fundamental problem faced by the researcher is that these mental processes are not directly observable but must be inferred from changes in performance between different experimental conditions. This inference is further complicated by the fact that performance measures may only be monotonically related to the underlying psychological constructs. State-trace analysis provides an approach to this problem which has gained increasing interest in recent years. In this talk, I explain state-trace analysis and discuss the set of mathematical issues that flow from it. Principal among these are the challenges of statistical inference and an unexpected connection to the mathematics of oriented matroids.
Meteorological drivers of extreme bushfire events in southern Australia
15:10 Fri 2 Jul, 2010 :: Benham Lecture Theatre :: Prof Graham Mills :: Centre for Australian Weather and Climate Research, Melbourne

Bushfires occur regularly during summer in southern Australia, but only a few of these fires become iconic due to their effects, either in terms of loss of life or economic and social cost. Such events include Black Friday (1939), the Hobart fires (1967), Ash Wednesday (1983), the Canberra bushfires (2003), and most recently Black Saturday in February 2009. In most of these events the weather of the day was statistically extreme in terms of heat, (low) humidity, and wind speed, and in terms of antecedent drought. There are a number of reasons for conducting post-event analyses of the meteorology of these events. One is to identify any meteorological circulation systems or dynamic processes occurring on those days that might not be widely or hitherto recognised, to document these, and to develop new forecast or guidance products. The understanding and prediction of such features can be used in the short term to assist in effective management of fires and the safety of firefighters and in the medium range to assist preparedness for the onset of extreme conditions. The results of such studies can also be applied to simulations of future climates to assess the likely changes in frequency of the most extreme fire weather events, and their documentary records provide a resource that can be used for advanced training purposes. In addition, particularly for events further in the past, revisiting these events using reanalysis data sets and contemporary NWP models can also provide insights unavailable at the time of the events. Over the past few years the Bushfire CRC's Fire Weather and Fire Danger project in CAWCR has studied the mesoscale meteorology of a number of major fire events, including the days of Ash Wednesday 1983, the Dandenong Ranges fire in January 1997, the Canberra fires and the Alpine breakout fires in January 2003, the Lower Eyre Peninsula fires in January 2005 and the Boorabbin fire in December 2007-January 2008. Various aspects of these studies are described below, including the structures of dry cold frontal wind changes, the particular character of the cold fronts associated with the most damaging fires in southeastern Australia, and some aspects of how the vertical temperature and humidity structure of the atmosphere may affect the fire weather at the surface. These studies reveal much about these major events, but also suggest future research directions, and some of these will be discussed.
Mathematica Seminar
15:10 Wed 28 Jul, 2010 :: Engineering Annex 314 :: Kim Schriefer :: Wolfram Research

The Mathematica Seminars 2010 offer an opportunity to experience the applicability, ease-of-use, as well as the advancements of Mathematica 7 in education and academic research. These seminars will highlight the latest directions in technical computing with Mathematica, and the impact this technology has across a wide range of academic fields, from maths, physics and biology to finance, economics and business. Those not yet familiar with Mathematica will gain an overview of the system and discover the breadth of applications it can address, while experts will get firsthand experience with recent advances in Mathematica like parallel computing, digital image processing, point-and-click palettes, built-in curated data, as well as courseware examples.
A spatial-temporal point process model for fine resolution multisite rainfall data from Roma, Italy
14:10 Thu 19 Aug, 2010 :: Napier G04 :: A/Prof Paul Cowpertwait :: Auckland University of Technology

A point process rainfall model is further developed that has storm origins occurring in space-time according to a Poisson process. Each storm origin has a random radius so that storms occur as circular regions in two-dimensional space, where the storm radii are taken to be independent exponential random variables. Storm origins are of random type z, where z follows a continuous probability distribution. Cell origins occur in a further spatial Poisson process and have arrival times that follow a Neyman-Scott point process. Cell origins have random radii so that cells form discs in two-dimensional space. Statistical properties up to third order are derived and used to fit the model to 10 min series taken from 23 sites across the Roma region, Italy. Distributional properties of the observed annual maxima are compared to equivalent values sampled from series that are simulated using the fitted model. The results indicate that the model will be of use in urban drainage projects for the Roma region.
Simultaneous confidence band and hypothesis test in generalised varying-coefficient models
15:05 Fri 10 Sep, 2010 :: Napier LG28 :: Prof Wenyang Zhang :: University of Bath

Generalised varying-coefficient models (GVC) are very important models. There are a considerable number of literature addressing these models. However, most of the existing literature are devoted to the estimation procedure. In this talk, I will systematically investigate the statistical inference for GVC, which includes confidence band as well as hypothesis test. I will show the asymptotic distribution of the maximum discrepancy between the estimated functional coefficient and the true functional coefficient. I will compare different approaches for the construction of confidence band and hypothesis test. Finally, the proposed statistical inference methods are used to analyse the data from China about contraceptive use there, which leads to some interesting findings.
Principal Component Analysis Revisited
15:10 Fri 15 Oct, 2010 :: Napier G04 :: Assoc. Prof Inge Koch :: University of Adelaide

Since the beginning of the 20th century, Principal Component Analysis (PCA) has been an important tool in the analysis of multivariate data. The principal components summarise data in fewer than the original number of variables without losing essential information, and thus allow a split of the data into signal and noise components. PCA is a linear method, based on elegant mathematical theory. The increasing complexity of data together with the emergence of fast computers in the later parts of the 20th century has led to a renaissance of PCA. The growing numbers of variables (in particular, high-dimensional low sample size problems), non-Gaussian data, and functional data (where the data are curves) are posing exciting challenges to statisticians, and have resulted in new research which extends the classical theory. I begin with the classical PCA methodology and illustrate the challenges presented by the complex data that we are now able to collect. The main part of the talk focuses on extensions of PCA: the duality of PCA and the Principal Coordinates of Multidimensional Scaling, Sparse PCA, and consistency results relating to principal components, as the dimension grows. We will also look at newer developments such as Principal Component Regression and Supervised PCA, nonlinear PCA and Functional PCA.
Statistical physics and behavioral adaptation to Creation's main stimuli: sex and food
15:10 Fri 29 Oct, 2010 :: E10 B17 Suite 1 :: Prof Laurent Seuront :: Flinders University and South Australian Research and Development Institute

Animals typically search for food and mates, while avoiding predators. This is particularly critical for keystone organisms such as intertidal gastropods and copepods (i.e. millimeter-scale crustaceans) as they typically rely on non-visual senses for detecting, identifying and locating mates in their two- and three-dimensional environments. Here, using stochastic methods derived from the field of nonlinear physics, we provide new insights into the nature (i.e. innate vs. acquired) of the motion behavior of gastropods and copepods, and demonstrate how changes in their behavioral properties can be used to identify the trade-offs between foraging for food or sex. The gastropod Littorina littorea hence moves according to fractional Brownian motions while foraging for food (in accordance with the fractal nature of food distributions), and switch to Brownian motion while foraging for sex. In contrast, the swimming behavior of the copepod Temora longicornis belongs to the class of multifractal random walks (MRW; i.e. a form of anomalous diffusion), characterized by a nonlinear moment scaling function for distance versus time. This clearly differs from the traditional Brownian and fractional Brownian walks expected or previously detected in animal behaviors. The divergence between MRW and Levy flight and walk is also discussed, and it is shown how copepod anomalous diffusion is enhanced by the presence and concentration of conspecific water-borne signals, and is dramatically increasing male-female encounter rates.
Classification for high-dimensional data
15:10 Fri 1 Apr, 2011 :: Conference Room Level 7 Ingkarni Wardli :: Associate Prof Inge Koch :: The University of Adelaide

For two-class classification problems Fisher's discriminant rule performs well in many scenarios provided the dimension, d, is much smaller than the sample size n. As the dimension increases, Fisher's rule may no longer be adequate, and can perform as poorly as random guessing. In this talk we look at new ways of overcoming this poor performance for high-dimensional data by suitably modifying Fisher's rule, and in particular we describe the 'Features Annealed Independence Rule (FAIR)? of Fan and Fan (2008) and a rule based on canonical correlation analysis. I describe some theoretical developments, and also show analysis of data which illustrate the performance of these modified rule.
Comparison of Spectral and Wavelet Estimation of the Dynamic Linear System of a Wade Energy Device
12:10 Mon 2 May, 2011 :: 5.57 Ingkarni Wardli :: Mohd Aftar :: University of Adelaide

Renewable energy has been one of the main issues nowadays. The implications of fossil energy and nuclear energy along with its limited source have triggered researchers and industries to find another source of renewable energy for example hydro energy, wind energy and also wave energy. In this seminar, I will talk about the spectral estimation and wavelet estimation of a linear dynamical system of motion for a heaving buoy wave energy device. The spectral estimates was based on the Fourier transform, while the wavelet estimate was based on the wavelet transform. Comparisons between two spectral estimates with a wavelet estimate of the amplitude response operator(ARO) for the dynamical system of the wave energy device shows that the wavelet estimate ARO is much better for data with and without noise.
On parameter estimation in population models
15:10 Fri 6 May, 2011 :: 715 Ingkarni Wardli :: Dr Joshua Ross :: The University of Adelaide

Essential to applying a mathematical model to a real-world application is calibrating the model to data. Methods for calibrating population models often become computationally infeasible when the populations size (more generally the size of the state space) becomes large, or other complexities such as time-dependent transition rates, or sampling error, are present. Here we will discuss the use of diffusion approximations to perform estimation in several scenarios, with successively reduced assumptions: (i) under the assumption of stationarity (the process had been evolving for a very long time with constant parameter values); (ii) transient dynamics (the assumption of stationarity is invalid, and thus only constant parameter values may be assumed); and, (iii) time-inhomogeneous chains (the parameters may vary with time) and accounting for observation error (a sample of the true state is observed).
When statistics meets bioinformatics
12:10 Wed 11 May, 2011 :: Napier 210 :: Prof Patty Solomon :: School of Mathematical Sciences

Bioinformatics is a new field of research which encompasses mathematics, computer science, biology, medicine and the physical sciences. It has arisen from the need to handle and analyse the vast amounts of data being generated by the new genomics technologies. The interface of these disciplines used to be information-poor, but is now information-mega-rich, and statistics plays a central role in processing this information and making it intelligible. In this talk, I will describe a published bioinformatics study which claimed to have developed a simple test for the early detection of ovarian cancer from a blood sample. The US Food and Drug Administration was on the verge of approving the test kits for market in 2004 when demonstrated flaws in the study design and analysis led to its withdrawal. We are still waiting for an effective early biomarker test for ovarian cancer.
Change detection in rainfall time series for Perth, Western Australia
12:10 Mon 16 May, 2011 :: 5.57 Ingkarni Wardli :: Farah Mohd Isa :: University of Adelaide

There have been numerous reports that the rainfall in south Western Australia, particularly around Perth has observed a step change decrease, which is typically attributed to climate change. Four statistical tests are used to assess the empirical evidence for this claim on time series from five meteorological stations, all of which exceed 50 years. The tests used in this study are: the CUSUM; Bayesian Change Point analysis; consecutive t-test and the Hotelling’s T²-statistic. Results from multivariate Hotelling’s T² analysis are compared with those from the three univariate analyses. The issue of multiple comparisons is discussed. A summary of the empirical evidence for the claimed step change in Perth area is given.
Statistical challenges in molecular phylogenetics
15:10 Fri 20 May, 2011 :: Mawson Lab G19 lecture theatre :: Dr Barbara Holland :: University of Tasmania

This talk will give an introduction to the ways that mathematics and statistics gets used in the inference of evolutionary (phylogenetic) trees. Taking a model-based approach to estimating the relationships between species has proven to be an enormously effective, however, there are some tricky statistical challenges that remain. The increasingly plentiful amount of DNA sequence data is a boon, but it is also throwing a spotlight on some of the shortcomings of current best practice particularly in how we (1) assess the reliability of our phylogenetic estimates, and (2) how we choose appropriate models. This talk will aim to give a general introduction this area of research and will also highlight some results from two of my recent PhD students.
Statistical modelling in economic forecasting: semi-parametrically spatio-temporal approach
12:10 Mon 23 May, 2011 :: 5.57 Ingkarni Wardli :: Dawlah Alsulami :: University of Adelaide

How to model spatio-temporal variation of housing prices is an important and challenging problem as it is of vital importance for both investors and policy makersto assess any movement in housing prices. In this seminar I will talk about the proposed model to estimate any movement in housing prices and measure the risk more accurately.
Permeability of heterogeneous porous media - experiments, mathematics and computations
15:10 Fri 27 May, 2011 :: B.21 Ingkarni Wardli :: Prof Patrick Selvadurai :: Department of Civil Engineering and Applied Mechanics, McGill University

Permeability is a key parameter important to a variety of applications in geological engineering and in the environmental geosciences. The conventional definition of Darcy flow enables the estimation of permeability at different levels of detail. This lecture will focus on the measurement of surface permeability characteristics of a large cuboidal block of Indiana Limestone, using a surface permeameter. The paper discusses the theoretical developments, the solution of the resulting triple integral equations and associated computational treatments that enable the mapping of the near surface permeability of the cuboidal region. This data combined with a kriging procedure is used to develop results for the permeability distribution at the interior of the cuboidal region. Upon verification of the absence of dominant pathways for fluid flow through the cuboidal region, estimates are obtained for the "Effective Permeability" of the cuboid using estimates proposed by Wiener, Landau and Lifschitz, King, Matheron, Journel et al., Dagan and others. The results of these estimates are compared with the geometric mean, derived form the computational estimates.
Optimal experimental design for stochastic population models
15:00 Wed 1 Jun, 2011 :: 7.15 Ingkarni Wardli :: Dr Dan Pagendam :: CSIRO, Brisbane

Markov population processes are popular models for studying a wide range of phenomena including the spread of disease, the evolution of chemical reactions and the movements of organisms in population networks (metapopulations). Our ability to use these models effectively can be limited by our knowledge about parameters, such as disease transmission and recovery rates in an epidemic. Recently, there has been interest in devising optimal experimental designs for stochastic models, so that practitioners can collect data in a manner that maximises the precision of maximum likelihood estimates of the parameters for these models. I will discuss some recent work on optimal design for a variety of population models, beginning with some simple one-parameter models where the optimal design can be obtained analytically and moving on to more complicated multi-parameter models in epidemiology that involve latent states and non-exponentially distributed infectious periods. For these more complex models, the optimal design must be arrived at using computational methods and we rely on a Gaussian diffusion approximation to obtain analytical expressions for Fisher's information matrix, which is at the heart of most optimality criteria in experimental design. I will outline a simple cross-entropy algorithm that can be used for obtaining optimal designs for these models. We will also explore the improvements in experimental efficiency when using the optimal design over some simpler designs, such as the design where observations are spaced equidistantly in time.
Inference and optimal design for percolation and general random graph models (Part I)
09:30 Wed 8 Jun, 2011 :: 7.15 Ingkarni Wardli :: Dr Andrei Bejan :: The University of Cambridge

The problem of optimal arrangement of nodes of a random weighted graph is discussed in this workshop. The nodes of graphs under study are fixed, but their edges are random and established according to the so called edge-probability function. This function is assumed to depend on the weights attributed to the pairs of graph nodes (or distances between them) and a statistical parameter. It is the purpose of experimentation to make inference on the statistical parameter and thus to extract as much information about it as possible. We also distinguish between two different experimentation scenarios: progressive and instructive designs.

We adopt a utility-based Bayesian framework to tackle the optimal design problem for random graphs of this kind. Simulation based optimisation methods, mainly Monte Carlo and Markov Chain Monte Carlo, are used to obtain the solution. We study optimal design problem for the inference based on partial observations of random graphs by employing data augmentation technique. We prove that the infinitely growing or diminishing node configurations asymptotically represent the worst node arrangements. We also obtain the exact solution to the optimal design problem for proximity (geometric) graphs and numerical solution for graphs with threshold edge-probability functions.

We consider inference and optimal design problems for finite clusters from bond percolation on the integer lattice $\mathbb{Z}^d$ and derive a range of both numerical and analytical results for these graphs. We introduce inner-outer plots by deleting some of the lattice nodes and show that the ëmostly populatedí designs are not necessarily optimal in the case of incomplete observations under both progressive and instructive design scenarios. Some of the obtained results may generalise to other lattices.

Inference and optimal design for percolation and general random graph models (Part II)
10:50 Wed 8 Jun, 2011 :: 7.15 Ingkarni Wardli :: Dr Andrei Bejan :: The University of Cambridge

The problem of optimal arrangement of nodes of a random weighted graph is discussed in this workshop. The nodes of graphs under study are fixed, but their edges are random and established according to the so called edge-probability function. This function is assumed to depend on the weights attributed to the pairs of graph nodes (or distances between them) and a statistical parameter. It is the purpose of experimentation to make inference on the statistical parameter and thus to extract as much information about it as possible. We also distinguish between two different experimentation scenarios: progressive and instructive designs.

We adopt a utility-based Bayesian framework to tackle the optimal design problem for random graphs of this kind. Simulation based optimisation methods, mainly Monte Carlo and Markov Chain Monte Carlo, are used to obtain the solution. We study optimal design problem for the inference based on partial observations of random graphs by employing data augmentation technique. We prove that the infinitely growing or diminishing node configurations asymptotically represent the worst node arrangements. We also obtain the exact solution to the optimal design problem for proximity (geometric) graphs and numerical solution for graphs with threshold edge-probability functions.

We consider inference and optimal design problems for finite clusters from bond percolation on the integer lattice $\mathbb{Z}^d$ and derive a range of both numerical and analytical results for these graphs. We introduce inner-outer plots by deleting some of the lattice nodes and show that the ëmostly populatedí designs are not necessarily optimal in the case of incomplete observations under both progressive and instructive design scenarios. Some of the obtained results may generalise to other lattices.

Quantitative proteomics: data analysis and statistical challenges
10:10 Thu 30 Jun, 2011 :: 7.15 Ingkarni Wardli :: Dr Peter Hoffmann :: Adelaide Proteomics Centre

Introduction to functional data analysis with applications to proteomics data
11:10 Thu 30 Jun, 2011 :: 7.15 Ingkarni Wardli :: A/Prof Inge Koch :: School of Mathematical Sciences

Object oriented data analysis
14:10 Thu 30 Jun, 2011 :: 7.15 Ingkarni Wardli :: Prof Steve Marron :: The University of North Carolina at Chapel Hill

Object Oriented Data Analysis is the statistical analysis of populations of complex objects. In the special case of Functional Data Analysis, these data objects are curves, where standard Euclidean approaches, such as principal components analysis, have been very successful. Recent developments in medical image analysis motivate the statistical analysis of populations of more complex data objects which are elements of mildly non-Euclidean spaces, such as Lie Groups and Symmetric Spaces, or of strongly non-Euclidean spaces, such as spaces of tree-structured data objects. These new contexts for Object Oriented Data Analysis create several potentially large new interfaces between mathematics and statistics. Even in situations where Euclidean analysis makes sense, there are statistical challenges because of the High Dimension Low Sample Size problem, which motivates a new type of asymptotics leading to non-standard mathematical statistics.
Object oriented data analysis of tree-structured data objects
15:10 Fri 1 Jul, 2011 :: 7.15 Ingkarni Wardli :: Prof Steve Marron :: The University of North Carolina at Chapel Hill

The field of Object Oriented Data Analysis has made a lot of progress on the statistical analysis of the variation in populations of complex objects. A particularly challenging example of this type is populations of tree-structured objects. Deep challenges arise, which involve a marriage of ideas from statistics, geometry, and numerical analysis, because the space of trees is strongly non-Euclidean in nature. These challenges, together with three completely different approaches to addressing them, are illustrated using a real data example, where each data point is the tree of blood arteries in one person's brain.
Modelling computer network topologies through optimisation
12:10 Mon 1 Aug, 2011 :: 5.57 Ingkarni Wardli :: Mr Rhys Bowden :: University of Adelaide

The core of the Internet is made up of many different computers (called routers) in many different interconnected networks, owned and operated by many different organisations. A popular and important field of study in the past has been "network topology": for instance, understanding which routers are connected to which other routers, or which networks are connected to which other networks; that is, studying and modelling the connection structure of the Internet. Previous study in this area has been plagued by unreliable or flawed experimental data and debate over appropriate models to use. The Internet Topology Zoo is a new source of network data created from the information that network operators make public. In order to better understand this body of network information we would like the ability to randomly generate network topologies resembling those in the zoo. Leveraging previous wisdom on networks produced as a result of optimisation processes, we propose a simple objective function based on possible economic constraints. By changing the relative costs in the objective function we can change the form of the resulting networks, and we compare these optimised networks to a variety of networks found in the Internet Topology Zoo.
Spectra alignment/matching for the classification of cancer and control patients
12:10 Mon 8 Aug, 2011 :: 5.57 Ingkarni Wardli :: Mr Tyman Stanford :: University of Adelaide

Proteomic time-of-flight mass spectrometry produces a spectrum based on the peptides (chains of amino acids) in each patient’s serum sample. The spectra contain data points for an x-axis (peptide weight) and a y-axis (peptide frequency/count/intensity). It is our end goal to differentiate cancer (and sub-types) and control patients using these spectra. Before we can do this, peaks in these data must be found and common peptides to different spectra must be found. The data are noisy because of biotechnological variation and calibration error; data points for different peptide weights may in fact be same peptide. An algorithm needs to be employed to find common peptides between spectra, as performing alignment ‘by hand’ is almost infeasible. We borrow methods suggested in the literature by metabolomic gas chromatography-mass spectrometry and extend the methods for our purposes. In this talk I will go over the basic tenets of what we hope to achieve and the process towards this.
Dealing with the GC-content bias in second-generation DNA sequence data
15:10 Fri 12 Aug, 2011 :: Horace Lamb :: Prof Terry Speed :: Walter and Eliza Hall Institute

The field of genomics is currently dealing with an explosion of data from so-called second-generation DNA sequencing machines. This is creating many challenges and opportunities for statisticians interested in the area. In this talk I will outline the technology and the data flood, and move on to one particular problem where the technology is used: copy-number analysis. There we find a novel bias, which, if not dealt with properly, can dominate the signal of interest. I will describe how we think about and summarize it, and go on to identify a plausible source of this bias, leading up to a way of removing it. Our approach makes use of the total variation metric on discrete measures, but apart from this, is largely descriptive.
Laplace's equation on multiply-connected domains
12:10 Mon 29 Aug, 2011 :: 5.57 Ingkarni Wardli :: Mr Hayden Tronnolone :: University of Adelaide

Various physical processes take place on multiply-connected domains (domains with some number of 'holes'), such as the stirring of a fluid with paddles or the extrusion of material from a die. These systems may be described by partial differential equations (PDEs). However, standard numerical methods for solving PDEs are not well-suited to such examples: finite difference methods are difficult to implement on multiply-connected domains, especially when the boundaries are irregular or moving, while finite element methods are computationally expensive. In this talk I will describe a fast and accurate numerical method for solving certain PDEs on two-dimensional multiply-connected domains, considering Laplace's equation as an example. This method takes advantage of complex variable techniques which allow the solution to be found with spectral accuracy provided the boundary data is smooth. Other advantages over traditional numerical methods will also be discussed.
Alignment of time course gene expression data sets using Hidden Markov Models
12:10 Mon 5 Sep, 2011 :: 5.57 Ingkarni Wardli :: Mr Sean Robinson :: University of Adelaide

Time course microarray experiments allow for insight into biological processes by measuring gene expression over a time period of interest. This project is concerned with time course data from a microarray experiment conducted on a particular variety of grapevine over the development of the grape berries at a number of different vineyards in South Australia. The aim of the project is to construct a methodology for combining the data from the different vineyards in order to obtain more precise estimates of the underlying behaviour of the genes over the development process. A major issue in doing so is that the rate of development of the grape berries is different at different vineyards. Hidden Markov models (HMMs) are a well established methodology for modelling time series data in a number of domains and have been previously used for gene expression analysis. Modelling the grapevine data presents a unique modelling issue, namely the alignment of the expression profiles needed to combine the data from different vineyards. In this seminar, I will describe our problem, review HMMs, present an extension to HMMs and show some preliminary results modelling the grapevine data.
Statistical analysis of metagenomic data from the microbial community involved in industrial bioleaching
12:10 Mon 19 Sep, 2011 :: 5.57 Ingkarni Wardli :: Ms Susana Soto-Rojo :: University of Adelaide

In the last two decades heap bioleaching has become established as a successful commercial option for recovering copper from low-grade secondary sulfide ores. Genetics-based approaches have recently been employed in the task of characterizing mineral processing bacteria. Data analysis is a key issue and thus the implementation of adequate mathematical and statistical tools is of fundamental importance to draw reliable conclusions. In this talk I will give a recount of two specific problems that we have been working on. The first regarding experimental design and the latter on modeling composition and activity of the microbial consortium.
Can statisticians do better than random guessing?
12:10 Tue 20 Sep, 2011 :: Napier 210 :: A/Prof Inge Koch :: School of Mathematical Sciences

In the finance or credit risk area, a bank may want to assess whether a client is going to default, or be able to meet the repayments. In the assessment of benign or malignant tumours, a correct diagnosis is required. In these and similar examples, we make decisions based on data. The classical t-tests provide a tool for making such decisions. However, many modern data sets have more variables than observations, and the classical rules may not be any better than random guessing. We consider Fisher's rule for classifying data into two groups, and show that it can break down for high-dimensional data. We then look at ways of overcoming some of the weaknesses of the classical rules, and I show how these "post-modern" rules perform in practice.
Estimating transmission parameters for the swine flu pandemic
15:10 Fri 23 Sep, 2011 :: 7.15 Ingkarni Wardli :: Dr Kathryn Glass :: Australian National University

Following the onset of a new strain of influenza with pandemic potential, policy makers need specific advice on how fast the disease is spreading, who is at risk, and what interventions are appropriate for slowing transmission. Mathematical models play a key role in comparing interventions and identifying the best response, but models are only as good as the data that inform them. In the early stages of the 2009 swine flu outbreak, many researchers estimated transmission parameters - particularly the reproduction number - from outbreak data. These estimates varied, and were often biased by data collection methods, misclassification of imported cases or as a result of early stochasticity in case numbers. I will discuss a number of the pitfalls in achieving good quality parameter estimates from early outbreak data, and outline how best to avoid them. One of the early indications from swine flu data was that children were disproportionately responsible for disease spread. I will introduce a new method for estimating age-specific transmission parameters from both outbreak and seroprevalence data. This approach allows us to take account of empirical data on human contact patterns, and highlights the need to allow for asymmetric mixing matrices in modelling disease transmission between age groups. Applied to swine flu data from a number of different countries, it presents a consistent picture of higher transmission from children.
Statistical analysis of school-based student performance data
12:10 Mon 10 Oct, 2011 :: 5.57 Ingkarni Wardli :: Ms Jessica Tan :: University of Adelaide

Join me in the journey of being a statistician for 15 minutes of your day (if you are not already one) and experience the task of data cleaning without having to get your own hands dirty. Most of you may have sat the Basic Skills Tests when at school or know someone who currently has to do the NAPLAN (National Assessment Program - Literacy and Numeracy) tests. Tests like these assess student progress and can be used to accurately measure school performance. In trying to answer the research question: "what conclusions about student progress and school performance can be drawn from NAPLAN data or data of a similar nature, using mathematical and statistical modelling and analysis techniques?", I have uncovered some interesting results about the data in my initial data analysis which I shall explain in this talk.
Statistical modelling for some problems in bioinformatics
11:10 Fri 14 Oct, 2011 :: B.17 Ingkarni Wardli :: Professor Geoff McLachlan :: The University of Queensland

In this talk we consider some statistical analyses of data arising in bioinformatics. The problems include the detection of differential expression in microarray gene-expression data, the clustering of time-course gene-expression data and, lastly, the analysis of modern-day cytometric data. Extensions are considered to the procedures proposed for these three problems in McLachlan et al. (Bioinformatics, 2006), Ng et al. (Bioinformatics, 2006), and Pyne et al. (PNAS, 2009), respectively. The latter references are available at
On the role of mixture distributions in the modelling of heterogeneous data
15:10 Fri 14 Oct, 2011 :: 7.15 Ingkarni Wardli :: Prof Geoff McLachlan :: University of Queensland

We consider the role that finite mixture distributions have played in the modelling of heterogeneous data, in particular for clustering continuous data via mixtures of normal distributions. A very brief history is given starting with the seminal papers by Day and Wolfe in the sixties before the appearance of the EM algorithm. It was the publication in 1977 of the latter algorithm by Dempster, Laird, and Rubin that greatly stimulated interest in the use of finite mixture distributions to model heterogeneous data. This is because the fitting of mixture models by maximum likelihood is a classic example of a problem that is simplified considerably by the EM's conceptual unification of maximum likelihood estimation from data that can be viewed as being incomplete. In recent times there has been a proliferation of applications in which the number of experimental units n is comparatively small but the underlying dimension p is extremely large as, for example, in microarray-based genomics and other high-throughput experimental approaches. Hence there has been increasing attention given not only in bioinformatics and machine learning, but also in mainstream statistics, to the analysis of complex data in this situation where n is small relative to p. The latter part of the talk shall focus on the modelling of such high-dimensional data using mixture distributions.
Likelihood-free Bayesian inference: modelling drug resistance in Mycobacterium tuberculosis
15:10 Fri 21 Oct, 2011 :: 7.15 Ingkarni Wardli :: Dr Scott Sisson :: University of New South Wales

A central pillar of Bayesian statistical inference is Monte Carlo integration, which is based on obtaining random samples from the posterior distribution. There are a number of standard ways to obtain these samples, provided that the likelihood function can be numerically evaluated. In the last 10 years, there has been a substantial push to develop methods that permit Bayesian inference in the presence of computationally intractable likelihood functions. These methods, termed ``likelihood-free'' or approximate Bayesian computation (ABC), are now being applied extensively across many disciplines. In this talk, I'll present a brief, non-technical overview of the ideas behind likelihood-free methods. I'll motivate and illustrate these ideas through an analysis of the epidemiological fitness cost of drug resistance in Mycobacterium tuberculosis.
Metric geometry in data analysis
13:10 Fri 11 Nov, 2011 :: B.19 Ingkarni Wardli :: Dr Facundo Memoli :: University of Adelaide

The problem of object matching under invariances can be studied using certain tools from metric geometry. The central idea is to regard objects as metric spaces (or metric measure spaces). The type of invariance that one wishes to have in the matching is encoded by the choice of the metrics with which one endows the objects. The standard example is matching objects in Euclidean space under rigid isometries: in this situation one would endow the objects with the Euclidean metric. More general scenarios are possible in which the desired invariance cannot be reflected by the preservation of an ambient space metric. Several ideas due to M. Gromov are useful for approaching this problem. The Gromov-Hausdorff distance is a natural candidate for doing this. However, this metric leads to very hard combinatorial optimization problems and it is difficult to relate to previously reported practical approaches to the problem of object matching. I will discuss different variations of these ideas, and in particular will show a construction of an L^p version of the Gromov-Hausdorff metric, called the Gromov-Wassestein distance, which is based on mass transportation ideas. This new metric directly leads to quadratic optimization problems on continuous variables with linear constraints. As a consequence of establishing several lower bounds, it turns out that several invariants of metric measure spaces turn out to be quantitatively stable in the GW sense. These invariants provide practical tools for the discrimination of shapes and connect the GW ideas to a number of pre-existing approaches.
Fluid flows in microstructured optical fibre fabrication
15:10 Fri 25 Nov, 2011 :: B.17 Ingkarni Wardli :: Mr Hayden Tronnolone :: University of Adelaide

Optical fibres are used extensively in modern telecommunications as they allow the transmission of information at high speeds. Microstructured optical fibres are a relatively new fibre design in which a waveguide for light is created by a series of air channels running along the length of the material. The flexibility of this design allows optical fibres to be created with adaptable (and previously unrealised) optical properties. However, the fluid flows that arise during fabrication can greatly distort the geometry, which can reduce the effectiveness of a fibre or render it useless. I will present an overview of the manufacturing process and highlight the difficulties. I will then focus on surface-tension driven deformation of the macroscopic version of the fibre extruded from a reservoir of molten glass, occurring during fabrication, which will be treated as a two-dimensional Stokes flow problem. I will outline two different complex-variable numerical techniques for solving this problem along with comparisons of the results, both to other models and to experimental data.
Spatial-point data sets and the Polya distribution
15:10 Fri 27 Apr, 2012 :: B.21 Ingkarni Wardli :: Dr Benjamin Binder :: The University of Adelaide

Spatial-point data sets, generated from a wide range of physical systems and mathematical models, can be analyzed by counting the number of objects in equally sized bins. We find that the bin counts are related to the Polya distribution. New indexes are developed which quantify whether or not a spatial data set is at its most evenly distributed state. Using three case studies (Lagrangian fluid particles in chaotic laminar flows, cellular automata agents in discrete models, and biological cells within colonies), we calculate the indexes and predict the spatial-state of the system.
Change detection in rainfall times series for Perth, Western Australia
12:10 Mon 14 May, 2012 :: 5.57 Ingkarni Wardli :: Ms Farah Mohd Isa :: University of Adelaide

There have been numerous reports that the rainfall in south Western Australia, particularly around Perth has observed a step change decrease, which is typically attributed to climate change. Four statistical tests are used to assess the empirical evidence for this claim on time series from five meteorological stations, all of which exceed 50 years. The tests used in this study are: the CUSUM; Bayesian Change Point analysis; consecutive t-test and the Hotelling's T^2-statistic. Results from multivariate Hotelling's T^2 analysis are compared with those from the three univariate analyses. The issue of multiple comparisons is discussed. A summary of the empirical evidence for the claimed step change in Perth area is given.
Evaluation and comparison of the performance of Australian and New Zealand intensive care units
14:10 Fri 25 May, 2012 :: 7.15 Ingkarni Wardli :: Dr Jessica Kasza :: The University of Adelaide

Recently, the Australian Government has emphasised the need for monitoring and comparing the performance of Australian hospitals. Evaluating the performance of intensive care units (ICUs) is of particular importance, given that the most severe cases are treated in these units. Indeed, ICU performance can be thought of as a proxy for the overall performance of a hospital. We compare the performance of the ICUs contributing to the Australian and New Zealand Intensive Care Society (ANZICS) Adult Patient Database, the largest of its kind in the world, and identify those ICUs with unusual performance. It is well-known that there are many statistical issues that must be accounted for in the evaluation of healthcare provider performance. Indicators of performance must be appropriately selected and estimated, investigators must adequately adjust for casemix, statistical variation must be fully accounted for, and adjustment for multiple comparisons must be made. Our basis for dealing with these issues is the estimation of a hierarchical logistic model for the in-hospital death of each patient, with patients clustered within ICUs. Both patient- and ICU-level covariates are adjusted for, with a random intercept and random coefficient for the APACHE III severity score. Given that we expect most ICUs to have similar performance after adjustment for these covariates, we follow Ohlssen et al., JRSS A (2007), and estimate a null model that we expect the majority of ICUs to follow. This methodology allows us to rigorously account for the aforementioned statistical issues, and accurately identify those ICUs contributing to the ANZICS database that have comparatively unusual performance. This is joint work with Prof. Patty Solomon and Assoc. Prof. John Moran.
A brief introduction to Support Vector Machines
12:30 Mon 4 Jun, 2012 :: 5.57 Ingkarni Wardli :: Mr Tyman Stanford :: University of Adelaide

Support Vector Machines (SVMs) are used in a variety of contexts for a range of purposes including regression, feature selection and classification. To convey the basic principles of SVMs, this presentation will focus on the application of SVMs to classification. Classification (or discrimination), in a statistical sense, is supervised model creation for the purpose of assigning future observations to a group or class. An example might be determining healthy or diseased labels to patients from p characteristics obtained from a blood sample. While SVMs are widely used, they are most successful when the data have one or more of the following properties: The data are not consistent with a standard probability distribution. The number of observations, n, used to create the model is less than the number of predictive features, p. (The so-called small-n, big-p problem.) The decision boundary between the classes is likely to be non-linear in the feature space. I will present a short overview of how SVMs are constructed, keeping in mind their purpose. As this presentation is part of a double post-grad seminar, I will keep it to a maximum of 15 minutes.
Comparison of spectral and wavelet estimators of transfer function for linear systems
12:10 Mon 18 Jun, 2012 :: B.21 Ingkarni Wardli :: Mr Mohd Aftar Abu Bakar :: University of Adelaide

We compare spectral and wavelet estimators of the response amplitude operator (RAO) of a linear system, with various input signals and added noise scenarios. The comparison is based on a model of a heaving buoy wave energy device (HBWED), which oscillates vertically as a single mode of vibration linear system. HBWEDs and other single degree of freedom wave energy devices such as the oscillating wave surge convertors (OWSC) are currently deployed in the ocean, making single degree of freedom wave energy devices important systems to both model and analyse in some detail. However, the results of the comparison relate to any linear system. It was found that the wavelet estimator of the RAO offers no advantage over the spectral estimators if both input and response time series data are noise free and long time series are available. If there is noise on only the response time series, only the wavelet estimator or the spectral estimator that uses the cross-spectrum of the input and response signals in the numerator should be used. For the case of noise on only the input time series, only the spectral estimator that uses the cross-spectrum in the denominator gives a sensible estimate of the RAO. If both the input and response signals are corrupted with noise, a modification to both the input and response spectrum estimates can provide a good estimator of the RAO. However, a combination of wavelet and spectral methods is introduced as an alternative RAO estimator. The conclusions apply for autoregressive emulators of sea surface elevation, impulse, and pseudorandom binary sequences (PRBS) inputs. However, a wavelet estimator is needed in the special case of a chirp input where the signal has a continuously varying frequency.
AFL Tipping isn't all about numbers and stats...or is it.....
12:10 Mon 6 Aug, 2012 :: B.21 Ingkarni Wardli :: Ms Jessica Tan :: University of Adelaide

The result of an AFL game is always unpredictable - we all know that. Hence why we discuss the weekend's upsets and the local tipping competition as part of the "water-cooler and weekend" conversation on a Monday morning. Different people use various weird and wonderful techniques or criteria to predict the winning team. With readily available data, I will investigate and compare various strategies and define a measure of the hardness of a round (full acknowledgements will be made in my presentation). Hopefully this will help me for next year's tipping competition...
Star Wars Vs The Lord of the Rings: A Survival Analysis
12:10 Mon 27 Aug, 2012 :: B.21 Ingkarni Wardli :: Mr Christopher Davies :: University of Adelaide

Ever wondered whether you are more likely to die in the Galactic Empire or Middle Earth? Well this is the postgraduate seminar for you! I'll be attempting to answer this question using survival analysis, the statistical method of choice for investigating time to event data. Spoiler Warning: This talk will contain references to the deaths of characters in the above movie sagas.
Principal Component Analysis (PCA)
12:30 Mon 3 Sep, 2012 :: B.21 Ingkarni Wardli :: Mr Lyron Winderbaum :: University of Adelaide

Principal Component Analysis (PCA) has become something of a buzzword recently in a number of disciplines including the gene expression and facial recognition. It is a classical, and fundamentally simple, concept that has been around since the early 1900's, its recent popularity largely due to the need for dimension reduction techniques in analyzing high dimensional data that has become more common in the last decade, and the availability of computing power to implement this. I will explain the concept, prove a result, and give a couple of examples. The talk should be accessible to all disciplines as it (should?) only assume first year linear algebra, the concept of a random variable, and covariance.
Optimal Experimental Design: What Is It?
12:10 Mon 15 Oct, 2012 :: B.21 Ingkarni Wardli :: Mr David Price :: University of Adelaide

Optimal designs are a class of experimental designs that are optimal with respect to some statistical criterion. That answers the question, right? But what do I mean by 'optimal', and which 'statistical criterion' should you use? In this talk I will answer all these questions, and provide an overly simple example to demonstrate how optimal design works. I will then give a brief explanation of how I will use this methodology, and what chickens have to do with it.
Epidemic models in socially structured populations: when are simple models too simple?
14:00 Thu 25 Oct, 2012 :: 5.56 Ingkarni Wardli :: Dr Lorenzo Pellis :: The University of Warwick

Both age and household structure are recognised as important heterogeneities affecting epidemic spread of infectious pathogens, and many models exist nowadays that include either or both forms of heterogeneity. However, different models may fit aggregate epidemic data equally well and nevertheless lead to different predictions of public health interest. I will here present an overview of stochastic epidemic models with increasing complexity in their social structure, focusing in particular on households models. For these models, I will present recent results about the definition and computation of the basic reproduction number R0 and its relationship with other threshold parameters. Finally, I will use these results to compare models with no, either or both age and household structure, with the aim of quantifying the conditions under which each form of heterogeneity is relevant and therefore providing some criteria that can be used to guide model design for real-time predictions.
Epidemic models in socially structured populations: when are simple models too simple?
14:00 Thu 25 Oct, 2012 :: 5.56 Ingkarni Wardli :: Dr Lorenzo Pellis :: The University of Warwick

Both age and household structure are recognised as important heterogeneities affecting epidemic spread of infectious pathogens, and many models exist nowadays that include either or both forms of heterogeneity. However, different models may fit aggregate epidemic data equally well and nevertheless lead to different predictions of public health interest. I will here present an overview of stochastic epidemic models with increasing complexity in their social structure, focusing in particular on households models. For these models, I will present recent results about the definition and computation of the basic reproduction number R0 and its relationship with other threshold parameters. Finally, I will use these results to compare models with no, either or both age and household structure, with the aim of quantifying the conditions under which each form of heterogeneity is relevant and therefore providing some criteria that can be used to guide model design for real-time predictions.
Spatiotemporally Autoregressive Partially Linear Models with Application to the Housing Price Indexes of the United States
12:10 Mon 12 Nov, 2012 :: B.21 Ingkarni Wardli :: Ms Dawlah Alsulami :: University of Adelaide

We propose a Spatiotemporal Autoregressive Partially Linear Regression ( STARPLR) model for data observed irregularly over space and regularly in time. The model is capable of catching possible non linearity and nonstationarity in space by coefficients to depend on locations. We suggest two-step procedure to estimate both the coefficients and the unknown function, which is readily implemented and can be computed even for large spatio-temoral data sets. As an illustration, we apply our model to analyze the 51 States' House Price Indexes (HPIs) in USA.
12:10 Mon 13 May, 2013 :: B.19 Ingkarni Wardli :: Lyron Winderbaum :: University of Adelaide

Colour is a powerful tool in presenting data, but it can be tricky to choose just the right colours to represent your data honestly - do the colours used in your heatmap overemphasise the differences between particular values over others? does your choice of colours overemphasize one when they should be represented as equal? etc. All these questions are fundamentally based in how we perceive colour. There has been alot of research into how we perceive colour in the past century, and some interesting results. I will explain how a `standard observer' was found empirically and used to develop an absolute reference standard for colour in 1931. How although the common Red-Green-Blue representation of colour is useful and intuitive, distances between colours in this space do not reflect our perception of difference between colours and how alternative, perceptually focused colourspaces where introduced in 1976. I will go on to explain how these results can be used to provide simple mechanisms by which to choose colours that satisfy particular properties such as being equally different from each other, or being linearly more different in sequence, or maintaining such properties when transferred to greyscale, or for a colourblind person.
Progress in the prediction of buoyancy-affected turbulence
15:10 Fri 17 May, 2013 :: B.18 Ingkarni Wardli :: Dr Daniel Chung :: University of Melbourne

Buoyancy-affected turbulence represents a significant challenge to our understanding, yet it dominates many important flows that occur in the ocean and atmosphere. The presentation will highlight some recent progress in the characterisation, modelling and prediction of buoyancy-affected turbulence using direct and large-eddy simulations, along with implications for the characterisation of mixing in the ocean and the low-cloud feedback in the atmosphere. Specifically, direct numerical simulation data of stratified turbulence will be employed to highlight the importance of boundaries in the characterisation of turbulent mixing in the ocean. Then, a subgrid-scale model that captures the anisotropic character of stratified mixing will be developed for large-eddy simulation of buoyancy-affected turbulence. Finally, the subgrid-scale model is utilised to perform a systematic large-eddy simulation investigation of the archetypal low-cloud regimes, from which the link between the lower-tropospheric stability criterion and the cloud fraction interpreted.
14:10 Mon 20 May, 2013 :: 7.15 Ingkarni Wardli :: A/Prof. Robb Muirhead :: School of Mathematical Sciences

This is a lighthearted (some would say content-free) talk about coincidences, those surprising concurrences of events that are often perceived as meaningfully related, with no apparent causal connection. Time permitting, it will touch on topics like:
Patterns in data and the dangers of looking for patterns, unspecified ahead of time, and trying to "explain" them; e.g. post hoc subgroup analyses, cancer clusters, conspiracy theories ...
Matching problems; e.g. the birthday problem and extensions
People who win a lottery more than once -- how surprised should we really be? What's the question we should be asking?
When you become familiar with a new word, and see it again soon afterwards, how surprised should you be?
Caution: This is a shortened version of a talk that was originally prepared for a group of non-mathematicians and non-statisticians, so it's mostly non-technical. It probably does not contain anything you don't already know -- it will be an amazing coincidence if it does!
Fire-Atmosphere Models
12:10 Mon 29 Jul, 2013 :: B.19 Ingkarni Wardli :: Mika Peace :: University of Adelaide

Fire behaviour models are increasingly being used to assist in planning and operational decisions for bush fires and fuel reduction burns. Rate of spread (ROS) of the fire front is a key output of such models. The ROS value is typically calculated from a formula which has been derived from empirical data, using very simple meteorological inputs. We have used a coupled fire-atmosphere model to simulate real bushfire events. The results show that complex interactions between a fire and the atmosphere can have a significant influence on fire spread, thus highlighting the limitations of a model that uses simple meteorological inputs.
Privacy-Preserving Computation: Not just for secretive millionaires*
12:10 Mon 19 Aug, 2013 :: B.19 Ingkarni Wardli :: Wilko Henecka :: University of Adelaide

PPC enables parties to share information while preserving their data privacy. I will introduce the concept, show a common ingredient and illustrate its use in an example. *See Yao's Millionaires Problem.
Medical Decision Analysis
12:10 Mon 2 Sep, 2013 :: B.19 Ingkarni Wardli :: Eka Baker :: University of Adelaide

Doctors make life changing decisions every day based on clinical trial data. However, this data is often obtained from studies on healthy individuals or on patients with only the disease that a treatment is targeting. Outside of these studies, many patients will have other conditions that may affect the predicted benefit of receiving a certain treatment. I will talk about what clinical trials are, how to measure the benefit of treatments, and how having multiple conditions (comorbidities) will affect the benefit of treatments.
Random Wanderings on a Sphere...
11:10 Tue 17 Sep, 2013 :: Ingkarni Wardli Level 5 Room 5.57 :: A/Prof Robb Muirhead :: University of Adelaide

This will be a short talk (about 30 minutes) about the following problem. (Even if I tell you all I know about it, it won't take very long!) Imagine the earth is a unit sphere in 3-dimensions. You're standing at a fixed point, which we may as well take to be the North Pole. Suddenly you get moved to another point on the sphere by a random (uniform) orthogonal transormation. Where are you now? You're not at a point which is uniformly distributed on the surface of the sphere (so, since most of the earth's surface is water, you're probably drowning). But then you get moved again by the same orthogonal transformation. Where are you now? And what happens to your location it this happens repeatedly? I have only a partial answwer to this question, for 2 and 3 transformations. (There's nothing special about 3 dimensions here--results hold for all dimensions which are at least 3.) I don't know of any statistical application for this! This work was motivated by a talk I heard, given by Tom Marzetta (Bell Labs) at a conference at MIT. Although I know virtually nothing about signal processing, I gather Marzetta was trying to encode signals using powers of ranfom orthogonal matrices. After carrying out simulations, I think he decided it wasn't a good idea.
A mathematician walks into a bar.....
12:10 Mon 30 Sep, 2013 :: B.19 Ingkarni Wardli :: Ben Rohrlach :: University of Adelaide

Man is by his very nature, inquisitive. Our need to know has been the reason we've always evolved as a species. From discovering fire, to exploring the galaxy with those Vulcan guys in that documentary I saw, knowing the answer to a question has always driven human kind. Clearly then, I had to ask something. Something that by it's very nature is a thing. A thing that, specifically, I had to know. That thing that I had to know was this: Do mathematicians get stupider the more they drink? Is this effect more pronounced than for normal (Gaussian) people? At the quiz night that AUMS just ran I managed to talk two tables into letting me record some key drinking statistics. I'll be using those statistics to introduce some different statistical tests commonly seen in most analyses you'll see in other fields. Oh, and I'll answer those questions I mentioned earlier too, hopefully. Let's do this thing.
Classification Using Censored Functional Data
15:10 Fri 18 Oct, 2013 :: B.18 Ingkarni Wardli :: A/Prof Aurore Delaigle :: University of Melbourne

We consider classification of functional data. This problem has received a lot of attention in the literature in the case where the curves are all observed on the same interval. A difficulty in applications is that the functional curves can be supported on quite different intervals, in which case standard methods of analysis cannot be used. We are interested in constructing classifiers for curves of this type. More precisely, we consider classification of functions supported on a compact interval, in cases where the training sample consists of functions observed on other intervals, which may differ among the training curves. We propose several methods, depending on whether or not the observable intervals overlap by a significant amount. In the case where these intervals differ a lot, our procedure involves extending the curves outside the interval where they were observed. We suggest a new nonparametric approach for doing this. We also introduce flexible ways of combining potential differences in shapes of the curves from different populations, and potential differences between the endpoints of the intervals where the curves from each population are observed.
All at sea with spectral analysis
11:10 Tue 19 Nov, 2013 :: Ingkarni Wardli Level 5 Room 5.56 :: A/Prof Andrew Metcalfe :: The University of Adelaide

The steady state response of a single degree of freedom damped linear stystem to a sinusoidal input is a sinusoidal function at the same frequency, but generally with a different amplitude and a phase shift. The analogous result for a random stationary input can be described in terms of input and response spectra and a transfer function description of the linear system. The practical use of this result is that the parameters of a linear system can be estimated from the input and response spectra, and the response spectrum can be predicted if the transfer function and input spectrum are known. I shall demonstrate these results with data from a small ship in the North Sea. The results from the sea trial raise the issue of non-linearity, and second order amplitude response functons are obtained using auto-regressive estimators. The possibility of using wavelets rather than spectra is consedred in the context of single degree of freedom linear systems. Everybody welcome to attend. Please not a change of venue - we will be in room 5.56
The structuring role of chaotic stirring on pelagic ecosystems
11:10 Fri 28 Feb, 2014 :: B19 Ingkarni Wardli :: Dr Francesco d'Ovidio :: Universite Pierre et Marie Curie (Paris VI)

The open ocean upper layer is characterized by a complex transport dynamics occuring over different spatiotemporal scales. At the scale of 10-100 km - which covers the so called mesoscale and part of the submesoscale - in situ and remote sensing observations detect strong variability in physical and biogeochemical fields like sea surface temperature, salinity, and chlorophyll concentration. The calculation of Lyapunov exponent and other nonlinear diagnostics applied to the surface currents have allowed to show that an important part of this tracer variability is due to chaotic stirring. Here I will extend this analysis to marine ecosystems. For primary producers, I will show that stable and unstable manifolds of hyperbolic points embedded in the surface velocity field are able to structure the phytoplanktonic community in fluid dynamical niches of dominant types, where competition can locally occur during bloom events. By using data from tagged whales, frigatebirds, and elephant seals, I will also show that chaotic stirring affects the behaviour of higher trophic levels. In perspective, these relations between transport structures and marine ecosystems can be the base for a biodiversity index constructued from satellite information, and therefore able to monitor key aspects of the marine biodiversity and its temporal variability at the global scale.
Viscoelastic fluids: mathematical challenges in determining their relaxation spectra
15:10 Mon 17 Mar, 2014 :: 5.58 Ingkarni Wardli :: Professor Russell Davies :: Cardiff University

Determining the relaxation spectrum of a viscoelastic fluid is a crucial step before a linear or nonlinear constitutive model can be applied. Information about the relaxation spectrum is obtained from simple flow experiments such as creep or oscillatory shear. However, the determination process involves the solution of one or more highly ill-posed inverse problems. The availability of only discrete data, the presence of noise in the data, as well as incomplete data, collectively make the problem very hard to solve. In this talk I will illustrate the mathematical challenges inherent in determining relaxation spectra, and also introduce the method of wavelet regularization which enables the representation of a continuous relaxation spectrum by a set of hyperbolic scaling functions.
Bayesian Indirect Inference
12:10 Mon 14 Apr, 2014 :: B.19 Ingkarni Wardli :: Brock Hermans :: University of Adelaide

Bayesian likelihood-free methods saw the resurgence of Bayesian statistics through the use of computer sampling techniques. Since the resurgence, attention has focused on so-called 'summary statistics', that is, ways of summarising data that allow for accurate inference to be performed. However, it is not uncommon to find data sets in which the summary statistic approach is not sufficient. In this talk, I will be summarising some of the likelihood-free methods most commonly used (don't worry if you've never seen any Bayesian analysis before), as well as looking at Bayesian Indirect Likelihood, a new way of implementing Bayesian analysis which combines new inference methods with some of the older computational algorithms.
Network-based approaches to classification and biomarker identification in metastatic melanoma
15:10 Fri 2 May, 2014 :: B.21 Ingkarni Wardli :: Associate Professor Jean Yee Hwa Yang :: The University of Sydney

Finding prognostic markers has been a central question in much of current research in medicine and biology. In the last decade, approaches to prognostic prediction within a genomics setting are primarily based on changes in individual genes / protein. Very recently, however, network based approaches to prognostic prediction have begun to emerge which utilize interaction information between genes. This is based on the believe that large-scale molecular interaction networks are dynamic in nature and changes in these networks, rather than changes in individual genes/proteins, are often drivers of complex diseases such as cancer. In this talk, I use data from stage III melanoma patients provided by Prof. Mann from Melanoma Institute of Australia to discuss how network information can be utilize in the analysis of gene expression analysis to aid in biological interpretation. Here, we explore a number of novel and previously published network-based prediction methods, which we will then compare to the common single-gene and gene-set methods with the aim of identifying more biologically interpretable biomarkers in the form of networks.
Stochastic models of evolution: Trees and beyond
15:10 Fri 16 May, 2014 :: B.18 Ingkarni Wardli :: Dr Barbara Holland :: The University of Tasmania

In the first part of the talk I will give a general introduction to phylogenetics, and discuss some of the mathematical and statistical issues that arise in trying to infer evolutionary trees. In particular, I will discuss how we model the evolution of DNA along a phylogenetic tree using a continuous time Markov process. In the second part of the talk I will discuss how to express the two-state continuous-time Markov model on phylogenetic trees in such a way that allows its extension to more general models. In this framework we can model convergence of species as well as divergence (speciation). I will discuss the identifiability (or otherwise) of the models that arise in some simple cases. Use of a statistical framework means that we can use established techniques such as the AIC or likelihood ratio tests to decide if datasets show evidence of convergent evolution.
Group meeting
15:10 Fri 6 Jun, 2014 :: 5.58 Ingkarni Wardli :: Meng Cao and Trent Mattner :: University of Adelaide

Meng Cao:: Multiscale modelling couples patches of nonlinear wave-like simulations :: Abstract: The multiscale gap-tooth scheme is built from given microscale simulations of complicated physical processes to empower macroscale simulations. By coupling small patches of simulations over unsimulated physical gaps, large savings in computational time are possible. So far the gap-tooth scheme has been developed for dissipative systems, but wave systems are also of great interest. This article develops the gap-tooth scheme to the case of nonlinear microscale simulations of wave-like systems. Classic macroscale interpolation provides a generic coupling between patches that achieves arbitrarily high order consistency between the multiscale scheme and the underlying microscale dynamics. Eigen-analysis indicates that the resultant gap-tooth scheme empowers feasible computation of large scale simulations of wave-like dynamics with complicated underlying physics. As an pilot study, we implement numerical simulations of dam-breaking waves by the gap-tooth scheme. Comparison between a gap-tooth simulation, a microscale simulation over the whole domain, and some published experimental data on dam breaking, demonstrates that the gap-tooth scheme feasibly computes large scale wave-like dynamics with computational savings. Trent Mattner :: Coupled atmosphere-fire simulations of the Canberra 2003 bushfires using WRF-Sfire :: Abstract: The Canberra fires of January 18, 2003 are notorious for the extreme fire behaviour and fire-atmosphere-topography interactions that occurred, including lee-slope fire channelling, pyrocumulonimbus development and tornado formation. In this talk, I will discuss coupled fire-weather simulations of the Canberra fires using WRF-SFire. In these simulations, a fire-behaviour model is used to dynamically predict the evolution of the fire front according to local atmospheric and topographic conditions, as well as the associated heat and moisture fluxes to the atmosphere. It is found that the predicted fire front and heat flux is not too bad, bearing in mind the complexity of the problem and the severe modelling assumptions made. However, the predicted moisture flux is too low, which has some impact on atmospheric dynamics.
All's Fair in Love and Statistics
12:35 Mon 28 Jul, 2014 :: B.19 Ingkarni Wardli :: Annie Conway :: University of Adelaide

Earlier this year published an article about a "math genius" who found true love after scraping and analysing data from a dating site. In this talk I will be investigating the actual mathematics that he used, in particular methods for clustering categorical data, and whether or not the approach was successful.
Fast computation of eigenvalues and eigenfunctions on bounded plane domains
15:10 Fri 1 Aug, 2014 :: B.18 Ingkarni Wardli :: Professor Andrew Hassell :: Australian National University

I will describe a new method for numerically computing eigenfunctions and eigenvalues on certain plane domains, derived from the so-called "scaling method" of Vergini and Saraceno. It is based on properties of the Dirichlet-to-Neumann map on the domain, which relates a function f on the boundary of the domain to the normal derivative (at the boundary) of the eigenfunction with boundary data f. This is a topic of independent interest in pure mathematics. In my talk I will try to emphasize the inteplay between theory and applications, which is very rich in this situation. This is joint work with numerical analyst Alex Barnett (Dartmouth).
Frequentist vs. Bayesian.
12:10 Mon 18 Aug, 2014 :: B.19 Ingkarni Wardli :: David Price :: University of Adelaide

Abstract: There are two frameworks in which we can do statistical analyses. Choosing one framework over the other can be* as controversial as choosing between team Jacob and... that other guy. In this talk, I aim to give a very very simple explanation of the main difference between frequentist and Bayesian methods. I'll probably flip a coin and show you a video too. * to people who really care.
Testing Statistical Association between Genetic Pathways and Disease Susceptibility
12:10 Mon 1 Sep, 2014 :: B.19 Ingkarni Wardli :: Andy Pfieffer :: University of Adelaide

A major research area is the identification of genetic pathways associated with various diseases. However, a detailed comparison of methods that have been designed to ascertain the association between pathways and diseases has not been performed. I will give the necessary biological background behind Genome-Wide Association Studies (GWAS), and explain the shortfalls in traditional GWAS methodologies. I will then explore various methods that use information about genetic pathways in GWAS, and explain the challenges in comparing these methods.
Inferring absolute population and recruitment of southern rock lobster using only catch and effort data
12:35 Mon 22 Sep, 2014 :: B.19 Ingkarni Wardli :: John Feenstra :: University of Adelaide

Abundance estimates from a data-limited version of catch survey analysis are compared to those from a novel one-parameter deterministic method. Bias of both methods is explored using simulation testing based on a more complex data-rich stock assessment population dynamics fishery operating model, exploring the impact of both varying levels of observation error in data as well as model process error. Recruitment was consistently better estimated than legal size population, the latter most sensitive to increasing observation errors. A hybrid of the data-limited methods is proposed as the most robust approach. A more statistically conventional error-in-variables approach may also be touched upon if enough time.
Optimally Chosen Quadratic Forms for Partitioning Multivariate Data
13:10 Tue 14 Oct, 2014 :: Ingkarni Wardli 715 Conference Room :: Assoc. Prof. Inge Koch :: School of Mathematical Sciences

Quadratic forms are commonly used in linear algebra. For d-dimensional vectors they have a matrix representation, Q(x) = x'Ax, for some symmetric matrix A. In statistics quadratic forms are defined for d-dimensional random vectors, and one of the best-known quadratic forms is the Mahalanobis distance of two random vectors. In this talk we want to partition a quadratic form Q(X) = X'MX, where X is a random vector, and M a symmetric matrix, that is, we want to find a d-dimensional random vector W such that Q(X) = W'W. This problem has many solutions. We are interested in a solution or partition W of X such that pairs of corresponding variables (X_j, W_j) are highly correlated and such that W is simpler than the given X. We will consider some natural candidates for W which turn out to be suboptimal in the sense of the above constraints, and we will then exhibit the optimal solution. Solutions of this type are useful in the well-known T-square statistic. We will see in examples what these solutions look like.
Happiness and social information flow: Computational social science through data.
15:10 Fri 7 Nov, 2014 :: EM G06 (Engineering & Maths Bldg) :: Dr Lewis Mitchell :: University of Adelaide

The recent explosion in big data coming from online social networks has led to an increasing interest in bringing quantitative methods to bear on questions in social science. A recent high-profile example is the study of emotional contagion, which has led to significant challenges and controversy. This talk will focus on two issues related to emotional contagion, namely remote-sensing of population-level wellbeing and the problem of information flow across a social network. We discuss some of the challenges in working with massive online data sets, and present a simple tool for measuring large-scale happiness from such data. By combining over 10 million geolocated messages collected from Twitter with traditional census data we uncover geographies of happiness at the scale of states and cities, and discuss how these patterns may be related to traditional wellbeing measures and public health outcomes. Using tools from information theory we also study information flow between individuals and how this may relate to the concept of predictability for human behaviour.
Happiness and social information flow: Computational social science through data.
15:10 Fri 7 Nov, 2014 :: EM G06 (Engineering & Maths Bldg) :: Dr Lewis Mitchell :: University of Adelaide

The recent explosion in big data coming from online social networks has led to an increasing interest in bringing quantitative methods to bear on questions in social science. A recent high-profile example is the study of emotional contagion, which has led to significant challenges and controversy. This talk will focus on two issues related to emotional contagion, namely remote-sensing of population-level wellbeing and the problem of information flow across a social network. We discuss some of the challenges in working with massive online data sets, and present a simple tool for measuring large-scale happiness from such data. By combining over 10 million geolocated messages collected from Twitter with traditional census data we uncover geographies of happiness at the scale of states and cities, and discuss how these patterns may be related to traditional wellbeing measures and public health outcomes. Using tools from information theory we also study information flow between individuals and how this may relate to the concept of predictability for human behaviour.
Modelling segregation distortion in multi-parent crosses
15:00 Mon 17 Nov, 2014 :: 5.57 Ingkarni Wardli :: Rohan Shah (joint work with B. Emma Huang and Colin R. Cavanagh) :: The University of Queensland

Construction of high-density genetic maps has been made feasible by low-cost high-throughput genotyping technology; however, the process is still complicated by biological, statistical and computational issues. A major challenge is the presence of segregation distortion, which can be caused by selection, difference in fitness, or suppression of recombination due to introgressed segments from other species. Alien introgressions are common in major crop species, where they have often been used to introduce beneficial genes from wild relatives. Segregation distortion causes problems at many stages of the map construction process, including assignment to linkage groups and estimation of recombination fractions. This can result in incorrect ordering and estimation of map distances. While discarding markers will improve the resulting map, it may result in the loss of genomic regions under selection or containing beneficial genes (in the case of introgression). To correct for segregation distortion we model it explicitly in the estimation of recombination fractions. Previously proposed methods introduce additional parameters to model the distortion, with a corresponding increase in computing requirements. This poses difficulties for large, densely genotyped experimental populations. We propose a method imposing minimal additional computational burden which is suitable for high-density map construction in large multi-parent crosses. We demonstrate its use modelling the known Sr36 introgression in wheat for an eight-parent complex cross.
Boundary behaviour of Hitchin and hypo flows with left-invariant initial data
12:10 Fri 27 Feb, 2015 :: Ingkarni Wardli B20 :: Vicente Cortes :: University of Hamburg

Hitchin and hypo flows constitute a system of first order pdes for the construction of Ricci-flat Riemannian mertrics of special holonomy in dimensions 6, 7 and 8. Assuming that the initial geometric structure is left-invariant, we study whether the resulting Ricci-flat manifolds can be extended in a natural way to complete Ricci-flat manifolds. This talk is based on joint work with Florin Belgun, Marco Freibert and Oliver Goertsches, see arXiv:1405.1866 (math.DG).
Multivariate regression in quantitative finance: sparsity, structure, and robustness
15:10 Fri 1 May, 2015 :: Engineering North N132 :: A/Prof Mark Coates :: McGill University

Many quantitative hedge funds around the world strive to predict future equity and futures returns based on many sources of information, including historical returns and economic data. This leads to a multivariate regression problem. Compared to many regression problems, the signal-to-noise ratio is extremely low, and profits can be realized if even a small fraction of the future returns can be accurately predicted. The returns generally have heavy-tailed distributions, further complicating the regression procedure.

In this talk, I will describe how we can impose structure into the regression problem in order to make detection and estimation of the very weak signals feasible. Some of this structure consists of an assumption of sparsity; some of it involves identification of common factors to reduce the dimension of the problem. I will also describe how we can formulate alternative regression problems that lead to more robust solutions that better match the performance metrics of interest in the finance setting.

Medical Decision Making
12:10 Mon 11 May, 2015 :: Napier LG29 :: Eka Baker :: University of Adelaide

Practicing physicians make treatment decisions based on clinical trial data every day. This data is based on trials primarily conducted on healthy volunteers, or on those with only the disease in question. In reality, patients do have existing conditions that can affect the benefits and risks associated with receiving these treatments. In this talk, I will explain how we modified an already existing Markov model to show the progression of treatment of a single condition over time. I will then explain how we adapted this to a different condition, and then created a combined model, which demonstrated how both diseases and treatments progressed on the same patient over their lifetime.
Can mathematics help save energy in computing?
15:10 Fri 22 May, 2015 :: Engineering North N132 :: Prof Markus Hegland :: ANU


Recent development of computational hardware is characterised by two trends: 1. High levels of duplication of computational capabilities in multicore, parallel and GPU processing, and, 2. Substantially faster development of the speed of computational technology compared to communication technology

A consequence of these two trends is that energy costs of modern computing devices from mobile phones to supercomputers are increasingly dominated by communication costs. In order to save energy one would thus need to reduce the amount of data movement within the computer. This can be achieved by recomputing results instead of communicating them. The resulting increase in computational redundancy may also be used to make the computations more robust against hardware faults. Paradoxically, by doing more (computations) we do use less (energy).

This talk will first discuss for a simple example how a mathematical understanding can be applied to improve computational results using extrapolation. Then the problem of energy consumption in computational hardware will be considered. Finally some recent work will be discussed which shows how redundant computing is used to mitigate computational faults and thus to save energy.

Group Meeting
15:10 Fri 29 May, 2015 :: EM 213 :: Dr Judy Bunder :: University of Adelaide

Talk : Patch dynamics for efficient exascale simulations Abstract Massive parallelisation has lead to a dramatic increase in available computational power. However, data transfer speeds have failed to keep pace and are the major limiting factor in the development of exascale computing. New algorithms must be developed which minimise the transfer of data. Patch dynamics is a computational macroscale modelling scheme which provides a coarse macroscale solution of a problem defined on a fine microscale by dividing the domain into many nonoverlapping, coupled patches. Patch dynamics is readily adaptable to massive parallelisation as each processor core can evaluate the dynamics on one, or a few, patches. However, patch coupling conditions interpolate across the unevaluated parts of the domain between patches and require almost continuous data transfer. We propose a modified patch dynamics scheme which minimises data transfer by only reevaluating the patch coupling conditions at `mesoscale' time scales which are significantly larger than the microscale time of the microscale problem. We analyse and quantify the error arising from patch dynamics with mesoscale temporal coupling.
Complex Systems, Chaotic Dynamics and Infectious Diseases
15:10 Fri 5 Jun, 2015 :: Engineering North N132 :: Prof Michael Small :: UWA

In complex systems, the interconnection between the components of the system determine the dynamics. The system is described by a very large and random mathematical graph and it is the topological structure of that graph which is important for understanding of the dynamical behaviour of the system. I will talk about two specific examples - (1) spread of infectious disease (where the connection between the agents in a population, rather than epidemic parameters, determine the endemic state); and, (2) a transformation to represent a dynamical system as a graph (such that the "statistical mechanics" of the graph characterise the dynamics).
A relaxed introduction to resampling-based multiple testing
12:10 Mon 10 Aug, 2015 :: Benham Labs G10 :: Ngoc Vo :: University of Adelaide

P-values and false positives are two phrases that you commonly see thrown around in scientific literature. More often than not, experimenters and analysts are required to quote p-values as a measure of statistical significance — how strongly does your evidence support your hypothesis? But what happens when this "strong evidence" is just a coincidence? What happens if you have lots of theses hypotheses — up to tens of thousands — to test all at the same time and most of your significant findings end up being just "coincidences"?
Be careful not to impute something ridiculous!
12:20 Mon 24 Aug, 2015 :: Benham Labs G10 :: Sarah James :: University of Adelaide

When learning how to make inferences about data, we are given all of the information with no missing values. In reality data sets are often missing data, anywhere from 5% of the data to extreme cases such as 70% of the data. Instead of getting rid of the incomplete cases we can impute predictions for each missing value and make inferences on the resulting data set. But just how sensible are our predictions? In this talk, we will learn how to deal with missing data and talk about why we have to be careful with our predictions.
Modelling Directionality in Stationary Geophysical Time Series
12:10 Mon 12 Oct, 2015 :: Benham Labs G10 :: Mohd Mahayaudin Mansor :: University of Adelaide

Many time series show directionality inasmuch as plots again-st time and against time-to-go are qualitatively different, and there is a range of statistical tests to quantify this effect. There are two strategies for allowing for directionality in time series models. Linear models are reversible if and only if the noise terms are Gaussian, so one strategy is to use linear models with non-Gaussian noise. The alternative is to use non-linear models. We investigate how non-Gaussian noise affects directionality in a first order autoregressive process AR(1) and compare this with a threshold autoregressive model with two thresholds. The findings are used to suggest possible improvements to an AR(9) model, identified by an AIC criterion, for the average yearly sunspot numbers from 1700 to 1900. The improvement is defined in terms of one-step-ahead forecast errors from 1901 to 2014.
Ocean dynamics of Gulf St Vincent: a numerical study
12:10 Mon 2 Nov, 2015 :: Benham Labs G10 :: Henry Ellis :: University of Adelaide

The aim of this research is to determine the physical dynamics of ocean circulation within Gulf St. Vincent, South Australia, and the exchange of momentum, nutrients, heat, salt and other water properties between the gulf and shelf via Investigator Strait and Backstairs Passage. The project aims to achieve this through the creation of high-resolution numerical models, combined with new and historical observations from a moored instrument package, satellite data, and shipboard surveys. The quasi-realistic high-resolution models are forced using boundary conditions generated by existing larger scale ROMS models, which in turn are forced at the boundary by a global model, creating a global to regional to local model network. Climatological forcing is done using European Centres for Medium range Weather Forecasting (ECMWF) data sets and is consistent over the regional and local models. A series of conceptual models are used to investigate the relative importance of separate physical processes in addition to fully forced quasi-realistic models. An outline of the research to be undertaken is given: • Connectivity of Gulf St. Vincent with shelf waters including seasonal variation due to wind and thermoclinic patterns; • The role of winter time cooling and formation of eddies in flushing the gulf; • The formation of a temperature front within the gulf during summer time; and • The connectivity and importance of nutrient rich, cool, water upwelling from the Bonney Coast with the gulf via Backstairs Passage during summer time.
Modelling Coverage in RNA Sequencing
09:00 Mon 9 Nov, 2015 :: Ingkarni Wardli 5.57 :: Arndt von Haeseler :: Max F Perutz Laboratories, University of Vienna

RNA sequencing (RNA-seq) is the method of choice for measuring the expression of RNAs in a cell population. In an RNA-seq experiment, sequencing the full length of larger RNA molecules requires fragmentation into smaller pieces to be compatible with limited read lengths of most deep-sequencing technologies. Unfortunately, the issue of non-uniform coverage across a genomic feature has been a concern in RNA-seq and is attributed to preferences for certain fragments in steps of library preparation and sequencing. However, the disparity between the observed non-uniformity of read coverage in RNA-seq data and the assumption of expected uniformity elicits a query on the read coverage profile one should expect across a transcript, if there are no biases in the sequencing protocol. We propose a simple model of unbiased fragmentation where we find that the expected coverage profile is not uniform and, in fact, depends on the ratio of fragment length to transcript length. To compare the non-uniformity proposed by our model with experimental data, we extended this simple model to incorporate empirical attributes matching that of the sequenced transcript in an RNA-seq experiment. In addition, we imposed an experimentally derived distribution on the frequency at which fragment lengths occur.

We used this model to compare our theoretical prediction with experimental data and with the uniform coverage model. If time permits, we will also discuss a potential application of our model.
Use of epidemic models in optimal decision making
15:00 Thu 19 Nov, 2015 :: Ingkarni Wardli 5.57 :: Tim Kinyanjui :: School of Mathematics, The University of Manchester

Epidemic models have proved useful in a number of applications in epidemiology. In this work, I will present two areas that we have used modelling to make informed decisions. Firstly, we have used an age structured mathematical model to describe the transmission of Respiratory Syncytial Virus in a developed country setting and to explore different vaccination strategies. We found that delayed infant vaccination has significant potential in reducing the number of hospitalisations in the most vulnerable group and that most of the reduction is due to indirect protection. It also suggests that marked public health benefit could be achieved through RSV vaccine delivered to age groups not seen as most at risk of severe disease. The second application is in the optimal design of studies aimed at collection of household-stratified infection data. A design decision involves making a trade-off between the number of households to enrol and the sampling frequency. Two commonly used study designs are considered: cross-sectional and cohort. The search for an optimal design uses Bayesian methods to explore the joint parameter-design space combined with Shannon entropy of the posteriors to estimate the amount of information for each design. We found that for the cross-sectional designs, the amount of information increases with the sampling intensity while the cohort design often exhibits a trade-off between the number of households sampled and the intensity of follow-up. Our results broadly support the choices made in existing data collection studies.
A fixed point theorem on noncompact manifolds
12:10 Fri 12 Feb, 2016 :: Ingkarni Wardli B21 :: Peter Hochs :: University of Adelaide / Radboud University

For an elliptic operator on a compact manifold acted on by a compact Lie group, the Atiyah-Segal-Singer fixed point formula expresses its equivariant index in terms of data on fixed point sets of group elements. This can for example be used to prove Weyl’s character formula. We extend the definition of the equivariant index to noncompact manifolds, and prove a generalisation of the Atiyah-Segal-Singer formula, for group elements with compact fixed point sets. In one example, this leads to a relation with characters of discrete series representations of semisimple Lie groups. (This is joint work with Hang Wang.)
How predictable are you? Information and happiness in social media.
12:10 Mon 21 Mar, 2016 :: Ingkarni Wardli Conference Room 715 :: Dr Lewis Mitchell :: School of Mathematical Sciences

The explosion of ``Big Data'' coming from online social networks and the like has opened up the new field of ``computational social science'', which applies a quantitative lens to problems traditionally in the domain of psychologists, anthropologists and social scientists. What does it mean to be influential? How do ideas propagate amongst populations? Is happiness contagious? For the first time, mathematicians, statisticians, and computer scientists can provide insight into these and other questions. Using data from social networks such as Facebook and Twitter, I will give an overview of recent research trends in computational social science, describe some of my own work using techniques like sentiment analysis and information theory in this realm, and explain how you can get involved with this highly rewarding research field as well.
Connecting within-host and between-host dynamics to understand how pathogens evolve
15:10 Fri 1 Apr, 2016 :: Engineering South S112 :: A/Prof Mark Tanaka :: University of New South Wales

Modern molecular technologies enable a detailed examination of the extent of genetic variation among isolates of bacteria and viruses. Mathematical models can help make inferences about pathogen evolution from such data. Because the evolution of pathogens ultimately occurs within hosts, it is influenced by dynamics within hosts including interactions between pathogens and hosts. Most models of pathogen evolution focus on either the within-host or the between-host level. Here I describe steps towards bridging the two scales. First, I present a model of influenza virus evolution that incorporates within-host dynamics to obtain the between-host rate of molecular substitution as a function of the mutation rate, the within-host reproduction number and other factors. Second, I discuss a model of viral evolution in which some hosts are immunocompromised, thereby extending opportunities for within-host virus evolution which then affects population-level evolution. Finally, I describe a model of Mycobacterium tuberculosis in which multi-drug resistance evolves within hosts and spreads by transmission between hosts.
Mathematical modelling of the immune response to influenza
15:00 Thu 12 May, 2016 :: Ingkarni Wardli B20 :: Ada Yan :: University of Melbourne

The immune response plays an important role in the resolution of primary influenza infection and prevention of subsequent infection in an individual. However, the relative roles of each component of the immune response in clearing infection, and the effects of interaction between components, are not well quantified.

We have constructed a model of the immune response to influenza based on data from viral interference experiments, where ferrets were exposed to two influenza strains within a short time period. The changes in viral kinetics of the second virus due to the first virus depend on the strains used as well as the interval between exposures, enabling inference of the timing of innate and adaptive immune response components and the role of cross-reactivity in resolving infection. Our model provides a mechanistic explanation for the observed variation in viruses' abilities to protect against subsequent infection at short inter-exposure intervals, either by delaying the second infection or inducing stochastic extinction of the second virus. It also explains the decrease in recovery time for the second infection when the two strains elicit cross-reactive cellular adaptive immune responses. To account for inter-subject as well as inter-virus variation, the model is formulated using a hierarchical framework. We will fit the model to experimental data using Markov Chain Monte Carlo methods; quantification of the model will enable a deeper understanding of the effects of potential new treatments.
Time series analysis of paleo-climate proxies (a mathematical perspective)
15:10 Fri 27 May, 2016 :: Engineering South S112 :: Dr Thomas Stemler :: University of Western Australia

In this talk I will present the work my colleagues from the School of Earth and Environment (UWA), the "trans disciplinary methods" group of the Potsdam Institute for Climate Impact Research, Germany, and I did to explain the dynamics of the Australian-South East Asian monsoon system during the last couple of thousand years. From a time series perspective paleo-climate proxy series are more or less the monsters moving under your bed that wake you up in the middle of the night. The data is clearly non-stationary, non-uniform sampled in time and the influence of stochastic forcing or the level of measurement noise are more or less unknown. Given these undesirable properties almost all traditional time series analysis methods fail. I will highlight two methods that allow us to draw useful conclusions from the data sets. The first one uses Gaussian kernel methods to reconstruct climate networks from multiple proxies. The coupling relationships in these networks change over time and therefore can be used to infer which areas of the monsoon system dominate the complex dynamics of the whole system. Secondly I will introduce the transformation cost time series method, which allows us to detect changes in the dynamics of a non-uniform sampled time series. Unlike the frequently used interpolation approach, our new method does not corrupt the data and therefore avoids biases in any subsequence analysis. While I will again focus on paleo-climate proxies, the method can be used in other applied areas, where regular sampling is not possible.
Student Performance Issues in First Year University Calculus
15:10 Fri 10 Jun, 2016 :: Engineering South S112 :: Dr Christine Mangelsdorf :: University of Melbourne

MAST10006 Calculus 2 is the largest subject in the School of Mathematics and Statistics at the University of Melbourne, accounting for about 2200 out of 7400 first year enrolments. Despite excellent and consistent feedback from students on lectures, tutorials and teaching materials, scaled failure rates in Calculus 2 averaged an unacceptably high 29.4% (with raw failure rates reaching 40%) by the end of 2014. To understand the issues behind the poor student performance, we studied the exam papers of students with grades of 40-49% over a three-year period. In this presentation, I will present data on areas of poor performance in the final exam, show samples of student work, and identify possible causes for their errors. Many of the performance issues are found to relate to basic weaknesses in the students’ secondary school mathematical skills that inhibit their ability to successfully complete Calculus 2. Since 2015, we have employed a number of approaches to support students’ learning that significantly improved student performance in assessment. I will discuss the changes made to assessment practices and extra support materials provided online and in person, that are driving the improvement.
Multi-scale modeling in biofluids and particle aggregation
15:10 Fri 17 Jun, 2016 :: B17 Ingkarni Wardli :: Dr Sarthok Sircar :: University of Adelaide

In today's seminar I will give 2 examples in mathematical biology which describes the multi-scale organization at 2 levels: the meso/micro level and the continuum/macro level. I will then detail suitable tools in statistical mechanics to link these different scales. The first problem arises in mathematical physiology: swelling-de-swelling mechanism of mucus, an ionic gel. Mucus is packaged inside cells at high concentration (volume fraction) and when released into the extracellular environment, it expands in volume by two orders of magnitude in a matter of seconds. This rapid expansion is due to the rapid exchange of calcium and sodium that changes the cross-linked structure of the mucus polymers, thereby causing it to swell. Modeling this problem involves a two-phase, polymer/solvent mixture theory (in the continuum level description), together with the chemistry of the polymer, its nearest neighbor interaction and its binding with the dissolved ionic species (in the micro-scale description). The problem is posed as a free-boundary problem, with the boundary conditions derived from a combination of variational principle and perturbation analysis. The dynamics of neutral gels and the equilibrium-states of the ionic gels are analyzed. In the second example, we numerically study the adhesion fragmentation dynamics of rigid, round particles clusters subject to a homogeneous shear flow. In the macro level we describe the dynamics of the number density of these cluster. The description in the micro-scale includes (a) binding/unbinding of the bonds attached on the particle surface, (b) bond torsion, (c) surface potential due to ionic medium, and (d) flow hydrodynamics due to shear flow.
Probabilistic Meshless Methods for Bayesian Inverse Problems
15:10 Fri 5 Aug, 2016 :: Engineering South S112 :: Dr Chris Oates :: University of Technology Sydney

This talk deals with statistical inverse problems that involve partial differential equations (PDEs) with unknown parameters. Our goal is to account, in a rigorous way, for the impact of discretisation error that is introduced at each evaluation of the likelihood due to numerical solution of the PDE. In the context of meshless methods, the proposed, model-based approach to discretisation error encourages statistical inferences to be more conservative in the presence of significant solver error. In addition, (i) a principled learning-theoretic approach to minimise the impact of solver error is developed, and (ii) the challenge of non-linear PDEs is considered. The method is applied to parameter inference problems in which non-negligible solver error must be accounted for in order to draw valid statistical conclusions.
Mathematical modelling of social spreading processes
15:10 Fri 19 Aug, 2016 :: Napier G03 :: Prof Hans De Sterck :: Monash University

Social spreading processes are intriguing manifestations of how humans interact and shape each others' lives. There is great interest in improving our understanding of these processes, and the increasing availability of empirical information in the era of big data and online social networks, combined with mathematical and computational modelling techniques, offer compelling new ways to study these processes. I will first discuss mathematical models for the spread of political revolutions on social networks. The influence of online social networks and social media on the dynamics of the Arab Spring revolutions of 2011 are of particular interest in our work. I will describe a hierarchy of models, starting from agent-based models realized on empirical social networks, and ending up with population-level models that summarize the dynamical behaviour of the spreading process. We seek to understand quantitatively how political revolutions may be facilitated by the modern online social networks of social media. The second part of the talk will describe a population-level model for the social dynamics that cause cigarette smoking to spread in a population. Our model predicts that more individualistic societies will show faster adoption and cessation of smoking. Evidence from a newly composed century-long composite data set on smoking prevalence in 25 countries supports the model, with potential implications for public health interventions around the world. Throughout the talk, I will argue that important aspects of social spreading processes can be revealed and understood via quantitative mathematical and computational models matched to empirical data. This talk describes joint work with John Lang and Danny Abrams.
A principled experimental design approach to big data analysis
15:10 Fri 23 Sep, 2016 :: Napier G03 :: Prof Kerrie Mengersen :: Queensland University of Technology

Big Datasets are endemic, but they are often notoriously difficult to analyse because of their size, complexity, history and quality. The purpose of this paper is to open a discourse on the use of modern experimental design methods to analyse Big Data in order to answer particular questions of interest. By appeal to a range of examples, it is suggested that this perspective on Big Data modelling and analysis has wide generality and advantageous inferential and computational properties. In particular, the principled experimental design approach is shown to provide a flexible framework for analysis that, for certain classes of objectives and utility functions, delivers equivalent answers compared with analyses of the full dataset. It can also provide a formalised method for iterative parameter estimation, model checking, identification of data gaps and evaluation of data quality. Finally it has the potential to add value to other Big Data sampling algorithms, in particular divide-and-conquer strategies, by determining efficient sub-samples.
Measuring and mapping carbon dioxide from remote sensing satellite data
15:10 Fri 21 Oct, 2016 :: Napier G03 :: Prof Noel Cressie :: University of Wollongong

This talk is about environmental statistics for global remote sensing of atmospheric carbon dioxide, a leading greenhouse gas. An important compartment of the carbon cycle is atmospheric carbon dioxide (CO2), where it (and other gases) contribute to climate change through a greenhouse effect. There are a number of CO2 observational programs where measurements are made around the globe at a small number of ground-based locations at somewhat regular time intervals. In contrast, satellite-based programs are spatially global but give up some of the temporal richness. The most recent satellite launched to measure CO2 was NASA's Orbiting Carbon Observatory-2 (OCO-2), whose principal objective is to retrieve a geographical distribution of CO2 sources and sinks. OCO-2's measurement of column-averaged mole fraction, XCO2, is designed to achieve this, through a data-assimilation procedure that is statistical at its basis. Consequently, uncertainty quantification is key, starting with the spectral radiances from an individual sounding to borrowing of strength through spatial-statistical modelling.
Toroidal Soap Bubbles: Constant Mean Curvature Tori in S ^ 3 and R ^3
12:10 Fri 28 Oct, 2016 :: Ingkarni Wardli B18 :: Emma Carberry :: University of Sydney

Constant mean curvature (CMC) tori in S ^ 3, R ^ 3 or H ^ 3 are in bijective correspondence with spectral curve data, consisting of a hyperelliptic curve, a line bundle on this curve and some additional data, which in particular determines the relevant space form. This point of view is particularly relevant for considering moduli-space questions, such as the prevalence of tori amongst CMC planes and whether tori can be deformed. I will address these questions for the spherical and Euclidean cases, using Whitham deformations.
Collective and aneural foraging in biological systems
15:10 Fri 3 Mar, 2017 :: Lower Napier LG14 :: Dr Jerome Buhl and Dr David Vogel :: The University of Adelaide

The field of collective behaviour uses concepts originally adapted from statistical physics to study how complex collective phenomena such as mass movement or swarm intelligence emerge from relatively simple interactions between individuals. Here we will focus on two applications of this framework. First we will have look at new insights into the evolution of sociality brought by combining models of nutrition and social interactions to explore phenomena such as collective foraging decisions, emergence of social organisation and social immunity. Second, we will look at the networks built by slime molds under exploration and foraging context.
Fast approximate inference for arbitrarily large statistical models via message passing
15:10 Fri 17 Mar, 2017 :: Engineering South S111 :: Prof Matt Wand :: University of Technology Sydney

We explain how the notion of message passing can be used to streamline the algebra and computer coding for fast approximate inference in large Bayesian statistical models. In particular, this approach is amenable to handling arbitrarily large models of particular types once a set of primitive operations is established. The approach is founded upon a message passing formulation of mean field variational Bayes that utilizes factor graph representations of statistical models. The notion of factor graph fragments is introduced and is shown to facilitate compartmentalization of the required algebra and coding.
Algae meet the mathematics of multiplicative multifractals
12:10 Tue 2 May, 2017 :: Inkgarni Wardli Conference Room 715 :: Professor Tony Roberts :: School of Mathematical Sciences

There is much that is fragmented and rough in the world around us: clouds and landscapes are examples, as is algae. We need fractal geometry to encompass these. In practice we need multifractals: a composite of interwoven sets, each with their own fractal structure. Multiplicative multifractals have known properties. Optimising a fit between them and the data then empowers us to quantify subtle details of fractal geometry in applications, such as in algae distribution.
Constructing differential string structures
14:10 Wed 7 Jun, 2017 :: EM213 :: David Roberts :: University of Adelaide

String structures on a manifold are analogous to spin structures, except instead of lifting the structure group through the extension Spin(n)\to SO(n) of Lie groups, we need to lift through the extension String(n)\to Spin(n) of Lie *2-groups*. Such a thing exists if the first fractional Pontryagin class (1/2)p_1 vanishes in cohomology. A differential string structure also lifts connection data, but this is rather complicated, involving a number of locally defined differential forms satisfying cocycle-like conditions. This is an expansion of the geometric string structures of Stolz and Redden, which is, for a given connection A, merely a 3-form R on the frame bundle such that dR = tr(F^2) for F the curvature of A; in other words a trivialisation of the de Rham class of (1/2)p_1. I will present work in progress on a framework (and specific results) that allows explicit calculation of the differential string structure for a large class of homogeneous spaces, which also yields formulas for the Stolz-Redden form. I will comment on the application to verifying the refined Stolz conjecture for our particular class of homogeneous spaces. Joint work with Ray Vozzo.
In space there is no-one to hear you scream
12:10 Tue 12 Sep, 2017 :: Inkgarni Wardli 5.57 :: A/Prof Gary Glonek :: School of Mathematical Sciences

Modern data problems often involve data in very high dimensions. For example, gene expression profiles, used to develop cancer screening models, typically have at least 30,000 dimensions. When dealing with such data, it is natural to apply intuition from low dimensional cases. For example, in a sample of normal observations, a typical data point will be near the centre of the distribution with only a small number of points at the edges. In this talk, simple probability theory will be used to show that the geometry of data in high dimensional space is very different from what we can see in one and two-dimensional examples. We will show that the typical data point is at the edge of the distribution, a long way from its centre and even further from any other points.
Understanding burn injuries and first aid treatment using simple mathematical models
15:10 Fri 13 Oct, 2017 :: Ingkarni Wardli B17 :: Prof Mat Simpson :: Queensland University of Technology

Scald burns from accidental exposure to hot liquids are the most common cause of burn injury in children. Over 2000 children are treated for accidental burn injuries in Australia each year. Despite the frequency of these injuries, basic questions about the physics of heat transfer in living tissues remain unanswered. For example, skin thickness varies with age and anatomical location, yet our understanding of how tissue damage from thermal injury is influenced by skin thickness is surprisingly limited. In this presentation we will consider a series of porcine experiments to study heat transfer in living tissues. We consider burning the living tissue, as well as applying various first aid treatment strategies to cool the living tissue after injury. By calibrating solutions of simple mathematical models to match the experimental data we provide insight into how thermal energy propagates through living tissues, as well as exploring different first aid strategies. We conclude by outlining some of our current work that aims to produce more realistic mathematical models.
The Markovian binary tree applied to demography and conservation biology
15:10 Fri 27 Oct, 2017 :: Ingkarni Wardli B17 :: Dr Sophie Hautphenne :: University of Melbourne

Markovian binary trees form a general and tractable class of continuous-time branching processes, which makes them well-suited for real-world applications. Thanks to their appealing probabilistic and computational features, these processes have proven to be an excellent modelling tool for applications in population biology. Typical performance measures of these models include the extinction probability of a population, the distribution of the population size at a given time, the total progeny size until extinction, and the asymptotic population composition. Besides giving an overview of the main performance measures and the techniques involved to compute them, we discuss recently developed statistical methods to estimate the model parameters, depending on the accuracy of the available data. We illustrate our results in human demography and in conservation biology.
Calculating optimal limits for transacting credit card customers
15:10 Fri 2 Mar, 2018 :: Horace Lamb 1022 :: Prof Peter Taylor :: University of Melbourne

Credit card users can roughly be divided into `transactors', who pay off their balance each month, and `revolvers', who maintain an outstanding balance, on which they pay substantial interest. In this talk, we focus on modelling the behaviour of an individual transactor customer. Our motivation is to calculate an optimal credit limit from the bank's point of view. This requires an expression for the expected outstanding balance at the end of a payment period. We establish a connection with the classical newsvendor model. Furthermore, we derive the Laplace transform of the outstanding balance, assuming that purchases are made according to a marked point process and that there is a simplified balance control policy which prevents all purchases in the rest of the payment period when the credit limit is exceeded. We then use the newsvendor model and our modified model to calculate bounds on the optimal credit limit for the more realistic balance control policy that accepts all purchases that do not exceed the limit. We illustrate our analysis using a compound Poisson process example and show that the optimal limit scales with the distribution of the purchasing process, while the probability of exceeding the optimal limit remains constant. Finally, we apply our model to some real credit card purchase data.
Models, machine learning, and robotics: understanding biological networks
15:10 Fri 16 Mar, 2018 :: Horace Lamb 1022 :: Prof Steve Oliver :: University of Cambridge

The availability of complete genome sequences has enabled the construction of computer models of metabolic networks that may be used to predict the impact of genetic mutations on growth and survival. Both logical and constraint-based models of the metabolic network of the model eukaryote, the ale yeast Saccharomyces cerevisiae, have been available for some time and are continually being improved by the research community. While such models are very successful at predicting the impact of deleting single genes, the prediction of the impact of higher order genetic interactions is a greater challenge. Initial studies of limited gene sets provided encouraging results. However, the availability of comprehensive experimental data for the interactions between genes involved in metabolism demonstrated that, while the models were able to predict the general properties of the genetic interaction network, their ability to predict interactions between specific pairs of metabolic genes was poor. I will examine the reasons for this poor performance and demonstrate ways of improving the accuracy of the models by exploiting the techniques of machine learning and robotics. The utility of these metabolic models rests on the firm foundations of genome sequencing data. However, there are two major problems with these kinds of network models - there is no dynamics, and they do not deal with the uncertain and incomplete nature of much biological data. To deal with these problems, we have developed the Flexible Nets (FNs) modelling formalism. FNs were inspired by Petri Nets and can deal with missing or uncertain data, incorporate both dynamics and regulation, and also have the potential for model predictive control of biotechnological processes.

News matching "Statistical data mining"

New Professor of Statistical Bioinformatics
Associate Professor Patty Solomon will take up the Chair of Statistical Bioinformatics within the School of Mathematical Sciences effective from 29th of October, 2007. Posted Mon 29 Oct 07.
ARC Grant successes
The School of Mathematical Sciences has again had outstanding success in the ARC Discovery and Linkage Projects schemes. Congratulations to the following staff for their success in the Discovery Project scheme: Prof Nigel Bean, Dr Josh Ross, Prof Phil Pollett, Prof Peter Taylor, New methods for improving active adaptive management in biological systems, $255,000 over 3 years; Dr Josh Ross, New methods for integrating population structure and stochasticity into models of disease dynamics, $248,000 over three years; A/Prof Matt Roughan, Dr Walter Willinger, Internet traffic-matrix synthesis, $290,000 over three years; Prof Patricia Solomon, A/Prof John Moran, Statistical methods for the analysis of critical care data, with application to the Australian and New Zealand Intensive Care Database, $310,000 over 3 years; Prof Mathai Varghese, Prof Peter Bouwknegt, Supersymmetric quantum field theory, topology and duality, $375,000 over 3 years; Prof Peter Taylor, Prof Nigel Bean, Dr Sophie Hautphenne, Dr Mark Fackrell, Dr Malgorzata O'Reilly, Prof Guy Latouche, Advanced matrix-analytic methods with applications, $600,000 over 3 years. Congratulations to the following staff for their success in the Linkage Project scheme: Prof Simon Beecham, Prof Lee White, A/Prof John Boland, Prof Phil Howlett, Dr Yvonne Stokes, Mr John Wells, Paving the way: an experimental approach to the mathematical modelling and design of permeable pavements, $370,000 over 3 years; Dr Amie Albrecht, Prof Phil Howlett, Dr Andrew Metcalfe, Dr Peter Pudney, Prof Roderick Smith, Saving energy on trains - demonstration, evaluation, integration, $540,000 over 3 years Posted Fri 29 Oct 10.

Publications matching "Statistical data mining"

Adaptively varying-coefficient spatiotemporal models
Lu, Zudi; Steinskog, D; Tjostheim, D; Yao, Q, Journal of the Royal Statistical Society Series B-Statistical Methodology 71 (859–880) 2009
CleanBGP: Verifying the consistency of BGP data
Flavel, Ashley; Maennel, Olaf; Chiera, Belinda; Roughan, Matthew; Bean, Nigel, International Network Management Workshop, Orlando, Florida 19/10/08
Energy balanced data gathering in WSNs with grid topologies
Chen, J; Shen, Hong; Tian, Hui, 7th International Conference on Grid and Cooperative Computing, China 24/10/08
Mining unexpected temporal associations: Applications in detecting adverse drug reactions
Jin, H; Chen, J; He, H; Williams, G; Kelman, C; O'Keefe, Christine, IEEE Transactions on Information Technology in Biomedicine 12 (488–500) 2008
Data fusion without data fusion: localization and tracking without sharing sensitive information
Roughan, Matthew; Arnold, Jonathan, Information, Decision and Control 2007, Adelaide, Australia 12/02/07
Optimal multilinear estimation of a random vector under constraints of casualty and limited memory
Howlett, P; Torokhti, Anatoli; Pearce, Charles, Computational Statistics & Data Analysis 52 (869–878) 2007
Statistics in review; Part 1: graphics, data summary and linear models
Moran, John; Solomon, Patricia, Critical care and Resuscitation 9 (81–90) 2007
Experimental Design and Analysis of Microarray Data
Wilson, C; Tsykin, Anna; Wilkinson, Christopher; Abbott, C, chapter in Bioinformatics (Elsevier Ltd) 1–36, 2006
Is BGP update storm a sign of trouble: Observing the internet control and data planes during internet worms
Roughan, Matthew; Li, J; Bush, R; Mao, Z; Griffin, T, SPECTS 2006, Calgary, Canada 31/07/06
Statistical characteristics of rainstorms derived from weather radar images
Qin, J; Leonard, Michael; Kuczera, George; Thyer, M; Lambert, Martin; Metcalfe, Andrew, 30th Hydrology and Water Resources Symposium, Launceston, Tasmania 04/12/06
Watching data streams toward a multi-homed sink under routing changes introduced by a BGP beacon
Li, J; Bush, R; Mao, Z; Griffin, T; Roughan, Matthew; Stutzbach, D; Purpus, E, PAM2006, Adelaide, Australia 30/03/06
Data-recursive smoother formulae for partially observed discrete-time Markov chains
Elliott, Robert; Malcolm, William, Stochastic Analysis and Applications 24 (579–597) 2006
Optimal linear estimation and data fusion
Elliott, Robert; Van Der Hoek, John, IEEE Transactions on Automatic Control 51 (686–689) 2006
Secure distributed data-mining and its application to large-scale network measurements
Roughan, Matthew; Zhang, Y, Computer Communication Review 36 (7–14) 2006
Optimal estimation of a random signal from partially missed data
Torokhti, Anatoli; Howlett, P; Pearce, Charles, EUSIPCO 2006, Florence, Italy 04/09/06
Diversity sensitivity and multimodal Bayesian statistical analysis by relative entropy
Leipnik, R; Pearce, Charles, The ANZIAM Journal 47 (277–287) 2005
Optimal recursive estimation of raw data
Torokhti, Anatoli; Howlett, P; Pearce, Charles, Annals of Operations Research 133 (285–302) 2005
Impinging laminar jets at moderate Reynolds numbers and separation distances
Bergthorson, J; Sone, K; Mattner, Trent; Dimotakis, P; Goodwin, D; Meiron, D, Physical Review E. (Statistical, Nonlinear, and Soft Matter Physics) 72 (066307-1–066307-12) 2005
Class-of-service mapping for QoS: A statistical signature-based approach to IP traffic classification
Roughan, Matthew; Sen, S; Spatscheck, O; Duffield, N, ACM SIG COMM 2004, Taormina, Sicily, Italy 25/10/04
Combining routing and traffic data for detection of IP forwarding anomalies
Roughan, Matthew; Griffin, T; Mao, M; Greenberg, A; Freeman, B, Sigmetrics - Performance 2004, New York, USA 12/06/04
IP forwarding anomalies and improving their detection using multiple data sources
Roughan, Matthew; Griffin, T; Mao, M; Greenberg, A; Freeman, B, SIGCOMM 2004, Oregon, USA 30/08/04
Swift-Hohenberg model for magnetoconvection
Cox, Stephen; Matthews, P; Pollicott, S, Physical Review E. (Statistical, Nonlinear, and Soft Matter Physics) 69 (066314-1–066314-14) 2004
The data processing inequality and stochastic resonance
McDonnell, Mark; Stocks, N; Pearce, Charles; Abbott, Derek, Noise in Complex Systems and Stochastic Dynamics, Santa Fe, New Mexico, USA 01/06/03
Stochastic resonance and data processing inequality
McDonnell, Mark; Stocks, N; Pearce, Charles; Abbott, Derek, Electronics Letters 39 (1287–1288) 2003
The Oxford dictionary of statistical terms
Dodge, Y; Cox, D; Commenges, D; Solomon, Patricia; Wilson, S,
Resampling-based multiple testing for microarray data analysis (Invited discussion of paper by Ge, Dudoit and Speed)
Glonek, Garique; Solomon, Patricia, Test 12 (50–53) 2003
Higher-order statistical moments of wave-induced response of offshore structures via efficient sampling techniques
Najafian, G; Burrows, R; Tickell, R; Metcalfe, Andrew, International Offshore and Polar Engineering Conference 3 (465–470) 2002
Statistical modelling and prediction associated with the HIV/AIDS epidemic
Solomon, Patricia; Wilson, Susan, The Mathematical Scientist 26 (87–102) 2001
Best estimators of second degree for data analysis
Howlett, P; Pearce, Charles; Torokhti, Anatoli, ASMDA 2001, Compiegne, France 12/06/01
Optimal successive estimation of observed data
Torokhti, Anatoli; Howlett, P; Pearce, Charles, International Conference on Optimization: Techniques and Applications (5th: 2001), Hong Kong, China 15/12/01
Statistical analysis of medical data: New developments - Book review
Solomon, Patricia, Biometrics 57 (327–328) 2001
Meta-analysis, overviews and publication bias
Solomon, Patricia; Hutton, Jonathon, Statistical Methods in Medical Research 10 (245–250) 2001
Disease surveillance and data collection issues in epidemic modelling
Solomon, Patricia; Isham, V, Statistical Methods in Medical Research 9 (259–277) 2000
Disease surveillance and intervention studies in developing countries
Solomon, Patricia, Statistical Methods in Medical Research 9 (183–184) 2000

Advanced search options

You may be able to improve your search results by using the following syntax:

QueryMatches the following
Asymptotic EquationAnything with "Asymptotic" or "Equation".
+Asymptotic +EquationAnything with "Asymptotic" and "Equation".
+Stokes -"Navier-Stokes"Anything containing "Stokes" but not "Navier-Stokes".
Dynam*Anything containing "Dynamic", "Dynamical", "Dynamicist" etc.