
Search the School of Mathematical SciencesPeople matching "Data mining"Courses matching "Data mining" 
Analysis of multivariable and high dimensional data Multivariate analysis of data is performed with the aims to
1. understand the structure in data and summarise the data in simpler ways;
2. understand the relationship of one part of the data to another part; and
3. make decisions or draw inferences based on data.
The statistical analyses of multivariate data extend those of univariate data, and in doing so require
more advanced mathematical theory and computational techniques. The course begins with a
discussion of the three classical methods Principal Component Analysis, Canonical Correlation
Analysis and Discriminant Analysis which correspond to the aims above. We also learn about
Cluster Analysis, Factor Analysis and newer methods including Independent Component Analysis.
For most real data the underlying distribution is not known, but if the assumptions of multivariate
normality of the data hold, extra properties can be derived. Our treatment combines ideas,
theoretical properties and a strong computational component for each of the different methods we
discuss. For the computational part  with Matlab  we make use of real data and learn the use
of simulations in order to assess the performance of different methods in practice.
Topics covered:
1. Introduction to multivariate data, the multivariate normal distribution
2. Principal Component Analysis, theory and practice
3. Canonical Correlation Analysis, theory and practice
4. Discriminant Analysis, Fisher's LDA, linear and quadratic DA
5. Cluster Analysis: hierarchical and kmeans methods
6. Factor Analysis and latent variables
7. Independent Component Analysis including an Introduction to Information Theory
The course will be based on my forthcoming monograph
Analysis of Multivariate and HighDimensional Data  Theory and Practice, to be published by
Cambridge University Press.
More about this course... 
Events matching "Data mining" 
Mathematics of underground mining. 15:10 Fri 12 May, 2006 :: G08 Mathematics Building University of Adelaide :: Prof. Hyam Rubinstein
Underground mining infrastructure involves an
interesting range of optimisation problems with geometric
constraints. In particular, ramps, drives and tunnels have gradient
within a certain prescribed range and turning circles (curvature) are
also bounded. Finally obstacles have to be avoided, such as faults,
ore bodies themselves and old workings. A group of mathematicians and
engineers at Uni of Melb and Uni of SA have been working on this
problem for a number of years. I will summarise what we have found and
the challenges of working in the mining industry. 

Watching evolution in real time; problems and potential research areas.
15:10 Fri 26 May, 2006 :: G08. Mathematics Building University of Adelaide :: Prof Alan Cooper (Federation Fellow)
Recent studies (1) have indicated problems with our
ability to use the genetic distances between species to estimate the
time since their divergence (so called molecular clocks). An
exponential decay curve has been detected in comparisons of closely
related taxa in mammal and bird groups, and rough approximations
suggest that molecular clock calculations may be problematic for the
recent past (eg <1 million years). Unfortunately, this period
encompasses a number of key evolutionary events where estimates of
timing are critical such as modern human evolutionary history, the
domestication of animals and plants, and most issues involved in
conservation biology. A solution (formulated at UA) will be briefly
outlined. A second area of active interest is the recent suggestion
(2) that mitochondrial DNA diversity does not track population size in
several groups, in contrast to standard thinking. This finding has
been interpreted as showing that mtDNA may not be evolving neutrally,
as has long been assumed.
Large ancient DNA datasets provide a means to examine these issues, by
revealing evolutionary processes in real time (3). The data also
provide a rich area for mathematical investigation as temporal
information provides information about several parameters that are
unknown in serial coalescent calculations (4). References: Ho SYW et al. Time dependency of molecular rate estimates and
systematic overestimation of recent divergence
times. Mol. Biol. Evol. 22, 15611568 (2005);
Penny D, Nature 436, 183184 (2005).
 Bazin E., et al. Population size does not influence mitochondrial
genetic diversity in animals. Science 312, 570 (2006);
EyreWalker A. Size does not matter for mitochondrial DNA,
Science 312, 537 (2006).
 Shapiro B, et al. Rise and fall of the Beringian steppe
bison. Science 306: 15611565 (2004);
Chan et al. Bayesian estimation of the timing and severity of a
population bottleneck from ancient DNA. PLoS Genetics, 2 e59
(2006).
 Drummond et al. Measurably evolving populations, Trends in
Ecol. Evol. 18, 481488 (2003);
Drummond et al. Bayesian coalescent inference of past population
dynamics from molecular sequences. Molecular Biology Evolution
22, 118592 (2005).


A Bivariate Zeroinflated Poisson Regression Model and application to some Dental Epidemiological data 14:10 Fri 27 Oct, 2006 :: G08 Mathematics Building University of Adelaide :: University Prof Sudhir Paul
Data in the form of paired (pretreatment, posttreatment) counts arise in the study of the effects of several treatments after accounting for possible covariate effects. An example of such a data set comes from a dental epidemiological study in Belo Horizonte (the Belo Horizonte caries prevention study) which evaluated various programmes for reducing caries. Also, these data may show extra pairs of zeros than can be accounted for by a simpler model, such as, a bivariate Poisson regression model. In such situations we propose to use a zeroinflated bivariate Poisson regression (ZIBPR) model for the paired (pretreatment, posttreatment) count data. We develop EM algorithm to obtain maximum likelihood estimates of the parameters of the ZIBPR model. Further, we obtain exact Fisher information matrix of the maximum likelihood estimates of the parameters of the ZIBPR model and develop a procedure for testing treatment effects. The procedure to detect treatment effects based on the ZIBPR model is compared, in terms of size, by simulations, with an earlier procedure using a zeroinflated Poisson regression (ZIPR) model of the posttreatment count with the pretreatment count treated as a covariate. The procedure based on the ZIBPR model holds level most effectively. A further simulation study indicates good power property of the procedure based on the ZIBPR model. We then compare our analysis, of the decayed, missing and filled teeth (DMFT) index data from the caries prevention study, based on the ZIBPR model with the analysis using a zeroinflated Poisson regression model in which the pretreatment DMFT index is taken to be a covariate 

Likelihood inference for a problem in particle physics 15:10 Fri 27 Jul, 2007 :: G04 Napier Building University of Adelaide :: Prof. Anthony Davison
The Large Hadron Collider (LHC), a particle accelerator located at CERN, near Geneva, is (currently!) expected to start operation in early 2008. It is located in an underground tunnel 27km in circumference, and when fully operational, will be the world's largest and highest energy particle accelerator. It is hoped that it will provide evidence for the existence of the Higgs boson, the last remaining particle of the socalled Standard Model of particle physics. The quantity of data that will be generated by the LHC is roughly equivalent to that of the European telecommunications network, but this will be boiled down to just a few numbers. After a brief introduction, this talk will outline elements of the statistical problem of detecting the presence of a particle, and then sketch how higher order likelihood asymptotics may be used for signal detection in this context. The work is joint with Nicola Sartori, of the Università Ca' Foscari, in Venice. 

Regression: a backwards step? 13:10 Fri 7 Sep, 2007 :: Maths G08 :: Dr Gary Glonek
Media...Most students of high school mathematics will have encountered the technique of fitting a line to data by least squares. Those who have taken a university statistics course will also have heard this method referred to as regression. However, it is not obvious from common dictionary definitions why this should be the case. For example, "reversion to an earlier or less advanced state or form". In this talk, the mathematical phenomenon that gave regression its name will be explained and will be shown to have implications in some unexpected contexts.


Statistical Critique of the International Panel on Climate Change's work on Climate Change. 18:00 Wed 17 Oct, 2007 :: Union Hall University of Adelaide :: Mr Dennis Trewin
Climate change is one of the most important issues facing us today. Many governments have introduced or are developing appropriate policy interventions to (a) reduce the growth of greenhouse gas emissions in order to mitigate future climate change, or (b) adapt to future climate change.
This important work deserves a high quality statistical data base but there are statistical shortcomings in the work of the International Panel on Climate Change (IPCC). There has been very little involvement of qualified statisticians in the very important work of the IPCC which appears to be scientifically meritorious in most other ways.
Mr Trewin will explain these shortcomings and outline his views on likely future climate change, taking into account the statistical deficiencies.
His conclusions suggest climate change is still an important issue that needs to be addressed but the range of likely outcomes is a lot lower than has been suggested by the IPCC.
This presentation will be based on an invited paper presented at the OECD World Forum.


Moderated Statistical Tests for Digital Gene Expression Technologies 15:10 Fri 19 Oct, 2007 :: G04 Napier Building University of Adelaide :: Dr Gordon Smyth :: Walter and Eliza Hall Institute of Medical Research in Melbourne, Australia
Digital gene expression (DGE) technologies measure gene expression by counting sequence tags. They are sensitive technologies for measuring gene expression on a genomic scale, without the need for prior knowledge of the genome sequence. As the cost of DNA sequencing decreases, the number of DGE datasets is expected to grow dramatically. Various tests of differential expression have been proposed for replicated DGE data using overdispersed binomial or Poisson models for the counts, but none of the these are usable when the number of replicates is very small. We develop tests using the negative binomial distribution to model overdispersion relative to the Poisson, and use conditional weighted likelihood to moderate the level of overdispersion across genes. A heuristic empirical Bayes algorithm is developed which is applicable to very general likelihood estimation contexts. Not only is our strategy applicable even with the smallest number of replicates, but it also proves to be more powerful than previous strategies when more replicates are available. The methodology is applicable to other counting technologies, such as proteomic spectral counts.


Global and Local stationary modelling in finance: Theory and empirical evidence 14:10 Thu 10 Apr, 2008 :: G04 Napier Building University of Adelaide :: Prof. Dominique Guégan :: Universite Paris 1 PantheonSorbonne
To model real data sets using second order stochastic processes imposes that the data sets verify the second order stationarity condition. This stationarity condition concerns the unconditional moments of the process. It is in that context that most of models developed from the sixties' have been studied; We refer to the ARMA processes (Brockwell and Davis, 1988), the ARCH, GARCH and EGARCH models (Engle, 1982, Bollerslev, 1986, Nelson, 1990), the SETAR process (Lim and Tong, 1980 and Tong, 1990), the bilinear model (Granger and Andersen, 1978, Guégan, 1994), the EXPAR model (Haggan and Ozaki, 1980), the long memory process (Granger and Joyeux, 1980, Hosking, 1981, Gray, Zang and Woodward, 1989, Beran, 1994, Giraitis and Leipus, 1995, Guégan, 2000), the switching process (Hamilton, 1988). For all these models, we get an invertible causal solution under specific conditions on the parameters, then the forecast points and the forecast intervals are available.
Thus, the stationarity assumption is the basis for a general asymptotic theory for identification, estimation and forecasting. It guarantees that the increase of the sample size leads to more and more information of the same kind which is basic for an asymptotic theory to make sense.
Now nonstationarity modelling has also a long tradition in econometrics. This one is based on the conditional moments of the data generating process. It appears mainly in the heteroscedastic and volatility models, like the GARCH and related models, and stochastic volatility processes (Ghysels, Harvey and Renault 1997). This nonstationarity appears also in a different way with structural changes models like the switching models (Hamilton, 1988), the stopbreak model (Diebold and Inoue, 2001, Breidt and Hsu, 2002, Granger and Hyung, 2004) and the SETAR models, for instance. It can also be observed from linear models with time varying coefficients (Nicholls and Quinn, 1982, Tsay, 1987).
Thus, using stationary unconditional moments suggest a global stationarity for the model, but using nonstationary unconditional moments or nonstationary conditional moments or assuming existence of states suggest that this global stationarity fails and that we only observe a local stationary behavior.
The growing evidence of instability in the stochastic behavior of stocks, of exchange rates, of some economic data sets like growth rates for instance, characterized by existence of volatility or existence of jumps in the variance or on the levels of the prices imposes to discuss the assumption of global stationarity and its consequence in modelling, particularly in forecasting. Thus we can address several questions with respect to these remarks.
1. What kinds of nonstationarity affect the major financial and economic data sets? How to detect them?
2. Local and global stationarities: How are they defined?
3. What is the impact of evidence of nonstationarity on the statistics computed from the global non stationary data sets?
4. How can we analyze data sets in the nonstationary global framework? Does the asymptotic theory work in nonstationary framework?
5. What kind of models create local stationarity instead of global stationarity? How can we use them to develop a modelling and a forecasting strategy?
These questions began to be discussed in some papers in the economic literature. For some of these questions, the answers are known, for others, very few works exist. In this talk I will discuss all these problems and will propose 2 new stategies and modelling to solve them. Several interesting topics in empirical finance awaiting future research will also be discussed.


Elliptic equation for diffusionadvection flows 15:10 Fri 15 Aug, 2008 :: G03 Napier Building University of Adelaide :: Prof. Pavel Bedrikovsetsky :: Australian School of Petroleum Science, University of Adelaide.
The standard diffusion equation is obtained by Einstein's method and its generalisation, FokkerPlankKolmogorovFeller theory. The time between jumps in Einstein derivation is constant.
We discuss random walks with residence time distribution, which occurs for flows of solutes and suspensions/colloids in porous media, CO2 sequestration in coal mines, several processes in chemical, petroleum and environmental engineering. The rigorous application of the Einstein's method results in new equation, containing the time and the mixed dispersion terms expressing the dispersion of the particle time steps.
Usually, adding the second time derivative results in additional initial data. For the equation derived, the condition of limited solution when time tends to infinity provides with uniqueness of the Caushy problem solution.
The solution of the pulse injection problem describing a common tracer injection experiment is studied in greater detail. The new theory predicts delay of the maximum of the tracer, compared to the velocity of the flow, while its forward "tail" contains much more particles than in the solution of the classical parabolic (advectiondispersion) equation. This is in agreement with the experimental observations and predictions of the direct simulation.


Oceanographic Research at the South Australian Research and Development Institute: opportunities for collaborative research 15:10 Fri 21 Nov, 2008 :: Napier G04 :: Associate Prof John Middleton :: South Australian Research and Development Institute
Increasing threats to S.A.'s fisheries and marine environment have underlined the increasing need for soundly based research into the ocean circulation and ecosystems (phyto/zooplankton) of the shelf and gulfs. With support of Marine Innovation SA, the Oceanography Program has within 2 years, grown to include 6 FTEs and a budget of over $4.8M. The program currently leads two major research projects, both of which involve numerical and applied mathematical modelling of oceanic flow and ecosystems as well as statistical techniques for the analysis of data. The first is the implementation of the Southern Australian Integrated Marine Observing System (SAIMOS) that is providing data to understand the dynamics of shelf boundary currents, monitor for climate change and understand the phyto/zooplankton ecosystems that underpin SA's wild fisheries and aquaculture. SAIMOS involves the use of shipbased sampling, the deployment of underwater marine moorings, underwater gliders, HF Ocean RADAR, acoustic tracking of tagged fish and Autonomous Underwater vehicles.
The second major project involves measuring and modelling the ocean circulation and biological systems within Spencer Gulf and the impact on prawn larval dispersal and on the sustainability of existing and proposed aquaculture sites. The discussion will focus on opportunities for collaborative research with both faculty and students in this exciting growth area of S.A. science.


Key Predistribution in GridBased Wireless Sensor Networks 15:10 Fri 12 Dec, 2008 :: Napier G03 :: Dr Maura Paterson :: Information Security Group at Royal Holloway, University of London.
Wireless sensors are small, batterypowered devices that are deployed to
measure quantities such as temperature within a given region, then form
a wireless network to transmit and process the data they collect.
We discuss the problem of distributing symmetric cryptographic keys to
the nodes of a wireless sensor network in the case where the sensors are
arranged in a square or hexagonal grid, and we propose a key
predistribution scheme for such networks that is based on Costas arrays.
We introduce more general structures known as distinctdifference
configurations, and show that they provide a flexible choice of
parameters in our scheme, leading to more efficient performance than
that achieved by prior schemes from the literature. 

From histograms to multivariate polynomial histograms and shape estimation 12:10 Thu 19 Mar, 2009 :: Napier 210 :: A/Prof Inge Koch
Media...Histograms are convenient and easytouse tools for estimating the shape of
data, but they have serious problems which are magnified for multivariate data.
We combine classic histograms with shape estimation by polynomials. The new
relatives, `polynomial histograms', have surprisingly nice mathematical
properties, which we will explore in this talk. We also show how they can be
used for real data of 1020 dimensions to analyse and understand the shape of
these data.


Multiscale tools for interpreting cell biology data 15:10 Fri 17 Apr, 2009 :: Napier LG29 :: Dr Matthew Simpson :: University of Melbourne
Trajectory data from observations of a random walk process are often used to characterize macroscopic transport coefficients and to infer motility mechanisms in cell biology. New continuum equations describing the average moments of the position of an individual agent in a population of interacting agents are derived and validated. Unlike standard noninteracting random walks, the new moment equations explicitly represent the interactions between agents as they are coupled to the macroscopic agent density. Key issues associated with the validity of the new continuum equations and the interpretation of experimental data will be explored. 

Statistical analysis for harmonized development of systemic organs in human fetuses 11:00 Thu 17 Sep, 2009 :: School Board Room :: Prof Kanta Naito :: Shimane University
The growth processes of human babies have been studied
sufficiently in scientific fields, but there have still been many issues
about the developments of human fetus which are not clarified. The aim of
this research is to investigate the developing process of systemic organs of
human fetuses based on the data set of measurements of fetus's bodies and
organs. Specifically, this talk is concerned with giving a mathematical
understanding for the harmonized developments of the organs of human
fetuses. The method to evaluate such harmonies is proposed by the use of the
maximal dilatation appeared in the theory of quasiconformal mapping. 

Contemporary frontiers in statistics 15:10 Mon 28 Sep, 2009 :: Badger Labs G31 Macbeth Lectrue :: Prof. Peter Hall :: University of Melbourne
The availability of powerful computing equipment has had a dramatic impact on statistical methods and thinking, changing forever the way data are analysed. New data types, larger quantities of data, and new classes of research problem are all motivating new statistical methods. We shall give examples of each of these issues, and discuss the current and future directions of frontier problems in statistics. 

Exploratory experimentation and computation 15:10 Fri 16 Apr, 2010 :: Napier LG29 :: Prof Jonathan Borwein :: University of Newcastle
Media...The mathematical research community is facing a great challenge to reevaluate the role of proof in light of the growing power of current computer systems, of modern mathematical computing packages, and of the growing capacity to datamine on the Internet. Add to that the enormous complexity of many modern capstone results such as the Poincare conjecture, Fermat's last theorem, and the Classification of finite simple groups. As the need and prospects for inductive mathematics blossom, the requirement to ensure the role of proof is properly founded remains undiminished. I shall look at the philosophical context with examples and then offer some of five benchmarking examples of the opportunities and challenges we face. 

Estimation of sparse Bayesian networks using a scorebased approach 15:10 Fri 30 Apr, 2010 :: School Board Room :: Dr Jessica Kasza :: University of Copenhagen
The estimation of Bayesian networks given highdimensional data sets, with more variables than there are observations, has been the focus of much recent research. These structures provide a flexible framework for the representation of the conditional independence relationships of a set of variables, and can be particularly useful in the estimation of genetic regulatory networks given gene expression data.
In this talk, I will discuss some new research on learning sparse networks, that is, networks with many conditional independence restrictions, using a scorebased approach. In the case of genetic regulatory networks, such sparsity reflects the view that each gene is regulated by relatively few other genes. The presented approach allows prior information about the overall sparsity of the underlying structure to be included in the analysis, as well as the incorporation of prior knowledge about the connectivity of individual nodes within the network.


Interpolation of complex data using spatiotemporal compressive sensing 13:00 Fri 28 May, 2010 :: Santos Lecture Theatre :: A/Prof Matthew Roughan :: School of Mathematical Sciences, University of Adelaide
Many complex datasets suffer from missing data, and interpolating these missing
elements is a key task in data analysis. Moreover, it is often the case that we
see only a linear combination of the desired measurements, not the measurements
themselves. For instance, in network management, it is easy to count the traffic
on a link, but harder to measure the endtoend flows. Additionally, typical
interpolation algorithms treat either the spatial, or the temporal
components of data separately, but in many real datasets have strong
spatiotemporal structure that we would like to exploit in reconstructing the
missing data. In this talk I will describe a novel reconstruction algorithm that
exploits concepts from the growing area of compressive sensing to solve all of
these problems and more. The approach works so well on Internet traffic matrices
that we can obtain a reasonable reconstruction with as much as 98% of the
original data missing. 

A variance constraining ensemble Kalman filter: how to improve forecast using climatic data of unobserved variables 15:10 Fri 28 May, 2010 :: Santos Lecture Theatre :: A/Prof Georg Gottwald :: The University of Sydney
Data assimilation aims to solve one of the fundamental problems ofnumerical weather prediction  estimating the optimal state of the
atmosphere given a numerical model of the dynamics, and sparse, noisy
observations of the system. A standard tool in attacking this
filtering problem is the Kalman filter.
We consider the problem when only partial observations are available.
In particular we consider the situation where the observational space
consists of variables which are directly observable with known
observational error, and of variables of which only their climatic
variance and mean are given. We derive the corresponding Kalman
filter in a variational setting.
We analyze the variance constraining Kalman filter (VCKF) filter for
a simple linear toy model and determine its range of optimal
performance. We explore the variance constraining Kalman filter in an
ensemble transform setting for the Lorenz96 system, and show that
incorporating the information on the variance on some unobservable
variables can improve the skill and also increase the stability of
the data assimilation procedure.
Using methods from dynamical systems theory we then systems where the
unobserved variables evolve deterministically but chaotically on a
fast time scale.
This is joint work with Lewis Mitchell and Sebastian Reich.


Meteorological drivers of extreme bushfire events in southern Australia 15:10 Fri 2 Jul, 2010 :: Benham Lecture Theatre :: Prof Graham Mills :: Centre for Australian Weather and Climate Research, Melbourne
Bushfires occur regularly during summer in southern Australia, but only a few of these fires become iconic due to their effects, either in terms of loss of life or economic and social cost. Such events include Black Friday (1939), the Hobart fires (1967), Ash Wednesday (1983), the Canberra bushfires (2003), and most recently Black Saturday in February 2009. In most of these events the weather of the day was statistically extreme in terms of heat, (low) humidity, and wind speed, and in terms of antecedent drought. There are a number of reasons for conducting postevent analyses of the meteorology of these events. One is to identify any meteorological circulation systems or dynamic processes occurring on those days that might not be widely or hitherto recognised, to document these, and to develop new forecast or guidance products. The understanding and prediction of such features can be used in the short term to assist in effective management of fires and the safety of firefighters and in the medium range to assist preparedness for the onset of extreme conditions. The results of such studies can also be applied to simulations of future climates to assess the likely changes in frequency of the most extreme fire weather events, and their documentary records provide a resource that can be used for advanced training purposes. In addition, particularly for events further in the past, revisiting these events using reanalysis data sets and contemporary NWP models can also provide insights unavailable at the time of the events.
Over the past few years the Bushfire CRC's Fire Weather and Fire Danger project in CAWCR has studied the mesoscale meteorology of a number of major fire events, including the days of Ash Wednesday 1983, the Dandenong Ranges fire in January 1997, the Canberra fires and the Alpine breakout fires in January 2003, the Lower Eyre Peninsula fires in January 2005 and the Boorabbin fire in December 2007January 2008. Various aspects of these studies are described below, including the structures of dry cold frontal wind changes, the particular character of the cold fronts associated with the most damaging fires in southeastern Australia, and some aspects of how the vertical temperature and humidity structure of the atmosphere may affect the fire weather at the surface.
These studies reveal much about these major events, but also suggest future research directions, and some of these will be discussed.


Mathematica Seminar 15:10 Wed 28 Jul, 2010 :: Engineering Annex 314 :: Kim Schriefer :: Wolfram Research
The Mathematica Seminars 2010 offer an opportunity to experience the applicability, easeofuse, as well as the advancements of Mathematica 7 in education and academic research. These seminars will highlight the latest directions in technical computing with Mathematica, and the impact this technology has across a wide range of academic fields, from maths, physics and biology to finance, economics and business.
Those not yet familiar with Mathematica will gain an overview of the system and discover the breadth of applications it can address, while experts will get firsthand experience with recent advances in Mathematica like parallel computing, digital image processing, pointandclick palettes, builtin curated data, as well as courseware examples. 

A spatialtemporal point process model for fine resolution multisite rainfall data from Roma, Italy 14:10 Thu 19 Aug, 2010 :: Napier G04 :: A/Prof Paul Cowpertwait :: Auckland University of Technology
A point process rainfall model is further developed that has storm origins occurring in spacetime according to a Poisson process. Each storm origin has a random radius so that storms occur as circular regions in twodimensional
space, where the storm radii are taken to be independent exponential random
variables. Storm origins are of random type z, where z follows a continuous
probability distribution. Cell origins occur in a further spatial Poisson
process and have arrival times that follow a NeymanScott point process. Cell
origins have random radii so that cells form discs in twodimensional space.
Statistical properties up to third order are derived and used to fit the model
to 10 min series taken from 23 sites across the Roma region, Italy.
Distributional properties of the observed annual maxima are compared to
equivalent values sampled from series that are simulated using the fitted
model. The results indicate that the model will be of use in urban drainage
projects for the Roma region.


Simultaneous confidence band and hypothesis test in generalised varyingcoefficient models 15:05 Fri 10 Sep, 2010 :: Napier LG28 :: Prof Wenyang Zhang :: University of Bath
Generalised varyingcoefficient models (GVC) are very important
models. There are a considerable number of literature addressing these models.
However, most of the existing literature are devoted to the estimation
procedure. In this talk, I will systematically investigate the statistical
inference for GVC, which includes confidence band as well as hypothesis test. I
will show the asymptotic distribution of the maximum discrepancy between the
estimated functional coefficient and the true functional coefficient. I will
compare different approaches for the construction of confidence band and
hypothesis test. Finally, the proposed statistical inference methods are used to
analyse the data from China about contraceptive use there, which leads to some
interesting findings. 

Principal Component Analysis Revisited 15:10 Fri 15 Oct, 2010 :: Napier G04 :: Assoc. Prof Inge Koch :: University of Adelaide
Since the beginning of the 20th century, Principal Component Analysis (PCA) has been an important tool in the analysis of multivariate data. The principal components summarise data in fewer than the original number of variables without losing essential information, and thus allow a split of the data into signal and noise components. PCA is a linear method, based on elegant mathematical theory.
The increasing complexity of data together with the emergence of fast computers in the later parts of the 20th century has led to a renaissance of PCA. The growing numbers of variables (in particular, highdimensional low sample size problems), nonGaussian data, and functional data (where the data are curves) are posing exciting challenges to statisticians, and have resulted in new research which extends the classical theory.
I begin with the classical PCA methodology and illustrate the challenges presented by the complex data that we are now able to collect. The main part of the talk focuses on extensions of PCA: the duality of PCA and the Principal Coordinates of Multidimensional Scaling, Sparse PCA, and consistency results relating to principal components, as the dimension grows. We will also look at newer developments such as Principal Component Regression and Supervised PCA, nonlinear PCA and Functional PCA.


Classification for highdimensional data 15:10 Fri 1 Apr, 2011 :: Conference Room Level 7 Ingkarni Wardli :: Associate Prof Inge Koch :: The University of Adelaide
For twoclass classification problems Fisher's discriminant rule performs
well in many scenarios provided the dimension, d, is much smaller than the sample
size n. As the dimension increases, Fisher's rule may no longer be
adequate, and can perform as poorly as random guessing.
In this talk we look at new ways of overcoming this poor performance for
highdimensional data by suitably modifying Fisher's rule, and in particular
we describe the 'Features Annealed Independence Rule (FAIR)? of Fan and Fan
(2008) and a rule based on canonical correlation analysis. I describe some
theoretical developments, and also show analysis of data which illustrate the
performance of these modified rule. 

Comparison of Spectral and Wavelet Estimation of the Dynamic Linear System of a Wade Energy Device 12:10 Mon 2 May, 2011 :: 5.57 Ingkarni Wardli :: Mohd Aftar :: University of Adelaide
Renewable energy has been one of the main issues nowadays. The implications of fossil energy and nuclear energy along with its limited source have triggered researchers and industries to find another source of renewable energy for example hydro energy, wind energy and also wave energy. In this seminar, I will talk about the spectral estimation and wavelet estimation of a linear dynamical system of motion for a heaving buoy wave energy device. The spectral estimates was based on the Fourier transform, while the wavelet estimate was based on the wavelet transform. Comparisons between two spectral estimates with a wavelet estimate of the amplitude response operator(ARO) for the dynamical system of the wave energy device shows that the wavelet estimate ARO is much better for data with and without noise. 

On parameter estimation in population models 15:10 Fri 6 May, 2011 :: 715 Ingkarni Wardli :: Dr Joshua Ross :: The University of Adelaide
Essential to applying a mathematical model to a realworld application is
calibrating the model to data. Methods for calibrating population models
often become computationally infeasible when the populations size (more generally
the size of the state space) becomes large, or other complexities such as
timedependent transition rates, or sampling error, are present. Here we
will discuss the use of diffusion approximations to perform estimation in several
scenarios, with successively reduced assumptions: (i) under the assumption
of stationarity (the process had been evolving for a very long time with
constant parameter values); (ii) transient dynamics (the assumption of stationarity
is invalid, and thus only constant parameter values may be assumed); and, (iii)
timeinhomogeneous chains (the parameters may vary with time) and accounting
for observation error (a sample of the true state is observed). 

When statistics meets bioinformatics 12:10 Wed 11 May, 2011 :: Napier 210 :: Prof Patty Solomon :: School of Mathematical Sciences
Media...Bioinformatics is a new field of research which encompasses mathematics, computer science, biology, medicine and the physical sciences. It has arisen from the need to handle and analyse the vast amounts of data being generated by the new genomics technologies. The interface of these disciplines used to be informationpoor, but is now informationmegarich, and statistics plays a central role in processing this information and making it intelligible. In this talk, I will describe a published bioinformatics study which claimed to have developed a simple test for the early detection of ovarian cancer from a blood sample. The US Food and Drug Administration was on the verge of approving the test kits for market in 2004 when demonstrated flaws in the study design and analysis led to its withdrawal. We are still waiting for an effective early biomarker test for ovarian cancer. 

Statistical challenges in molecular phylogenetics 15:10 Fri 20 May, 2011 :: Mawson Lab G19 lecture theatre :: Dr Barbara Holland :: University of Tasmania
Media...This talk will give an introduction to the ways that mathematics and statistics gets used in the inference of evolutionary (phylogenetic) trees. Taking a modelbased approach to estimating the relationships between species has proven to be an enormously effective, however, there are some tricky statistical challenges that remain. The increasingly plentiful amount of DNA sequence data is a boon, but it is also throwing a spotlight on some of the shortcomings of current best practice particularly in how we (1) assess the reliability of our phylogenetic estimates, and (2) how we choose appropriate models. This talk will aim to give a general introduction this area of research and will also highlight some results from two of my recent PhD students. 

Permeability of heterogeneous porous media  experiments, mathematics and computations 15:10 Fri 27 May, 2011 :: B.21 Ingkarni Wardli :: Prof Patrick Selvadurai :: Department of Civil Engineering and Applied Mechanics, McGill University
Permeability is a key parameter important to a variety of applications in geological engineering and in the environmental geosciences. The conventional definition of Darcy flow enables the estimation of permeability at different levels of detail. This lecture will focus on the measurement of surface permeability characteristics of a large cuboidal block of Indiana Limestone, using a surface permeameter. The paper discusses the theoretical developments, the solution of the resulting triple integral equations and associated computational treatments that enable the mapping of the near surface permeability of the cuboidal region. This data combined with a kriging procedure is used to develop results for the permeability distribution at the interior of the cuboidal region. Upon verification of the absence of dominant pathways for fluid flow through the cuboidal region, estimates are obtained for the "Effective Permeability" of the cuboid using estimates proposed by Wiener, Landau and Lifschitz, King, Matheron, Journel et al., Dagan and others. The results of these estimates are compared with the geometric mean, derived form the computational estimates. 

Optimal experimental design for stochastic population models 15:00 Wed 1 Jun, 2011 :: 7.15 Ingkarni Wardli :: Dr Dan Pagendam :: CSIRO, Brisbane
Markov population processes are popular models for studying a wide range of
phenomena including the spread of disease, the evolution of chemical reactions
and the movements of organisms in population networks (metapopulations). Our
ability to use these models effectively can be limited by our knowledge about
parameters, such as disease transmission and recovery rates in an epidemic.
Recently, there has been interest in devising optimal experimental designs for
stochastic models, so that practitioners can collect data in a manner that
maximises the precision of maximum likelihood estimates of the parameters for
these models. I will discuss some recent work on optimal design for a variety
of population models, beginning with some simple oneparameter models where the
optimal design can be obtained analytically and moving on to more complicated
multiparameter models in epidemiology that involve latent states and
nonexponentially distributed infectious periods. For these more complex
models, the optimal design must be arrived at using computational methods and we
rely on a Gaussian diffusion approximation to obtain analytical expressions for
Fisher's information matrix, which is at the heart of most optimality criteria
in experimental design. I will outline a simple crossentropy algorithm that
can be used for obtaining optimal designs for these models. We will also
explore the improvements in experimental efficiency when using the optimal
design over some simpler designs, such as the design where observations are
spaced equidistantly in time. 

Inference and optimal design for percolation and general random graph models (Part I) 09:30 Wed 8 Jun, 2011 :: 7.15 Ingkarni Wardli :: Dr Andrei Bejan :: The University of Cambridge
The problem of optimal arrangement of nodes of a random weighted graph
is discussed in this workshop. The nodes of graphs under study are fixed, but
their edges are random and established according to the so called
edgeprobability function. This function is assumed to depend on the weights
attributed to the pairs of graph nodes (or distances between them) and a
statistical parameter. It is the purpose of experimentation to make inference on
the statistical parameter and thus to extract as much information about it as
possible. We also distinguish between two different experimentation scenarios:
progressive and instructive designs.
We adopt a utilitybased Bayesian framework to tackle the optimal design problem
for random graphs of this kind. Simulation based optimisation methods, mainly
Monte Carlo and Markov Chain Monte Carlo, are used to obtain the solution. We
study optimal design problem for the inference based on partial observations of
random graphs by employing data augmentation technique. We prove that the
infinitely growing or diminishing node configurations asymptotically represent
the worst node arrangements. We also obtain the exact solution to the optimal
design problem for proximity (geometric) graphs and numerical solution for
graphs with threshold edgeprobability functions.
We consider inference and optimal design problems for finite clusters from bond
percolation on the integer lattice $\mathbb{Z}^d$ and derive a range of both
numerical and analytical results for these graphs. We introduce innerouter
plots by deleting some of the lattice nodes and show that the ÃÂÃÂ«mostly populatedÃÂÃÂ
designs are not necessarily optimal in the case of incomplete observations under
both progressive and instructive design scenarios. Some of the obtained results
may generalise to other lattices. 

Inference and optimal design for percolation and general random graph models (Part II) 10:50 Wed 8 Jun, 2011 :: 7.15 Ingkarni Wardli :: Dr Andrei Bejan :: The University of Cambridge
The problem of optimal arrangement of nodes of a random weighted graph
is discussed in this workshop. The nodes of graphs under study are fixed, but
their edges are random and established according to the so called
edgeprobability function. This function is assumed to depend on the weights
attributed to the pairs of graph nodes (or distances between them) and a
statistical parameter. It is the purpose of experimentation to make inference on
the statistical parameter and thus to extract as much information about it as
possible. We also distinguish between two different experimentation scenarios:
progressive and instructive designs.
We adopt a utilitybased Bayesian framework to tackle the optimal design problem
for random graphs of this kind. Simulation based optimisation methods, mainly
Monte Carlo and Markov Chain Monte Carlo, are used to obtain the solution. We
study optimal design problem for the inference based on partial observations of
random graphs by employing data augmentation technique. We prove that the
infinitely growing or diminishing node configurations asymptotically represent
the worst node arrangements. We also obtain the exact solution to the optimal
design problem for proximity (geometric) graphs and numerical solution for
graphs with threshold edgeprobability functions.
We consider inference and optimal design problems for finite clusters from bond
percolation on the integer lattice $\mathbb{Z}^d$ and derive a range of both
numerical and analytical results for these graphs. We introduce innerouter
plots by deleting some of the lattice nodes and show that the ÃÂÃÂÃÂÃÂ«mostly populatedÃÂÃÂÃÂÃÂ
designs are not necessarily optimal in the case of incomplete observations under
both progressive and instructive design scenarios. Some of the obtained results
may generalise to other lattices. 

Quantitative proteomics: data analysis and statistical challenges 10:10 Thu 30 Jun, 2011 :: 7.15 Ingkarni Wardli :: Dr Peter Hoffmann :: Adelaide Proteomics Centre


Introduction to functional data analysis with applications to proteomics data 11:10 Thu 30 Jun, 2011 :: 7.15 Ingkarni Wardli :: A/Prof Inge Koch :: School of Mathematical Sciences


Object oriented data analysis 14:10 Thu 30 Jun, 2011 :: 7.15 Ingkarni Wardli :: Prof Steve Marron :: The University of North Carolina at Chapel Hill
Object Oriented Data Analysis is the statistical analysis of populations of complex objects. In the special case of Functional Data Analysis, these data objects are curves, where standard Euclidean approaches, such as principal components analysis, have been very successful. Recent developments in medical image analysis motivate the statistical analysis of populations of more complex data objects which are elements of mildly nonEuclidean spaces, such as Lie Groups and Symmetric Spaces, or of strongly nonEuclidean spaces, such as spaces of treestructured data objects. These new contexts for Object Oriented Data Analysis create several potentially large new interfaces between mathematics and statistics. Even in situations where Euclidean analysis makes sense, there are statistical challenges because of the High Dimension Low Sample Size problem, which motivates a new type of asymptotics leading to nonstandard mathematical statistics. 

Object oriented data analysis of treestructured data objects 15:10 Fri 1 Jul, 2011 :: 7.15 Ingkarni Wardli :: Prof Steve Marron :: The University of North Carolina at Chapel Hill
The field of Object Oriented Data Analysis has made a lot of
progress on the statistical analysis of the variation in populations
of complex objects. A particularly challenging example of this type
is populations of treestructured objects. Deep challenges arise,
which involve a marriage of ideas from statistics, geometry, and
numerical analysis, because the space of trees is strongly
nonEuclidean in nature. These challenges, together with three
completely different approaches to addressing them, are illustrated
using a real data example, where each data point is the tree of blood
arteries in one person's brain. 

Modelling computer network topologies through optimisation 12:10 Mon 1 Aug, 2011 :: 5.57 Ingkarni Wardli :: Mr Rhys Bowden :: University of Adelaide
The core of the Internet is made up of many different computers (called routers) in many different interconnected networks, owned and operated by many different organisations. A popular and important field of study in the past has been "network topology": for instance, understanding which routers are connected to which other routers, or which networks are connected to which other networks; that is, studying and modelling the connection structure of the Internet. Previous study in this area has been plagued by unreliable or flawed experimental data and debate over appropriate models to use. The Internet Topology Zoo is a new source of network data created from the information that network operators make public. In order to better understand this body of network information we would like the ability to randomly generate network topologies resembling those in the zoo. Leveraging previous wisdom on networks produced as a result of optimisation processes, we propose a simple objective function based on possible economic constraints. By changing the relative costs in the objective function we can change the form of the resulting networks, and we compare these optimised networks to a variety of networks found in the Internet Topology Zoo. 

Spectra alignment/matching for the classification of cancer and control patients 12:10 Mon 8 Aug, 2011 :: 5.57 Ingkarni Wardli :: Mr Tyman Stanford :: University of Adelaide
Proteomic timeofflight mass spectrometry produces a spectrum based on the peptides (chains of amino acids) in each patientâs serum sample. The spectra contain data points for an xaxis (peptide weight) and a yaxis (peptide frequency/count/intensity). It is our end goal to differentiate cancer (and subtypes) and control patients using these spectra. Before we can do this, peaks in these data must be found and common peptides to different spectra must be found. The data are noisy because of biotechnological variation and calibration error; data points for different peptide weights may in fact be same peptide. An algorithm needs to be employed to find common peptides between spectra, as performing alignment âby handâ is almost infeasible. We borrow methods suggested in the literature by metabolomic gas chromatographymass spectrometry and extend the methods for our purposes. In this talk I will go over the basic tenets of what we hope to achieve and the process towards this.


Dealing with the GCcontent bias in secondgeneration DNA sequence data 15:10 Fri 12 Aug, 2011 :: Horace Lamb :: Prof Terry Speed :: Walter and Eliza Hall Institute
Media...The field of genomics is currently dealing with an explosion of data from socalled
secondgeneration DNA sequencing machines. This is creating many challenges and
opportunities for statisticians interested in the area.
In this talk I will outline the technology and the data flood, and move on to one particular
problem where the technology is used: copynumber analysis.
There we find a novel bias, which, if not dealt with properly, can dominate the signal of
interest. I will describe how we think about and summarize it, and go on to identify a
plausible source of this bias, leading up to a way of removing it.
Our approach makes use of the total variation metric on discrete measures, but apart from
this, is largely descriptive. 

Laplace's equation on multiplyconnected domains 12:10 Mon 29 Aug, 2011 :: 5.57 Ingkarni Wardli :: Mr Hayden Tronnolone :: University of Adelaide
Various physical processes take place on multiplyconnected domains
(domains with some number of 'holes'), such as the stirring of a fluid
with paddles or the extrusion of material from a die. These systems may
be described by partial differential equations (PDEs). However, standard
numerical methods for solving PDEs are not wellsuited to such examples:
finite difference methods are difficult to implement on
multiplyconnected domains, especially when the boundaries are irregular
or moving, while finite element methods are computationally expensive.
In this talk I will describe a fast and accurate numerical method for
solving certain PDEs on twodimensional multiplyconnected domains,
considering Laplace's equation as an example. This method takes
advantage of complex variable techniques which allow the solution to be
found with spectral accuracy provided the boundary data is smooth. Other
advantages over traditional numerical methods will also be discussed. 

Alignment of time course gene expression data sets using Hidden Markov Models 12:10 Mon 5 Sep, 2011 :: 5.57 Ingkarni Wardli :: Mr Sean Robinson :: University of Adelaide
Time course microarray experiments allow for insight into biological processes by measuring gene expression over a time period of interest. This project is concerned with time course data from a microarray experiment conducted on a particular variety of grapevine over the development of the grape berries at a number of different vineyards in South Australia. The aim of the project is to construct a methodology for combining the data from the different vineyards in order to obtain more precise estimates of the underlying behaviour of the genes over the development process. A major issue in doing so is that the rate of development of the grape berries is different at different vineyards.
Hidden Markov models (HMMs) are a well established methodology for modelling time series data in a number of domains and have been previously used for gene expression analysis. Modelling the grapevine data presents a unique modelling issue, namely the alignment of the expression profiles needed to combine the data from different vineyards. In this seminar, I will describe our problem, review HMMs, present an extension to HMMs and show some preliminary results modelling the grapevine data. 

Statistical analysis of metagenomic data from the microbial community involved in industrial bioleaching 12:10 Mon 19 Sep, 2011 :: 5.57 Ingkarni Wardli :: Ms Susana SotoRojo :: University of Adelaide
In the last two decades heap bioleaching has become established as a successful commercial option for recovering copper from lowgrade secondary sulfide ores. Geneticsbased approaches have recently been employed in the task of characterizing mineral processing bacteria. Data analysis is a key issue and thus the implementation of adequate mathematical and statistical tools is of fundamental importance to draw reliable conclusions. In this talk I will give a recount of two specific problems that we have been working on. The first regarding experimental design and the latter on modeling composition and activity of the microbial consortium. 

Can statisticians do better than random guessing? 12:10 Tue 20 Sep, 2011 :: Napier 210 :: A/Prof Inge Koch :: School of Mathematical Sciences
In the finance or credit risk area, a bank may want to assess whether a client is going to default, or be able to meet the repayments. In the assessment of benign or malignant tumours, a correct diagnosis is required. In these and similar examples, we make decisions based on data. The classical ttests provide a tool for making such decisions. However, many modern data sets have more variables than observations, and the classical rules may not be any better than random guessing. We consider Fisher's rule for classifying data into two groups, and show that it can break down for highdimensional data. We then look at ways of overcoming some of the weaknesses of the classical rules, and I show how these "postmodern" rules perform in practice. 

Estimating transmission parameters for the swine flu pandemic 15:10 Fri 23 Sep, 2011 :: 7.15 Ingkarni Wardli :: Dr Kathryn Glass :: Australian National University
Media...Following the onset of a new strain of influenza with pandemic potential, policy makers need specific advice on how fast the disease is spreading, who is at risk, and what interventions are appropriate for slowing transmission. Mathematical models play a key role in comparing interventions and identifying the best response, but models are only as good as the data that inform them. In the early stages of the 2009 swine flu outbreak, many researchers estimated transmission parameters  particularly the reproduction number  from outbreak data. These estimates varied, and were often biased by data collection methods, misclassification of imported cases or as a result of early stochasticity in case numbers. I will discuss a number of the pitfalls in achieving good quality parameter estimates from early outbreak data, and outline how best to avoid them.
One of the early indications from swine flu data was that children were disproportionately responsible for disease spread. I will introduce a new method for estimating agespecific transmission parameters from both outbreak and seroprevalence data. This approach allows us to take account of empirical data on human contact patterns, and highlights the need to allow for asymmetric mixing matrices in modelling disease transmission between age groups. Applied to swine flu data from a number of different countries, it presents a consistent picture of higher transmission from children. 

Statistical analysis of schoolbased student performance data 12:10 Mon 10 Oct, 2011 :: 5.57 Ingkarni Wardli :: Ms Jessica Tan :: University of Adelaide
Join me in the journey of being a statistician for 15 minutes of your day (if you are not already one) and experience the task of data cleaning without having to get your own hands dirty. Most of you may have sat the Basic Skills Tests when at school or know someone who currently has to do the NAPLAN (National Assessment Program  Literacy and Numeracy) tests. Tests like these assess student progress and can be used to accurately measure school performance. In trying to answer the research question: "what conclusions about student progress and school performance can be drawn from NAPLAN data or data of a similar nature, using mathematical and statistical modelling and analysis techniques?", I have uncovered some interesting results about the data in my initial data analysis which I shall explain in this talk. 

Statistical modelling for some problems in bioinformatics 11:10 Fri 14 Oct, 2011 :: B.17 Ingkarni Wardli :: Professor Geoff McLachlan :: The University of Queensland
Media...In this talk we consider some statistical analyses of data arising in
bioinformatics. The problems include the detection of differential
expression in microarray geneexpression data, the clustering of
timecourse geneexpression data and, lastly, the analysis of
modernday cytometric data. Extensions are considered to the procedures
proposed for these three problems in McLachlan et al. (Bioinformatics, 2006),
Ng et al. (Bioinformatics, 2006), and Pyne et al. (PNAS, 2009), respectively.
The latter references are available at http://www.maths.uq.edu.au/~gjm/. 

On the role of mixture distributions in the modelling of heterogeneous data 15:10 Fri 14 Oct, 2011 :: 7.15 Ingkarni Wardli :: Prof Geoff McLachlan :: University of Queensland
Media...We consider the role that finite mixture distributions have played in the modelling of heterogeneous data, in particular for clustering continuous data via mixtures of normal distributions. A very brief history is given starting with the seminal papers by Day and Wolfe in the sixties before the appearance of the EM algorithm. It was the publication in 1977 of the latter algorithm by Dempster, Laird, and Rubin that greatly stimulated interest in the use of finite mixture distributions to model heterogeneous data. This is because the fitting of mixture models by maximum likelihood is a classic example of a problem that is simplified considerably by the EM's conceptual unification of maximum likelihood estimation from data that can be viewed as being incomplete. In recent times there has been a proliferation of applications in which the number of experimental units n is comparatively small but the underlying dimension p is extremely large as, for example, in microarraybased genomics and other highthroughput experimental approaches. Hence there has been increasing attention given not only in bioinformatics and machine learning, but also in mainstream statistics, to the analysis of complex data in this situation where n is small relative to p. The latter part of the talk shall focus on the modelling of such highdimensional data using mixture distributions. 

Metric geometry in data analysis 13:10 Fri 11 Nov, 2011 :: B.19 Ingkarni Wardli :: Dr Facundo Memoli :: University of Adelaide
The problem of object matching under invariances can be
studied using certain tools from metric geometry. The central idea is
to regard
objects as metric spaces (or metric measure spaces). The type of
invariance that one wishes to have in the matching is encoded by the
choice of the metrics with which one endows the objects. The standard
example is matching objects in Euclidean space under rigid isometries:
in this
situation one would endow the objects with the Euclidean metric. More
general scenarios are possible in which the desired invariance cannot
be reflected by the preservation of an ambient space metric. Several
ideas due to M. Gromov are useful for approaching this problem. The
GromovHausdorff distance is a natural candidate for doing this.
However, this metric leads to very hard combinatorial optimization
problems and it is difficult to relate to previously reported
practical approaches to the problem of object matching. I will discuss
different variations of these ideas, and in particular will show a
construction of an L^p version of the GromovHausdorff metric, called
the GromovWassestein distance, which is based on mass transportation
ideas. This new metric directly leads to quadratic optimization
problems on continuous variables with linear constraints.
As a consequence of establishing several lower bounds, it turns out
that several invariants of metric measure spaces turn out to be
quantitatively stable in the GW sense. These invariants provide
practical tools for the discrimination of shapes and connect the GW
ideas to a number of preexisting approaches. 

Fluid flows in microstructured optical fibre fabrication 15:10 Fri 25 Nov, 2011 :: B.17 Ingkarni Wardli :: Mr Hayden Tronnolone :: University of Adelaide
Optical fibres are used extensively in modern telecommunications as they allow the transmission of information at high speeds. Microstructured optical fibres are a relatively new fibre design in which a waveguide for light is created by a series of air channels running along the length of the material. The flexibility of this design allows optical fibres to be created with adaptable (and previously unrealised) optical properties. However, the fluid flows that arise during fabrication can greatly distort the geometry, which can reduce the effectiveness of a fibre or render it useless. I will present an overview of the manufacturing process and highlight the difficulties. I will then focus on surfacetension driven deformation of the macroscopic version of the fibre extruded from a reservoir of molten glass, occurring during fabrication, which will be treated as a twodimensional Stokes flow problem. I will outline two different complexvariable numerical techniques for solving this problem along with comparisons of the results, both to other models and to experimental data.


Spatialpoint data sets and the Polya distribution 15:10 Fri 27 Apr, 2012 :: B.21 Ingkarni Wardli :: Dr Benjamin Binder :: The University of Adelaide
Media...Spatialpoint data sets, generated from a wide range of
physical systems and mathematical
models, can be analyzed by counting the number of objects in equally
sized bins. We find that the bin
counts are related to the Polya distribution. New indexes are
developed which quantify whether or not a
spatial data set is at its most evenly distributed state. Using three
case studies (Lagrangian fluid particles in chaotic laminar
flows, cellular automata agents in discrete models, and biological
cells within colonies),
we calculate the indexes and predict the spatialstate of the system. 

A brief introduction to Support Vector Machines 12:30 Mon 4 Jun, 2012 :: 5.57 Ingkarni Wardli :: Mr Tyman Stanford :: University of Adelaide
Media...Support Vector Machines (SVMs) are used in a variety of contexts for a range of purposes including regression, feature selection and classification. To convey the basic principles of SVMs, this presentation will focus on the application of SVMs to classification. Classification (or discrimination), in a statistical sense, is supervised model creation for the purpose of assigning future observations to a group or class. An example might be determining healthy or diseased labels to patients from p characteristics obtained from a blood sample.
While SVMs are widely used, they are most successful when the data have one or more of the following properties:
The data are not consistent with a standard probability distribution.
The number of observations, n, used to create the model is less than the number of predictive features, p. (The socalled smalln, bigp problem.)
The decision boundary between the classes is likely to be nonlinear in the feature space.
I will present a short overview of how SVMs are constructed, keeping in mind their purpose. As this presentation is part of a double postgrad seminar, I will keep it to a maximum of 15 minutes.


Comparison of spectral and wavelet estimators of transfer function for linear systems 12:10 Mon 18 Jun, 2012 :: B.21 Ingkarni Wardli :: Mr Mohd Aftar Abu Bakar :: University of Adelaide
Media...We compare spectral and wavelet estimators of the response amplitude operator (RAO) of a linear system, with various input signals and added noise scenarios. The comparison is based on a model of a heaving buoy wave energy device (HBWED), which oscillates vertically as a single mode of vibration linear system.
HBWEDs and other single degree of freedom wave energy devices such as the oscillating wave surge convertors (OWSC) are currently deployed in the ocean, making single degree of freedom wave energy devices important systems to both model and analyse in some detail. However, the results of the comparison relate to any linear system.
It was found that the wavelet estimator of the RAO offers no advantage over the spectral estimators if both input and response time series data are noise free and long time series are available. If there is noise on only the response time series, only the wavelet estimator or the spectral estimator that uses the crossspectrum of the input and response signals in the numerator should be used. For the case of noise on only the input time series, only the spectral estimator that uses the crossspectrum in the denominator gives a sensible estimate of the RAO. If both the input and response signals are corrupted with noise, a modification to both the input and response spectrum estimates can provide a good estimator of the RAO. However, a combination of wavelet and spectral methods is introduced as an alternative RAO estimator.
The conclusions apply for autoregressive emulators of sea surface elevation, impulse, and pseudorandom binary sequences (PRBS) inputs. However, a wavelet estimator is needed in the special case of a chirp input where the signal has a continuously varying frequency. 

AFL Tipping isn't all about numbers and stats...or is it..... 12:10 Mon 6 Aug, 2012 :: B.21 Ingkarni Wardli :: Ms Jessica Tan :: University of Adelaide
Media...The result of an AFL game is always unpredictable  we all know that. Hence why we discuss the weekend's upsets and the local tipping competition as part of the "watercooler and weekend" conversation on a Monday morning. Different people use various weird and wonderful techniques or criteria to predict the winning team. With readily available data, I will investigate and compare various strategies and define a measure of the hardness of a round (full acknowledgements will be made in my presentation). Hopefully this will help me for next year's tipping competition... 

Star Wars Vs The Lord of the Rings: A Survival Analysis 12:10 Mon 27 Aug, 2012 :: B.21 Ingkarni Wardli :: Mr Christopher Davies :: University of Adelaide
Media...Ever wondered whether you are more likely to die in the Galactic Empire or Middle Earth? Well this is the postgraduate seminar for you!
I'll be attempting to answer this question using survival analysis, the statistical method of choice for investigating time to event data.
Spoiler Warning: This talk will contain references to the deaths of characters in the above movie sagas. 

Principal Component Analysis (PCA) 12:30 Mon 3 Sep, 2012 :: B.21 Ingkarni Wardli :: Mr Lyron Winderbaum :: University of Adelaide
Media...Principal Component Analysis (PCA) has become something of a buzzword recently in a number of disciplines including the gene expression and facial recognition. It is a classical, and fundamentally simple, concept that has been around since the early 1900's, its recent popularity largely due to the need for dimension reduction techniques in analyzing high dimensional data that has become more common in the last decade, and the availability of computing power to implement this. I will explain the concept, prove a result, and give a couple of examples. The talk should be accessible to all disciplines as it (should?) only assume first year linear algebra, the concept of a random variable, and covariance.


Epidemic models in socially structured populations: when are simple models too simple? 14:00 Thu 25 Oct, 2012 :: 5.56 Ingkarni Wardli :: Dr Lorenzo Pellis :: The University of Warwick
Both age and household structure are recognised as important heterogeneities affecting epidemic spread of infectious pathogens, and many models exist nowadays that include either or both forms of heterogeneity. However, different models may fit aggregate epidemic data equally well and nevertheless lead to different predictions of public health interest. I will here present an overview of stochastic epidemic models with increasing complexity in their social structure, focusing in particular on households models. For these models, I will present recent results about the definition and computation of the basic reproduction number R0 and its relationship with other threshold parameters. Finally, I will use these results to compare models with no, either or both age and household structure, with the aim of quantifying the conditions under which each form of heterogeneity is relevant and therefore providing some criteria that can be used to guide model design for realtime predictions. 

Epidemic models in socially structured populations: when are simple models too simple? 14:00 Thu 25 Oct, 2012 :: 5.56 Ingkarni Wardli :: Dr Lorenzo Pellis :: The University of Warwick
Both age and household structure are recognised as important heterogeneities affecting epidemic spread of infectious pathogens, and many models exist nowadays that include either or both forms of heterogeneity. However, different models may fit aggregate epidemic data equally well and nevertheless lead to different predictions of public health interest. I will here present an overview of stochastic epidemic models with increasing complexity in their social structure, focusing in particular on households models. For these models, I will present recent results about the definition and computation of the basic reproduction number R0 and its relationship with other threshold parameters. Finally, I will use these results to compare models with no, either or both age and household structure, with the aim of quantifying the conditions under which each form of heterogeneity is relevant and therefore providing some criteria that can be used to guide model design for realtime predictions. 

Spatiotemporally Autoregressive Partially Linear Models with Application to the Housing Price Indexes of the United States 12:10 Mon 12 Nov, 2012 :: B.21 Ingkarni Wardli :: Ms Dawlah Alsulami :: University of Adelaide
Media...We propose a Spatiotemporal Autoregressive Partially Linear Regression ( STARPLR) model for data observed irregularly over space and regularly in time. The model is capable of catching possible non linearity and nonstationarity in space by coefficients to depend on locations. We suggest twostep procedure to estimate both the coefficients and the unknown function, which is readily implemented and can be computed even for large spatiotemoral data sets. As an illustration, we apply our model to analyze the 51 States' House Price Indexes (HPIs) in USA. 

Colour 12:10 Mon 13 May, 2013 :: B.19 Ingkarni Wardli :: Lyron Winderbaum :: University of Adelaide
Media...Colour is a powerful tool in presenting data, but it can be tricky to choose just the right colours to represent your data honestly  do the colours used in your heatmap overemphasise the differences between particular values over others? does your choice of colours overemphasize one when they should be represented as equal? etc. All these questions are fundamentally based in how we perceive colour. There has been alot of research into how we perceive colour in the past century, and some interesting results. I will explain how a `standard observer' was found empirically and used to develop an absolute reference standard for colour in 1931. How although the common RedGreenBlue representation of colour is useful and intuitive, distances between colours in this space do not reflect our perception of difference between colours and how alternative, perceptually focused colourspaces where introduced in 1976. I will go on to explain how these results can be used to provide simple mechanisms by which to choose colours that satisfy particular properties such as being equally different from each other, or being linearly more different in sequence, or maintaining such properties when transferred to greyscale, or for a colourblind person. 

Progress in the prediction of buoyancyaffected turbulence 15:10 Fri 17 May, 2013 :: B.18 Ingkarni Wardli :: Dr Daniel Chung :: University of Melbourne
Media...Buoyancyaffected turbulence represents a significant challenge to our
understanding, yet it dominates many important flows that occur in the
ocean and atmosphere. The presentation will highlight some recent progress
in the characterisation, modelling and prediction of buoyancyaffected
turbulence using direct and largeeddy simulations, along with implications
for the characterisation of mixing in the ocean and the lowcloud feedback
in the atmosphere. Specifically, direct numerical simulation data of
stratified turbulence will be employed to highlight the importance of
boundaries in the characterisation of turbulent mixing in the ocean. Then,
a subgridscale model that captures the anisotropic character of stratified
mixing will be developed for largeeddy simulation of buoyancyaffected
turbulence. Finally, the subgridscale model is utilised to perform a
systematic largeeddy simulation investigation of the archetypal lowcloud
regimes, from which the link between the lowertropospheric stability
criterion and the cloud fraction interpreted. 

Coincidences 14:10 Mon 20 May, 2013 :: 7.15 Ingkarni Wardli :: A/Prof. Robb Muirhead :: School of Mathematical Sciences
Media...This is a lighthearted (some would say contentfree) talk about coincidences, those surprising concurrences of events that are often perceived as meaningfully related, with no apparent causal connection. Time permitting, it will touch on topics like:
Patterns in data and the dangers of looking for patterns, unspecified ahead of time, and trying to "explain" them; e.g. post hoc subgroup analyses, cancer clusters, conspiracy theories ...
Matching problems; e.g. the birthday problem and extensions
People who win a lottery more than once  how surprised should we really be? What's the question we should be asking?
When you become familiar with a new word, and see it again soon afterwards, how surprised should you be?
Caution: This is a shortened version of a talk that was originally prepared for a group of nonmathematicians and nonstatisticians, so it's mostly nontechnical. It probably does not contain anything you don't already know  it will be an amazing coincidence if it does! 

FireAtmosphere Models 12:10 Mon 29 Jul, 2013 :: B.19 Ingkarni Wardli :: Mika Peace :: University of Adelaide
Media...Fire behaviour models are increasingly being used to assist in planning and operational decisions for bush fires and fuel reduction burns. Rate of spread (ROS) of the fire front is a key output of such models. The ROS value is typically calculated from a formula which has been derived from empirical data, using very simple meteorological inputs. We have used a coupled fireatmosphere model to simulate real bushfire events. The results show that complex interactions between a fire and the atmosphere can have a significant influence on fire spread, thus highlighting the limitations of a model that uses simple meteorological inputs. 

PrivacyPreserving Computation: Not just for secretive millionaires* 12:10 Mon 19 Aug, 2013 :: B.19 Ingkarni Wardli :: Wilko Henecka :: University of Adelaide
Media...PPC enables parties to share information while preserving their data privacy.
I will introduce the concept, show a common ingredient and illustrate its use in an example.
*See Yao's Millionaires Problem. 

Medical Decision Analysis 12:10 Mon 2 Sep, 2013 :: B.19 Ingkarni Wardli :: Eka Baker :: University of Adelaide
Doctors make life changing decisions every day based on clinical trial data. However, this data is often obtained from studies on healthy individuals or on patients with only the disease that a treatment is targeting. Outside of these studies, many patients will have other conditions that may affect the predicted benefit of receiving a certain treatment. I will talk about what clinical trials are, how to measure the benefit of treatments, and how having multiple conditions (comorbidities) will affect the benefit of treatments. 

Classification Using Censored Functional Data 15:10 Fri 18 Oct, 2013 :: B.18 Ingkarni Wardli :: A/Prof Aurore Delaigle :: University of Melbourne
Media...We consider classification of functional data. This problem has received a lot of attention in the literature in the case where the curves are all observed on the same interval. A difficulty in applications is that the functional curves can be supported on quite different intervals, in which case standard methods of analysis cannot be used. We are interested in constructing classifiers for curves of this type. More precisely, we consider classification of functions supported on a compact interval, in cases where the training sample consists of functions observed on other intervals, which may differ among the training curves.
We propose several methods, depending on whether or not the observable intervals
overlap by a significant amount. In the case where these intervals differ a lot, our procedure involves extending the curves outside the interval where they were observed. We suggest a new nonparametric approach for doing this.
We also introduce flexible ways of combining potential differences in shapes of the curves from different populations, and potential differences between the endpoints of
the intervals where the curves from each population are observed. 

All at sea with spectral analysis 11:10 Tue 19 Nov, 2013 :: Ingkarni Wardli Level 5 Room 5.56 :: A/Prof Andrew Metcalfe :: The University of Adelaide
The steady state response of a single degree of freedom damped linear stystem to a sinusoidal input is a sinusoidal function at the same frequency, but generally with a different amplitude and a phase shift. The analogous result for a random stationary input can be described in terms of input and response spectra and a transfer function description of the linear system.
The practical use of this result is that the parameters of a linear system can be estimated from the input and response spectra, and the response spectrum can be predicted if the transfer function and input spectrum are known.
I shall demonstrate these results with data from a small ship in the North Sea. The results from the sea trial raise the issue of nonlinearity, and second order amplitude response functons are obtained using autoregressive estimators.
The possibility of using wavelets rather than spectra is consedred in the context of single degree of freedom linear systems.
Everybody welcome to attend.
Please not a change of venue  we will be in room 5.56 

The structuring role of chaotic stirring on pelagic ecosystems 11:10 Fri 28 Feb, 2014 :: B19 Ingkarni Wardli :: Dr Francesco d'Ovidio :: Universite Pierre et Marie Curie (Paris VI)
The open ocean upper layer is characterized by a complex transport dynamics occuring over different spatiotemporal scales. At the scale of 10100 km  which covers the so called mesoscale and part of the submesoscale  in situ and remote sensing observations detect strong variability in physical and biogeochemical fields like sea surface temperature, salinity, and chlorophyll concentration. The calculation of Lyapunov exponent and other nonlinear diagnostics applied to the surface currents have allowed to show that an important part of this tracer variability is due to chaotic stirring. Here I will extend this analysis to marine ecosystems. For primary producers, I will show that stable and unstable manifolds of hyperbolic points embedded in the surface velocity field are able to structure the phytoplanktonic community in fluid dynamical niches of dominant types, where competition can locally occur during bloom events. By using data from tagged whales, frigatebirds, and elephant seals, I will also show that chaotic stirring affects the behaviour of higher trophic levels. In perspective, these relations between transport structures and marine ecosystems can be the base for a biodiversity index constructued from satellite information, and therefore able to monitor key aspects of the marine biodiversity and its temporal variability at the global scale. 

Viscoelastic fluids: mathematical challenges in determining their relaxation spectra 15:10 Mon 17 Mar, 2014 :: 5.58 Ingkarni Wardli :: Professor Russell Davies :: Cardiff University
Determining the relaxation spectrum of a viscoelastic fluid is a crucial step before a linear or nonlinear constitutive model can be applied. Information about the relaxation spectrum is obtained from simple flow experiments such as creep or oscillatory shear. However, the determination process involves the solution of one or more highly illposed inverse problems. The availability of only discrete data, the presence of noise in the data, as well as incomplete data, collectively make the problem very hard to solve.
In this talk I will illustrate the mathematical challenges inherent in determining relaxation spectra, and also introduce the method of wavelet regularization which enables the representation of a continuous relaxation spectrum by a set of hyperbolic scaling functions.


Bayesian Indirect Inference 12:10 Mon 14 Apr, 2014 :: B.19 Ingkarni Wardli :: Brock Hermans :: University of Adelaide
Media...Bayesian likelihoodfree methods saw the resurgence of Bayesian statistics through the use of computer sampling techniques. Since the resurgence, attention has focused on socalled 'summary statistics', that is, ways of summarising data that allow for accurate inference to be performed. However, it is not uncommon to find data sets in which the summary statistic approach is not sufficient.
In this talk, I will be summarising some of the likelihoodfree methods most commonly used (don't worry if you've never seen any Bayesian analysis before), as well as looking at Bayesian Indirect Likelihood, a new way of implementing Bayesian analysis which combines new inference methods with some of the older computational algorithms. 

Networkbased approaches to classification and biomarker identification in metastatic melanoma 15:10 Fri 2 May, 2014 :: B.21 Ingkarni Wardli :: Associate Professor Jean Yee Hwa Yang :: The University of Sydney
Media...Finding prognostic markers has been a central question in much of current research in medicine and biology. In the last decade, approaches to prognostic prediction within a genomics setting are primarily based on changes in individual genes / protein. Very recently, however, network based approaches to prognostic prediction have begun to emerge which utilize interaction information between genes. This is based on the believe that largescale molecular interaction networks are dynamic in nature and changes in these networks, rather than changes in individual genes/proteins, are often drivers of complex diseases such as cancer.
In this talk, I use data from stage III melanoma patients provided by Prof. Mann from Melanoma Institute of Australia to discuss how network information can be utilize in the analysis of gene expression analysis to aid in biological interpretation. Here, we explore a number of novel and previously published networkbased prediction methods, which we will then compare to the common singlegene and geneset methods with the aim of identifying more biologically interpretable biomarkers in the form of networks. 

Group meeting 15:10 Fri 6 Jun, 2014 :: 5.58 Ingkarni Wardli :: Meng Cao and Trent Mattner :: University of Adelaide
Meng Cao:: Multiscale modelling couples patches of nonlinear wavelike simulations ::
Abstract:
The multiscale gaptooth scheme is built from given microscale simulations of complicated physical processes to empower macroscale simulations. By coupling small patches of simulations over unsimulated physical gaps, large savings in computational time are possible. So far the gaptooth scheme has been developed for dissipative systems, but wave systems are also of great interest. This article develops the gaptooth scheme to the case of nonlinear microscale simulations of wavelike systems. Classic macroscale interpolation provides a generic coupling between patches that achieves arbitrarily high order consistency between the multiscale scheme and the underlying microscale dynamics. Eigenanalysis indicates that the resultant gaptooth scheme empowers feasible computation of large scale simulations of wavelike dynamics with complicated underlying physics. As an pilot study, we implement numerical simulations of dambreaking waves by the gaptooth scheme. Comparison between a gaptooth simulation, a microscale simulation over the whole domain, and some published experimental data on dam breaking, demonstrates that the gaptooth scheme feasibly computes large scale wavelike dynamics with computational savings.
Trent Mattner :: Coupled atmospherefire simulations of the Canberra 2003 bushfires using WRFSfire :: Abstract:
The Canberra fires of January 18, 2003 are notorious for the extreme fire behaviour and fireatmospheretopography interactions that occurred, including leeslope fire channelling, pyrocumulonimbus development and tornado formation. In this talk, I will discuss coupled fireweather simulations of the Canberra fires using WRFSFire. In these simulations, a firebehaviour model is used to dynamically predict the evolution of the fire front according to local atmospheric and topographic conditions, as well as the associated heat and moisture fluxes to the atmosphere. It is found that the predicted fire front and heat flux is not too bad, bearing in mind the complexity of the problem and the severe modelling assumptions made. However, the predicted moisture flux is too low, which has some impact on atmospheric dynamics. 

All's Fair in Love and Statistics 12:35 Mon 28 Jul, 2014 :: B.19 Ingkarni Wardli :: Annie Conway :: University of Adelaide
Media...Earlier this year Wired.com published an article about a "math genius" who found true love after scraping and analysing data from a dating site. In this talk I will be investigating the actual mathematics that he used, in particular methods for clustering categorical data, and whether or not the approach was successful. 

Fast computation of eigenvalues and eigenfunctions on bounded plane domains 15:10 Fri 1 Aug, 2014 :: B.18 Ingkarni Wardli :: Professor Andrew Hassell :: Australian National University
Media...I will describe a new method for numerically computing eigenfunctions and eigenvalues on certain plane domains, derived from the socalled "scaling method" of Vergini and Saraceno. It is based on properties of the DirichlettoNeumann map on the domain, which relates a function f on the boundary of the domain to the normal derivative (at the boundary) of the eigenfunction with boundary data f. This is a topic of independent interest in pure mathematics. In my talk I will try to emphasize the inteplay between theory and applications, which is very rich in this situation. This is joint work with numerical analyst Alex Barnett (Dartmouth). 

Inferring absolute population and recruitment of southern rock lobster using only catch and effort data 12:35 Mon 22 Sep, 2014 :: B.19 Ingkarni Wardli :: John Feenstra :: University of Adelaide
Media...Abundance estimates from a datalimited version of catch survey analysis are compared to those from a novel oneparameter deterministic method. Bias of both methods is explored using simulation testing based on a more complex datarich stock assessment population dynamics fishery operating model, exploring the impact of both varying levels of observation error in data as well as model process error. Recruitment was consistently better estimated than legal size population, the latter most sensitive to increasing observation errors. A hybrid of the datalimited methods is proposed as the most robust approach. A more statistically conventional errorinvariables approach may also be touched upon if enough time. 

Optimally Chosen Quadratic Forms for Partitioning Multivariate Data 13:10 Tue 14 Oct, 2014 :: Ingkarni Wardli 715 Conference Room :: Assoc. Prof. Inge Koch :: School of Mathematical Sciences
Media...Quadratic forms are commonly used in linear algebra. For ddimensional vectors they have a matrix representation, Q(x) = x'Ax, for some symmetric matrix A. In statistics quadratic forms are defined for ddimensional random vectors, and one of the bestknown quadratic forms is the Mahalanobis distance of two random vectors.
In this talk we want to partition a quadratic form Q(X) = X'MX, where X is a random vector, and M a symmetric matrix, that is, we want to find a ddimensional random vector W such that Q(X) = W'W. This problem has many solutions. We are interested in a solution or partition W of X such that pairs of corresponding variables (X_j, W_j) are highly correlated and such that W is simpler than the given X.
We will consider some natural candidates for W which turn out to be suboptimal in the sense of the above constraints, and we will then exhibit the optimal solution. Solutions of this type are useful in the wellknown Tsquare statistic. We will see in examples what these solutions look like. 

Happiness and social information flow: Computational social science through data. 15:10 Fri 7 Nov, 2014 :: EM G06 (Engineering & Maths Bldg) :: Dr Lewis Mitchell :: University of Adelaide
The recent explosion in big data coming from online social networks has led to an increasing interest in bringing quantitative methods to bear on questions in social science. A recent highprofile example is the study of emotional contagion, which has led to significant challenges and controversy. This talk will focus on two issues related to emotional contagion, namely remotesensing of populationlevel wellbeing and the problem of information flow across a social network. We discuss some of the challenges in working with massive online data sets, and present a simple tool for measuring largescale happiness from such data. By combining over 10 million geolocated messages collected from Twitter with traditional census data we uncover geographies of happiness at the scale of states and cities, and discuss how these patterns may be related to traditional wellbeing measures and public health outcomes. Using tools from information theory we also study information flow between individuals and how this may relate to the concept of predictability for human behaviour. 

Happiness and social information flow: Computational social science through data. 15:10 Fri 7 Nov, 2014 :: EM G06 (Engineering & Maths Bldg) :: Dr Lewis Mitchell :: University of Adelaide
The recent explosion in big data coming from online social networks has led to an increasing interest in bringing quantitative methods to bear on questions in social science. A recent highprofile example is the study of emotional contagion, which has led to significant challenges and controversy. This talk will focus on two issues related to emotional contagion, namely remotesensing of populationlevel wellbeing and the problem of information flow across a social network. We discuss some of the challenges in working with massive online data sets, and present a simple tool for measuring largescale happiness from such data. By combining over 10 million geolocated messages collected from Twitter with traditional census data we uncover geographies of happiness at the scale of states and cities, and discuss how these patterns may be related to traditional wellbeing measures and public health outcomes. Using tools from information theory we also study information flow between individuals and how this may relate to the concept of predictability for human behaviour. 

Boundary behaviour of Hitchin and hypo flows with leftinvariant initial data 12:10 Fri 27 Feb, 2015 :: Ingkarni Wardli B20 :: Vicente Cortes :: University of Hamburg
Hitchin and hypo flows constitute a system of first order pdes for the construction of
Ricciflat Riemannian mertrics of special holonomy in dimensions 6, 7 and 8.
Assuming that the initial geometric structure is leftinvariant, we study whether the resulting Ricciflat manifolds can be extended in a natural way to complete Ricciflat manifolds. This talk is based on joint work with Florin Belgun, Marco Freibert and Oliver Goertsches, see arXiv:1405.1866 (math.DG). 

Multivariate regression in quantitative finance: sparsity, structure, and robustness 15:10 Fri 1 May, 2015 :: Engineering North N132 :: A/Prof Mark Coates :: McGill University
Many quantitative hedge funds around the world strive to predict future equity and futures returns based on many sources of information, including historical returns and economic data. This leads to a multivariate regression problem. Compared to many regression problems, the signaltonoise ratio is extremely low, and profits can be realized if even a small fraction of the future returns can be accurately predicted. The returns generally have heavytailed distributions, further complicating the regression procedure.
In this talk, I will describe how we can impose structure into the regression problem in order to make detection and estimation of the very weak signals feasible. Some of this structure consists of an assumption of sparsity; some of it involves identification of common factors to reduce the dimension of the problem. I will also describe how we can formulate alternative regression problems that lead to more robust solutions that better match the performance metrics of interest in the finance setting. 

Medical Decision Making 12:10 Mon 11 May, 2015 :: Napier LG29 :: Eka Baker :: University of Adelaide
Media...Practicing physicians make treatment decisions based on clinical trial data every day. This data is based on trials primarily conducted on healthy volunteers, or on those with only the disease in question. In reality, patients do have existing conditions that can affect the benefits and risks associated with receiving these treatments.
In this talk, I will explain how we modified an already existing Markov model to show the progression of treatment of a single condition over time. I will then explain how we adapted this to a different condition, and then created a combined model, which demonstrated how both diseases and treatments progressed on the same patient over their lifetime. 

Can mathematics help save energy in computing? 15:10 Fri 22 May, 2015 :: Engineering North N132 :: Prof Markus Hegland :: ANU
Media...Recent development of computational hardware is characterised by two trends:
1. High levels of duplication of computational capabilities in multicore, parallel and GPU processing, and, 2. Substantially faster development of the speed of computational technology compared to communication
technology
A consequence of these two trends is that energy costs of modern computing devices from mobile phones to
supercomputers are increasingly dominated by communication costs. In order to save energy one would thus
need to reduce the amount of data movement within the computer. This can be achieved by recomputing results
instead of communicating them. The resulting increase in computational redundancy may also be used to make
the computations more robust against hardware faults. Paradoxically, by doing more (computations) we do
use less (energy).
This talk will first discuss for a simple example how a mathematical understanding can be applied to improve
computational results using extrapolation. Then the problem of energy consumption in computational hardware
will be considered. Finally some recent work will be discussed which shows how redundant computing is used
to mitigate computational faults and thus to save energy.


Group Meeting 15:10 Fri 29 May, 2015 :: EM 213 :: Dr Judy Bunder :: University of Adelaide
Talk : Patch dynamics for efficient exascale simulations
Abstract
Massive parallelisation has lead to a dramatic increase in available computational power.
However, data transfer speeds have failed to keep pace and are the major limiting factor in the development of exascale computing. New algorithms must be developed which minimise the transfer of data. Patch dynamics is a computational macroscale modelling scheme which provides a coarse macroscale solution of a problem defined on a fine microscale by dividing the domain into many nonoverlapping, coupled patches. Patch dynamics is readily adaptable to massive parallelisation as each processor core can evaluate the dynamics on one, or a few, patches. However, patch coupling conditions interpolate across the unevaluated parts of the domain between patches and require almost continuous data transfer. We propose a modified patch dynamics scheme which minimises data transfer by only reevaluating the patch coupling conditions at `mesoscale' time scales which are significantly larger than the microscale time of the microscale problem. We analyse and quantify the error arising from patch dynamics with mesoscale temporal coupling. 

Be careful not to impute something ridiculous! 12:20 Mon 24 Aug, 2015 :: Benham Labs G10 :: Sarah James :: University of Adelaide
Media...When learning how to make inferences about data, we are given all of the information with no missing values. In reality data sets are often missing data, anywhere from 5% of the data to extreme cases such as 70% of the data. Instead of getting rid of the incomplete cases we can impute predictions for each missing value and make inferences on the resulting data set. But just how sensible are our predictions? In this talk, we will learn how to deal with missing data and talk about why we have to be careful with our predictions. 

Ocean dynamics of Gulf St Vincent: a numerical study 12:10 Mon 2 Nov, 2015 :: Benham Labs G10 :: Henry Ellis :: University of Adelaide
Media...The aim of this research is to determine the physical dynamics of ocean circulation within Gulf St. Vincent, South Australia, and the exchange of momentum, nutrients, heat, salt and other water properties between the gulf and shelf via Investigator Strait and Backstairs Passage. The project aims to achieve this through the creation of highresolution numerical models, combined with new and historical observations from a moored instrument package, satellite data, and shipboard surveys.
The quasirealistic highresolution models are forced using boundary conditions generated by existing larger scale ROMS models, which in turn are forced at the boundary by a global model, creating a global to regional to local model network. Climatological forcing is done using European Centres for Medium range Weather Forecasting (ECMWF) data sets and is consistent over the regional and local models. A series of conceptual models are used to investigate the relative importance of separate physical processes in addition to fully forced quasirealistic models.
An outline of the research to be undertaken is given:
ÃÂ¢ÃÂÃÂ¢ Connectivity of Gulf St. Vincent with shelf waters including seasonal variation due to wind and thermoclinic patterns;
ÃÂ¢ÃÂÃÂ¢ The role of winter time cooling and formation of eddies in flushing the gulf;
ÃÂ¢ÃÂÃÂ¢ The formation of a temperature front within the gulf during summer time; and
ÃÂ¢ÃÂÃÂ¢ The connectivity and importance of nutrient rich, cool, water upwelling from the Bonney Coast with the gulf via Backstairs Passage during summer time. 

Modelling Coverage in RNA Sequencing 09:00 Mon 9 Nov, 2015 :: Ingkarni Wardli 5.57 :: Arndt von Haeseler :: Max F Perutz Laboratories, University of Vienna
Media...RNA sequencing (RNAseq) is the method of choice for measuring the expression of RNAs in a cell population. In an RNAseq experiment, sequencing the full length of larger RNA molecules requires fragmentation into smaller pieces to be compatible with limited read lengths of most deepsequencing technologies. Unfortunately, the issue of nonuniform coverage across a genomic feature has been a concern in RNAseq and is attributed to preferences for certain fragments in steps of library preparation and sequencing. However, the disparity between the observed nonuniformity of read coverage in RNAseq data and the assumption of expected uniformity elicits a query on the read coverage profile one should expect across a transcript, if there are no biases in the sequencing protocol. We propose a simple model of unbiased fragmentation where we find that the expected coverage profile is not uniform and, in fact, depends on the ratio of fragment length to transcript length. To compare the nonuniformity proposed by our model with experimental data, we extended this simple model to incorporate empirical attributes matching that of the sequenced transcript in an RNAseq experiment. In addition, we imposed an experimentally derived distribution on the frequency at which fragment lengths occur.
We used this model to compare our theoretical prediction with experimental data and with the uniform coverage model. If time permits, we will also discuss a potential application of our model. 

Use of epidemic models in optimal decision making 15:00 Thu 19 Nov, 2015 :: Ingkarni Wardli 5.57 :: Tim Kinyanjui :: School of Mathematics, The University of Manchester
Media...Epidemic models have proved useful in a number of applications in epidemiology. In this work, I will present two areas that we have used modelling to make informed decisions. Firstly, we have used an age structured mathematical model to describe the transmission of Respiratory Syncytial Virus in a developed country setting and to explore different vaccination strategies. We found that delayed infant vaccination has significant potential in reducing the number of hospitalisations in the most vulnerable group and that most of the reduction is due to indirect protection. It also suggests that marked public health benefit could be achieved through RSV vaccine delivered to age groups not seen as most at risk of severe disease. The second application is in the optimal design of studies aimed at collection of householdstratified infection data. A design decision involves making a tradeoff between the number of households to enrol and the sampling frequency. Two commonly used study designs are considered: crosssectional and cohort. The search for an optimal design uses Bayesian methods to explore the joint parameterdesign space combined with Shannon entropy of the posteriors to estimate the amount of information for each design. We found that for the crosssectional designs, the amount of information increases with the sampling intensity while the cohort design often exhibits a tradeoff between the number of households sampled and the intensity of followup. Our results broadly support the choices made in existing data collection studies. 

A fixed point theorem on noncompact manifolds 12:10 Fri 12 Feb, 2016 :: Ingkarni Wardli B21 :: Peter Hochs :: University of Adelaide / Radboud University
Media...For an elliptic operator on a compact manifold acted on by a compact Lie group, the AtiyahSegalSinger fixed point formula expresses its equivariant index in terms of data on fixed point sets of group elements. This can for example be used to prove Weylâs character formula. We extend the definition of the equivariant index to noncompact manifolds, and prove a generalisation of the AtiyahSegalSinger formula, for group elements with compact fixed point sets. In one example, this leads to a relation with characters of discrete series representations of semisimple Lie groups. (This is joint work with Hang Wang.) 

How predictable are you? Information and happiness in social media. 12:10 Mon 21 Mar, 2016 :: Ingkarni Wardli Conference Room 715 :: Dr Lewis Mitchell :: School of Mathematical Sciences
Media...The explosion of ``Big Data'' coming from online social networks and the like has opened up the new field of ``computational social science'', which applies a quantitative lens to problems traditionally in the domain of psychologists, anthropologists and social scientists. What does it mean to be influential? How do ideas propagate amongst populations? Is happiness contagious? For the first time, mathematicians, statisticians, and computer scientists can provide insight into these and other questions. Using data from social networks such as Facebook and Twitter, I will give an overview of recent research trends in computational social science, describe some of my own work using techniques like sentiment analysis and information theory in this realm, and explain how you can get involved with this highly rewarding research field as well.


Connecting withinhost and betweenhost dynamics to understand how pathogens evolve 15:10 Fri 1 Apr, 2016 :: Engineering South S112 :: A/Prof Mark Tanaka :: University of New South Wales
Media...Modern molecular technologies enable a detailed examination of the extent of genetic variation among isolates of bacteria and viruses. Mathematical models can help make inferences about pathogen evolution from such data. Because the evolution of pathogens ultimately occurs within hosts, it is influenced by dynamics within hosts including interactions between pathogens and hosts. Most models of pathogen evolution focus on either the withinhost or the betweenhost level. Here I describe steps towards bridging the two scales. First, I present a model of influenza virus evolution that incorporates withinhost dynamics to obtain the betweenhost rate of molecular substitution as a function of the mutation rate, the withinhost reproduction number and other factors. Second, I discuss a model of viral evolution in which some hosts are immunocompromised, thereby extending opportunities for withinhost virus evolution which then affects populationlevel evolution. Finally, I describe a model of Mycobacterium tuberculosis in which multidrug resistance evolves within hosts and spreads by transmission between hosts. 

Mathematical modelling of the immune response to influenza 15:00 Thu 12 May, 2016 :: Ingkarni Wardli B20 :: Ada Yan :: University of Melbourne
Media...The immune response plays an important role in the resolution of primary influenza infection and prevention of subsequent infection in an individual. However, the relative roles of each component of the immune response in clearing infection, and the effects of interaction between components, are not well quantified.
We have constructed a model of the immune response to influenza based on data from viral interference experiments, where ferrets were exposed to two influenza strains within a short time period. The changes in viral kinetics of the second virus due to the first virus depend on the strains used as well as the interval between exposures, enabling inference of the timing of innate and adaptive immune response components and the role of crossreactivity in resolving infection. Our model provides a mechanistic explanation for the observed variation in viruses' abilities to protect against subsequent infection at short interexposure intervals, either by delaying the second infection or inducing stochastic extinction of the second virus. It also explains the decrease in recovery time for the second infection when the two strains elicit crossreactive cellular adaptive immune responses. To account for intersubject as well as intervirus variation, the model is formulated using a hierarchical framework. We will fit the model to experimental data using Markov Chain Monte Carlo methods; quantification of the model will enable a deeper understanding of the effects of potential new treatments.


Time series analysis of paleoclimate proxies (a mathematical perspective) 15:10 Fri 27 May, 2016 :: Engineering South S112 :: Dr Thomas Stemler :: University of Western Australia
Media...In this talk I will present the work my colleagues from the School of
Earth and Environment (UWA), the "trans disciplinary methods" group of
the Potsdam Institute for Climate Impact Research, Germany, and I did to
explain the dynamics of the AustralianSouth East Asian monsoon system
during the last couple of thousand years.
From a time series perspective paleoclimate proxy series are more or
less the monsters moving under your bed that wake you up in the middle
of the night. The data is clearly nonstationary, nonuniform sampled in
time and the influence of stochastic forcing or the level of measurement
noise are more or less unknown. Given these undesirable properties
almost all traditional time series analysis methods fail.
I will highlight two methods that allow us to draw useful conclusions
from the data sets. The first one uses Gaussian kernel methods to
reconstruct climate networks from multiple proxies. The coupling
relationships in these networks change over time and therefore can be
used to infer which areas of the monsoon system dominate the complex
dynamics of the whole system. Secondly I will introduce the
transformation cost time series method, which allows us to detect
changes in the dynamics of a nonuniform sampled time series. Unlike the
frequently used interpolation approach, our new method does not corrupt
the data and therefore avoids biases in any subsequence analysis. While
I will again focus on paleoclimate proxies, the method can be used in
other applied areas, where regular sampling is not possible.


Student Performance Issues in First Year University Calculus 15:10 Fri 10 Jun, 2016 :: Engineering South S112 :: Dr Christine Mangelsdorf :: University of Melbourne
Media...MAST10006 Calculus 2 is the largest subject in the School of Mathematics and Statistics at the University of Melbourne, accounting for about 2200 out of 7400 first year enrolments. Despite excellent and consistent feedback from students on lectures, tutorials and teaching materials, scaled failure rates in Calculus 2 averaged an unacceptably high 29.4% (with raw failure rates reaching 40%) by the end of 2014. To understand the issues behind the poor student performance, we studied the exam papers of students with grades of 4049% over a threeyear period. In this presentation, I will present data on areas of poor performance in the final exam, show samples of student work, and identify possible causes for their errors. Many of the performance issues are found to relate to basic weaknesses in the studentsâ secondary school mathematical skills that inhibit their ability to successfully complete Calculus 2. Since 2015, we have employed a number of approaches to support studentsâ learning that significantly improved student performance in assessment. I will discuss the changes made to assessment practices and extra support materials provided online and in person, that are driving the improvement. 

Mathematical modelling of social spreading processes 15:10 Fri 19 Aug, 2016 :: Napier G03 :: Prof Hans De Sterck :: Monash University
Media...Social spreading processes are intriguing manifestations of how humans interact and shape each others' lives. There is great interest in improving our understanding of these processes, and the increasing availability of empirical information in the era of big data and online social networks, combined with mathematical and computational modelling techniques, offer compelling new ways to study these processes.
I will first discuss mathematical models for the spread of political revolutions on social networks. The influence of online social networks and social media on the dynamics of the Arab Spring revolutions of 2011 are of particular interest in our work. I will describe a hierarchy of models, starting from agentbased models realized on empirical social networks, and ending up with populationlevel models that summarize the dynamical behaviour of the spreading process. We seek to understand quantitatively how political revolutions may be facilitated by the modern online social networks of social media.
The second part of the talk will describe a populationlevel model for the social dynamics that cause cigarette smoking to spread in a population. Our model predicts that more individualistic societies will show faster adoption and cessation of smoking. Evidence from a newly composed centurylong composite data set on smoking prevalence in 25 countries supports the model, with potential implications for public health interventions around the world.
Throughout the talk, I will argue that important aspects of social spreading processes can be revealed and understood via quantitative mathematical and computational models matched to empirical data.
This talk describes joint work with John Lang and Danny Abrams. 

A principled experimental design approach to big data analysis 15:10 Fri 23 Sep, 2016 :: Napier G03 :: Prof Kerrie Mengersen :: Queensland University of Technology
Media...Big Datasets are endemic, but they are often notoriously difficult to analyse because of their size, complexity, history and quality. The purpose of this paper is to open a discourse on the use of modern experimental design methods to analyse Big Data in order to answer particular questions of interest. By appeal to a range of examples, it is suggested that this perspective on Big Data modelling and analysis has wide generality and advantageous inferential and computational properties. In particular, the principled experimental design approach is shown to provide a flexible framework for analysis that, for certain classes of objectives and utility functions, delivers equivalent answers compared with analyses of the full dataset. It can also provide a formalised method for iterative parameter estimation, model checking, identification of data gaps and evaluation of data quality. Finally it has the potential to add value to other Big Data sampling algorithms, in particular divideandconquer strategies, by determining efficient subsamples. 

Measuring and mapping carbon dioxide from remote sensing satellite data 15:10 Fri 21 Oct, 2016 :: Napier G03 :: Prof Noel Cressie :: University of Wollongong
Media...This talk is about environmental statistics for global remote sensing of atmospheric carbon dioxide, a leading greenhouse gas. An important compartment of the carbon cycle is atmospheric carbon dioxide (CO2), where it (and other gases) contribute to climate change through a greenhouse effect. There are a number of CO2 observational programs where measurements are made around the globe at a small number of groundbased locations at somewhat regular time intervals. In contrast, satellitebased programs are spatially global but give up some of the temporal richness. The most recent satellite launched to measure CO2 was NASA's Orbiting Carbon Observatory2 (OCO2), whose principal objective is to retrieve a geographical distribution of CO2 sources and sinks. OCO2's measurement of columnaveraged mole fraction, XCO2, is designed to achieve this, through a dataassimilation procedure that is statistical at its basis. Consequently, uncertainty quantification is key, starting with the spectral radiances from an individual sounding to borrowing of strength through spatialstatistical modelling. 

Toroidal Soap Bubbles: Constant Mean Curvature Tori in S ^ 3 and R ^3 12:10 Fri 28 Oct, 2016 :: Ingkarni Wardli B18 :: Emma Carberry :: University of Sydney
Media...Constant mean curvature (CMC) tori in S ^ 3, R ^ 3 or H ^ 3 are in bijective correspondence with spectral curve data, consisting of a hyperelliptic curve, a line bundle on this curve and some additional data, which in particular determines the relevant space form. This point of view is particularly relevant for considering modulispace questions, such as the prevalence of tori amongst CMC planes and whether tori can be deformed. I will address these questions for the spherical and Euclidean cases, using Whitham deformations.


Algae meet the mathematics of multiplicative multifractals 12:10 Tue 2 May, 2017 :: Inkgarni Wardli Conference Room 715 :: Professor Tony Roberts :: School of Mathematical Sciences
Media...There is much that is fragmented and rough in the world around us: clouds and landscapes are examples, as is algae.
We need fractal geometry to encompass these.
In practice we need multifractals: a composite of interwoven sets, each with their own fractal structure.
Multiplicative multifractals have known properties.
Optimising a fit between them and the data then empowers us to quantify subtle details of fractal geometry in applications, such as in algae distribution. 

Constructing differential string structures 14:10 Wed 7 Jun, 2017 :: EM213 :: David Roberts :: University of Adelaide
Media...String structures on a manifold are analogous to spin structures, except instead of lifting the structure group through the extension Spin(n)\to SO(n) of Lie groups, we need to lift through the extension String(n)\to Spin(n) of Lie *2groups*. Such a thing exists if the first fractional Pontryagin class (1/2)p_1 vanishes in cohomology. A differential string structure also lifts connection data, but this is rather complicated, involving a number of locally defined differential forms satisfying cocyclelike conditions. This is an expansion of the geometric string structures of Stolz and Redden, which is, for a given connection A, merely a 3form R on the frame bundle such that dR = tr(F^2) for F the curvature of A; in other words a trivialisation of the de Rham class of (1/2)p_1. I will present work in progress on a framework (and specific results) that allows explicit calculation of the differential string structure for a large class of homogeneous spaces, which also yields formulas for the StolzRedden form. I will comment on the application to verifying the refined Stolz conjecture for our particular class of homogeneous spaces. Joint work with Ray Vozzo. 

In space there is noone to hear you scream 12:10 Tue 12 Sep, 2017 :: Inkgarni Wardli 5.57 :: A/Prof Gary Glonek :: School of Mathematical Sciences
Media...Modern data problems often involve data in very high dimensions. For example, gene expression profiles, used to develop cancer screening models, typically have at least 30,000 dimensions. When dealing with such data, it is natural to apply intuition from low dimensional cases. For example, in a sample of normal observations, a typical data point will be near the centre of the distribution with only a small number of points at the edges.
In this talk, simple probability theory will be used to show that the geometry of data in high dimensional space is very different from what we can see in one and twodimensional examples. We will show that the typical data point is at the edge of the distribution, a long way from its centre and even further from any other points. 

Understanding burn injuries and first aid treatment using simple mathematical models 15:10 Fri 13 Oct, 2017 :: Ingkarni Wardli B17 :: Prof Mat Simpson :: Queensland University of Technology
Scald burns from accidental exposure to hot liquids are the most common cause of burn injury in children. Over 2000 children are treated for accidental burn injuries in Australia each year. Despite the frequency of these injuries, basic questions about the physics of heat transfer in living tissues remain unanswered. For example, skin thickness varies with age and anatomical location, yet our understanding of how tissue damage from thermal injury is influenced by skin thickness is surprisingly limited. In this presentation we will consider a series of porcine experiments to study heat transfer in living tissues. We consider burning the living tissue, as well as applying various first aid treatment strategies to cool the living tissue after injury. By calibrating solutions of simple mathematical models to match the experimental data we provide insight into how thermal energy propagates through living tissues, as well as exploring different first aid strategies. We conclude by outlining some of our current work that aims to produce more realistic mathematical models. 

The Markovian binary tree applied to demography and conservation biology 15:10 Fri 27 Oct, 2017 :: Ingkarni Wardli B17 :: Dr Sophie Hautphenne :: University of Melbourne
Markovian binary trees form a general and tractable class of continuoustime branching processes, which makes them wellsuited for realworld applications. Thanks to their appealing probabilistic and computational features, these processes have proven to be an excellent modelling tool for applications in population biology. Typical performance measures of these models include the extinction probability of a population, the distribution of the population size at a given time, the total progeny size until extinction, and the asymptotic population composition. Besides giving an overview of the main performance measures and the techniques involved to compute them, we discuss recently developed statistical methods to estimate the model parameters, depending on the accuracy of the available data. We illustrate our results in human demography and in conservation biology. 

Calculating optimal limits for transacting credit card customers 15:10 Fri 2 Mar, 2018 :: Horace Lamb 1022 :: Prof Peter Taylor :: University of Melbourne
Credit card users can roughly be divided into `transactors', who pay off their balance each month, and `revolvers', who maintain an outstanding balance, on which they pay substantial interest.
In this talk, we focus on modelling the behaviour of an individual transactor customer. Our motivation is to calculate an optimal credit limit from the bank's point of view. This requires an expression for the expected outstanding balance at the end of a payment period.
We establish a connection with the classical newsvendor model. Furthermore, we derive the Laplace transform of the outstanding balance, assuming that purchases are made according to a marked point process and that there is a simplified balance control policy which prevents all purchases in the rest of the payment period when the credit limit is exceeded. We then use the newsvendor model and our modified model to calculate bounds on the optimal credit limit for the more realistic balance control policy that accepts all purchases that do not exceed the limit.
We illustrate our analysis using a compound Poisson process example and show that the optimal limit scales with the distribution of the purchasing process, while the probability of exceeding the optimal limit remains constant.
Finally, we apply our model to some real credit card purchase data. 

Models, machine learning, and robotics: understanding biological networks 15:10 Fri 16 Mar, 2018 :: Horace Lamb 1022 :: Prof Steve Oliver :: University of Cambridge
The availability of complete genome sequences has enabled the construction of computer models of metabolic networks that may be used to predict the impact of genetic mutations on growth and survival. Both logical and constraintbased models of the metabolic network of the model eukaryote, the ale yeast Saccharomyces cerevisiae, have been available for some time and are continually being improved by the research community. While such models are very successful at predicting the impact of deleting single genes, the prediction of the impact of higher order genetic interactions is a greater challenge. Initial studies of limited gene sets provided encouraging results. However, the availability of comprehensive experimental data for the interactions between genes involved in metabolism demonstrated that, while the models were able to predict the general properties of the genetic interaction network, their ability to predict interactions between specific pairs of metabolic genes was poor. I will examine the reasons for this poor performance and demonstrate ways of improving the accuracy of the models by exploiting the techniques of machine learning and robotics.
The utility of these metabolic models rests on the firm foundations of genome sequencing data. However, there are two major problems with these kinds of network models  there is no dynamics, and they do not deal with the uncertain and incomplete nature of much biological data. To deal with these problems, we have developed the Flexible Nets (FNs) modelling formalism. FNs were inspired by Petri Nets and can deal with missing or uncertain data, incorporate both dynamics and regulation, and also have the potential for model predictive control of biotechnological processes.


Quantifying language change 15:10 Fri 1 Jun, 2018 :: Horace Lamb 1022 :: A/Prof Eduardo Altmann :: University of Sydney
Mathematical methods to study natural language are increasingly important because of the ubiquity of textual data in the Internet. In this talk I will discuss mathematical models and statistical methods to quantify the variability of language, with focus on two problems: (i) How the vocabulary of languages changed over the last centuries? (ii) How the language of scientific disciplines relate to each other and evolved in the last decades? One of the main challenges of these analyses stem from universal properties of word frequencies, which show high temporal variability and are fattailed distributed. The later feature dramatically affects the statistical properties of entropybased estimators, which motivates us to compare vocabularies using a generalized JensonShannon divergence (obtained from entropies of order alpha). 

Quantifying language change 15:10 Fri 1 Jun, 2018 :: Napier 208 :: A/Prof Eduardo Altmann :: University of Sydney
Mathematical methods to study natural language are increasingly important because of the ubiquity of textual data in the Internet. In this talk I will discuss mathematical models and statistical methods to quantify the variability of language, with focus on two problems: (i) How the vocabulary of languages changed over the last centuries? (ii) How the language of scientific disciplines relate to each other and evolved in the last decades? One of the main challenges of these analyses stem from universal properties of word frequencies, which show high temporal variability and are fattailed distributed. The later feature dramatically affects the statistical properties of entropybased estimators, which motivates us to compare vocabularies using a generalized JensonShannon divergence (obtained from entropies of order alpha). 

Projected Particle Filters 15:10 Fri 24 Aug, 2018 :: Lower Napier LG15 :: Dr John Maclean :: University of Adelaide
cientific advances owe equally to models and data, and both will remain relevant and key to further understanding. Observations drive model development, and model development often drives data acquisition. It therefore is particularly prudent to have these two sides of the scientific coin work in concert. This is a mathematical and statistical question: how to combine the output of model investigations and observational data. The area that is dedicated to studying and developing the best approaches to this issue is called Data Assimilation (DA). Perhaps the most crucial modernday application of DA is numerical weather prediction, but it is also used in GPS systems and studies of atmospheric conditions on other planets.
I will take the probabilistic or Bayesian approach to DA. At a particular time at which data are available, the question of data assimilation is how to approximate the posterior or analysis distribution, that is found by conditioning the "forecast distribution" on the data. A key method under this umbrella is the particle filter, that approximates the forecast and posterior distributions with an ensemble of weighted particles.
The talk will focus on a contribution to particle filtering made from a dynamical systems point of view. I will introduce a framework for Particle Filtering, PFAUS, in which only the components of data corresponding to the unstable and neutral modes of the forecast model are assimilated.
The particle filter is well suited to nonlinear forecast models, and nonGaussian forecast distributions, but would normally require exponentially more computational effort as the dimension of the DA problem increases. The PFAUS implementation is shown to correspond to assimilating observations of a lower dimension, equal to the number of Lyapunov exponents. The dimension of the observations is crucial in the computational cost of the particle filter and this approach is a framework to drastically lower that cost while preserving as much relevant information as possible, in that the unstable and neutral modes correspond to the most uncertain model predictions.
Particle filters are an active area of research in both the DA and the statistical communities, and there are many competing algorithms. One nice feature of PFAUS is that it is not exactly an algorithm but rather a framework for filtering: any particle filter can be applied in the PFAUS framework. 

Topological Data Analysis 15:10 Fri 31 Aug, 2018 :: Napier 208 :: Dr Vanessa Robins :: Australian National University
Topological Data Analysis has grown out of work focussed on deriving qualitative and yet quantifiable information about the shape of data. The underlying assumption is that knowledge of shape  the way the data are distributed  permits highlevel reasoning and modelling of the processes that created this data. The 0th order aspect of shape is the number pieces: "connected components" to a topologist; "clustering" to a statistician. Higherorder topological aspects of shape are holes, quantified as "nonbounding cycles" in homology theory. These signal the existence of some type of constraint on the datagenerating process.
Homology lends itself naturally to computer implementation, but its naive application is not robust to noise. This inspired the development of persistent homology: an algebraic topological tool that measures changes in the topology of a growing sequence of spaces (a filtration). Persistent homology provides invariants called the barcodes or persistence diagrams that are sets of intervals recording the birth and death parameter values of each homology class in the filtration. It captures information about the shape of data over a range of length scales, and enables the identification of "noisy" topological structure.
Statistical analysis of persistent homology has been challenging because the raw information (the persistence diagrams) are provided as sets of intervals rather than functions. Various approaches to converting persistence diagrams to functional forms have been developed recently, and have found application to data ranging from the distribution of galaxies, to porous materials, and cancer detection. 
News matching "Data mining" 
ARC Grant successes The School of Mathematical Sciences has again had outstanding success in the ARC Discovery and Linkage Projects schemes.
Congratulations to the following staff for their success in the Discovery Project scheme:
Prof Nigel Bean, Dr Josh Ross, Prof Phil Pollett, Prof Peter Taylor, New methods for improving active adaptive management in biological systems, $255,000 over 3 years;
Dr Josh Ross, New methods for integrating population structure and stochasticity into models of disease dynamics, $248,000 over three years;
A/Prof Matt Roughan, Dr Walter Willinger, Internet trafficmatrix synthesis, $290,000 over three years;
Prof Patricia Solomon, A/Prof John Moran, Statistical methods for the analysis of critical care data, with application to the Australian and New Zealand Intensive Care Database, $310,000 over 3 years;
Prof Mathai Varghese, Prof Peter Bouwknegt, Supersymmetric quantum field theory, topology and duality, $375,000 over 3 years;
Prof Peter Taylor, Prof Nigel Bean, Dr Sophie Hautphenne, Dr Mark Fackrell, Dr Malgorzata O'Reilly, Prof Guy Latouche, Advanced matrixanalytic methods with applications, $600,000 over 3 years.
Congratulations to the following staff for their success in the Linkage Project scheme:
Prof Simon Beecham, Prof Lee White, A/Prof John Boland, Prof Phil Howlett, Dr Yvonne Stokes, Mr John Wells, Paving the way: an experimental approach to the mathematical modelling and design of permeable pavements, $370,000 over 3 years;
Dr Amie Albrecht, Prof Phil Howlett, Dr Andrew Metcalfe, Dr Peter Pudney, Prof Roderick Smith, Saving energy on trains  demonstration, evaluation, integration, $540,000 over 3 years
Posted Fri 29 Oct 10. 
Publications matching "Data mining"Publications 

CleanBGP: Verifying the consistency of BGP data Flavel, Ashley; Maennel, Olaf; Chiera, Belinda; Roughan, Matthew; Bean, Nigel, International Network Management Workshop, Orlando, Florida 19/10/08  Energy balanced data gathering in WSNs with grid topologies Chen, J; Shen, Hong; Tian, Hui, 7th International Conference on Grid and Cooperative Computing, China 24/10/08  Mining unexpected temporal associations: Applications in detecting adverse drug reactions Jin, H; Chen, J; He, H; Williams, G; Kelman, C; O'Keefe, Christine, IEEE Transactions on Information Technology in Biomedicine 12 (488–500) 2008  Data fusion without data fusion: localization and tracking without sharing sensitive information Roughan, Matthew; Arnold, Jonathan, Information, Decision and Control 2007, Adelaide, Australia 12/02/07  Optimal multilinear estimation of a random vector under constraints of casualty and limited memory Howlett, P; Torokhti, Anatoli; Pearce, Charles, Computational Statistics & Data Analysis 52 (869–878) 2007  Statistics in review; Part 1: graphics, data summary and linear models Moran, John; Solomon, Patricia, Critical care and Resuscitation 9 (81–90) 2007  Experimental Design and Analysis of Microarray Data Wilson, C; Tsykin, Anna; Wilkinson, Christopher; Abbott, C, chapter in Bioinformatics (Elsevier Ltd) 1–36, 2006  Is BGP update storm a sign of trouble: Observing the internet control and data planes during internet worms Roughan, Matthew; Li, J; Bush, R; Mao, Z; Griffin, T, SPECTS 2006, Calgary, Canada 31/07/06  Watching data streams toward a multihomed sink under routing changes introduced by a BGP beacon Li, J; Bush, R; Mao, Z; Griffin, T; Roughan, Matthew; Stutzbach, D; Purpus, E, PAM2006, Adelaide, Australia 30/03/06  Datarecursive smoother formulae for partially observed discretetime Markov chains Elliott, Robert; Malcolm, William, Stochastic Analysis and Applications 24 (579–597) 2006  Optimal linear estimation and data fusion Elliott, Robert; Van Der Hoek, John, IEEE Transactions on Automatic Control 51 (686–689) 2006  Secure distributed datamining and its application to largescale network measurements Roughan, Matthew; Zhang, Y, Computer Communication Review 36 (7–14) 2006  Optimal estimation of a random signal from partially missed data Torokhti, Anatoli; Howlett, P; Pearce, Charles, EUSIPCO 2006, Florence, Italy 04/09/06  Optimal recursive estimation of raw data Torokhti, Anatoli; Howlett, P; Pearce, Charles, Annals of Operations Research 133 (285–302) 2005  Combining routing and traffic data for detection of IP forwarding anomalies Roughan, Matthew; Griffin, T; Mao, M; Greenberg, A; Freeman, B, Sigmetrics  Performance 2004, New York, USA 12/06/04  IP forwarding anomalies and improving their detection using multiple data sources Roughan, Matthew; Griffin, T; Mao, M; Greenberg, A; Freeman, B, SIGCOMM 2004, Oregon, USA 30/08/04  The data processing inequality and stochastic resonance McDonnell, Mark; Stocks, N; Pearce, Charles; Abbott, Derek, Noise in Complex Systems and Stochastic Dynamics, Santa Fe, New Mexico, USA 01/06/03  Stochastic resonance and data processing inequality McDonnell, Mark; Stocks, N; Pearce, Charles; Abbott, Derek, Electronics Letters 39 (1287–1288) 2003  Resamplingbased multiple testing for microarray data analysis (Invited discussion of paper by Ge, Dudoit and Speed) Glonek, Garique; Solomon, Patricia, Test 12 (50–53) 2003  Best estimators of second degree for data analysis Howlett, P; Pearce, Charles; Torokhti, Anatoli, ASMDA 2001, Compiegne, France 12/06/01  Optimal successive estimation of observed data Torokhti, Anatoli; Howlett, P; Pearce, Charles, International Conference on Optimization: Techniques and Applications (5th: 2001), Hong Kong, China 15/12/01  Statistical analysis of medical data: New developments  Book review Solomon, Patricia, Biometrics 57 (327–328) 2001  Disease surveillance and data collection issues in epidemic modelling Solomon, Patricia; Isham, V, Statistical Methods in Medical Research 9 (259–277) 2000 
Advanced search options
You may be able to improve your search results by using the following syntax:
Query  Matches the following 

Asymptotic Equation  Anything with "Asymptotic" or "Equation". 
+Asymptotic +Equation  Anything with "Asymptotic" and "Equation". 
+Stokes "NavierStokes"  Anything containing "Stokes" but not "NavierStokes". 
Dynam*  Anything containing "Dynamic", "Dynamical", "Dynamicist" etc. 
