Survey sampling and Markov Chain Monte Carlo algorithms
My thesis is about speeding up Markov Chain Monte Carlo (MCMC) algorithms for large data sets. The main idea for speeding up the computations is to use a subsample of the data to compute an estimate of the true likelihood (the likelihood from all data observations). This estimate replaces the true likelihood in the MCMC algorithm.
In any given iteration in the MCMC algorithm, the true log-likelihood can be viewed as a finite population total. This allows us to explore survey sampling techniques for efficient estimation of the log-likelihood. Clearly, the speedup in computational time improves as the size of the subsample decreases. It is therefore crucial to design efficient estimators and sampling schemes that allows us to work with a very small fraction of the data.
In this talk I will give an overview of this idea and how it is implemented in my research. I will present the methods in such a way that prior knowledge about MCMC algorithms is not required. The focus will entirely be on the survey sampling in this context.
I look forward to interact with the audience as I know that many of you are experts in this field.