Browsing by Browse by FOR 2008 "010405 Statistical Theory"
Now showing 1 - 11 of 11
- Results Per Page
- Sort Options
- Some of the metrics are blocked by yourconsent settings
Conference PublicationPublication Connecting Experimental and Theoretical PerspectivesThis paper presents a case study of a group of pre-service teachers (age 21-52) as they work in a domain of stochastic abstraction to reason about "experimental" and "theoretical" perspectives. I am particularly interested in investigating whether preservice teachers could construct a bidirectional link between the data-centric and modelling perspectives on distribution, similar to the tentative model I introduced elsewhere for coordinating the two perspectives on distribution. In this study, we have seen echoes of these ideas in relation to experimental and theoretical probabilities. The results show students' movement between probabilities at the micro level and the shape of histograms at the macro level.1294 1 - Some of the metrics are blocked by yourconsent settings
Conference PublicationPublication The Coordination of Distributional Thinking(International Association for Statistical Education (IASE) & International Statistical Institute (ISI), 2007)My aim is to trace the thinking-in-change (Noss & Hoyles, 1996) during the co-ordination of two epistemologically distinct faces of distribution. By co-ordination here, I refer to the connection between a data-centric perspective on distribution, which identifies distribution as an aggregated set of actual outputs, and a modelling perspective on distribution, which views distribution as a set of possible outcomes and associated probabilities. The coordination requires that the learner connects in both directions the data that forms a distribution of results to make up the modelling distribution. The dual connection is, I believe, at the heart of informal inference.1360 5 - Some of the metrics are blocked by yourconsent settings
Conference PublicationPublication Discriminating "Signal" and "Noise" in Computer-Generated Data(International Group for the Psychology of Mathematics Education (IGPME), 2010); Pratt, DavidThis paper presents a case study of a group of students (age 14-15) as they use a computer-based domain of stochastic abstraction to begin to view spread or noise as dispersion from the signal. The results show that carefully designed computer tools, in which probability distribution is used as a generator of data, can facilitate the discrimination of signal and noise. This computational affordance of distribution is seen as related to classical statistical methods that aim to separate main effect from random error. In this study, we have seen how signal and noise can be recognised by students as an aspect of distribution. Students' discussion of computer-generated data and their sketches of the distribution express the idea that more variation is centred close to the signal, and less variation is located further away from it.1245 3 - Some of the metrics are blocked by yourconsent settings
Conference PublicationPublication The Emergence of Distribution From Causal RootsOur premise, in line with a constructivist approach, is that thinking about distribution and stochastic phenomena in general, must develop from resources already established. Our prior research has suggested that, given appropriate tools to think with, meanings for distribution might emerge out of knowledge about causality. In this study, based on the second author's ongoing doctoral research, we consider the relationship between the design of a microworld, in which students can control attempts to throw a ball into a basket, and the emergence of meanings for distribution. We suggest that the notion of statistical error or noise is a rich idea for helping students to bridge their deterministic and stochastic worlds.1118 - Some of the metrics are blocked by yourconsent settings
Conference PublicationPublication Harnessing the Causal to Illuminate the StochasticThis study builds on prior work, which identified that students of age 11 years had sound intuitions for short-term randomness but had few tools for articulating patterns in longer-term randomness. This previous work did however identify the construction of new causal meanings for distribution when they interacted with a computer-based microworld. Through a design research methodology, we are building new microworlds that aspire to capture how students might use knowledge about the deterministic to explain probability distribution as an emergent phenomenon. In this paper, we report on some insights gained from early iterations and show how we have embodied these ideas into a new microworld, not yet tested with students.1275 2 - Some of the metrics are blocked by yourconsent settings
Publication Open AccessJournal ArticleInference for Reaction Networks Using the Linear Noise Approximation(Wiley-Blackwell Publishing Ltd, 2014) ;Fearnhead, Paul ;Giagos, VasileiosSherlock, ChrisWe consider inference for the reaction rates in discretely observed networks such as those found in models for systems biology, population ecology, and epidemics. Most such networks are neither slow enough nor small enough for inference via the true state-dependent Markov jump process to be feasible. Typically, inference is conducted by approximating the dynamics through an ordinary differential equation (ODE) or a stochastic differential equation (SDE). The former ignores the stochasticity in the true model and can lead to inaccurate inferences. The latter is more accurate but is harder to implement as the transition density of the SDE model is generally unknown. The linear noise approximation (LNA) arises from a first-order Taylor expansion of the approximating SDE about a deterministic solution and can be viewed as a compromise between the ODE and SDE models. It is a stochastic model, but discrete time transition probabilities for the LNA are available through the solution of a series of ordinary differential equations. We describe how a restarting LNA can be efficiently used to perform inference for a general class of reaction networks; evaluate the accuracy of such an approach; and show how and when this approach is either statistically or computationally more efficient than ODE or SDE methods. We apply the LNA to analyze Google Flu Trends data from the North and South Islands of New Zealand, and are able to obtain more accurate short-term forecasts of new flu cases than another recently proposed method, although at a greater computational cost.838 - Some of the metrics are blocked by yourconsent settings
Conference PublicationPublication Making connections between the two perspectives on distributionMy premise, in line with a constructivist approach and Pratt's (1998) research, is that thinking about distribution must develop from causal meanings already established. The results of the third iteration of a design research study indicate support for my conjecture that it is possible to design an environment in which students' well established causal meanings can be exploited to coordinate the emergent data-centric and modelling perspectives on distribution (Prodromou & Pratt, 2006). In this study, I report on the fourth iteration that investigates how and whether students bridge the two perspectives on distribution.1245 - Some of the metrics are blocked by yourconsent settings
Journal ArticlePublication Model-averaged confidence intervals for factorial experimentsWe consider the coverage rate of model-averaged confidence intervals for the treatment means in a factorial experiment, when we use a normal linear model in the analysis. Model-averaging provides a useful compromise between using the full model (containing all main effects and interactions) and a "best model" obtained by some model-selection process. Use of the full model guarantees perfect coverage, whereas use of a best model is known to lead to narrow intervals with poor coverage. Model-averaging allows us to achieve good coverage using intervals that are also narrower than those from the full model. We compare four information criteria that might be used for model-averaging in this setting: AIC, AICc , AIC*c and BIC. In this setting, if the full model is 'truth', all the criteria will have perfect coverage rates asymptotically. We use simulation to assess the coverage rates and interval widths likely to be achieved by a confidence interval with a nominal coverage of 95%. Our results suggest that AIC performs best in terms of coverage rate; across a wide range of scenarios and replication levels, it consistently provides coverage rates within 1.5% points of the nominal level, while also leading to reductions in interval-width of up to 30%, compared to the full model. AICc performed worst overall, with a coverage rate that was up to 5.2% points too low. We recommend that model-averaging become standard practise when summarising the results of a factorial experiment in terms of the treatment means, and that AIC be used to perform the model-averaging.1298 - Some of the metrics are blocked by yourconsent settings
Journal ArticlePublication A penalized likelihood approach to pooling estimates of covariance components from analyses by partsEstimates of covariance matrices for numerous traits are commonly obtained by pooling results from a series of analyses of subsets of traits. A penalized maximum-likelihood approach is proposed to combine estimates from part analyses while constraining the resulting overall matrices to be positive definite. In addition, this provides the scope for 'improving' estimates of individual matrices by applying a penalty to the likelihood aimed at borrowing strength from their phenotypic counterpart. A simulation study is presented showing that the new method performs well, yielding unpenalized estimates closer to results from multivariate analyses considering all traits, than various other techniques used. In particular, combining results for all sources of variation simultaneously minimizes deviations in phenotypic estimates if sampling covariances can be approximated. A mild penalty shrinking estimates of individual covariance matrices towards their sum or estimates of canonical eigenvalues towards their mean proved advantageous in most cases. The method proposed is flexible, computationally undemanding and provides combined estimates with good sampling properties and is thus recommended as alternative to current methods for pooling.2238 - Some of the metrics are blocked by yourconsent settings
Conference PublicationPublication Pooling Estimates of Covariance Components Using a Penalized Maximum Likelihood ApproachEstimates of large genetic covariance matrices are commonly obtained by pooling results from a series of analyses of small subsets of traits. Procedures available to pool the part-estimates differ in their efficacy in accounting for unequal accuracies of estimates and sampling correlations, and ensuring that pooled matrices are within the parameter space. We propose a maximum likelihood (ML) approach to combine estimates, treating sets from individual part-analyses as matrices of mean squares and cross-products from independent families. This facilitates simultaneous pooling of estimates for all sources of variation considered, readily allows for weighted estimation or a given structure of the pooled matrices, and provides a framework for regularized estimation by penalizing the likelihood. A simulation study is presented, comparing the quality of combined estimates for several procedures, including truncation or shrinkage of either canonical or individual matrix eigen-values, iterative summation of expanded part matrices, and the ML approach, considering a range of penalties. Shrinking eigen-values of individual matrices towards their mean reduced losses in the pooled estimates, but substantially increased proportional losses in their phenotypic counterparts and thus yielded estimates differing most from corresponding full multivariate analyses of all traits. Assuming a simple pseudo-pedigree structure when combining estimates for all random effects simultaneously using ML allowed sampling correlations between estimates of different components from the same part-analysis to be approximated sufficiently to yield pooled matrices closest to full multivariate results, with little change in phenotypic components. Imposing a mild penalty to shrink matrices for random effects towards their sum proved highly advantageous, markedly reducing losses in estimates and more than compensating for the reduction in efficiency of using the data inherent in analyses by parts. Penalized ML provides a flexible alternative to current methods for pooling estimates from part-analyses with good sampling properties, and should be adopted more widely.2245 - Some of the metrics are blocked by yourconsent settings
Conference PublicationPublication Reasoning About Sampling in the Context of Making Informal Statistical Inferences(International Collaboration for Research on Statistical Reasoning, Thinking and Learning (SRTL), 2011); This research study examined how senior secondary school students develop understanding of the core statistical concepts of "sample" and "sampling" when making statistical inferences; and how students build interconnections between these concepts. This was observed as students engaged in making interval estimates of a population parameter within a computer-simulated environment. Activities involved sampling and estimating across three different sample size situations followed by a reflection stage to compare the estimates. Results of the four stages of the students' activities are presented. Discussion of the results will be shared at the SRTL-7 forum.2416 2