June 2nd - Dr. Alireza Aghasi, Oregon State University (EECS)
May 19th - Dr. Michael Sohn, University of Rochester
Optimal Approaches to the Compositionality and High-Sparsity of Microbiome Data
Over the past decade, the US National Institutes of Health has invested more than a billion dollars to human microbiome research, underscoring the crucial role the microbiome plays in our health and well-being. Despite these substantial efforts, many of the groundbreaking discoveries have faced challenges in being applied effectively in clinical practice. One major hurdle is the difficulty in reproducing the results related to specific microbes (or taxa) that are linked to various diseases. One key factor contributing to this irreproducibility is the improper treatment of the compositional and sparse nature of microbiome data during statistical analysis. In this presentation, I will first outline the unique characteristics of microbiome data and illustrate the consequences of ignoring these factors. I will then introduce an optimal normalization method to address the compositional effects, built on the minimal assumption required to extract meaningful absolute information from relative data. Additionally, I will discuss why treating all zeros as missing values tends to lead to better outcomes than the common practice of replacing zeros with a small constant (e.g., 0.5), and then I will present a multiple imputation method specifically designed for high-dimensional, sparse compositional data. The optimality and validity of the proposed methods, along with their beneficial effects on downstream analyses, are demonstrated through extensive simulation studies. I will also illustrate their application to real microbiome datasets.
May 12th - Dr. Jun Chen, Mayo Clinic
Microbiome Association Analysis - Methods, Tools and Study Design
The human microbiome has become an integral component of modern genomics research. The central theme of human microbiome research is to decipher the complex interplay between environmental factors, microbial communities, and host biology. To unravel these interactions, robust and powerful statistical tools are essential. In this talk, I will introduce two new computationally efficient approaches, D-MANOVA and LinDA, for microbiome association analyses. D-MANOVA enables community-level testing to establish an overall association, while LinDA pinpoints the specific taxa driving the overall association. Building upon these tools, I will present a simulation-based power analysis framework to guide the design of microbiome studies.
May 5th - Dr. Deepak Kadetotad, Starkey Hearing
Research vs. Reality - Challenges in Real-Time ML based Fixed-Point processing in Hearing Aids
With the proliferation of machine learning into the hearing aid industry, many research papers try to achieve the state-of-the-art results with measured metrics like Signal to Noise Improvement (SNRi), Short-Time Objective Intelligibility (STOI), etc. but end up missing the forest for the trees by neglecting crucial constraints of Hearing Aids (HAs) that tend to leave the research unusable by industry. The talk aims to highlight certain key constraints of HAs that can be generalized to other low power embedded devices. The goal is to inspire a parsimonious integration of these constraints into the design of networks, system architectures and algorithms, thereby facilitating real-world impacts of your research.
May 1st - Dr. Xiaohui Chang, Oregon State University (College of Business)
Dynamic Methods for Calibrating Numerical Model Outputs
Numerical air quality models are pivotal for forecasting and assessing air pollution, but their outputs may be systematically biased. In this study, we propose several hierarchical dynamic models to calibrate large-scale numerical model outputs using supplementary data sources. At deeper levels of our models, we employ stochastic integro-differential equations to characterize the dynamic evolution of spatial random effects. To accelerate computation, we adopt techniques such as the ensemble Kalman smoother and variational Bayes. For statistical inference, we apply a conditional simulation technique to quantify the uncertainty of parameter estimates and calibration results. We demonstrate our approach by calibrating real-world PM2.5 outputs from the Community Multiscale Air Quality (CMAQ) system for China’s Beijing-Tianjin-Hebei region. Results show that our models produce more accurate calibrations than competing methods across different performance metrics while achieving higher computational efficiency.
April 21st - Dr. Ting Zhang, University of Georgia
High-quantile regression for tail-dependent time series
Quantile regression is a popular and powerful method for studying the effect of regressors on quantiles of a response distribution. However, existing results on quantile regression were mainly developed for cases in which the quantile level is fixed, and the data are often assumed to be independent. Motivated by recent applications, we consider the situation where (i) the quantile level is not fixed and can grow with the sample size to capture the tail phenomena, and (ii) the data are no longer independent, but collected as a time series that can exhibit serial dependence in both tail and non-tail regions. To study the asymptotic theory for high-quantile regression estimators in the time series setting, we introduce a tail adversarial stability condition, which had not previously been described, and show that it leads to an interpretable and convenient framework for obtaining limit theorems for time series that exhibit serial dependence in the tail region, but are not necessarily strongly mixing. Numerical experiments are conducted to illustrate the effect of tail dependence on high-quantile regression estimators, for which simply ignoring the tail dependence may yield misleading p-values.
April 14th - Dr. Lynn LaMotte, Louisiana State University
ANOVA at 100: Its Influence, Quandaries, and a Resolution
R. A. Fisher formalized Analysis of Variance – ANOVA – in Statistical Methods for Research Workers, first published in 1925. Within a decade, it became widely taught, accepted, and applied in agriculture, economics, engineering, social sciences, and other scientific disciplines. ANOVA spawned terms like additive, interaction, fixed, and random factor effects. It was the primary motivation to broaden linear models from full-rank multiple regression models to more general overparameterized non-full-rank models, which required dealing with the notion of estimability. Its influence on the broad subject of linear models, on the teaching of statistical methods, and on the practice of statistics has been immense. In the first lecture I intend to relate some of the developments of ANOVA over time, along with a few of the personalities responsible. This will include formulation of ANOVA in balanced models and the complications that result from the least bit of unbalanced-ness. The second lecture will deal mainly with the question, how (or whether) to construct a test statistic for lower-level factor effects (e.g., main effects) in models that permit higher-level (e.g., interaction) effects. Such tests are done routinely and mostly without controversy in balanced models. Why has it seemed difficult and controversial in unbalanced models?
March 3rd - Dr. Yang Chen, University of Michigan
Statistical Machine Learning Enabled Scientific Discovery – Case Studies in Space Weather Forecasting
In recent years, machine learning approaches have become increasingly popular in space weather forecasting with the increasing data collection and sharing capabilities of satellite data. Studies have shown that machine learning approaches can offer significantly higher accuracy for forecasting extreme space weather events such as solar flares and geomagnetic storms. However, the “proven” success of data-driven approaches raises various concerns from scientists: the interpretability, generalizability, stability, ignorance of known physics principles, and reproducibility of the fitted machine learning model are often the center of debates. Statistical principles provide valuable insights into the roots of the successes and failures of data-driven prediction models, thus of utmost importance. However, most classical statistical models suffer from the simplicity of model specifications and stringent regularity conditions, which do not apply in practice. We have shown through multiple studies that by embedding machine learning components into rigorous statistical models, we can leverage the strengths of (i) highly interpretable statistical models with calibrated uncertainty estimates, (ii) the flexibility of deep learning models, and (iii) modern computational infrastructures. In this talk, I will discuss a few case studies where we propose novel statistical machine-learning approaches in space weather forecasting.
January 13th - Dr. Michael Dumelle, US Environmental Protection Agency
Introducing spatial statistics using the spmodel R package
Abstract: In scientific disciplines like ecology and environmental science, spatial data (i.e., data that are distributed in space) are common. For spatial data, statistical models incorporating spatial dependence (i.e., spatial models) tend to be more realistic than statistical models ignoring spatial dependence (i.e., nonspatial models). Recent software advances in R's spatial data ecosystem have made spatial models much more accessible to practitioners. Here we focus on the spmodel R package (https://usepa.github.io/spmodel/), which fits, summarizes, and makes predictions for a variety of spatial models. We discuss three reasons why spmodel is an effective tool for introducing (and teaching) spatial statistics: First, spmodel uses a syntactic structure similar to that of familiar base R functions like lm() and glm(); Second, spmodel provides a wide breadth of options that give users a high amount of control over the model being fit; And third, spmodel is compatible with other modern R packages like sf, broom, and emmeans.
February 24th - Dr. James Molyneux, Swyfft,
Oregon State University (Statistics adjunct professor; Food Science & Technology courtesy graduate faculty)
There and Back Again: Tales of a Former Academic
Abstract: A former academic leaves the Academy and heads off into the wild west of industry. Here, we’ll share tales of how the transition from tenure-track researcher to data scientist was accomplished, discuss the trade offs between the two career paths, and learn that there’s still opportunities to solve stimulating problems outside of the Ivory Tower. We’ll offer tips (or at least bad advice) for folks interested in data science roles and biased opinions about what data science curricula might look like within the confines of a statistics department. Plus, as many interesting stories as can be told while also not breaking any non-disclosure agreements.
February 20th - Department Event
Info will be emailed to department members
February 19th - Department Event
Info will be emailed to department members
February 12th - Department Event
Info will be emailed to department members
February 10th - Department Event
Info will be emailed to department members
February 6th - Department Event
Info will be emailed to department members
February 3rd - Department Event
Info will be emailed to department members