Skip to main content

Interdisciplinary

Interdisciplinary

blue numbers and code loading on translucent screen with black backdrop

International Bayesian statistics and data science conference comes to Oregon

By OSU College of Science news

Stan 2020, a Bayesian statistics and data science conference, will take place on August 11-14, 2020 at Oregon State University.

The 5th Stan Conference will take place at Oregon State University on August 11-14, 2020. The four-day conference will include two days of tutorials followed by an exciting scientific program comprising talks, posters, open discussions and statistical modeling.

Registration for Stan 2020 is now open. Researchers, students and professionals are encouraged to register for the conference which includes all tutorials. The conference is also soliciting session proposals, contributed talks and posters. Deadlines and other information can be found here.

Stan is a freedom-respecting, open-source software that has had an extensive and far-reaching impact on Bayesian computations for a broad range of applied statistics and data science problems.

The conference typically draws 300 attendees from academia, industry and government agencies. The conference offers a great opportunity for students and other participants to learn about Bayesian computation. Previous Stan Conferences were held at Columbia University, New York, and Cambridge University, U.K., among other places.

Plenary speakers at Stan 2020 are Elizabeth Wolkovich from the University of British Columbia and Adrian Rafftery, a member of the National Academy of Sciences, from the University of Washington, Seattle.

Debashis Mondal, associate professor in the Department of Statistics at OSU, is a co-organizer of Stan 2020. The other organizers of Stan 2020 are Susana Marquez, The Rockefeller Foundation; Eric J. Ward, Northwest Fisheries Science Center (NOAA); Yi Zhang, Metrum Research Group; and Daniel Lee, Generable.

Follow Stan on Twitter.

Bird flying next to windmills

Making green energy safer for wildlife with statistics

By Srila Nayak

Wind turbines and swan in the dutch province of Flevoland

Associate Professor of statistics Lisa Madsen and statisticians from the United States Geological Survey (USGS) have come together to develop methodology to estimate the total mortality of bats, birds and other small creatures on wind farms and solar facilities. The Endangered Species Act requires that wind farms pay particular attention to endangered or threatened species such as golden eagles, brown pelicans, whooping cranes, condors and Indiana bats, which are killed when they accidentally collide with turbine blades.

“We want to keep track of our natural resources. We don’t want to end up depleting them, because we can’t tell we are taking too much.”

Monitoring fatalities at wind energy facilities can help government agencies, such as the U.S. Fish & Wildlife Service and the Bureau of Land Management, make better decisions about species management. Developing statistically accurate fatality prediction and estimation tools and monitoring protocols can also help agencies ensure that renewable energy facilities developers design operations to minimize the impact to wildlife, thus reducing environmental damage. “Fundamentally, what people want to know is ‘how many?’. This idea of keeping count and our desire to know ‘how many’ are important for conservation,” Madsen said. “We want to keep track of our natural resources. We don’t want to end up depleting them, because we can’t tell we are taking too much.”

How many? The missing bats and birds

Madsen’s collaborators, Manuela Huso and Dan Dalthorp, from the USGS Forest and Rangeland Ecosystem Science Center in Corvallis are contributing new statistical models, estimators and software tools to improve bird and bat fatality estimates at solar and wind power facilities. Huso initiated the research 10 years ago to come up with improved models and methods of estimating the count of carcasses. Dalthorp joined her shortly thereafter; Madsen began collaborating with the USGS team in a more substantial capacity during her sabbatical two years ago.

Last year, the team along with collaborators from consulting firm, Western EcoSystems Technology, Inc, data science lab DAPPER Stats, the Swiss Ornithological Institute, and Duke University developed a software package called GenEst (a generalized estimator of mortality) — a suite of statistical models and software tools specifically designed for estimating the total number of creatures arriving in an area during a specific time period when their detection probability is unknown but estimable. The latter can also be used more generally to estimate the size of open populations with imperfect detection probabilities.

However, as Madsen’s research on fatalities at wind farms shows, estimating an accurate count is anything but a straightforward process. In the case of wildlife fatalities due to collision with wind turbines or solar panels, carcasses invariably go missing, carried away by scavengers or fall in areas inaccessible to searchers. Therefore, simple counts of carcasses found at wind farms do not reflect the actual number of fatalities.

Madsen and her colleagues have developed complex statistical tools that estimate the actual number of carcasses when they are undetectable for any reason by taking into account a host of predictor variables such as searcher efficiency, variations in plot sizes and location of inaccessible areas.

Madsen developed a model to use data from field trials to estimate searcher efficiency. This model is incorporated into the larger GenEst model framework. “My collaborators are working on other aspects of the problem: getting a count of missing carcasses by estimating the amount of time a carcass is likely to stay before getting carried away by a predator. It is a highly involved project, where we put all the pieces of the puzzle together along with the uncertainty associated with all of these aspects,” explained Madsen.

“I think that non-statisticians could benefit from learning some statistical principles such as the concept of uncertainty, collecting useful data, and applying appropriate data analysis tools in a given situation.”

The software package, created by the team, will be utilized by government agencies as well as Western EcoSystems Technology, Inc., which has already begun to implement the software to assist their clients. The project has also attracted attention from environmental and government agencies in Canada, South Africa, Portugal and Scotland among others. In addition, the USGS statisticians have conducted workshops demonstrating how to use the software to estimate animal mortality at wind and solar energy facilities. “The methodology is generally applicable to any situation where you want to count something where the detection is not perfect,” said Madsen.

The path to ecological statistics

After graduating from the University of Oregon with a master’s degree in mathematics, Madsen taught mathematics in a community college in New York. She wanted to get a doctorate in math education because she enjoyed teaching the subject. But she quickly discovered it wasn’t an ideal academic match for her. In the meantime, her husband suggested she try a statistics course. Madsen enjoyed the experience and switched to the Ph.D. program in statistics at Cornell University.

She also obtained a minor in natural resources at Cornell, which inspired her to apply statistics to ecological problems. In recent years, Madsen has also worked on numerical models of geological data to estimate the risk of environmental disasters such as leaking oil wells and other phenomena.

Madsen excels at teaching courses on statistical methods to non-statistics students at the graduate and undergraduate levels. She enjoys helping her students develop a statistical mindset as they learn about extending statistical methods to different disciplines.

“I think that non-statisticians could benefit from learning some statistical principles such as the concept of uncertainty, collecting useful data, and applying appropriate data analysis tools in a given situation,” Madsen remarked.

Spiral icon above lit-up cityscape

Synergies unleashed to tackle human health and disease

By Debbie Farris

The mysteries of human health and disease are as numerous as they are elusive. They pose complex problems that demand complex solutions. As science becomes increasingly interdisciplinary, the edges blurring and blending faster than we can name those evolutions, the challenges of human health require that we examine them from multiple perspectives, from biohealth, bioinformatics and biochemistry to chemistry, mathematics and biology.

In the 21st century, human health and disease require that we as scientists working in the life, physical and mathematical sciences collaborate. That we put our heads together, step outside the traditional academic boundaries to ignite new thinking and spur innovative solutions to address the most pressing problems in human health.

The proliferation of data is transforming the scientific landscape. Scientists are grappling with how to analyze and integrate data quickly across disciplines. With the mounting need for better, faster ways to harness vast amounts of information, mathematical and statistical researchers make for natural partners who are well trained to manage and interpret data to deepen understanding of the scale of health issues. This approach enables scientists to test more theories and manage more data to develop a greater, more sophisticated understanding of human health.

This fall the National Science Foundation’s Division of Mathematical Sciences and the National Institutes of Health’s National Library of Medicine launched a Joint Initiative on Generalizable Data Science Methods for Biomedical Research to support the development of innovative and transformative mathematical and statistical approaches to address data-driven biomedical and health challenges.

OSU researchers are harnessing the power of global collaborations to deepen understanding of and to address our most important concerns in human health.

The chemistry behind aging

Biophysicist Elisar Barbar and team discovered that the intrinsically disordered state of the protein ASCIZ, a key transcription factor in cells, plays a major role in regulating production of the protein LC8, a hub protein regulating over 100 other proteins critical to a wide range of life processes from viral infection to tumor suppression to cell death. Her work on intrinsically disorganized proteins, a hot frontier of research in biochemical and medical research today, has far-reaching implications due to their critical role in a vast array of cellular functions.

Colleagues Afua Nyarko and Viviana Perez are studying the chemistry behind the biological processes and the synthesis of biologically active molecules. Nyarko studies protein interactions and their role in the formation of tumors. She is one of a handful of scientists worldwide studying proteins from a structural biology perspective, where detailed information on the structure of specific amino acids can reveal how tumor suppressor proteins inhibit specific growth-promoting proteins.

Perez studies the biological processes of aging, specifically the protein aggregation in neurodegenerative diseases and protein misfolding. She discovered a new function for the compound rapamycin that, with its unusual properties, may help address neurologic damage.

Barbar and Nyarko’s work uses nuclear magnetic resonance to describe molecular structures of proteins. They also focus on protein informatics, from the analysis of experimental mass-spectrometry evidence for proteins to the integration and curation of large-scale data warehouses of protein sequence and functional annotation.

Genetics and bioinformatics

Our bioinformatics researchers are working on groundbreaking developments at the nexus of data science and human health. David Hendrix developed a neural network program that illuminates connections between mutant genetic material and disease. His team used deep learning to decipher which ribonucleic acids (RNA) have the potential to encode proteins, an important step toward better understanding RNA, one of life’s fundamental, essential molecules. Unlocking the mysteries of RNA means knowing its connections to human health and disease.

Hendrix compares it to a tool similar to calculus or linear algebra, but one used to learn biological patterns. Deep learning is helping his team manage vast amounts of data and learn new biological rules that distinguish the function of these types of molecules. He recently teamed up with the Barbar group to develop an algorithm that will predict new proteins that interact with LC8. This validates the importance of LC8 in many systems and opens up new interactions to study, underscoring the power of big data to guide new experiments.

David Koslicki recently discovered that the blood of patients with schizophrenia features genetic material from more types of microorganisms than the blood of people without the debilitating mental illness. His team performed whole-blood transcriptome analyses on 192 people, including healthy people and people with schizophrenia, bipolar disorder and Lou Gehrig’s disease. The findings showed that microbiota in the blood are similar to ones in the mouth and gut. There appears to be some permeability there into the bloodstream.

Koslicki and his collaborators received an NIH grant to build a biomedical translator, a software system that connects various distributed databases of biomedical knowledge and that can “reason” over these data sources to answer relevant biomedical questions. This is one example of how mathematical and computational sciences are syncing with biomedical research to accelerate translation for the scientific community.

Fighting disease

Microbiologist Bruce Geller scored a monumental win against antibiotic resistance. He crafted a compound known as a PPMO that genetically neutralizes a pathogen’s ability to thwart antibiotics. His team designed and tested PPMOs against Klebsiella pneumonia, an opportunistic pathogen that’s difficult to kill and resistant to many antibiotics. A platform technology, PPMOs can be quickly designed or modified to kill nearly any bacterium. They are not found in nature so bacteria have not developed resistance to them. PPMOs may be highly effective therapeutics.

Geller expects that the wave of the future will be molecular medicine, a broad field that draws on physical, chemical, biological, bioinformatics and medical techniques to describe molecular structures and mechanisms, identify molecular and genetic errors of disease and develop interventions. OSU scientists are combining these experimental and mathematical tools to develop anti-viral drugs.

Microbiologist Thomas Sharpton made a key advance toward understanding which of the trillions of gut microbes may play important roles in how humans and other mammals evolve. His global team created a new algorithm and software to taxonomize and clarify key microbial clades, or groups of microbes that appear frequently across mammalian species. A Western lifestyle tends to reduce microbial diversity so knowing which clades have been evolutionarily conserved opens up potential health interventions.

earth from space

Bridging computer science and statistics to optimize results from "Big Data"

The Spring 2017 Milne Lecture on big data

The spring 2017 Milne Lecture features Michael I. Jordan, the Pehong Chen Distinguished Professor in the Department of Electrical Engineering and Computer Science and the Department of Statistics at the University of California, Berkeley. He will discuss “On Computational Thinking, Inferential Thinking and Data Science."

Professor Michael I. Jordan in front of grey backdrop

Professor Michael I. Jordan

Hosted by the Department of Statistics, the spring Milne Lecture will be held on Tuesday, May 16 at 4 pm in the Learning and Innovation Center, Room 128. The Milne Lecture in Mathematics, Statistics and Computer Science is a collaborative series of distinguished lectures launched in 1981 to honor founding Mathematics Department Chair and William Edmond Milne, a pioneer in numerical analysis.

In his lecture, Jordan will discuss how the rapid growth in the size and scope of datasets in science and technology has created a need for novel foundational perspectives on data analysis that blend the inferential and computational sciences. That classical perspectives from these fields are not adequate to address emerging problems in "Big Data" is apparent from their sharply divergent nature at an elementary level. In computer science, for example, the growth of the number of data points is a source of "complexity" that must be tamed via algorithms or hardware, whereas in statistics the growth of the number of data points is a source of "simplicity" in that inferences are generally stronger and asymptotic results can be invoked.

On a formal level, the gap is made evident by the lack of a role for computational concepts such as "runtime" in core statistical theory and the lack of a role for statistical concepts such as "risk" in core computational theory. Jordan will present several research vignettes aimed at bridging computation and statistics, including the problem of inference under privacy and communication constraints, and methods for trading off the speed and accuracy of inference.

Michael I. Jordan is the Pehong Chen Distinguished Professor in the Department of Electrical Engineering and Computer Science and the Department of Statistics at the University of California, Berkeley. He received his Masters in Mathematics from Arizona State University, and earned his Ph.D. in Cognitive Science in 1985 from the University of California, San Diego. He was a professor at Massachusetts Institute of Technology from 1988 to 1998. His research interests bridge the computational, statistical, cognitive and biological sciences, and have focused in recent years on Bayesian nonparametric analysis, probabilistic graphical models, spectral methods, kernel machines and applications to problems in distributed computing systems, natural language processing, signal processing and statistical genetics.

Professor Jordan is a member of the National Academy of Sciences, a member of the National Academy of Engineering and a member of the American Academy of Arts and Sciences. He is a Fellow of the American Association for the Advancement of Science. He has been named a Neyman Lecturer and a Medallion Lecturer by the Institute of Mathematical Statistics. He received the International Joint Conference on Artificial Intelligence Research Excellence Award in 2016, the David E. Rumelhart Prize in 2015 and the Association for Computer Machinery (ACM)/Association for the Advancement of Artificial Intelligence (AAAI) Allen Newell Award in 2009. He is a Fellow of the AAAI, ACM, American Statistical Association, Cognitive Science Society, Institute for Electrical and Electronics Engineers, Institute of Mathematics and Statistics, International Society for Bayesian Analysis and Society for Industrial and Applied Mathematics.

Support for the Milne Lectures comes from a generous gift from the Milne family as well as support from the College of Science’s Departments of Mathematics and Statistics, the College of Engineering‘s School of Electrical Engineering and Computer Science and from the Center for Genome Research and Biocomputing at OSU.

Subscribe to Interdisciplinary