Skip to main content

Data, AI and Robotics

Data, AI and Robotics

Bird flying next to windmills

Making green energy safer for wildlife with statistics

By Srila Nayak

Wind turbines and swan in the dutch province of Flevoland

Associate Professor of statistics Lisa Madsen and statisticians from the United States Geological Survey (USGS) have come together to develop methodology to estimate the total mortality of bats, birds and other small creatures on wind farms and solar facilities. The Endangered Species Act requires that wind farms pay particular attention to endangered or threatened species such as golden eagles, brown pelicans, whooping cranes, condors and Indiana bats, which are killed when they accidentally collide with turbine blades.

“We want to keep track of our natural resources. We don’t want to end up depleting them, because we can’t tell we are taking too much.”

Monitoring fatalities at wind energy facilities can help government agencies, such as the U.S. Fish & Wildlife Service and the Bureau of Land Management, make better decisions about species management. Developing statistically accurate fatality prediction and estimation tools and monitoring protocols can also help agencies ensure that renewable energy facilities developers design operations to minimize the impact to wildlife, thus reducing environmental damage. “Fundamentally, what people want to know is ‘how many?’. This idea of keeping count and our desire to know ‘how many’ are important for conservation,” Madsen said. “We want to keep track of our natural resources. We don’t want to end up depleting them, because we can’t tell we are taking too much.”

How many? The missing bats and birds

Madsen’s collaborators, Manuela Huso and Dan Dalthorp, from the USGS Forest and Rangeland Ecosystem Science Center in Corvallis are contributing new statistical models, estimators and software tools to improve bird and bat fatality estimates at solar and wind power facilities. Huso initiated the research 10 years ago to come up with improved models and methods of estimating the count of carcasses. Dalthorp joined her shortly thereafter; Madsen began collaborating with the USGS team in a more substantial capacity during her sabbatical two years ago.

Last year, the team along with collaborators from consulting firm, Western EcoSystems Technology, Inc, data science lab DAPPER Stats, the Swiss Ornithological Institute, and Duke University developed a software package called GenEst (a generalized estimator of mortality) — a suite of statistical models and software tools specifically designed for estimating the total number of creatures arriving in an area during a specific time period when their detection probability is unknown but estimable. The latter can also be used more generally to estimate the size of open populations with imperfect detection probabilities.

However, as Madsen’s research on fatalities at wind farms shows, estimating an accurate count is anything but a straightforward process. In the case of wildlife fatalities due to collision with wind turbines or solar panels, carcasses invariably go missing, carried away by scavengers or fall in areas inaccessible to searchers. Therefore, simple counts of carcasses found at wind farms do not reflect the actual number of fatalities.

Madsen and her colleagues have developed complex statistical tools that estimate the actual number of carcasses when they are undetectable for any reason by taking into account a host of predictor variables such as searcher efficiency, variations in plot sizes and location of inaccessible areas.

Madsen developed a model to use data from field trials to estimate searcher efficiency. This model is incorporated into the larger GenEst model framework. “My collaborators are working on other aspects of the problem: getting a count of missing carcasses by estimating the amount of time a carcass is likely to stay before getting carried away by a predator. It is a highly involved project, where we put all the pieces of the puzzle together along with the uncertainty associated with all of these aspects,” explained Madsen.

“I think that non-statisticians could benefit from learning some statistical principles such as the concept of uncertainty, collecting useful data, and applying appropriate data analysis tools in a given situation.”

The software package, created by the team, will be utilized by government agencies as well as Western EcoSystems Technology, Inc., which has already begun to implement the software to assist their clients. The project has also attracted attention from environmental and government agencies in Canada, South Africa, Portugal and Scotland among others. In addition, the USGS statisticians have conducted workshops demonstrating how to use the software to estimate animal mortality at wind and solar energy facilities. “The methodology is generally applicable to any situation where you want to count something where the detection is not perfect,” said Madsen.

The path to ecological statistics

After graduating from the University of Oregon with a master’s degree in mathematics, Madsen taught mathematics in a community college in New York. She wanted to get a doctorate in math education because she enjoyed teaching the subject. But she quickly discovered it wasn’t an ideal academic match for her. In the meantime, her husband suggested she try a statistics course. Madsen enjoyed the experience and switched to the Ph.D. program in statistics at Cornell University.

She also obtained a minor in natural resources at Cornell, which inspired her to apply statistics to ecological problems. In recent years, Madsen has also worked on numerical models of geological data to estimate the risk of environmental disasters such as leaking oil wells and other phenomena.

Madsen excels at teaching courses on statistical methods to non-statistics students at the graduate and undergraduate levels. She enjoys helping her students develop a statistical mindset as they learn about extending statistical methods to different disciplines.

“I think that non-statisticians could benefit from learning some statistical principles such as the concept of uncertainty, collecting useful data, and applying appropriate data analysis tools in a given situation,” Madsen remarked.

Spiral icon above lit-up cityscape

Synergies unleashed to tackle human health and disease

By Debbie Farris

The mysteries of human health and disease are as numerous as they are elusive. They pose complex problems that demand complex solutions. As science becomes increasingly interdisciplinary, the edges blurring and blending faster than we can name those evolutions, the challenges of human health require that we examine them from multiple perspectives, from biohealth, bioinformatics and biochemistry to chemistry, mathematics and biology.

In the 21st century, human health and disease require that we as scientists working in the life, physical and mathematical sciences collaborate. That we put our heads together, step outside the traditional academic boundaries to ignite new thinking and spur innovative solutions to address the most pressing problems in human health.

The proliferation of data is transforming the scientific landscape. Scientists are grappling with how to analyze and integrate data quickly across disciplines. With the mounting need for better, faster ways to harness vast amounts of information, mathematical and statistical researchers make for natural partners who are well trained to manage and interpret data to deepen understanding of the scale of health issues. This approach enables scientists to test more theories and manage more data to develop a greater, more sophisticated understanding of human health.

This fall the National Science Foundation’s Division of Mathematical Sciences and the National Institutes of Health’s National Library of Medicine launched a Joint Initiative on Generalizable Data Science Methods for Biomedical Research to support the development of innovative and transformative mathematical and statistical approaches to address data-driven biomedical and health challenges.

OSU researchers are harnessing the power of global collaborations to deepen understanding of and to address our most important concerns in human health.

The chemistry behind aging

Biophysicist Elisar Barbar and team discovered that the intrinsically disordered state of the protein ASCIZ, a key transcription factor in cells, plays a major role in regulating production of the protein LC8, a hub protein regulating over 100 other proteins critical to a wide range of life processes from viral infection to tumor suppression to cell death. Her work on intrinsically disorganized proteins, a hot frontier of research in biochemical and medical research today, has far-reaching implications due to their critical role in a vast array of cellular functions.

Colleagues Afua Nyarko and Viviana Perez are studying the chemistry behind the biological processes and the synthesis of biologically active molecules. Nyarko studies protein interactions and their role in the formation of tumors. She is one of a handful of scientists worldwide studying proteins from a structural biology perspective, where detailed information on the structure of specific amino acids can reveal how tumor suppressor proteins inhibit specific growth-promoting proteins.

Perez studies the biological processes of aging, specifically the protein aggregation in neurodegenerative diseases and protein misfolding. She discovered a new function for the compound rapamycin that, with its unusual properties, may help address neurologic damage.

Barbar and Nyarko’s work uses nuclear magnetic resonance to describe molecular structures of proteins. They also focus on protein informatics, from the analysis of experimental mass-spectrometry evidence for proteins to the integration and curation of large-scale data warehouses of protein sequence and functional annotation.

Genetics and bioinformatics

Our bioinformatics researchers are working on groundbreaking developments at the nexus of data science and human health. David Hendrix developed a neural network program that illuminates connections between mutant genetic material and disease. His team used deep learning to decipher which ribonucleic acids (RNA) have the potential to encode proteins, an important step toward better understanding RNA, one of life’s fundamental, essential molecules. Unlocking the mysteries of RNA means knowing its connections to human health and disease.

Hendrix compares it to a tool similar to calculus or linear algebra, but one used to learn biological patterns. Deep learning is helping his team manage vast amounts of data and learn new biological rules that distinguish the function of these types of molecules. He recently teamed up with the Barbar group to develop an algorithm that will predict new proteins that interact with LC8. This validates the importance of LC8 in many systems and opens up new interactions to study, underscoring the power of big data to guide new experiments.

David Koslicki recently discovered that the blood of patients with schizophrenia features genetic material from more types of microorganisms than the blood of people without the debilitating mental illness. His team performed whole-blood transcriptome analyses on 192 people, including healthy people and people with schizophrenia, bipolar disorder and Lou Gehrig’s disease. The findings showed that microbiota in the blood are similar to ones in the mouth and gut. There appears to be some permeability there into the bloodstream.

Koslicki and his collaborators received an NIH grant to build a biomedical translator, a software system that connects various distributed databases of biomedical knowledge and that can “reason” over these data sources to answer relevant biomedical questions. This is one example of how mathematical and computational sciences are syncing with biomedical research to accelerate translation for the scientific community.

Fighting disease

Microbiologist Bruce Geller scored a monumental win against antibiotic resistance. He crafted a compound known as a PPMO that genetically neutralizes a pathogen’s ability to thwart antibiotics. His team designed and tested PPMOs against Klebsiella pneumonia, an opportunistic pathogen that’s difficult to kill and resistant to many antibiotics. A platform technology, PPMOs can be quickly designed or modified to kill nearly any bacterium. They are not found in nature so bacteria have not developed resistance to them. PPMOs may be highly effective therapeutics.

Geller expects that the wave of the future will be molecular medicine, a broad field that draws on physical, chemical, biological, bioinformatics and medical techniques to describe molecular structures and mechanisms, identify molecular and genetic errors of disease and develop interventions. OSU scientists are combining these experimental and mathematical tools to develop anti-viral drugs.

Microbiologist Thomas Sharpton made a key advance toward understanding which of the trillions of gut microbes may play important roles in how humans and other mammals evolve. His global team created a new algorithm and software to taxonomize and clarify key microbial clades, or groups of microbes that appear frequently across mammalian species. A Western lifestyle tends to reduce microbial diversity so knowing which clades have been evolutionarily conserved opens up potential health interventions.

arial view of citizens walking through busy intersection in Japan

Cities’ population, transportation patterns affect how flu epidemics play out

By Steve Lundeberg

Flu epidemics in cities

The more people a city has and the more organized its residents’ movement patterns, the longer its flu season is apt to last, according to population biologist Benjamin Dalziel.

The findings, published today in Science, are an important step toward predicting outbreak trends for a viral infection that each year in the United States sickens millions of people, sends hundreds of thousands to the hospital and kills tens of thousands.

Dalziel, the corresponding author of the new study, worked with an international collaboration to analyze weekly flu incidence data from 603 cities of varying size and “structure” – that is, patterns people follow in where they live and work.

The other factor the researchers looked at was the role a key weather metric – specific humidity – played in flu epidemics.

Flu is transmitted by virus-bearing moisture droplets that people exhale, cough out or sneeze out, creating a “cloud of risk” that emanates from an infected person and is breathed in by those around him or her.

“As specific humidity decreases, the virus remains viable in the air for longer, effectively expanding that cloud,” Dalziel said. “However, if an infected person is right beside you, it matters less what the specific humidity is.”

Which is where city size and structure come in – if there are lots of people, and transportation patterns frequently draw them together, it helps flu viruses find new hosts even when climatic conditions aren’t at their most favorable for transmission.

For the full story, click here.

James Molyneux standing in front of Kidder Hall

Statistician who helped create new data science curriculum for California high schools joins OSU

By Srila Nayak

James Molyneux, assistant professor in statistics

The College of Science welcomes James Molyneux, who joined the Department of Statistics as an assistant professor in Fall 2018. Molyneux joined the department from UCLA, where he completed his dissertation on earthquake forecasting models based on statistical and computational methods.

In his new role, Molyneux teaches a wide variety of undergraduate and graduate courses, including online courses, to both statistics students and those from majors in engineering and biological sciences in the areas of data analytics, statistical methods and theory.

In addition to research on statistical seismology, Molyneux brings deep expertise in statistics pedagogy and education to OSU. As a doctoral student, he collaborated with his professors, high school educators, and other graduate students to create a project on statistics education funded by the National Science Foundation. The result is an innovative Introduction to Data Science (IDS) curriculum, which introduces high school students to data and statistics.

Part of a math-science partnership grant between UCLA and the Los Angeles Unified School District, IDS has been designated as a core math course and has been implemented in 14 southern California high school districts with further plans of scaling it to other school districts in the United States and even abroad.

A revolutionary approach to 21st-century mathematical learning, the year-long course engages students with real data, introducing statistical, computational and graphical tools for reasoning about the world.

Molyneux is excited about exploring the possibility of introducing Oregon high school students to data, statistics and coding through IDS. He had an opportunity to introduce the IDS program to the Oregon Department of Education during a Math Pathways seminar in December.

“There has been a lot of interest in changing how mathematics is taught in high schools in Oregon,” observed Molyneux. “What if we didn’t make every student learn calculus, and introduced them to data science instead?”

In recent times, educators have begun to question the longstanding tradition of high school mathematics curriculum, whose mainstays have been the much-feared algebra II and calculus courses, arguing in favor of multiple math pathways towards graduation and college, which would include new courses in data science, statistics and programming.

“I think students find a lot of utility and value in being shown how to type instructions to a computer in a coding language, hit enter and have something happen based on what they are writing. It teaches them a lot of fundamental statistical ideas and how to be a good citizen by learning to evaluate data critically and detect misrepresented graphs and data,” said Molyneux.

Molyneux eagerly looks forward to utilizing his experiences and background in statistics pedagogy in the classroom. His teaching is also guided by his own experience of transformation.

An indifferent student of mathematics during his undergraduate years at California State University, Fullerton, Molyneux’s academic interests underwent a radical metamorphosis when he crossed paths with a brilliant teacher in a calculus II class.

“I had barely passed calculus I, but my teacher — Kathy Lewis — changed everything for me. That’s when I thought for the first time math is for me.” The realization prompted him to add a major in mathematics along with his major in economics, which eventually led to several classes in statistics and a Ph.D. in statistics.

“Having come from a place where I did not initially like math, I really want to expose people to this field I fell in love with and why they may like it too. You can do powerful things with statistics; it has real-life applications. Enabling students to find meaning in statistics has a lot of value for me,” said Molyneux.

He is excited to collaborate with statistics colleagues and others on campus on several new projects. Some of these include creating software for hydrologists and joining forces with OSU’s Center for Genomics Research and Biotechnology to fashion a data science program for students from rural communities in Oregon, which will impart skills in data analytics and statistical applications in natural resources and agriculture.

“I am delighted to be here. The department has a data analytics program which is growing fast and it’s very fulfilling to be a part of it. The statistics faculty are incredible teachers and researchers. It has definitely been a highlight to get to do statistics with so many talented people,” said Molyneux.

A native of La Habra, California, Molyneux enjoys hiking, cycling and discovering the restaurants in Salem, Ore., where he resides. He harbors a dream to reach the summits of all the mountains in Oregon.

Thomas Sharpton with colleague looking at samples in lab

From scientific ideas to innovative solutions in the marketplace

Innovation Days

The College of Science is launching a transformative program to support and strengthen innovation and entrepreneurship that will enable us to better identify, validate, and develop the commercial impact of basic research. Innovation Days will bring together faculty, faculty research assistants and research associates to discuss and learn about moving basic research ideas and discoveries from the lab to commercial applications and practical solutions.

Co-hosted by the College of Science and the Office of Commercialization and Corporate Development (OCCD), Innovation Days will host its first session on January 7, 2019, 2:30-5 pm followed by a reception from 5-6 p.m. The deadline to register is December 14, 2018. Additional sessions to follow on February 4, April 1 and April 29.

Innovation Days is designed to build awareness and engagement with experts who will help advance and propel the OSU innovation enterprise. Workshop participants will learn about resources to:

  • Leverage basic research and research funding opportunities toward application
  • Increase the impact of basic research through patents and commercialization
  • Validate broader impacts of research projects to enhance proposal success
  • Connect with local innovation ecosystem and identify pathways to translate research to application
  • Create opportunities with industry
  • Integrate invention disclosures, patent applications, and company formation into day-to-day work to advance your career

Facilitators represent and support the many pathways available to successfully transfer technology and commercialize scientific research. The workshop series includes: Berry Treat, director of OCCD, who will provide an overview of his office and how it supports the research to industry pathway; Joe Christison, senior intellectual property and licensing manager at OCCD, who will introduce participants to technology transfer at OSU; Katie Pettinger, commercialization catalyst at OSU Advantage Accelerator, who will discuss startup support available to OSU researchers; chemistry professor Rich Carter, who will share his success story as an inventor; and Chris Stoner, senior industry contracts manager, OCCD, who will discuss the development of appropriate and effective research agreements with companies.

track ripped up from earthquake

Research finds quakes can systematically trigger other ones on opposite side of Earth

By Steve Lundeberg

Big earthquakes trigger smaller earthquakes

New research shows that a big earthquake can not only cause other quakes, but large ones, and on the opposite side of the Earth.

The findings, published Aug. 2 in Nature Scientific Reports, are an important step toward improved short-term earthquake forecasting and risk assessment.

Statistician Debashis Mondal, collaborating with Robert O'Malley and Michael Behrenfeld of the College of Agricultural Sciences and Chris Goldfinger of the College of Earth, Ocean and Atmospheric Sciences, looked at 44 years of seismic data and found clear evidence that temblors of magnitude 6.5 or larger trigger other quakes of magnitude 5.0 or larger.

It had been thought that aftershocks – smaller magnitude quakes that occur in the same region as the initial quake as the surrounding crust adjusts after the fault perturbation – and smaller earthquakes at great distances – were the main global effects of very large earthquakes.

But the OSU analysis of seismic data from 1973 through 2016 – an analysis that excluded data from aftershock zones – using larger time windows than in previous studies, provided discernible evidence that in the three days following one large quake, other earthquakes were more likely to occur.

“The test cases showed a clearly detectable increase over background rates,” said the study’s corresponding author, Robert O’Malley. “Earthquakes are part of a cycle of tectonic stress buildup and release. As fault zones near the end of this seismic cycle, tipping points may be reached and triggering can occur.”

The higher the magnitude, the more likely a quake is to trigger another quake. Higher-magnitude quakes, which have been happening with more frequency in recent years, also seem to be triggered more often than lower-magnitude ones.

For the full story, click here.

picture of Microbiomes

Statistical innovations help decode the human microbiome

Gut Microbiota

The human microbiome—the vast collection of microorganisms living in and on the bodies of humans—can lead us to a better understanding of human health and disease, not to mention accelerate the development of therapeutic drugs. However, the vastness and complexity of microbiome data require advances in statistical methodology and software for an accurate analysis of host-microbiome interactions. Statistics faculty Yuan Jiang, Duo Jiang and Thomas Sharpton are developing novel statistical methods to bridge the gap between the human microbiome and microbiome-based healthcare.

They were awarded a prestigious four-year $770K grant by the National Institute of General Medical Sciences (NIGMS), one of the U.S. National Institutes of Health (NIH). Yuan Jiang, associate professor of statistics, is the lead researcher and principal investigator on the project, “Network-based statistical methods to decode interactions within microbiomes.” Duo Jiang, assistant professor of statistics and Thomas Sharpton, assistant professor of microbiology and statistics, are co-investigators on this grant.

This project will advance scientific understanding of the functions and operations of microbiomes by developing statistical methods and models to study biological interactions between microbes or between microbes and their host.

“The new statistical methodologies will leverage recent advances in graphical models and high dimensional statistics to tackle unmet analytical challenges encountered in the analysis of modern microbiome data,” said Duo Jiang.

Interest in the role of the microbiome in human health and disease has increased rapidly within the last decade. However, available tools and technologies do not adequately capture the full scope and complexities of microbial interactions within a community. For example, a correlation type analysis employed to model microbial interactions cannot filter out misleading co-occurrence patterns in a community: two microbes that independently interact with a third but not with one another may appear to correlate.

“The currently used statistical models fail to account for specific properties of microbiome data, including its heterogeneous compositional count nature, the complex environmental context, and its evolutionary structure,” Yuan Jiang explained.

“Additionally, existing algorithms are often not scalable to the huge size of microbiome data. Therefore, new statistical methods and algorithms need to be developed to better answer the scientific questions.”

The NIGMS grant will help Jiang and his team pioneer new statistical methods “built on conditional dependencies that disentangle biological interactions from marginal correlations to produce mechanistically and evolutionarily relevant network models of how microbes interact with one another and their host.”

The methods and software produced by this project will “transform the discovery of how these microbes interact with one another and influence or respond to human physiology.” A broader understanding of microbiomes and their role in disease etiology will open the doors to engineer and utilize microbiomes important to human health to develop new drugs, therapeutic probiotics and clinical diagnostics.

The grant will support graduate research assistants (GRAs). Two GRAs from statistics and one GRA from microbiology will be a part of this interdisciplinary collaboration. “Such a form provides students with opportunities for experiential learning in diverse scientific areas (e.g., statistics, computer science, microbiology, evolution, and genetics) as well as experience in teamwork and interdisciplinary research,” said Yuan Jiang.

Sastry Pantula shaking hands with Mukherjee president

Oregon State statisticians in Hyderabad, India

Sastry Pantula at the 2017 International Indian Statistical Association (IISA) Conference

Statisticians from Oregon State University are in Hyderabad, India for the 2017 International Indian Statistical Association (IISA) Conference, December 27-30, 2017. The conference will take place at the Hyderabad International Convention Center. The theme of the conference is "Statistics and Data Science for Better Life, Society and Science."

Professor Sastry Pantula is a panelist in a discussion on Women in Statistics and Data Science. He is also a speaker in a special panel, entitled "Are Statisticians Prepared for the Data Science challenge?-A Career Development Panel." Pantula is a member of 2017 IISA's International Advisory Committee.

Assistant Professor Sharmodeep Bhattacharyya will present his research at a session on "Estimation and Inference in Networks and Graphical Models." Bhattacharyya will also chair a session on "Probability, Random Matrices, Big Data."

The 2016 IISA conference was hosted by the Department of Statistics at Oregon State at the Learning Innovation Center on campus, August 18-21. Emphasizing the theme of "Statistical and Data Sciences: A Key to Healthy People, Planet and Prosperity," the conference was attended by 200 statisticians from the United States and other parts of the world.

Read more: Welcoming hundreds of statisticians to campus
International statistics conference comes to campus

arial shot of Baltimore cityscape

Statistics researchers at JSM 2017

Joint Statistical Meetings 2017 hosted in Baltimore

Faculty and students from the Department of Statistics are participating in the Joint Statistical Meetings (JSM) 2017 in Baltimore, July 29 - August 3. JSM is one of the largest statistical conferences in the world, hosting more than 6,000 statisticians from academia, industry and government and featuring more than 600 research sessions and poster presentations.

Topics at JSM 2017 range from statistical applications to methodology and theory to the expanding boundaries of statistics, such as analytics and data science.

College of Science Dean Sastry Pantula, who is also a Professor of Statistics, will be the luncheon speaker and present a talk, "Strengths, Opportunities and Challenges in the era of BIG Data: An Asian Statistician Perspective." He will discuss: The strengths Asian statisticians bring to the profession, opportunities that exist for Asian statisticians in the era of BIG data across all sectors, how universities and professional societies can help build future leaders in statistics, the needs and challenges Asian statisticians face and how Asian statisticians can strive for excellence, enhance diversity and foster harmony in the profession.

Pantula is delivering a talk during the Pre-Conference Workshop, which is part of a continuing education course, " Preparing Statisticians for Leadership: How to See the Big Picture and Have More Influence, Part 2." The course, which is being held Sunday morning on July 30, addresses what leadership is and how statisticians can improve and demonstrate leadership to affect their organizations. It features leaders from all sectors of statistics speaking about their personal journeys and offering guidance on personal leadership development with a focus on the larger organizational/business view and influence.

Pantula is also participating in a panel discussion of current and former Deans and Provosts of Arts and Sciences who are statisticians. Panelists will share their perspectives and experiences about how to advance the mission of statistics departments in the current university environment. The panel is on Thursday, August 3.

All OSU Statistics alumni are welcomed at a special reception for them on Tuesday, August 1 from 5:30-7:00 p.m. at the Hilton Hotel on Pratt Street in Tubman A. The event offers the perfect occasion to reconnect with other alumni, OSU faculty, students and friends. We look forward to catching up with alumni to hear about their accomplishments and successes!

Below is a complete list of our faculty and student who are presenting talks and poster presentations at JSM 2017. Many of our faculty and alumni are also representing OSU on various committees, are presenting papers on which they are co-authors and participating in other ways at the conference, but are not listed below.

student working on math homework holding calculator

Students experience a summer of state-of-the-art data science research

By Srila Nayak

Research Experiences for Undergraduates (REU)

Two new Research Experiences for Undergraduates (REU) in the field of statistics are training students in cutting-edge and advanced data analytics and computational skills essential to interdisciplinary research across the fields of statistics, microbiology and quantitative sciences.

New grant trains students in data analytics

The Department of Statistics was awarded its first Research Experiences for Undergraduates (REU) grant this year. The National Science Foundation’s REU program supports comprehensive, hands-on research experience for undergraduate students in the STEM fields, and awards funds to initiate and conduct projects that engage a number of students in research.

Statistics faculty Yanming Di, Lan Xue, Thomas Sharpton (PI; joint appointment in microbiology), Duo Jiang and Yuan Jiang received the NSF REU grant from the American Statistical Association (ASA). The $380K project is for 2016-2018, to support three REU sites per year, for a total of nine REU sites across the country. The grant funds 10 weeks of research and training for four undergraduate students at each site. Each student receives a stipend of $8,000.

The overall objective of the ASA-supported REU program is to promote undergraduate research experiences in statistics and to prepare students for graduate study in statistics. According to ASA, "The students will see how statistics has an impact on fields such as engineering, atmospheric science, health care, and all kinds of public policy."

Aaron Huang, Ellen Kulinksy, Betsy Hensel and Shelby Taylor standing with a sheep and llama

Statistics REU students (from l to r) Aaron Huang, Ellen Kulinski, Betsy Hensel and Shelby Taylor enjoy the outdoors in Corvallis.

Oregon State University was chosen by ASA as one of three REU sites this year and is currently hosting students from across the country from June 19 until August 25. The four REU participants were selected from more than a hundred applicants in a highly competitive process.

The REU students are Shelby Taylor from Brigham Young University, Aaron Huang from the University of Washington, Ellen Kulinsky from the University of California-Berkeley and Elizabeth Hensel from the University of Virginia.

"The students are stellar and they are doing a fantastic job with the research. They were selected for this transdisciplinary REU because of their prior knowledge and preparation in both statistics and biology. We are very lucky to have them here," said Sharpton.

The students are gaining exposure to the entire data analysis process as it relates to biological research. They analyze DNA sequence data and use statistical methods to determine how the types of bacteria that live in the gut, known as the gut microbiome, relate to human health, lifestyle and environmental conditions.

Specifically, the students are analyzing data from a large, crowd-sourced, citizen science project led by American Gut which collects human samples, ranging from saliva to stool, along with questionnaire responses about individual lifestyles and diet. Working closely with an interdisciplinary team of faculty, each student will conduct a complete data analysis, which includes data quality control, applying statistical and bioinformatics techniques, data visualization. Faculty members will train students in all of the techniques and skills that they need to complete the project.

Students will analyze the data to gather information on how the gut microbiome varies across individuals and its association with a variety of health and lifestyle factors. Some of the REU projects explore how body mass index affects the microbiome; the relationship between age and the microbiome; and the associations between the gut microbiome and food and alcohol consumption.

REU students acquire knowledge and expertise in statistics and biology through intensive subject lectures. Hands-on experiential learning projects in bioinformatics and biostatistics give them ample opportunities to apply their theoretical and conceptual learning to design experiments and deduce results from complex data sets. The data-driven REU will prepare students to capitalize on the growing professional opportunities in data analytics.

While Sharpton and Jiang are extensively involved in guiding student research projects, the REU is a deeply collaborative process in which the other faculty are serving in important mentorship roles and providing expertise in theoretical and computational subject areas. Di, for example, has been teaching students computer coding and programming. Students are also learning R, LaTeX and Matlab as a part of their statistical and biological research.

"The REU has provided our faculty an opportunity to work with ambitious and talented undergraduate students. It has also exposed students to cutting-edge microbiome science. Through their research they are seeking answers to novel questions on human health and the microbiome, a new area of study about which much remains to be done," said Sharpton. He hopes to bring a similar opportunity to OSU students in the future.

This REU in the area of microbiome informatics research is a pivotal part of Oregon State Microbiome Initiative (OMBI), and is slated to advance education and research in the statistical, biological and computational sciences at OSU. OMBI launched this spring.

The students' research and learning experiences for the REU project are generously supported by the Department of Statistics, the College of Science and the Center for Genome Research and Biocomputing (CGRB). At CGRB, the students have conducted bioinformatics research and analysis with the aid of its biocomputing cyber infrastructure.

More information is available online.

Photo above: College of Science Dean Sastry Pantula with (l to r) Statistics REU students Ellen Kulinsky, Aaron Huang, Betsy Hensel and Shelby Taylor

The Summer Institute of Statistics targets talented, underrepresented students

Javier Rojo with RUSIS group in front of the Memorial Union

Professor Javier Rojo (far right), who joined the Department of Statistics in January 2017, has moved his award-winning REU site, Research for Undergraduates Summer Institute of Statistics to Oregon State (RUSIS@OSU) from the University of Nevada, Reno. In 2003 as a professor at Rice University, Rojo started the country’s first Research Experiences for Undergraduate (REU) Program in the field of statistics, which has been extremely successful in recruiting, training and guiding underrepresented minority and economically disadvantaged students towards advanced degrees in mathematics and statistics.

The Institute conducts a 10-week intensive summer program for the study of statistics and its applications for a cohort of 12-15 students every year. This summer 12 students, chosen from a pool of 70 applicants, are working on statistical research projects at OSU from June 19-August 24. The REU cohort is 50 percent female and 10 of the 12 students are underrepresented minorities. They hail from institutions such as the University of Texas, El Paso, University of Arizona, Occidental College, California State University, Bakersfield, Fresno State University, Texas State University, San Marcos, Duke, and Harvard among other places.

"One of the benefits of transferring the REU program to OSU is the name recognition that will attract talented students. It is also a great recruitment tool and may inspire students to apply to the Statistics Graduate Programs at OSU," said Rojo.

There is promising data that REUs have a positive impact on graduate recruitment in host institutions. According to Rojo, approximately 10 students who received their Ph.D.s at Rice University had been RUSIS participants during the years the program was housed at Rice.

RUSIS students eating lunch at McMennamins

Current RUSIS students represent a mix of majors including engineering, computer science, mathematics and the social sciences such as psychology. Under Rojo's guidance they are pursuing research on ambitious and exciting projects that involve studying data to measure the impact of the Clean Air Act on environmental pollution, investigating studies on the impact of obesity on the environment from a statistical standpoint, using probabilistic and statistical components to model data for better financial investment decisions as well as various other research projects.

Research projects at RUSIS, Rojo points out, involve heavy computation. The students are undergoing valuable computational training and learning various programming languages such as R, LaTex, Matlab and Mathematica taught by graduate students in the Department of Mathematics. Students also go through a four-week course on statistics and probability that brings them up to speed with statistics.

At the end of the program the students are expected to produce a technical report in LaTex and present a research talk to a scientific advisory committee comprising experts from Georgia Institute of Technology, University of Arizona, University of Michigan, Rice University and the University of Texas, El Paso.

Owing to the paucity of statistics undergraduate programs in the country, Rojo has encountered fewer than one percent statistics majors at RUSIS. Most RUSIS participants come from fields such as biology, business and computer science. But he notes that nearly 30 RUSIS alumni have gone on to earn a Ph.D. in statistics and biostatistics.

"My main objective is to encourage students to obtain a Ph.D. in statistics if they have the opportunity to do so," said Rojo.

The program has been variously supported and funded by the NSF and the National Security Agency (NSA) for the last 15 years. Owing to Rojo’s sustained efforts and leadership, the American Mathematical Society (AMS) selected his REU program for its award “Mathematics Programs That Make a Difference” in 2014.

The AMS award citation states, "As the first REU in Statistics, RUSIS has served as a model program for others to emulate, both by encouraging undergraduates to pursue graduate studies in the mathematical sciences and by increasing the numbers of underrepresented minorities and women in mathematics and statistics."

Under Rojo’s leadership, the program has taken phenomenal strides: After 10 years, the REU program reported that 85% of the undergraduates who attended the Summer Institute were admitted to Ph.D. programs around the country, with roughly 61% of students hailing from underrepresented populations and 53% of the participants have been female.

These impressive results were achieved through “intensive statistics courses, close supervision of research projects and visits to various research institutes and agencies in the area” according to Rojo, who is responsible for the students’ computational training and research projects.

Read more: Internationally renowned statistician joins faculty

Subscribe to Data, AI and Robotics