How big data is flipping the scientific method on its head

Headshot of Prof. Lucy Gao

Science—especially the life sciences—is generating more and more data each year, and sophisticated statistical methods are required to help make sense of it. With so much data available, there's been a shift that flies in the face of the traditional scientific method where a researcher picks a hypothesis and then designs an experiment to test it.

“Science is hard and increasingly complex, and it's not always easy or practical to generate a hypothesis ahead of time,” says Dr. Lucy Gao, who joined UBC’s Department of Statistics in 2022. “Think about it: could you really come up with a compelling hypothesis before collecting any experimental data? As data becomes easier and easier to collect, it becomes more and more tempting to pick a hypothesis after we have experimental data, rather than before.”

This new paradigm is challenging—classical statistical testing does not account for the fact that researchers look at their data in order to choose a hypothesis to explore, and so classical tests wind up overstating the strength of evidence to support the hypothesis in this new paradigm. Dr. Gao guides them through the data, by developing statistical methods to answer scientific questions that have the same statistical guarantees as the classical hypothesis tests.

One of the scientific areas she’s been working in recently involves data collected with a new technology called single-cell RNA-sequencing. It enables researchers to assess gene activity at the single-cell level, and understand how genes behave across thousands of cells in a tissue sample. Given that each person carries between 20,000 and 25,000 protein-coding genes which contain around three billion nucleotides, that’s a lot of numbers to wrangle.

“It’s not like back in the days when you picked one pet gene that you liked and studied it for your entire career,” Dr. Gao says.

Telling a story with data

There are a couple of stages to exploring data sets to gather the information with which to tell a story. First is wading through masses of numbers, often measurements of some aspect, like the height, weight and ages of a population. Visualizations are computed, summaries are tabulated and averages calculated.

“It can seem like the simplest part of the process, but surprisingly, that's the stage that takes the longest because it's hard to fit an appropriate statistical model and be confident in the result when you don't understand your data very well. It's very easy to inadvertently ask a silly question or fit a model that involves some assumptions that make no sense.”

Dr. Gao explains the importance of statistical modelling by offering the example of news reports on a study showing that eating chocolate is linked to having a happier life.

“When you see a headline like that you ask ‘Why are they so confident?’ Usually, the researchers have fit some kind of a model, then perform some kind of statistical test to determine how much evidence there is to support that model. That's the deeper inquiry stage after you've looked at these summaries and visualizations where you see if you can put a quantitative number on how much evidence there is to support a particular hypothesis.”

Part of the challenge is working with researchers from other disciplines who use different terminologies to explain the scientific question they’re asking.

“A huge part of collaborative research between statisticians and scientists is just trying to meet halfway. We need to formulate the scientific question into a statistical question to be able to answer it using statistics. How do we get there?”

Getting here

Dr. Gao didn’t plan to make a career of biostatistics. She was a good math student in high school, but wasn’t sure what she could do with her talent. She asked her parents who suggested studying financial mathematics.

“They said, that seems like a nice thing to do if you're strong at math and you're looking to have a job at the end of it. I didn't know anything about finance, but I thought I'll give it a try and enrolled in that major at the University of Victoria.”

But she ran into macroeconomics, which bored her to tears. Luckily, one of the requirements for a financial mathematics degree was introductory statistics.

“It was just like a ball rolling downhill. I switched majors and was fortunate enough to start working on some research with faculty during my undergrad. I really enjoyed solving probability puzzles and got further into the discipline. I met scientists and learned about the problems that can be solved with statistics, which allowed me to learn about different scientific areas. I get to learn about the cool science that people are doing.”

Related Links:

Dr. Lucy Gao's faculty profile at UBC Statistics
Dr. Lucy Gao's research website