How big data is flipping the scientific method on its head

December 5, 2022

Dr. Lucy Gaov

Science—especially the life sciences—is generating more and more data each year, and sophisticated statistical methods are required to help make sense of it. With so much data available, there's been a shift that flies in the face of the traditional scientific method where a researcher picks a hypothesis and then designs an experiment to test it.

“Science is hard and increasingly complex, and it's not always easy or practical to generate a hypothesis ahead of time,” says Dr. Lucy Gao, who joined UBC’s Department of Statistics in 2022. “Think about it: could you really come up with a compelling hypothesis before collecting any experimental data? As data becomes easier and easier to collect, it becomes more and more tempting to pick a hypothesis after we have experimental data, rather than before.”

This new paradigm is challenging—classical statistical testing does not account for the fact that researchers look at their data in order to choose a hypothesis to explore, and so classical tests wind up overstating the strength of evidence to support the hypothesis in this new paradigm. Dr. Gao guides them through the data, by developing statistical methods to answer scientific questions that have the same statistical guarantees as the classical hypothesis tests.

One of the scientific areas she’s been working in recently involves data collected with a new technology called single-cell RNA-sequencing. It enables researchers to assess gene activity at the single-cell level, and understand how genes behave across thousands of cells in a tissue sample. Given that each person carries between 20,000 and 25,000 protein-coding genes which contain around three billion nucleotides, that’s a lot of numbers to wrangle.

“It’s not like back in the days when you picked one pet gene that you liked and studied it for your entire career,” Dr. Gao says.

Telling a story with data

There are a couple of stages to exploring data sets to gather the information with which to tell a story. First is wading through masses of numbers, often measurements of some aspect, like the height, weight and ages of a population. Visualizations are computed, summaries are tabulated and averages calculated.

“It can seem like the simplest part of the process, but surprisingly, that's the stage that takes the longest because it's hard to fit an appropriate statistical model and be confident in the result when you don't understand your data very well. It's very easy to inadvertently ask a silly question or fit a model that involves some assumptions that make no sense.”

Dr. Gao explains the importance of statistical modelling by offering the example of news reports on a study showing that eating chocolate is linked to having a happier life.

“When you see a headline like that you ask ‘Why are they so confident?’ Usually, the researchers have fit some kind of a model, then perform some kind of statistical test to determine how much evidence there is to support that model. That's the deeper inquiry stage after you've looked at these summaries and visualizations where you see if you can put a quantitative number on how much evidence there is to support a particular hypothesis.”

Part of the challenge is working with researchers from other disciplines who use different terminologies to explain the scientific question they’re asking.

“A huge part of collaborative research between statisticians and scientists is just trying to meet halfway. We need to formulate the scientific question into a statistical question to be able to answer it using statistics. How do we get there?”

Getting here

Dr. Gao didn’t plan to make a career of biostatistics. She was a good math student in high school, but wasn’t sure what she could do with her talent. She asked her parents who suggested studying financial mathematics.

“They said, that seems like a nice thing to do if you're strong at math and you're looking to have a job at the end of it. I didn't know anything about finance, but I thought I'll give it a try and enrolled in that major at the University of Victoria.”

But she ran into macroeconomics, which bored her to tears. Luckily, one of the requirements for a financial mathematics degree was introductory statistics.

“It was just like a ball rolling downhill. I switched majors and was fortunate enough to start working on some research with faculty during my undergrad. I really enjoyed solving probability puzzles and got further into the discipline. I met scientists and learned about the problems that can be solved with statistics, which allowed me to learn about different scientific areas. I get to learn about the cool science that people are doing.”

Related Links:

Dr. Lucy Gao's faculty profile at UBC Statistics
Dr. Lucy Gao's research website


Musqueam First Nation land acknowledegement

We honour xwməθkwəy̓ əm (Musqueam) on whose ancestral, unceded territory UBC Vancouver is situated. UBC Science is committed to building meaningful relationships with Indigenous peoples so we can advance Reconciliation and ensure traditional ways of knowing enrich our teaching and research.

Learn more: Musqueam First Nation

Faculty of Science

Office of the Dean, Earth Sciences Building
2178–2207 Main Mall
Vancouver, BC Canada
V6T 1Z4
UBC Crest The official logo of the University of British Columbia. Urgent Message An exclamation mark in a speech bubble. Arrow An arrow indicating direction. Arrow in Circle An arrow indicating direction. A bookmark An ribbon to indicate a special marker. Calendar A calendar. Caret An arrowhead indicating direction. Time A clock. Chats Two speech clouds. External link An arrow pointing up and to the right. Facebook The logo for the Facebook social media service. A Facemask The medical facemask. Information The letter 'i' in a circle. Instagram The logo for the Instagram social media service. Linkedin The logo for the LinkedIn social media service. Lock, closed A closed padlock. Lock, open An open padlock. Location Pin A map location pin. Mail An envelope. Mask A protective face mask. Menu Three horizontal lines indicating a menu. Minus A minus sign. Money A money bill. Telephone An antique telephone. Plus A plus symbol indicating more or the ability to add. RSS Curved lines indicating information transfer. Search A magnifying glass. Arrow indicating share action A directional arrow. Spotify The logo for the Spotify music streaming service. Twitter The logo for the Twitter social media service. Youtube The logo for the YouTube video sharing service.