Science Progress | Where science, technology, and progressive policy meet
GENOMICS AND THE SCIENTIFIC METHOD

Fishing for Funding

Big Data is Changing Science. Are Funding Agencies Keeping Up?

SOURCE: Wikipedia Commons A row of DNA sequencing machines. DNA can now be transformed into data, emailed around the world, and analyzed anywhere remarkably cheaply. This has big ramifications for the way genomic science is undertaken and funded.

Dr. Paul Flicek was in his laboratory at the European Bioinformatics Institute in Hinxton, England, one Monday morning when something strange happened. All weekend, data in the form of DNA sequences had been flowing between his institution and collaborators at the National Center for Biotechnology Information in Bethesda, MD. But a little after 9 a.m., everything stopped.

After all, who transmits such quantities of data? Colossal information shipment is hard to escape notice. “We never found out for sure, but to this day we assume that administrators who monitor Internet traffic somewhere came into work Monday morning, were struck by the amount of data going through the routers, and shut things down,” Flicek said.

Genomics gave new meaning to the phrase “big data.” One person’s genome, for instance, consists of 3 billion base pairs. Spelling out the order of, or sequencing, each pair requires about two bits of computer storage, making the whole genome’s storage size 12 billion bits. This translates to about 1.5 gigabytes of data. A modern machine can sequence more than 500 billion base pairs in a week or just over. That is 167 human genomes and 250 gigabytes—or the equivalent of 63,000 standard song files or 200 movie files. The research bottleneck used to be collecting data. Now, the greatest challenge is making sense of it.

Things weren’t always like this. When genomics first began carving its niche in the scientific world, the path to gene discovery was quite different. Researchers pored through recent literature, honed in on a handful of genes that sounded promising, and then, armed with these candidates, designed experiments to test correlation between them and the traits they were suspected to underlie. Call it a paragon of the traditional scientific method: Ask a question, conduct background reading, formulate a specific hypothesis, test it with an experiment, and draw a conclusion.

Over the past 10 or so years, however, meta-reviews of the literature to trace the success of candidate gene methods have disclosed staggeringly abysmal conclusions. Of hundreds of published papers using this approach, only a tiny fraction of results—6 of 166, to be exact—could be consistently replicated. Gradually, candidate gene techniques waned in favor of tactics that scanned the entire genome without any conjecture about the role of any particular gene.

Flicek’s story involved a massive undertaking called the 1000 Genomes Project, an international effort to catalogue a wide array of human genetic variation by inspecting the full genetic makeup of—you guessed it—1,000 people. What are the scientists looking for? In projects like this, often they won’t know until they find it. What is the hypothesis? In a word: vague. The human genome is laden with diversity, both among and within populations.

This way of tackling a scientific problem marks a significant shift from the time-honored scientific method. Those in the field might call it data-driven research, to be contrasted with standard hypothesis-driven science. Critics are more likely to make charges of fishing expeditions. However couched, the change in approach is this: Instead of designing an experiment to test a defined, preconceived hypothesis, researchers first amass large banks of information and then wade through them with the aid of powerful computers to unearth biologically pertinent findings.

For the maneuver to be mathematically robust, data sets must be big. Accordingly, the emergence of the method was fostered by the coupling of biology and computer science that enabled mammoth data production and storage. The approach plays a central role in disciplines ending with “-omics,” which by definition seek to characterize biology in a big way. (Some familiar examples include genomics, which deals with the complete DNA sequences of organisms; proteomics, or the large-scale study of all the proteins; and metabolomics, which involves all small molecules generated in metabolism).

Drifting research paradigms raise questions of who in the scientific community is adapting, how rapidly, and how they are interacting with their more traditional counterparts. Specifically, are those controlling the purse strings caught up on what is happening at the research bench?

A recent article in Nature lends some unique insight. Author Kendall Powell takes readers behind the scenes inside a funding committee of the American Cancer Society, which has funded 44 Nobel Prize laureates, as committee members deliberate through multiple rounds of scrutiny and elimination. Their discussions and decisions shed light on the cherished criteria that filter the haves of research funding from the have-nots. One proposal was cut for allegedly committing a very telling blunder:

Another outstanding application … runs into trouble because of a lack of scientific details. … [the primary reviewer] can’t see how the applicant will filter the genes that are pulled from the proposed screen. The problem with this particular fishing expedition, says the second reviewer, is that “he didn’t explain how he would sort through all the fish”. This proposal, too, is knocked out of the competitive range.

To understand the reviewers’ reasoning, it helps to take a look at the recent history of research funding and the relevant pressures that have developed. Over the last decade, many government science agencies have faced stagnant budgets that at best have kept up with inflation despite increasing numbers of competitive applications. At the National Institutes of Health, or NIH, the largest public funding source for biomedical research in the United States, just less than 20 percent of grant applications were funded in 2009, compared to 32 percent in 2000.

The result is a notoriously grueling application process. Researchers typically begin writing a grant months before the deadline and the entire pipeline of peer review can take up to a year. Subject to scrutiny are the researcher’s background, equipment and facilities needed, time, and most importantly the projected overall impact of the scientific outcome. Innovative, thought-out work with expected output is a must. In 2002 the National Science Foundation, which funds approximately 20 percent of all federally supported basic science research in universities, announced that proposals must demonstrate broader impacts on society in order to be seriously considered. Committee members become increasingly nitpicky, writes Powell, with reviewers “looking for any excuse not to fund a project.”

A prime choice for such an excuse is the fishing accusation, many researchers gripe. In her blog, one scientist observes that she and her colleagues have all received the fishing remark at some point in their proposal reviews, and it was always intended as derogatory. “This kind of hedge trimming suggests that only the safest, most predictable work should be done,” she writes, “and any exploratory tangents should be lopped off early.” She continued in an email to me, “It’s a problem of overabundance of caution.” Dr. Tim Birkhead, a professor of behavioral ecology at the University of Sheffield, voices similar concerns in an article printed in the Times Higher Education. “The scientific research councils seem to be obsessed by hypothesis testing. Many times I have heard it said by referees rejecting a proposal: ‘But there was no hypothesis.’” The problem with this model, he says, is crippling risk aversion. When scientists “basically have to know what they are going to find before putting in a research application,” research becomes “trivially confirmatory and inherently unlikely to discover anything truly novel.”

Still, reluctance to support fishing is not necessarily an assault on big data. “The success of fishing depends on how good your lure is,” explains Dr. Peter Good, a program director at the National Human Genome Research Institute who manages portfolios of grants involving genomic technology development. To get funded, “you have to lay out your ideas – technology-driven or hypothesis-driven – demonstrate what you’re doing is significant, better than anything else out there, and show reviewers you know what you’re doing.”

Dr. Elizabeth Pisani takes that argument one step further in her article, “Has the internet changed science?” What goes on in the laboratory has never been as neat as what gets written in the scientific paper, she points out. The paper follows a template that frames research as a linear story, aligning with the steps of the classic scientific method. Yet the findings that become published are frequently not the ones that were initially pursued. As information accumulates and trends can be detected, researchers can come up with new, increasingly refined hypotheses. Thus, drawing a sharp distinction between data-driven and hypothesis-driven methods, much less presenting this divide as new, is misleading. The two are not conflicting, but complementary. As Peter Good says, “Data-driven really means hypothesis-generating.” It would be silly for a committee to bias (intentionally) against one or the other side of the same coin.

Here is another way to view it: Hypothesis-driven and data-driven do not represent two opposing and nonoverlapping camps of inquiry but rather a continuum addressing the initial idea’s degree of specificity. Data-driven research then falls on one end of that continuum, with a more flexible starting hypothesis. In genomics lingo, that would mean the difference between “we predict a genetic basis underlying this trait” and “we predict that X specific gene is implicated in this trait.”

Different fields have varied traditions about where they fall on that spectrum. Genomics is one where data-driven methods have now been in play for a while, meaning geneticists who sit on review committees are less likely to take a knee-jerk “But where is the hypothesis?” reaction to grant proposals. But departments are integrating—or ignoring—big data at unequal rates. Things get tricky when a committee comprises researchers from diverse backgrounds who subscribe to distinct conventions of how research ought to be conducted. “Issues can crop up when you send a grant to a study section with no geneticists, and they say, ‘this is a fishing expedition,’” says Dr. Matthew State, an associate professor of genetics at Yale University School of Medicine. “You then say ‘Right!’ and have to explain that empirically, it works here.” Often, however, no side is clearly right or wrong. Chalk it up to a clash of scientific cultures.

The scientific method may not be becoming obsolete but it is evolving to exploit the power of modern information technology. Meanwhile, funding agencies are dealing with changing burdens of their own. Difficult decisions must be made and disagreement is expected. There will likely never be a perfect system that will satisfy everyone. Here’s to the goal that the laboratory and funding worlds evolve in a way that is as synchronized and symbiotic as possible.

Ilana Yurkiewicz holds a B.S. from Yale University and was a staff writer at The News & Observer. Currently a clinical research assistant at Walter Reed Army Medical Center, she will matriculate at Harvard Medical School in the fall.

Tags: ,

Comments on this article

By clicking and submitting a comment I acknowledge the Science Progress Privacy Policy and agree to the Science Progress Terms of Use. I understand that my comments are also being governed by Facebook's Terms of Use and Privacy Policy.