Modularity (1)

On modularity, and intelligibility of artifacts not designed by humans (1)

or

The problem with modules (1) (wonkish)

Let’s start by recalling this scene from 3017, in which Sally Fowler and Rod Blaine discuss an alien spacecraft.

“Most of the probe’s internal equipment was junk, fused and melted clutters of plastic blocks, remains of integrated circuitry, odd strips of conducting and semiconducting materials jumbled together in no rational order. There was no trace of the shroud lines, no gear for reeling them in, no apertures in the thirty-two projections at one end of the probe. If the shrouds were all one molecule it might explain why they were missing; they would have come apart, changed chemically, when Blaine’s cannon cut them. But how had they controlled the sail? Could the shrouds somehow be made to contract and relax, like a muscle?

An odd idea, but some of the intact mechanisms were just as odd. There was no standardization of parts in the probe. Two widgets intended to do almost the same job could be subtly different or wildly different. Braces and mountings seemed hand carved. The probe was as much a sculpture as a machine.

Blaine read that, shook his head, and called Sally. Presently she joined him in his cabin.

“Yes, I wrote that,” she said. “It seems to be true. Every nut and bolt in that probe was designed separately […] But that’s not all. You know how redundancy works?”

“In machines? Two gilkickies to do one job. In case one fails.”

“Well, it seems that the Moties work it both ways.”

“Moties?”

She shrugged. “We had to call them something. The Mote engineers made two widgets do one job, all right, but the second widget does two other jobs, and some of the supports are also bimetallic thermostats and thermoelectric generators all in one. Rod, I barely understand the words. Modules: human engineers work in modules, don’t they?”

“For a complicated job, of course they do.”

 “The Moties don’t. It’s all one piece, everything working on everything else.”

 Larry Niven and Jerry Pournelle, The Mote in God’s Eye, Pocket Books, Simon and Schuster, New York, New York (1974)

If the biggest cliche in systems biology is that biological systems are robust, the second biggest is probably that they are comprised of modules.

To start, I need to acknowledge that Sally was right. To build systems, human engineers do work in modules, be those built of mechanical parts, electronic circuitry, or lines of code.

For researchers who wish to underatand biological function, or who wish to design and engineer new functions, it becomes a valid question to ask in which cases the concept of modules is relevant to understanding.

Some of the current belief that biological function arises from the assembled actions of modules arises from influential paper. This paper (“From molecular to modular cell biology”, by Lee Hartwell, John Hopfield, Stan Leibler, and Andrew Murray, 1999) stated “We argue here for the recognition of functional ‘modules’ as a critical level of biological organization. Modules are composed of many types of molecule. They have discrete functions that arise from interactions among their components (proteins, DNA, RNA and small molecules), but these functions cannot easily be predicted by studying the properties of the isolated components. We believe that general ‘design principles’ — profoundly shaped by the constraints of evolution — govern the structure and function of modules.”

In subsequent posts, I will advance the argument that this belief in modularity has not worked out very well. In specific I will argue that, just as with robustness, at the level of the cell, many assertions that processes and phenomena in biology are modular are not not, for reasonable meanings of the word module, true. I will also argue that, at the level of the cell, operation based on that belief has not led to greatly increased understanding, and in fact might have kept human researchers from describing biological systems more accurately. And finally, I will assert that operation based the belief that cellular systems are modular has not led to an improved ability to engineer these systems.

For all three arguments, part of the reason will be that biological systems were evolved. As opposed to being designed and built by humans.  Or even by other sentient beings.

Reference

Hartwell, L. H., Hopfield, J. J., Leibler, S. and Murray, A. W. (1999).  From molecular to modular cell biology 402(6761 supp), C47-C52

The motivation for this post is a comment by Rodney Rothstein at a recent meeting.

We were listening to a talk by Molly McQuilken, a PhD student in Amy Gladfelter’s lab. McQuiklen described her work, which involved a fairly cheap (ableit purpose-built) polarization microscope (from Rudolf Oldenburg and his group at Woods Hole MBL (Abrahamsson et al. 2015)) used to image pixels full of signal coming from fusions of yeast septin proteins to GFP. By which means, if the majority of the GFP moieties in the [rigid] septin-GFP fusions in a pixel are oriented in parallel, she can see that this is so and determine the orientation of the fusion proteins in the pixel. And thus, in each pixel, observe and start to understand the concerted actions of these molecules and their interactions with the rest of the cytoskeletal machinery, as (in S. cerevisiae) the gap connecting the mother and the bud forms an ever narrowing hourglass.

This work has been going well since at least 2011 (DeMay et al. 2011). Its current progress among other things illustrates is how marvelously empowered graduate students are these days. Lurking under the work (enabling that power) are extraordinary tools — the GFP, the recombinant methods that allow design and construction of the rigid Septin-GFP fusions, and the polarization microscope. But the main lesson that shined through the talk was the wonderfulness of this relatively small science, and the serene confidence that McQuilken inspired in her listeners that continued work: a) by this person b) in this lab c) working with these methods d) in cerevisiae— was going to reveal new knowledge about these highly-not-understood proteins.  Information that will be directly relevant to understanding, for example, how the otherwise difficult-to-gain-purchase-on mammalian orthologs of these proteins work (imagine that one needed to figure out how SEPT7 product, directs microtubules into filopodia when axons branch (Xie et al. 2007)).  That being an example of the kind of process that will certainly prove to be dysfunctional in some already known human neurodegenerative diseases. The kind of process that might one day be targeted by a drug therapy.

This rational experimentation in yeast to find out how septins work provides a textbook example of how the NIH, and within it, the NIGMS, keeps faith with the public.  By enabling young scientists to study basic biology, the work the NIH supports helps generate a stock of ideas and truths that will provide many of the hypotheses that power more applied research in mammalian organisms, and that work will enable development and validation of therapeutic interventions that reduce morbidity and mortality in humans.

Notably, this kind of work can be done and insights can be gained by motivated individual graduate students working in a well-understood, simple and highly tractable experimental organism.

Rodney mentioned something to the effect that it was a shame that work of this type– this marvelous, high-tech-but-accessible, tool-dependent, graduate-student-powered, 21st century small science, was no longer going on in E. coli.

And then imagined the counterfactual in which work on K-12 (and but let’s also imagine work on other coli strains, related species, and their phages) had been sustained at the levels of the 1960s-1970s-1980s.  And imagined that a much larger number of processes either exclusive to prokaryotes or that operate very differently in prokaryotes were under study by small labs armed with 21st century methods, in the same way yeast septation is now.  And then imagined that big genetics meetings featured talks on this kind of work by graduate students who were interacting with other graduate students working on other now poorly understood processes basic to the life of the same organism.   The assertion was (and is) that the antibiotic pipeline would not now be dry.

Restated, the semi-collective decision… in the mysterious ways those decisions get made…. by funding agencies and by scientists… this decision to scale down this sort of small lab work E. coli was premature.  It meant that questions that in prokaryotes of equivalent complexity to those being pursued by students in small labs in working on yeast are not being pursued in bacteria.  This retreat from E. coli was at least 30 years premature.   And humankind is now paying the price for that mistake.

I don’t expect very many readers to accept the above as a bare assertion, and there are some arguments against it that bear discussion, but to flesh out the argument and consider possible counterarguments I’ll first need to lay some groundwork.

One set of ideas I’ll need to introduce (and support) will be the shooting-fish-in-a-barrel criticisms of individual assertions advanced by soi disant systems biologists about how biological systems operate. There are a number of these these and it will take some time to lay them out.

Then, to illustrate the lost opportunity, to make the case that 25 years of application of current methods to E. coli would likely have generated ideas that would already have led to better antibiotics, so that human morbidity and mortality due to infectious disease might be significantly lower than it is now, I will need to introduce and support a second set of ideas.  These ideas go back to the 1990s, that time so many highly technological approaches to understanding biology seemed possible.  Starting from that time, I will need to give examples of real discoveries made and insights had, things that the techno-optimists and systems biologists did well and got right.

After that, I hope to be able to support a more sophisticated discussion of the different kinds of experimentation that leads to knowledge of function, and whether the current allocation of money and attention to the different methods is optimal for scientific knowledge or for human felicity.

And then to return to the counterfactual world in which a 21st level of understanding of dozens of processes specific to prokaryotes would be giving the industrial biotech and pharma structure a larger stock of ideas, and in which industry would be delivering new classes of antibiotics as one subset of the new small-molecule therapeutics that it knows how well to make.

It will likely take some months to introduce these topics.

References

Rothstein and Rothstein lab. https://systemsbiology.columbia.edu/faculty/rodney-rothstein, http://www.rothsteinlab.com

Mcquilken.   https://www.researchgate.net/profile/Molly_Mcquilken

Gladfelter lab.  http://www.dartmouth.edu/~gladfelterlab/

Abrahamsson, S., McQuilken, M., Mehta, S. B., Verma, A., Larsch, J., Ilic, R., Heintzmann, R., Bargmann, C. I., Gladfelter, A. S., and Oldenbourg R. (2015) MultiFocus Polarization Microscope (MF-PolScope) for 3D polarization imaging of up to 25 focal planes simultaneously.  Opt Express. 2015 Mar 23;23(6):7734-7754. PMID: 25837112

Oldenbourg lab.  http://www.mbl.edu/bell/current-faculty/oldenbourg-lab/

DeMay, B. S., Noda, N., Gladfelter, A. S., and Oldenbourg, R.  (2011)  Rapid and quantitative imaging of excitation polarized fluorescence reveals ordered septin dynamics in live yeast.  Biophys J. 2011 Aug 17;101(4):985-994. PMID: 21843491

Xie, Y., Vessey, J. P., Konecna, A., Dahm, R., Macchi, P., and Kiebler, M. A (2007). The GTP-binding protein Septin 7 is critical for dendrite branching and dendritic-spine morphology.  Curr Biol. 17(20):1746-1751. PMID: 17935997

 

One topic that will merit multiple attempts to address is the limitations to the utility of ideas and metaphors from engineering for understanding the function of biological systems. My starting stance on this is that when exploring the unknown (for example, the quantitative function of intracellular signaling systems) ideas that are imported from other spheres, and their attempted use as metaphors qualifies as among first and best approaches researchers can take.

Moreover, any first use of a term as a metaphor, to compare truths from one domain to knowledge to knowledge in another, is by definition a creative act, one that can lead to new understanding.

But to move from possibility to reality, to actually generate new knowledge, the metaphor-wielder must actually test the idea, not merely assert it as truth.

When people became enthusiastic about the application of ideas from engineering to understand biology (Hartwell et al. 1999, Brent 2000), I think most of us imagined that the ideas would be tested, to see if they led to new knowledge, or not (Brent 2004). I doubt anyone imagined that the such concepts might be imperfectly understood, then asserted, then and re-asserted until meaning was lost.

As a first example of how use of metaphors can fail to generate knowledge, I’ll start with the term robust. And note that a single post can’t begin to convey all of the lack of enlightenment spread by its unquestioned repeptition.

Colloquially, robust means something like rugged. In biology, its use is old, and broadly consistent with colloquial its colloquial meaning. For example, it sometimes stands in opposition to gracile. Our Australopithecus great-great-grand parents are said to be gracile, while our Paranthropus great aunts and uncles were robust, more heavily built, and incidentally able to crush roots and seed grains with their teeth.

For systems biology, I suspect that term robustness came in from control theory. Here, by the 1990s, the term had a fairly precise technical meaning: a control system was said to be robust if its stability (return to starting values), and some level of its performance, were guaranteed in the face of changes or lack of certainty in some system variables within defined ranges (see for example Zhou and Doyle 1997). Note at this point how different this particular meaning is from either the colloquial sense of the term or the pre-existing biological meaning.

In the early crossover uses in biology in the 1990s we see the term robust being used precisely, for example by Leibler and his coworkers (Barkai and Leibler 1997, Alon et al. 1999). At this point there is nothing of the metaphorical at all. When these authors use the term robust, they mean by it what control theorists say it to mean.

But then the other connotations of the term begin to take over, and the metaphor began to take flight. We see early flapping of wings in a review by Hiroaki Kitano (2004), in which robustness “is one of the fundamental and ubiquitously observed systems-level phenomena that cannot be understood by looking at the individual components. A system must be robust to function in unpredictable environments using unreliable components.” Not all ways biological systems that can function in unpredictable environments involve an engineer’s conception of robust control (for example, some biological systems do not return to a baseline, but deviate afterwards from where they began by continued development or by physically moving their entire fleshy envelopes to more predictable environments).

But a greater problem is with the widely shared assertion captured in the 2004 Kitano review, that there is a thing called robustness ubiquitous to biological systems that robustness enables them to function with unreliable components. An ability to function with unreliable parts was not part of the original definition of robust control. It crept in, during or at least around, the time of its importation into biology.

But I believe still more important problem is how the far this assertion of robustness deviates from the colloquial meaning of the word. The result is that the word, by pushing an interesting idea and asserts a statement about the natural world, ignores known truths.

Consider one particularly important, and yet unreliable, biological “component”, DNA. Consider how deeply unrobust a cell’s chromosomal DNA is. It’s easily broken by the tiniest sheer forces, yet it needs to be unwound at frightening speeds, replicated, and stuffed into descendent cells. Its sequence needs to be accurate but one of its bases, cytosine, is so vulnerable to deamination that it made evolutionary sense in vertebrates to generate the male gamets outside the body in cells a few degrees cooler. And, in haploid cells in G1, a single unrepaired double strand break, as might be caused by a single cosmic ray, is death. There is nothing at all here that matches any natural language meaning of the term robust.

For DNA, the idea that the system has a general property, that it is robust, in the colloquial sense of being rugged, is, all things considered, false.

Moreover, for DNA, the idea that there is characteristic robustness, an “emergent”, property of the biological system, in the not-actually-from-control-theory sense allowing “functioning with unreliable components”– that idea also seems mostly. Rather, to protect the fragile DNA molecule, to ensure its accurate replication and segregation into descendent cells, and to protect the soma of from the consequences of fragility, there are dozens of known systems. Repair systems. Proofreading systems. Error-detecting systems. Freeze-until-you’ve-corrected-the-error systems. Suicide-if-continued- nonrepair systems. These known systems use known molecular entities to carry out processes  These processes and entities have names. Repair. UvrA. Proofreading. DnaQ. APC/C. Checkpoints. p53. Apoptosis.

It is from study of these processes and named entities, rather than any wishful assertions based on understanding of control theory or other disciplines, that all epistemically useful statements of how cells can be “robust” or resistant or resiliant to DNA damage have so far come

References

Alon U, Surette MG, Barkai N, Leibler S (1999) Robustness in bacterial chemotaxis. Nature 397: 168–171 PMID: 9923680

Barkai N, Leibler S (1997) Robustness in simple biochemical networks. Nature 387: 913–917  PMID: 9202124

Brent, R. Genomic Biology, Cell, 2000, 100, 169-183.  PMID: 10647941

Brent, R. A partnership between biology and engineering. Nature Biotechnology 2004, 22, 1211-1214.  PMID: 15470452

Carlson, J. M. and Doyle, J. (1999).  Highly optimized tolerance: a mechanism for power laws in designed systems. Phys. Rev. E 60, 1412–1427.

Carlson, J. M. and Doyle, J. Complexity and robustness. Proc. Natl Acad. Sci. USA 99 (Suppl 1), 2538–2545 (2002).  PMID: 11875207

Hartwell LH, Hopfield JJ, Leibler S, Murray AW. (1999).  From molecular to modular cell biology. Nature. 1999 402(6761 Suppl: C47-52. PMID: 10591225

Kitano H. (2004) Biological robustness. Nat Rev Genet. 2004 Nov;5(11):826-837. PMID: 10591225

Zhou, K. and Doyle, J.  (1997) Essentials of robust control. Prentice Hall.   ISBN 0-13-525833-2

 

I just mentioned an unexpected negative consequence of scientific optimism about new methods– an instance in which optimism led scientists to race to generate, journals race to publish, and companies to race to capitalize buggy data, and how those developments precluded later generation of better data.

I raised this example because one aspect of 2016 is a wave of seemingly starry eyed optimism about CRISPR / Cas9. This is a case where a technical improvement really is huge and has already sparked a wave of subsequent technical invention that is far from peaking. At such times, we might be tempted to imagine that the technology will solve every problem.

One of the things it’s overall not going to solve, at least in the way now being hyped, is reporter genes.

As Doudna and Charpentier envisioned, CRISPR/ Cas9 provides unparalleled ability to fuse coding sequences of genes to useful tags, including affinity purification tags (Savic et al. 2014), degradation tags (eg. Park et al. 2014), and fluorescent protein (XFP) tags (Paix et al, 2014). With strains and cell lines containing XFP tagged proteins in hand, researchers can quantify protein measuring fluorescence at the wavelength emitted by the XFP.

Since the 1960s, scientists have, by various means, including means that predate recombinant DNA itself, fused different promoters to the coding sequences of genes whose protein products are easily quantified. constructed reporter genes.   Starting with the early constructions in the bacterial chromosome or on live or defective bacteriophages such as the φ80 trp-lac fusions in E. coli (Miller et al. 1970), scientists have used expression from reporters in qualitative genetic experiments to understand how mRNA synthesis from the fused promoter is regulated and to isolate enlightening mutants.  A lot of biological experimentation still depends on reporter genes.

As with the use of large scale protein interaction in the ’90s and ’00s to mobilize investment capital, I wouldn’t be surprised if some of the attention to using wholesale gene fusion via CRISPR/Cas9 to undertake comprehensive surveys of gene expression wasn’t due to this particular application being driven, somewhere, by a desire to raise money for some company.

But even far from Sand Hill Road, the same people who are agog with the possibility of whole-genome protein expression tagging by CRISPR/ Cas9 may not have considered how important it is to get reporter genes whose expression very precisely mirrors that of the promoter they are studying

This whole-genome stock-price-related issues is on my mind because my lab is trying to uses reporter genes in one particular limit application, to help sort out that part of variation in gene expression that is due to differences in signal from different signaling systems that reaches various promoters. To measure this variation, we need to minimize other sources of experimental error that we might falsely interpret as meaningful variation.  The way to handle this is for the constructions to be identical in all but the differences among those promoters. Operationally, this means same protein, same sequences governing ribosome binding, same 3′ UTR, 5′ UTR as similar as one can get it, same number of gene copies, and– this is an important part– all assayed genes parked in otherwise-genetically-identical organisms at the same chromosomal sites.  Otherwise we know (Pesce et al. 2016) that the amount of variation in reporter expression can be impacted, for example by colliding RNA polymerases coming in from the opposite strand (Saeki and Svejstrup 2009) increasing gene expression noise).

Put more simply, to understand of the causes and consequences of variation in gene expression, that we need to find, control for, and eventually extirpate confounding contributions measurement of this variation.

And yet, a prediction. There will be a lot of gene expression data generated from collections of CRISPR/Cas9-generated fusion proteins that fuse fluorescent reporter genes to different coding sequences. These proteins will be expressed from sites scattered all over the relevant genomes. The promoters will be different, as will 5′ untranslated mRNAs, the 3′ UTRs, and the protein coding sequences. mRNAs and proteins will be transcribed, translated and degraded at different rates. There will be variation in measured signal to promoter due to other polymerase II transcription complexes blundering through the same coding sequences. Studies using hese gene fusions studies will be hyped by the people who carry them out, and hyped by the journals that publish them. Results will include important qualitative conclusions. Results will also contain quantitative data, which will be used by certain kinds of computational biologists to generate many hundreds or even thousands of publications that few will further appreciate. And, as with protein interactions, once published, these badly controlled gene fusion studies will make it more difficult to carry out more careful work.

I don’t see a moral to this, except maybe the old one that the pursuit of the new, the shiny, the purportedly transformative sometimes acts to make more difficult the pursuit of longer term questions. Scholars were probably complaining about the pursuit of the shiny and its distorting effects in the 800s (Francoise 1996, Al-Khalili 2011). So that part is not historically unique.

On the other hand, as we saw last year in the rise and fall of the myth-fullfiling, TED-talking, Lisa Holmes (Carreyrou, 2015), the robustness of the contemporary connection between a good transformative story and the ability to access capital to start companies is historically unique.

But this is the institutional landscape in which we now do our science.

References

Al-Khalili, J.  (2011) The House of Wisdom: How Arabic Science Saved Ancient Knowledge and Gave Us the Renaissance, New York: Penguin Press, ISBN 9781594202797

Carreyrou, J. (2015) Hot startup Theranos has struggled with its blood-test technology.  Wall Street Journal, 15 October 2015.

Francoise, M.  The Scientific Institutions in the Medieval Near East, pp. 985–1007 in Rashed, Roshdi; Morelon, Régis (1996). Encyclopedia of the History of Arabic Science: v.3 Technology, alchemy and life sciences. Routledge. ISBN 9780415020633.

Miller, J. H., W. S. Reznikoff, A. E. Silverstone, K. Ippen, E. R. Signer and J. R. Beckwith (1970) Fusions of the lac and trp Regions of the Escherichia coli Chromosome. J Bacteriol 104: 1273-1279.   PMID: 16559103

Paix A, Wang Y, Smith HE, Lee CY, Calidas D, Lu T, Smith J, Schmidt H, Krause MW, Seydoux G.  (2014).  Scalable and versatile genome editing using linear DNAs with microhomology to Cas9 Sites in Caenorhabditis elegans.  Genetics. 198(4):1347-1356. doi: PMID: 2524945

Park A, Won ST, Pentecost M, Bartkowski W, Lee B (2014).  CRISPR/Cas9 allows efficient and complete knock-in of a destabilization domain-tagged essential protein in a human cell line, allowing rapid knockdown of protein function.  PLoS One. 2014 Apr 17;9(4):e95101.  PMID: 24743236

Saeki H, Svejstrup JQ (2009).  Stability, flexibility, and dynamic interactions of colliding RNA polymerase II elongation complexes.  Mol Cell. 35(2):191-205.

Savic D, Partridge EC, Newberry KM, Smith SB, Meadows SK, Roberts BS, Mackiewicz M, Mendenhall EM, Myers RM.  (2015).  CETCh-seq: CRISPR epitope tagging ChIP-seq of DNA-binding proteins.  Genome Res. 2015 Oct;25(10):1581-1589. PMID: 26355004

Pesce Pesce, C. G., Zdraljevic, S., Peria, W., Rockwell, D., Yu, R. C., Colman-Lerner, A., and Brent, R. Cell-to-cell variability in the yeast pheromone response: high throughput screen identifies genes with different effects on transmitted signal and response.  Still unpublished.

 

 

One of the many attributes of the 1990s no living adult human may see again was the intensity of the optimism. The optimism extended to many aspects of the human experience, and of course to many things in science.  This optimism was applied to high-throughput (then called “Functional Genomic”) methods could do for an understanding of biological function.

What could have been apparent, but wasn’t, was the fact that a first use of more primitive high throughput methods would be to generate boluses of buggy data. And that said boluses would then be published in brand name journals. And the effect of that publication of that data would be to diminish the ability to gain the money needed to carry out more careful work and to reduce the chance that such work would be published in brand name journals, or even published at all.

In this post, I am going to recall a particular instance of optimism from the 1990s. I am mentioning this because the scientific community is still in a wave of optimism based on CRISPR/ Cas9. Here, of course, the power of CRISPR/ Cas9 is not overstated, the optimism is justified. But CRISPR/Cas9 methods are now close to helping generate one kind of buggy high throughput functional data. Because that seems imminent, I’d like to mention a case of optimism-aided loss-of-promise from the recent past.

In this case, the locus of optimism was information about protein interactions. Development of two hybrid methods allowed identification of protein-protein interactions. Development of reliable strains and reporter genes, tests for good baits, the availability of good prey libraries, and the high throughput provided by interaction mating allowed widespread use of two-hybrid to generate immediately useful experimental findings: identification of previously unknown partner proteins, and identification as partners of already-known proteins that were not known to interact.

This success led to optimism that study of the knowledge of the pattern of interactions among genome-encoded proteins — not just of their identity– could also give insight into gene function (Brent and Finley, 1994). And that one should scale up interaction mating to try to detect as many interactions as possible, scaling up to all the interactions among proteins encoded by entire genomes. Beginning with Saccharomyces cerevisiae but extending to other model organisms targeted by the genome sequencing efforts as those sequences became available. It took at some years for the idea that information about protein function might inhere in the patterns of interaction to gain traction, but that idea became finally gained some acceptance during the early 00’s.

By which time some of the initial emphasis on caution and care in generating the data, and in thinking about what the data could tell, had evaporated.

One waypoint from this period was the publication, in late 2003, in Science, of an incomplete map of protein interactions among genome encoded proteins in Drosophila (Giot et al. 2003).  It covered about a third of the proteins encoded by the Drosophila genome.  The work was performed in ways that ensured that it failed to detect many interactions; that is, it was performed using methods that resulted in many false negative interactions.

Much of the work in the Science paper was done by a [now defunct] publicly traded biotech company, Curagen. It is tempting to speculate that the decision to publish might have been influenced by a desire to influence the stock price. However, the publication of the incomplete work did not seem to coincide with any change in the overall downward trend of the firm’s stock price from its high of $113 / share in March 2000.

Curagen 3

So, leaving this speculation to one side… sadly, after the publication of the incomplete interaction map in Science, neither Science, Nature, nor Cell ever published a more complete Drosophila interaction map.

This history suggests that the first publication may have precluded subsequent high profile publication of more complete work. It’s also consistent with a corollary idea, that perception that there could not be a second high profile publication diminished some of the enthusiasm for careful subsequent work.

As it happens, a second, more careful second interaction map was published later, in a much lower profile journal.

There was almost no overlap between these interaction maps.

The lack of overlap was anticipated by direct experiment, and the reasons it might arise were identified and articulated (Stanyon et al., 2004).

Those articulated lessons were not heeded.

Notably, both groups expressed the hope that their work be a basis for future understanding.

In fact, the optimism-imbued buggy protein interaction did become a basis for future work. It became grist for 1000s of quasi mathematical computational biological studies of protein-protein interactions, networks, and other esoterica.

However, more than ten years later, it’s hard to point to strong and correct findings from these computational studies and it is possible to identify assertions from them that are, as they say, more false than true and which I hope to make subjects of later posts.  For now, let me simply say that the large scale buggy protein interaction data during the 00’s did launch future computational work and published papers, but it probably did not contribute much to new scientific understanding.

Might the contribution of this data to new knowledge have been greater if circumstances had allowed Drosophila protein interaction data to have been generated more carefully?

References

Finley, R. L. Jr., and Brent, R. (1994).  Interaction mating reveals binary and ternary connections between Drosophila cell cycle regulators.Proc Natl Acad Sci U S A. 1994 Dec 20;91(26):12980-12984. PMID:7809159

Giot L, Bader JS, Brouwer C, Chaudhuri A, Kuang B, Li Y, Hao YL, Ooi CE, Godwin B, Vitols E, Vijayadamodar G, Pochart P, Machineni H, Welsh M, Kong Y, Zerhusen B, Malcolm R, Varrone Z, Collis A, Minto M, Burgess S, McDaniel L, Stimpson E, Spriggs F, Williams J, Neurath K, Ioime N, Agee M, Voss E, Furtak K, Renzulli R, Aanensen N, Carrolla S, Bickelhaupt E, Lazovatsky Y, DaSilva A, Zhong J, Stanyon CA, Finley RL Jr, White KP, Braverman M, Jarvie T, Gold S, Leach M, Knight J, Shimkets RA, McKenna MP, Chant J, Rothberg JM.  (2003).  A protein interaction map of Drosophila melanogaster.  Science. 302(5651): 1727-1736.  PMID: 14605208

Stanyon CA, Liu G, Mangiola BA, Patel N, Giot L, Kuang B, Zhang H, Zhong J, Finley RL Jr.  (2004).  A Drosophila protein-interaction map centered on cell-cycle regulators.  Genome Biol. 2004;5(12):R96.  PMID:15575970

Formstecher E, Aresta S, Collura V, Hamburger A, Meil A, Trehin A, Reverdy C, Betin V, Maire S, Brun C, Jacq B, Arpin M, Bellaiche Y, Bellusci S, Benaroch P, Bornens M, Chanet R, Chavrier P, Delattre O, Doye V, Fehon R, Faye G, Galli T, Girault JA, Goud B, de Gunzburg J, Johannes L, Junier MP, Mirouse V, Mukherjee A, Papadopoulo D, Perez F, Plessis A, Rossé C, Saule S, Stoppa-Lyonnet D, Vincent A, White M, Legrain P, Wojcik J, Camonis J, Daviet L.  (2005).  Protein interaction mapping: a Drosophila case study.  Genome Res. 2005 Mar;15(3):376-384.  PMID:15710747

 

Nova Mundani Systematis Hypotyposis

Yet another proposal for organization of the solar system. Here, the the Moon orbits the Earth, Mercury, Venus, Mars, Jupiter and Saturn orbit the Sun, and the Sun orbits the Earth and Moon. From De mundi ætherei recentioribus phænomenis, Tycho Brahe, (Uraniborg, 1610) Q.8.14, p.189, Magdalen College Library, Oxford University, Oxford, UK.

This blog is to supplement my published work and academic website with review and commentary on current topics in contemporary biology. One broad set of topics I hope to address concerns the kinds of knowledge a 21st century understanding of cellular and organismic biological function will need to consist of. A second is the epistemic tools (eg., particular molecular methods and paradigmatic experimental designs) that researchers would use to gain such knowledge. A third is occasional essays on scientific or policy directions on which I have expertise not relevant to the first two goals. I may add other topics in future.

In 2016, some of the last decade’s ideas about how blogging might permit new kinds of journalism and exchange of ideas now seems naive (DeLong, 2008van der Werff, 2015).   However, I’m impressed by the continuing contribution of this medium to some academic discourses. In biology, these include the use of blogs to explain how research works, extend the reach of science journalism, and to identify scientific misconduct (including the now closed science-fraud.org). They also include timely positive contributions to discussion of the scientific literature (Rosie Redfield, Derek Lowe) during a historical epoch in which (for some topics I care about fiercely) the reliability and quality of this literature is very, very, low.  But I am most impressed by the use of blogs by contemporary economists to articulate and test ideas more quickly than could be done by exchanges in published literature.

For this experiment in blog writing, my biggest inspiration has been Lior Pachter’s site on computational biology (whose layout I have imitated, at least to start).  But I am also impressed by work in this medium by number of macroeconomists, including Simon Wren-Lewis, Larry SummersPaul Krugman, and most particularly Brad DeLong.

Policy: All comments must include a valid email address in the appropriate box (however, people who comment are welcome to use pseudonyms). I will reject comments I find inappropriate.  This blog in 2016 and succeeding years is Copyright Roger Brent.  Visitors are free to link to any of the content and to reproduce it so long as they reference it properly (publication citation, url, etc.)

References

de Long http://www.bradford-delong.com/2008/11/homesteading-th.html and http://www.bradford-delong.com

Krugman http://krugman.blogs.nytimes.com

Lowe http://blogs.sciencemag.org/pipeline/

Pachter https://liorpachter.wordpress.com/

Redfield http://rrresearch.fieldofscience.com/

Summers http://larrysummers.com/category/blog/

van der Werff http://www.vox.com/2015/8/6/9099357/internet-dead-end

Wren-Lewis https://mainlymacro.blogspot.com