Half the DNA on the NYC Subway matches no known organism

The results of a massive new DNA sequencing project on the New York City subway have just been published. And yup, there's a lot of bacteria on the subway—though we know most of it is harmless. What's really important, though, is what we don't know about it.

The PathoMap project, which involved sampling turnstiles, benches, and keypads at 466 stations, found 15,152 life-forms in total, half of which were bacterial. The Wall Street Journal has created a fun, interactive microbial map of the subway out of the data, showing where on the lines the bacteria "associated with" everything from mozzarella cheese to staph infections was found.

But "associated with" is a pretty fuzzy term that runs up against the limits of science. In the past few years, genetic sequencing has become vastly more powerful and cheap, making metagenomic analyses possible. This means we can take all of the DNA in an environmental sample—human, plant, bacteria, cockroach, whatever—and sequencing the hell out of it.

The problem, though, is that our genetic libraries are still incomplete. For example, if I don't know what the DNA sequences of a cockroach look like, how can I know my DNA sequence belongs to a cockroach? That's how why half the DNA found in the project matched no known organism.

This is especially true when it comes to bacteria that are being discovered for the first time in these new metagenomic analyses. And what does "associated with," when it comes to bacteria, really mean? Maybe we found a certain bacterium on cheese once, but maybe we never sampled its true native habitat?