Evolving Thoughts: April 16

Saturday, April 22, 2006

Races, geography, and genetic clusters

Every so often, a correspondent asks me about something I haven't really given much thought to. Having had the question raised, I am attuned to things I might otherwise have passed over. You know how that happens?

One person asked me for my view on human races. My default view was that of Richard Lewontin's - that race is a biological construct [RealAudio stream], and that biological differences between "races" is less than within-"race" difference. But I hadn't given it much more thought than that. So I was rather surpised to read this on the Gene Expression blog, with which I normally find myself in agreement:

represent each individual's genome as a point in a space of extremely high dimension, and define a race as a set of points whose distance from each other is less than some radius. These clusters map onto intuitive self-identified race with a very high degree of accuracy.

The idea of mapping genetic differences as coordinates in a space of genetic alleles in a genome is old. It was originally proposed by Sewall Wright, as a "space of genetic recombination", which became the "adaptive landscape. Clustering genomes in a genomic space is a way of identifying groups that share ancestry, although a similar approach in taxonomy known as phenetics failed because of several problems - the clusters were often not conserved when the variables changed, and the division was fundamentally arbitrary. Some proposed a 95% rule, others a 70% rule, and so on.

The paper, by Tang et al. 2004, linked to in the quote is by 12 authors from a number of institutions of repute, though, and it cannot be dismissed. The context is the medical problem of identifying genetic diseases by race. Some diseases, such as sickle cell anemia, are strongly "racial" in distribution, largely because the diseases come from a particular geographic region. And other authors, particularly Luigi Cavalli-Sforza and Marcus Feldman at Stanford have argued that genes and cultural ethnicities, such as language, often covary. So perhaps there is something to race.

Racial classifications such as found in the United States derive from the writings of 18th century biologist Johann Blumenbach in 1775, in which he assigned both physiological and psychological characters to five "races": Caucasian, Mongoloid, Malay, American, and Ethiopian (which became known as "Negroes"). In other countries, this was not so tightly adopted in the social fabric usually. In Australia, we have what is known as racism, but no great attachment to these categories, for example. Racism is pure xenophobia, irrespective of racial classification. In the historical context of American society, "Hispanic" is added, although 42% of the Latino social group failed to self-identify on the 2000 census due to their admixture of American, African and European ancestry (see this paper, to which we shall return).

The paper by Tang et al. takes several alleles and maps them onto a genome space and finds that the self-reported racial groups match the genetic clusters very well. Others have criticised this sort of work as being statistically questionable and subject to all kinds of artifacts - I'm not competent to discuss this, but I'll link to the papers for those who are.

What I find worrisome here is that it is another example of lumping versus splitting. Begin with a set category structure, and you can find covariance. Try to derive a covariance out of the data, though, and you get a much higher degree of differentiation. Perhaps we find covariance between genes and alleles races because we set out to.

That the human species has geographic variation is not at issue. It clearly does, as Feldman's colleagues have argued. It just doesn't support the standard racial typology. Alleles had to evolve and spread somewhere. But they do spread, too. The human species (convention makes me want to type "human race") is massively interbreeding. A friend, Marc Buhler, did his PhD on an allele that shows gene flow from the Vikings to the Ashkenazi Jews in the middle ages. And more recently, geneticist Alan Templeton has argued that there are at least three major "out of Africa" migrations, and at least one "back to Africa" event.

So while ancestry is a useful way to classify species (because species are isolated gene pools, most of the time), it is rarely a good way to classify populations within species. There are haplotype groups in many species, including humans, and in some species, such as the California seal, this shows geography fairly well. But not in humans. We move about too much.

So, do I think there are races in biology as well as culture? No. Nothing I have seen indicates that humans nicely group into distinct populations of less than the 54 found by Feldman's group (probably a lot more - for instance, Papua New Guinea is not represented in their sample set). And this leads us to the paper by the Human Race and Ethnicity Working Group (rare to see a paper that doesn't list all the authors). They rightly observe that while there are continental differences in genetics, there is no hard division, and genetic variation doesn't match up with cultural differences per se. There is a genetic substructure to the human population, but it isn't racial.

"Race" is a difficult term. It was invented to account for within-species groupings in the early days of modern taxonomy. Other terms for it are "sub-species" (which doesn't mean a group that is less valuable or advanced than the species as a whole, but just a part of the species), "breed", "variant", "cultivar" (in botany), and so forth. Blumenbach's value-laden scheme, however, makes it a matter of social valuation. And this is magnified by the social inequalities of so-named races. No biologist would identify Hispanics as a race under any circumstances, for example. They are identified purely in terms of social factors, like migration, social status and political influence.

Within-species groupings are, evolutionarily speaking, ephemeral. Ten thousand years ago, almost none of the non-African races existed. Ten thousand years from now, almost none of the modern races will continue to exist, I warrant. And Africa is so genetically diverse (being the source of all genetic variation that hasn't evolved in the past 60,000 years) that one cannot fairly call "African" (or Ethiopian) a single group. While it is true that some alleles, like the sickle cell allelle that confers a measure of resistance to malarial complications, have geographic origins, they do not mark out races. Ironically, this was pointed out in terms of physiological, or as we would now call it, phenotypic, traits by a contemporary critic of Blumenbach's, Buffon. The more things change...

Philosophy of biology blog gone

The Philosophy of Biology blog is dead. Mike Sprague, who was the principal, is leaving the field and couldn't find anyone to take over the blog at FSU. So I have removed it from my blogroll.

Thursday, April 20, 2006

When is a species worth conserving?

It's not often one gets to see a dustup between taxonomists in the media. After all, taxonomy is such a civilised discipline, usually nobody gets killed and hospitalisation is rare. But here, in the Fort Wayne News Sentinel, is a piece on a match between Rob Roy Ramey, and Tim King, both field biologists, over the status of a mouse.

What's at issue is its status under the US Endangered Species Act, which focuses on conservation in terms of species rather than ecosystems, biodiversity, or viability of broader ecological systems. Ramey denied that a previously listed endangered species, Preble's meadow jumping mouse (that's some mouse, if it can jump entire meadows!) was in fact a species at all, but rather a subspecific population. King reanalysed the matter and decided that yes it was a species. And what is more, it was intimated, according to the article, that Ramey was "politically tainted" (which is code for, a Republican shill).

I'll get to the politics in a minute, but the taxonomic issues are of interest to me. The article talks about the "lumpers versus splitters" dispute in taxonomy, and wrongly attributes the recognition of this to Darwin (it predates him enormously, probably as long as naturalists have been describing species). A lumper blends variation into single taxa, while a splitter finely discriminates variants into their own taxa. There are lumpers and splitters at all levels of taxonomy, from superkingdoms down to subspecies, but the usual argy bargy is about species. Transitional forms have always been a problem. One anecdote about a student of Agassiz, Nathaniel Shale, tells of him stomping on transitional shells and exclaiming "That's the way to treat a damned transitional form!", in the late nineteenth century. One major problem of the nature of living things is that, well, they vary enormously and over gradients (morphoclines). Drawing the boundaries is tough, and sometimes a matter of convention.

Ramey disputes that the studies are really at odds. He says the differences come down to a question of how one interprets the data and where one chooses to draw lines between species. King draws them extremely finely; Ramey, less so. "What species concept you apply determines how you allocate your resources," Ramey told me. "We have so many things listed and too few resources to get the job done. We could go down the road of saying that every local population segment is a listable subspecies. But can we afford it, and will we be shortchanging arguably more important species? We can end up saving lots of little fish in each creek, and lose those creatures that are really unique."

And here is the problem. It's a matter of triage, due to limited economic resources. But focus on the "species concept" question. There is only one species concept - what is at issue here is the definitions and associated techniques and criteria for identifying a species. With the rise of DNA-based identification techniques, including the much-touted and much-criticised "DNA barcode", it is either the case that the genes used overgroup (lump) or undergroup (split) depending entirely on the evolutionary genetic history of the organisms. So the issue is really which serves the purposes best. In older days, species identification was based on phenotype, or in the older terminology, morphology - body shape, skeleton form, organs, and so on. This was convenient, but despite the mythology you'll find in some texts, it wasn't the whole story - naturalists before the modern period knew very well that there was variation of form, and that the identification keys were just conveniences. But some over-reliance on these techniques, hallowed by time and authority, led to bitter disputes. We see the same things happening today, only based on choices of molecular data rather than phenotype.

But what purposes are served by identifying species in conservation biology? It would seem that species is the focus because there is something objective about species, and so it validates the choices made for conserving biodiversity. If the diagnosis of species is assay-relative, that is, if it depends on what you use as the identification, the choice of assay is crucial. And that choice can be made to serve nonscientific purposes as well as scientific ones.

There is a movement, apparently successful (of which it seems Ramey is a part) to have the ESA rewritten. A movement known as the National Endangered Species Act Reform Coalition has successfully lobbied Congress to allow more political interference and economic considerations in the conservation process. The coalition includes such bodies as

American Farm Bureau Federation
American Forest & Paper Association
American Public Power Association
Colorado River Energy Distributors Association
Edison Electric Institute
Mid-West Electric Consumers Association
National Association of Counties
National Association of Home Builders
The National Grange
National Marine Manufacturers Association
National Rural Electric Cooperative Association
National Water Resources Association
Northwest Horticultural Council
Tri-State Generation and Transmission Association

all of which have vested interests in the outcome. If the Secretary of the Interior can overrule the Parks and Wildlife Service, based on whatever "scientific information" (the new wording of the revised act HR3824, the "Threatened and Endangered Species Recovery Act of 2005," which has passed the House of Representatives in the US, and is now before the Senate for ratification), economic interests will come to predominate.

Now this is not entirely a bad thing, for two reasons. First, it is entirely true as Ramey said that limited resources should be used to best conserve ecosystems, and second, that without the full support of the local communities and businesses, conservation is as doomed in the US as it is in the Congo. What worries me is that the tenor of the change indicates that this is not the motivation, but that this is a smokescreen for the undercutting of science that has been seen elsewhere by the present administration. All you have to do is change what is used to identify the species, and you can lump or split to serve political purposes.

Something like the Preble's mouse issue happened once before. The Red Wolf had been listed in 1967 as endangered under an earlier version of the ESA, but it turned out that as numbers declined, they hybridised with the more abundant coyote. A major debate followed, in which it transpired that there were no apparently unique alleles in the Red Wolf, and that it might not be a proper species. The Biological Species Concept was employed by those who argued that it was not a species, since it freely hybridised, so species concepts played a major role in that discussion too.

Part of the problem lies, I believe, in the focus on species. And that is a fundamental problem of the use of various, often arbitrarily chosen, measures of biodiversity. What really matters about biodiversity is the viability of entire ecosystems. At best, individual species are surrogates for that property. But nobody seems to be able to identify what that property really is. The problem is certainly with the ESA and equivalent legislation around the world, but the solution is hard to find. Some think there is no solution - biodiversity is just what we want to conserve in each particular instance. I think there is something worth pursuing here, and I have put in a grant application to follow it up. If it comes through, I'll certainly have more to say about this.

But it's nice to see my favourite species of biologists in the news. Taxonomy matters...

Wednesday, April 19, 2006

What it is like to make a bat

Sorry, philosophy fans, I'm not riffing off Nagel here. Instead, let me link you to that rat bastard Paul Mhgfs, who has another of his wonderfully written pieces on development and evolution in "How to make a bat" on Pharyngula - the second best science blog after the Journal of Irreproducible Results.

Myliz, whose name is unaccountably impossible to spell, makes a number of points about how bats evolved flight. In the process he notes that it is a mistake to think that

the genome encodes a blueprint of morphology. It doesn't; what it contains is a description of interacting agents that work together in a process to produce a complex result. Changes in genes and regulatory elements can essentially produce changes in rules of development, rather than crudely specifying blocks of morphology.

Amen, brother! Mahrz also notes that the growth of finger bones is relatively easy to achieve - just keep early cells in bone growth, regulated by the Bmp2 gene, dividing for longer. Of such variation, easy to generate by additional copies or underregulation of the gene, is natural selection able to make new morphologies. If you happen to be laboring under the misapprehension (or in Bushese, misundertaking) that genes are instructions, go read this.

I wish I could write like him. The bastard. In fact, I wish I were him, only I'd change my name.

Tuesday, April 18, 2006

No, I'm not going to do it

OK, I did it, but I'm not going to tell you what I got. Go do it yourself. Just don't tell me what your result is. It's bad enough all those Science Blog folk have to tell us...

[Hint: I'm part of the 15%]

Sunday, April 16, 2006

What is an abstraction?

Abstract objects are difficult to specify. In the Stanford Encyclopedia of Philosophy entry on abstract objects, Gideon Rosen notes that the distinction before the 20th century was primarily about words, and the general and particular distinction of the nominalist debate. In modern philosophy following Frege's theory of logic, abstractions become objects. Basically an abstract object is something that exists in virtue of being an objective property of universals. A universal, as I have discussed before, is something common to all actual objects ("particulars"). Nominalists think that universals exist only in the head. So how can there be abstract objects at all?

We aren't here concerned with any abstract object, but with the abstract objects of science, and in particular of biology. What numbers may be can be left for philosophy of mathematics. Nor are we concerned that some abstract objects can be family-resemblance based. In fact, in biology we might expect that many classes of organisms would be family resemblance classes; they are families after all. Phylogenetic classes are families - and some abstract objects of biology, such as "homology", are due to family descendance. Homologies are caused by by being descended from the "same" parts of ancestors. The problem lies in finding out what the sameness is.

One of the ways indicated by the Rosen article for explaining abstract objects is called by David Lewis the "way of abstraction", and it is this that is relevant here. We identify something that is common to more than one organism or biological phenomenon, and eliminate that which is unique to some subset (including each individual particular), and we call that the class. Another way to identify universals is the Non Spatiality Criterion, which has been axiomatised by Edward Zalta. According to Zalta, an abstract object is something that does not exist in space or time. But biological entities do exist in space and time, and any class, such as a taxon or homology is spatially bound - it exists in a given area, at a particular period in history. It is, as philosophers say, time and space indexed.

So homologies are concrete things. But the generalisations we make are in our heads, so they are also concrete things. It is only in our descriptions, that is, in our language, that they lack time indices. The confusion this sometimes causes leads people to think that form is an explanatory abstract object. The classical form versus function argument that one finds popping up again and again is based on this. Form is abstract. It explains nothing except the form itself. It is that it suits us to say that something common to all of the class is had by a particular object in biology, and so it shares all the concomitant properties of the form. Here I include such "forms" as equations like the Lotka-Volterra cycle, or the Hardy-Weinberg equilibrium. Exploring the properties of these abstract objects helps us, to be sure, but it is only that we understand these mathematical forms and can make inferences about the form (that the cycle is reaching its downturn for the prey species, say, rabbits, for instance, and so, if the L-V equation holds true, the predator species, say, foxes, will also peak soon). But if the foxes also eat voles as well as rabbits, then the L-V equation won't hold true, and nothing about the abstract model will help us find that out. What explains the actual dynamics is the physical causes - that foxes eat rabbits, which means they have the energy budgets to reproduce. The equation, or in other contexts, the morphology of the organism, is only a sketch of the real explanation, without the messy but necessary details.

As a side note, it is for this reason that I prefer the clumsy but specific nomenclature of cladistics. When one talks about a homology, it is unclear what the referent is, but if you talk about an autapomorphy or a symplesiomorphy it is precisely clear. They are still abstractions, but a given apomorphy is not - it's a leg length or a number of bristles, or whatever. Homologies often include homoplasies (convergently evolved traits, like a bat's and a bird's wing). As Gould notes, E. Ray Lankester, who coined "homoplasy", also suggested "homogeny" for homologies that were still the feature being compared (like the bones of the forelimbs of bats and birds), and it would have resolved a lot of confusion had that term been adopted.

Evolving representations

So, carrying on the line of argument from the Abstract and Concrete in Biology post, let us ask the obvious question: Why do scientific abstractions, or models, represent the world at all?

One way to approach this philosophically is to follow Russell and say that they denote. That is, the variables of the model have a true interpretation. But this doesn't help us here - it may be that they do, but why do they? Quine suggested that there is no answer to this really - whatever the variables are of our best scientific theories, those are what we must say exist. But, apart from a cryptic comment, he doesn't tell us how this comes to be, either.

The answer often given is one I have some sympathy for. That is, scientific theories evolve by a process very like natural selection. This view goes back to T. H. Huxley, who noted that scientific theories are engaged in a struggle for survival, and that the fittest survive, but it begs a question. We are using the model of natural selection to explain how theories evolve - but natural selection is itself a theory. Isn't this circular?

There is another problem with this view - one noted by philosopher Kim Sterelny. Selection can leave a population (in this case of scientists holding a particular view) stranded on a suboptimal fitness peak, unable to attain a better peak nearby (or distant). So if science is a selection process, why do we have warrant to think it has found an optimal ontology? Isn't this reason to think that science is basically just finding something that, to use a term of Herbert Simon's, satisfices rather than optimises our epistemic commitments? That way lies social constructionism.

Let's look at each of these in turn. First the petitio claim. Natural selection is basically a model, yes. As a model it implies that in any interpretation in which the conditions required for its outcomes hold, and in which nothing countervails, the results will be necessarily selective. It's a theory when applied to various cases of biology (and here of conceptual evolution), but the model is a fact of logic. Sometimes this is misleadingly said to imply that natural selection is a tautology, as if that were a bad thing. It is a tautology - something that is necessarily true. But it is only true when the conditions apply, and that relies on empirical observation. So while the bare model is tautological and therefore not informative (in one way - it certainly came as a surprise to those who developed it, from Adam Smith to Darwin and on), the application is not tautological. And we can see this by considering cases where it fails, such as random drift in cases of no selective pressure. Remember this - it will become important shortly.

So if we got a natural selection model by trial and error, or some other selection-like process, it doesn't matter. The origins of something don't affect its veracity. To claim that it does is to commit the Genetic Fallacy. If God handed it down on the third tablet of commandments, it would still be true. So we can dispose of the circularity objection.

Sterelny's objection is rather more serious. But there is a way around it, and it has to do with the adaptive landscape itself, as I discussed before on the work of Sergey Gavrilets. Any reasonable account of evolution in biology, and I'm going to say also of concepts and science, must understand that the number of variables in the adaptive landscape is very large. When this occurs, at a certain point, there are likely to be regions of interconnected ridges in fitness space that are not too much less fit than the peaks (Gavrilets calls these "large components"). Since a population of evolving entities will scatter around the peak, some will be able to drift at random to other peaks, even though they are all of high fitness (that is, strongly, but not too strongly, selected). This is like Wright's genetic drift, but doesn't require reasons like small population size to traverse the fitness valleys. In short, there are usually pathways from here to there.

If this is true, and I see no reason to doubt it in biology or scientific evolution, then we can expect that low-fitness theories will have been eliminated, and their ontologies with them. So we have a reason to think that the "surviving" ontologies are likely to denote, even if there is some slop or contradiction in elements of the competing theories.

Given the interconnectedness of scientific ideas - when geology and astronomy as well as chemistry and physics are all used in evolutionary hypotheses, for example - it is unlikely that this consilience is just an accident. So we can warrantedly believe that the variables of our best attested theories denote.

But that "best attested" is the kicker. Not all theories or explanations have the same degree of attestation. When we try to develop a model of striped animals, for instance, we are not faced with nice cleanly demarcated classes. We have to construct and test them, and there is a lag between testing and attestation. That is, we have to find out what are the best classes of things to explain, and this is a matter of iterations of hypothesis, testing, and refining.

So it follows that we will have less attested generalisations in our best models, and they are to be checked against both evidence (in terms of being part of the best available model) and conceptual difficulty. We seek to find those generalisations that best cover the terrain, but sometimes the terrain itself is hard to isolate out. This is where I think conceptual analysis plays a role.

When generalisations cause ambiguity, they are immediately suspect. One cannot eliminate ambiguity, and in fact it may be one way that science is able to escape from local high fitness peaks and move on. But something that has been a long standing problem, and shows no evidence of being refined despite attempts to do so, ought to be a prime candidate for revision, disambiguation, or elimination. Of course, it is important that we eliminate the scientific concepts rather than the philosophical ones (at least, for science - philosophical ambiguities are also subject to revision, in philosophy) and so it's important that the concepts are actually scientific. To this end, using intuitions or older literature won't suffice. Nor has that been the usual philosophical practice - Mill for instance used the best scientific ideas of his day when discussing classification in the System of Logic, and so on for Russell, Carnap, and other analytic philosophers. They have flocked, as it were, around the centres of epistemic activity, not the arid and largely meaningless ideas in the air of common discourse, or that homonym for it, "introspective intuitions".

Next, a discussion about what makes some idea an abstraction, and where it is...