



1. 突变,随机漂变和中性进化



  和Y染色体一样,亲代的一些基因并没有衍生,也没有在子代群体中发现。如果一个固定规模种群的后代基因是随机抽取的,则未抽取的基因的概率由参数为1的泊松分布给出,即q(0) = e-1 = 0.367。这些未抽取的基因(超过三分之一)没有遗传就消失了。它们的缺失被那些碰巧留下了更多后代的亲代基因所补偿。如果情况不是这样,祖先的谱系就会保持平行,但从不相遇。过去的祖先谱系的分型与现在相比,在多样性的丧失上没有区别,在所谓的血缘关系上也没有区别。

测量多样性的公式(1)具有一个有用属性:它取决于样本量估计。当子代“t+1”从亲代“t”中抽取n个基因作为样本,根据公式E(Hs) = Ht (1 – 1/n),子代表现出的变异丢失为1/n。因为子代是随机抽取替换的,即使子代群体规模大于亲代群体,这一特性也同样适用,但是种群越大,多样性受到的侵蚀就越少。所有现实中的种群规模都有限,这就足够了。按照惯例,遗传学家认为,这种变异丢失为1/Ne,其中Ne是指染色体的有效数目[1]。因此从第一代到第二代:

E(H2) = H1 (1 – 1/Ne) (3)


E(T) = 2 Ne (4)


多态性随着时间的推移而消失可以反过来表示:我们往回追溯,会发现同一位点基因在染色体上的位置。在群体遗传学中,对于所有同源基因(同源类),如果两条染色体或两个基因在减数分裂时相互匹配并排除对方,则它们是同源的。的两个基因总是有一个最近的共同祖先。这就是约翰·金曼(John Kingman)所说的溯祖过程。不同位点上的祖先不一样,因为性别使祖先的数量倍增,因此基因的共同祖先也不一样。如果上一代拥有共同祖先的概率q = 1/Ne始终不变,那么祖先的分布遵循指数定律t = q . e-qt。这些祖先的期望年龄等于Ne。如果从那以后没有发生突变,两个基因在遗传学上是相似的。但只要在从祖先传承到这两个基因的过程中,其中一个谱系发生了突变,就足以让这两个基因都成为等位基因。由此推断,这两个基因之间核苷酸差异的数量为θ = Ne × 2µ,其中µ中性突变率。用公式θ = 2Neμ来定义的参数θ是群体遗传学的一个基本参数。

图1. 一些种群结构模型。椭圆表示种群,实线箭头表示种群间的迁移率,虚线箭头表示种群之间由于随机漂变造成的差异。在星形系统发育和奠基效应模型中,分化是随时间一直持续的。在其他模型中,种群内部的随机漂变和迁移导致的种群同质化之间存在着平衡。观察到的种群对之间的遗传变异和TSFs模式可以表明给定物种的历史是否与上述模型之一或多或少相似。


FST = 1-HS/HT(5)


图2. 溯祖和人口结构变化。来自同一位点的6个基因样本(白色圆圈)在三个具有不同历史的群体中被用于谱系研究,图中的灰色方框代表了群体的规模:规模不变(左)、瓶颈后的近期扩张(中)和更早的扩张(右)。时间(以代为单位)可以追溯到位于图表顶部的过去。突变用黑色圆形(“内部”分支的突变,和样本中的几个基因是相同的)或黑色星形( “外部”分支的突变,在样本中是独一无二的,因为它们导致样本中单个基因的出现)表示。我们看到,样本中共同祖先的年龄分布因不同条件而有较大差异。系谱的外部(末端)分支与内部分支的长度比是不同的。对于一个恒定的突变率,随着时间的推移,外部和内部突变的相对比例将会不同。样本中罕见/频繁突变的比例是用来重建种群历史的指标之一。其他多态性指标(包括文中看到的H和π)也具有这个属性(temps passé 过去的时间)


2. 中性模型和生物多样性管理


3. 有害突变


4. 有利突变


图3. 选择性扫描。重组使得基因组上相邻区域的进化有可能被分离。如果没有选择,中性多态性将通过一个非常缓慢的平衡过程沿着染色体达到一个可比值。当一个突变在某个区域具有优势且频率变为1时,这个过程会非常快,并且它会扫除该区域的中性多态性,但不会扫除相邻区域的中性多态性。通过对比扫描区域和中性区域的中性多态性水平,可以肯定确实是选择在前者中起了作用,并排除了认为“适应的就是我们所看到的”的循环推理。本例显示了模拟果蝇X染色体上两个相邻区域的选择性扫描。它们使识别基因的这两个复合体成为可能,这两个复合体同时作用于改变果蝇后代的孟德尔基因比例(参见 [6])(所谓的“自私”基因)。在这两个区域(SR1和SR2),选择消除了中性多态性。
(régions codantes 编码区;déficit de polymorphisme neutre 中性多态性缺乏;polymorphisme neuter(π) 中性多态性(π);position des genes sur le chromosome 基因在染色体上的位置)

图4. 平衡多态性。当自然选择维持两个等位基因的共存时,它们的序列分歧越来越大,以至于这两个基因之间积累的突变比基因组其余部分的中性模型预测的要多。图中的例子是果蝇tan基因上的两个等位基因(坐标0),这两个等位基因维持了浅色腹部和黑色腹部雌性果蝇大约300万年的共存。这些颜色模式涉及雄性和雌性交配过程之间的交流,但它们互相不能消除彼此,可能是因为它们的选择优势在频率过大时减弱了。黑白横线:染色体间差异的期望值:蓝色:实际观测值(参见 [7])(Divergence 分化程度;femelle noire 黑色雌性;femelle claire 浅色雌性)


5. 多态性是有用的吗?

  20世纪30年代至60年代,自然种群遗传学家在自然界中发现了越来越多的多态性。他们想要评估多态性的程度和发现其在进化方面的潜在用途。那些认为遗传多样性对其本身有利,而自然选择将其维持在较高水平的研究人员,和那些认为选择导致在一个有限规模种群[6]中形成一个表型一个相当同质的野生个体所有可观察到的特征,其余的变异相当有害。他们中没有一个是对的。法国人 gustave malécot 在1950年代就已经证明,中性多态性是孟德尔定律的结果【Lois关于生物遗传的原则,由捷克僧侣和植物学家孟德尔(1822-1884)提出】的研究人员观点对立。最终在1966年,研究人员发现了极高水平的分子多态性,这无法单独用自然选择解释[7]。于是,日本的木村(Kimura)和太田(Ohta)提出了中性理论[8]。人们意识到,达尔文自然选择理论的替代品不是物种的固定性(这也是达尔文的反对者所持观点),而是中性模型预测的持续基因变化,这类似于物理学扩散现象中的随机漫步。这一观点最终在20世纪80年代得到接受。然而,与繁殖种群的大小相比,所有物种的有效种群大小都非常低,这表明外界力量侵蚀遗传多样性的程度远远超过中性模型的预测。尽管研究人员的估计还不够准确,这种侵蚀在一定程度上是由于自然选择,自然选择消除了有害突变,固定了有利突变,从而增加了漂移对中性变异的影响。虽然研究对物种的未来极其重要,但所选多态性肯定只占了多态性案例的一小部分。


维持重组系统的选择性力量、孟德尔定律(Mendel’s laws)[9],以及性行为的基因混合都是考虑多态性的参数,它们以这种方式维持多态性而多态性在自然种群中具有短期优势。



Genetic polymorphism and selection


Polymorphism has caused controversy about its role in evolution. But if it essentially follows a neutral evolution, it serves as a reference, in contrast, for the study of natural selection. It is also used by ecologists in conservation biology to reconstruct the past history of species.

1. Mutations, random drift and neutral evolution

Polymorphism consists of mutations that escape DNA repair systems over cell divisions. Their rate of appearance is therefore a biological variable. In humans and chimpanzees, it is µ ≈ 10-8 mutations by nucleotideBasic element of a nucleic acid such as DNA or RNA. It is composed of a nucleic base (or nitrogenous base), a ose with five carbon atoms, called pentose, whose association forms a nucleoside, and finally one to three phosphate groups. and by generation. The considerable amount of sperm produced by male mammals means that there is much more cell division in male germ lineAll cells from stem cells to gametes than in female germ line: 380 against 23 at age 30 (i.e. 16 times more), and even more so when men age (840 against 23 at age 50, i.e. 36 times more). This means that in these species the mutations are mainly produced in male lines and depend on the age of the father. Each birth produces about 100 new mutations per genomeGenetic material of a living organism. It contains genetic information encoding proteins. In most organisms, the genome corresponds to DNA. However, in some viruses called retroviruses (e.g. HIV), the genetic material is RNA., but because only a small part of the genome is codingDescribes the part of the DNA or RNA of a gene translated into protein. Represents only a part of the gene from which it originates, as well as the mRNA in which it is registered., 99% of them have no effect on survival or fertility. They are called neutral. A new allelTwo homologous genes are called alleles when they have different shapes, distinguishable at a given level of observation. An allele can therefore correspond to a single sequence, or to a set of sequences that are different but not distinguishable at the phenotype level. (e.g. blue/brown/green eye colour but at the nucleotide level there are many more different alleles, several per colour). can be neutral, harmful or advantageous. Neutral mutations are the most studied, as they allow predictive models to be written to explore population history. Their distribution also serves as a null hypothesisRefers to the basic point of view, to the default position regarding a given phenomenon. In general, hypotheses opposing the null hypothesis have the burden of proof. to interpret, by comparison, that of deleterious or beneficial mutations.

We could think that in a genome comprising only neutral alleles, the drift of allelic frequencies would compensate for one fluctuation on the other and that the allelic diversity H would remain stable in the long term. But this impression is false. Gradually, diversity is eroding. This phenomenon is very similar to the loss of diversity of family names, a slow but significant phenomenon in human isolates such as remote villages. When a family does not have a boy, it does not pass on its surname. The same surname can be transmitted by related families, but the smaller the population, the greater the probability of names being lost. This is obviously not due to any biological property of the Y chromosome, which accompanies male births. Chance is enough to explain it. This property reflects the fact that the constitution of a daughter generation from a parental population follows the principle of a draw with replacementDrawing successively with delivery of tokens in an urn containing n tokens, means taking a first token, reading its value, putting it in the urn, taking a second token, reading its value, putting it in the urn, etc. until the p-th token. This means choosing p objects among n with repetition (you can choose the same object several times) and in order (the order in which you choose the objects is important). The number of successive draws with tokens among n is: n × n × n × … × n = np..

Like the Y chromosomes, some genes of the parental generation are not derived, and are not found in the daughter population. If the genes of the progeny are randomly drawn in a population of constant size, the probability that genes are not drawn is given by a Poisson’s law of parameter 1 as q(0) = e-1 = 0.367. These undrawn genes (more than a third) disappear without offspring. Their absence is compensated by parental genes which, by chance, have left more descendants. If this were not the case, the ancestral lines would remain parallel without ever meeting. The grouping of ancestral lines when going back in time is no different from a loss of diversity when going down to the present, nor is it different from what is called consanguinity.

The measurement of the diversity of formula (1) has a useful property: it depends on the sample size on which it is estimated. When a daughter population “t+1” is sampled by drawing n genes from a parental generation “t” the daughter generation shows a loss of variation equal to 1/n, according to the formula E(Hs) = Ht (1 – 1/n). This is true even if the daughter population is larger than the mother population, since it is a draw with replacement, but the larger the population, the less diversity is eroded. It is sufficient that the population be finite in size, which is what all real populations are. By convention, geneticists refer to this loss of variation as 1/Ne, where Ne is referred to as the effective number of chromosomes [1]. Thus from a generation 1 to a generation 2:

E(H2) = H1 (1 – 1/Ne)   (3)

The effective size is almost always much smaller than the actual size of the chromosomes, for reasons that will be discussed later. For example, it is estimated that in the past of the human lineage, the effective number of chromosomes was in the order of 10,000. If there were no mutations, it is shown that the population would become monomorphic after a time T, of hope:

E(T) = 2 Ne    (4)

There are two consequences to this: first, the polymorphism of a species is always “recent” on the scale of the duration of a species, since it depends on mutations that have restored the polymorphism despite the erosion of diversity that accompanies the drift of allelic frequencies. Second, the level of polymorphism is a compromise between two opposing mechanisms, creating the neutral mutation-drift balance.

The disappearance of polymorphism over time can be expressed in the opposite direction: when we go back in time, there is always a last common ancestor between two genes of the same locusPosition of the gene on the chromosome. In population genetics, all homologous genes (homology class). Two chromosomes or two genes are homologous if they match and exclude each other at meiosis.. This is what John Kingman called the coalescence process. The ancestor is not the same for different locus, because sexuality multiplies the number of ancestors, therefore also the common ancestors of genes. If the probability of having an ancestor common to the previous generation q = 1/Ne, remains constant over time, the distribution of ancestors follows an exponential law t = q. e-qt. The age expectation of these ancestors is equal to Ne. Two genes will be genetically similar if no mutation has occurred since then. But it is enough that a mutation has occurred in one of the lines leading from the ancestor to each of the two genes for both genes to be alleles. It can be deduced that the number of nucleotide differences between these two genes is θ = Ne x 2µ, where µ is the neutral mutation rate. This θ value, defined as θ = 2Neµ, is a fundamental parameter of population genetics.

Encyclopedie environnement - polymorphisme - modele structuration population - polymorphism
Figure 1. Some models of population structure. The continuous arrows represent migration rates between populations, represented by ellipses. The interrupted arrows represent the differentiation between populations due to random drift. In star phylogeny and the founding effect, the divergence is continuous over time. In other models, there is a balance between random drift within populations and homogenization of populations by migration. Observed patterns of genetic variation and TSFs between population pairs indicate whether the history of a given species is more or less similar to one of these scenarios.

The neutral evolution of natural populations is very important in conservation biology, as it allows the history of species to be reconstructed. Geneticists have long known that random genetic drift allows them to infer models of population differentiation and species structure in space (Figure 1). During the second half of the 20th century, the most commonly used indicator to study the structuring of a population into sub-populations was the FST of the formula:

FST = 1-HS/HT     (5)

where HS is the average of the diversities of the sub-populations and HT is the diversity of the total population [2].

Encyclopedie environnement - polymorphisme - Coalescence et changements demographiques - coalescence and demographic changes - polymorphism
Figure 2. Coalescence and demographic changes. The genealogy of a sample of 6 genes (white circles) from the same locus is examined in three populations with different demographic histories where the framing represents the size of the population: constant size (left), recent expansion after a bottleneck (centre) and old expansion (right). Time (in generations) goes back to the past at the top of the diagram. Mutations are represented by black circles (mutations of the “internal” branches, common to several genes in the sample) or by black stars (mutations of the “external” branches, unique in the sample because they lead to a single gene in the sample). We see that the age distribution of the common ancestors in the sample is very different depending on the conditions. The length ratio between the external (terminal) and internal branches of genealogy is different. This will result, for a constant mutation rate over time, in a different relative proportion of external and internal mutations. This ratio of rare/frequent mutations in the sample is one of the indicators used to reconstruct the history of the population. Other sets of polymorphism indicators (including H and π, seen in the text) also have this property.

In the 21st century, the age of numerical genome analysis, the theory of coalescence [3], independently developed by Kingman, Hudson and Tajima in 1982-83, makes it possible, in addition to studying structuring, to determine whether populations have remained stable or have undergone demographic changes (Figure 2).

2. Neutral model and biodiversity management

Figures 1 and 2 illustrate how genetic variation profiles are affected by population history: spatial structuring, colonization, migration, population change are all events that impart a specific signature in the molecular polymorphism of species, and allow ecologistswork in ecology. The job of an ecologist is to study the relationships between organisms and the surrounding world. Should not be confused with the ecologist, who campaigns to protect ecology. to trace its history. During the Quaternary era – the current geological period – the world’s climates changed cyclically, resulting in periodic changes in the coastline, a north-south shift of biological associations and glaciers, and periods of wet or dry climate at all latitudes. The resulting movements, decreases, increases, invasions of populations, indices of species’ responses to changes in their environment, are systematically recorded by population biologists before any natural population management initiative is undertaken. Most of the applications of population genetics today are in conservation biology.

3. Harmful mutations

Because genes code for proteins, most mutations in coding regions modify the protein sequence (about 3/4 of the mutations, a proportion that varies according to the composition of the sequence). In the human lineage, about 40% of these changes are deleterious, i.e. they are missing when the evolution of the genome of this species is assessed since its separation from the chimpanzee lineage. If a mutation were neutral, it would have a 1/Ne chance of replacing the other genes present at this locus one day (in a population of effective size Ne, the other genes taken together are in a 1-1/Ne proportion, and each also has a 1/Ne chance of replacing all the others). But a mutation can be harmful and affect the health or fertility of the individuals who carry it. Its frequency may fluctuate for a few generations by random drift before disappearing by selection (forty generations on average in Drosophila). All members of a species are carriers of deleterious mutations. You and I are. They are almost always in the heterozygousstate. This characterizes an organism that has two different alleles of the same gene at the same locus for each of its homologous chromosomes., because if a mutation has a frequency, for example, of 1/1000, it will have a thousand times fewer representatives in the homozygotestate, characterized by an organism that has two identical alleles of this gene at the same locus for each of its homologous chromosomes. than in the heterozygote state. It is the slight disadvantage of heterozygotes that eliminates the mutation rather than the often much greater disadvantage of the homozygous. Since the effects of deleterious mutations on several locus are cumulative, the mutation burden becomes a quantitative variable like any other whose additive effects may be undetectable, but nevertheless effective over the long term to purge the genome permanently. This explains why proteins remain functional and harmful mutations remain of low frequency. They are probably one of the factors that explain the maintenance of genetic recombination. This makes it possible both to group harmful mutations together to eliminate them and to limit the consequences of their elimination on adjacent regions of the chromosomes.

4. Advantageous mutations

What are the 60% of mutations affecting proteins without deleterious effect? Like mutations affecting other regions of the genome, they can be “neutral”, i.e. without any effect on health or fertility in a particular environment and in a particular communication system of a species. Their frequency fluctuates randomly in natural populations. But if conditions change, they can be advantageous. They are then part of the natural selection and sexual selection imagined by Darwin, but also of the selection in the first sense of the word, i.e. the selection made by man on his domestic species. There are two types of polymorphism selected: transient polymorphism and balanced polymorphism.

Encyclopedie environnement - polymorphisme - Balayage selectif - polymorphism
Figure 3. Selective scanning. Recombination makes it possible to decouple the evolution of adjacent regions of the genome. If there was no selection, neutral polymorphism would reach comparable values along the chromosome through a very slow equilibrium process. When a mutation is advantageous in a region and sets at the frequency of 1, the process is very fast, and it sweeps away the neutral polymorphism of that region, but not that of adjacent regions. The contrast of the level of neutral polymorphism in the scanned regions and in the neutral regions makes it possible to affirm that it is indeed the selection that has acted in the former, and excludes the circular reasoning that would admit that “what is adapted is what we see”. This example shows two contiguous areas of selective scanning on the X chromosome of Drosophila simulans. They make it possible to identify two complexes of genes that act simultaneously to modify for their benefit the Mendelian proportions in the offspring of fruit flies (see ref [6]) (so-called “selfish” genes). In these two zones (SR1 and SR2), selection has eliminated neutral polymorphism.
Transient polymorphism is the case of an advantageous mutation that gradually “fixes” itself by eliminating alternative alleles, which can lead to a frequency of 1. This is the case, for example, of insecticide resistance genes in mosquitoes, antibiotic resistance in bacteria and antimalarial drug resistance in the malaria parasite: these mutations would probably not have had an advantage under natural conditions, but in the environmental context imposed by medicine, these alleles increase in frequency. This is also the case for the three alleles that regulate the expression of lactase, an enzyme that allows humans to digest milk sugar (lactose) not only in the newborn state, as in other mammals, but also in adults. These mutations have become beneficial in livestock populations, while our hunter-gatherer ancestors only had the opportunity to digest fruit sugar (sucrose) as adults. In all these cases of transient polymorphism, the locus to which the selection relates is “betrayed” by a signature in the genome: the rapid expansion of its frequency makes the adjacent neutral variation on the chromosome disappear. This is a case of selective scanning, which makes it possible to affirm that the fixation of an allele is not due to random drift, but to selection (Figure 3, [4]).

Encyclopedie environnement - polymorphisme - Polymorphisme equilibre - polymorphism
Figure 4. Balanced polymorphism. When natural selection maintains the coexistence of two alleles, their sequences diverge more and more, to the point of accumulating more mutations between them than the neutral model for the rest of the genome predicts. This is the case for two alleles of the Drosophila tan gene (coordinate 0), whose two alleles have maintained the coexistence of females with light or black abdomen for about three million years. These patterns of coloration are involved in communication between males and females during mating, but neither can eliminate the other, probably because their selective advantage decreases when they become too frequent. Interrupted line: expected value of the divergence between chromosomes; in blue: value actually observed (see ref [7]).
Balanced polymorphism refers to situations where two alleles coexist because each is favoured under certain conditions, but where neither can prevail over the other in all circumstances of time or space. An example is given by cases where the selective advantage of a genotype increases due to its inverse frequency. This is called frequency-dependent selection. Such situations of balanced polymorphism are frequent in cases of sexual selection (Figure 4, [5]).

5. Is polymorphism useful?

In the 1930s to 1960s, natural population geneticists discovered an increasing number of polymorphisms in nature. They wanted to assess its extent and discover its potential utility in terms of evolution. Debates opposed researchers who considered that genetic diversity conferred an advantage in itself and that selection maintained it at high levels, to researchers who considered that selection led to a phenotypeAll the observable characteristics of a fairly homogeneous wild individual, the remaining variations being rather harmful. None of them were right. The Frenchman Gustave Malécot had already demonstrated in the 1950s that neutral polymorphism was a consequence of Mendel’s {tooltip}laws{ind-text}Lois concerning the principles of biological heredity, set out by the Czech monk and botanist Gregor Mendel (1822-1884). in a finite size population [6]. It was finally the discovery in 1966 of extremely high levels of molecular polymorphism, which could not be explained by natural selection alone [7], that allowed the Japanese Kimura and Ohta to put forward the neutralist theory [8]. It was realized that the alternative to Darwin’s theory of natural selection was not the fixity of species (as thought by Darwin’s opponents, for example) but a continuous genetic change predicted by the neutral model, similar to the random walk of a diffusion phenomenon in physics. This vision was definitively accepted in the 1980s. However, the very low value of the effective population size measured in all species, compared to the reproductive population size, indicates that forces are eroding genetic diversity much more than neutralist models predict. This erosion is due in part, still poorly estimated, to natural selection, which eliminates harmful mutations and fixes advantageous variations, and thus increases the effects of drift on the neutral variation. Although extremely important for the future of the species, the selected polymorphisms certainly represent only a small fraction of the cases of polymorphism.

Neutral molecular polymorphism provides the basic theory, the reference model, from which the selection and history of populations are studied. The paradox is that, from now on, the molecular signatures of natural selection are sought in the genome using neutralist theory.

The existence of selective forces that maintain the recombination system, Mendel’s laws [9], and the genetic mixing of sexuality is an argument for considering that polymorphism, which they maintain in this way, has a short-term advantage in natural populations.


