Molecular Homology.
In the 1950's and 1960's the
notion of homology has begun to be
extended to the
molecular level, to the comparison of nucleotide se-
quences in the DNA's of different organisms,
and to
the comparison of amino acid sequences in proteins
from
different organisms. If the work to establish mo-
lecular homology is great, so too are the conceptual
rewards.
One of the most difficult handicaps of classical
genetics and evolutionary
studies is, as we have noted,
the lack of a simple relation between the
genotype
and phenotype of an organism. A consequence of this
complexity is that if two classes of organisms exhibit
a certain degree of
phenotypic difference, one cannot
usually determine the extent of genotypic
difference
between the two classes. Thus, from classical genetics
we
can usually know neither the extent of genotypic
change underlying the
observed phenotypic alterations
in evolution, nor, therefore, the rate at
which the
genotype changes in evolution.
These difficulties are partially overcome by consid-
ering nucleotide sequences in DNA and amino acid
sequences in
proteins. Since the nucleotide sequence
of the DNA is the genotype,
comparison of nucleotide
sequences in different organisms is the most
direct
means of assessing the extent of genetic change in
evolution,
and genetic homology between species.
However, such an assessment is not as
straightforward
as one might have hoped. In the first place, direct
analyses of truly long sequences of bases in DNA are
not currently
available. Estimations of similarity of
nucleotide sequences between DNA's
utilize indirect
techniques which only establish approximate ho-
mology, not identity of sequence. These
techniques
will be described later.
Even were it possible to obtain quite detailed nu-
cleotide sequences for the DNA of two organisms, say
bacteria,
estimation of the extent of genotypic differ-
ence between the two would remain difficult. The
concept of the
extent of genotypic difference is ambig-
uous.
Ambiguity resides in the dual reference of
“genotype”
to the actual physical structure of the
DNA, the sequence of bases, and
also to the DNA as
the carrier of genetic information. If one is
referring
to the physical genotype, then the extent of difference
between two genotypes is simply the number of ho-
mologous loci at which the nucleotides differ. Alter-
ation in the informational genotype is related in
rather
complex ways to alteration in the physical genotype.
Amino
acids are coded for by triplets of nucleotides;
most amino acids are coded
for by two or more codons.
Thus, some nucleotide substitutions change a
codon to
a second codon for the same amino acid. Such a substi-
tution alters the physical genome,
but leaves the infor-
mational genotype
unaltered. Physical genotypes
different at many loci can be the same
informational
genotype. Conversely, nearly identical physical geno-
types can be radically different
informational geno-
types. This possibility is
a consequence of the fact that
codons are triplets, and an amino acid
sequence is
specified by a sequence of triplets in which the nucleo
tides are “read” from a specific starting
point, three
at a time. A deletion of a single nucleotide can cause
a
“reading frame shift” in which all codons downstream
from the deletion are misread and a large number of
incorrect amino acids
are incorporated into the pro-
tein. A small
change in the physical genotype yields
a large change in the informational
genotype. If one
is concerned with the extent and rate of alteration
of
the informational genotype in evolution, one must view
with caution
data derived from estimates of physical
homology of the DNA's of various
organisms.
Indirect physical techniques to study the extent of
base sequence homology
of the physical genotype de-
pend upon the DNA's
duplex structure whose comple-
mentary
strands may be separated and caused to re-
combine. Since single-stranded DNA components from
different origins
may also be induced to form “hybrid”
structures, a
means is afforded by which to assess ge-
netic
relationships among organisms. It can be shown
that duplex formation
between single strands derived
from DNA of the same or nearly identical
species
occurs readily, but fails to occur if the strands are
derived
from very different organisms.
Results of such studies (Bolton, p. 77) indicate that
phenotypically similar
animals have very similar DNA
base sequences. Furthermore, “...
the similarities and
differences in polynucleotide sequences
quantitatively
indicate the extent of the taxonomic category to which
the systematist refers. Thus, among the primates, a
superfamily distinction
means that about one-quarter
of the polynucleotide sequences are different,
half are
different for subordinal separation, and about three-
quarters for ordinal
distinction.” Bolton also notes that
“the
quantitative similarities in polynucleotide se-
quences among vertebrates can be related to the time
at which
the lines of organisms in the present diverged
from one another in the
geologic past according to
the paleontologist's judgment.”
Bolton's figure shows
a linear decrease in the logarithm of DNA
similarity
with time.
While Bolton's data gives a good indication of the
rate of alteration of a
physical genome in evolution,
it remains difficult to relate the results to
the extent
and rate of change of the informational genotype in
evolution.
A conclusion reached by Britten and co-workers
(1968, p. 529), is that many
nucleotide sequences occur
repeatedly in the DNA of higher organisms, there
being
many DNA families, each with many nearly identical
copies of one
sequence. The existence of these homolo-
gous
DNA sequences renders the relation between the
physical and informational
genotype even more com-
plex, for the functional
significance of the redundant
DNA is not known. Britten's data also casts
doubt on
Bolton's conclusion about DNA homology among spe-
cies, for Bolton probably measured only highly re-
dundant DNA sequences.
In contrast to changes in nucleotide sequences which
may occur without
alteration of the informational
genotype, changes in amino acid sequence
are evi-
dence, by definition, of alteration of
the informational
genotype. With the exception of substitutions of nu-
cleotides which do not change the amino
acid specified,
substitution of a single nucleotide results in the substi-
tution of a single amino acid at a
locus in the poly-
peptide. Since the
assignment of codons to amino acids
is now fairly well established, it is
now possible to say
which amino acid substitutions can occur by substi-
tution of a single nucleotide in a
codon. Some amino
acid substitutions cannot be made by altering a
single
nucleotide, but would require the simultaneous alter-
ation of two or three nucleotides; or
else, since nucleo-
tide substitutions must
usually occur one at a time,
intermediate proteins with an amino acid
different
from both the first and final form, must have existed.
Partial or complete sequences of amino acids have
now been worked out for
several sets of homologous
proteins in different organisms, for example,
hemo-
globin and cytochrome C (Fitch and
Margoliash, 1967).
By utilizing arguments about minimal possible
changes
causing sequences of amino acid substitutions, coupled
with
assumptions about nonreversal of changes, it is
possible to arrange
contemporary proteins into pre-
sumptive
branching phylogenetic sequences (ibid.).
Evidence supporting the deduced branching phylo-
genetic relations can be sought in the
fossil record. The
form of argument utilized is closely similar to
that
noted by E. O. Wilson in 1965 for deducing consistent
possible
phylogenies based on gross phenotypes of
contemporary organisms. Utilizing
such techniques on
cytochrome C, Fitch and Margoliash (1967) have pro-
duced a phylogenetic tree linking fungi,
yeasts, nema-
todes, fish, birds, and mammals,
which is very similar
to phylogenetic trees proposed by classical
zoologists.
The number of amino acid substitutions, coupled with
time
estimates derived from the paleontological record,
can give an estimate of
the rate of mutation of the
informational genotype. It will be of
particular interest
to compare the rates for proteins performing
diverse
functions, for the rate must depend in part upon the
strictness of selective constraints on workable amino
acid sequences.
The extent of homology in amino acid sequence for
some proteins is enormous;
neurohypophysial peptide
hormones hardly differ from man to shark
(Acher,
1969). Other proteins exhibit far less homology, differ-
ing in many loci in many different ways.
The occur-
rence of such proteins, all
performing the same func
tion in different animals, has led some biologists to
suppose
that some amino acid substitutions do not
affect protein function and are
therefore not subject
to selection. By random drift, large numbers of
such
substitutions are claimed to accumulate, so that these
homologous
proteins differ at many loci but continue
to function.
Amino acid sequence homology is also utilized to
help establish possible
common evolutionary ancestry
for different proteins. For example, the
alpha, beta,
delta, and gamma chains of hemoglobin have long
identical
sequences (Fitch and Margoliash, 1967). This
argues strongly that the four
protein chains were de-
rived from some single
gene, perhaps by its endorepli-
cation
to form the sort of redundant DNA of which
Britten has spoken, and then the
further evolution of
the four genes.
Clearly, the extension of the concept of homology
to the molecular level
promises to be exceptionally
rewarding.