Cladogram
A cladogram (from
Generating a cladogram
This section needs additional citations for verification. (April 2016) |
Molecular versus morphological data
The characteristics used to create a cladogram can be roughly categorized as either morphological (synapsid skull, warm blooded, notochord, unicellular, etc.) or molecular (DNA, RNA, or other genetic information).[7] Prior to the advent of DNA sequencing, cladistic analysis primarily used morphological data. Behavioral data (for animals) may also be used.[8]
As
Plesiomorphies and synapomorphies
Researchers must decide which character states are "ancestral" (
Homoplasies
A homoplasy is a character state that is shared by two or more taxa due to some cause other than common ancestry.[11] The two main types of homoplasy are convergence (evolution of the "same" character in at least two distinct lineages) and reversion (the return to an ancestral character state). Characters that are obviously homoplastic, such as white fur in different lineages of Arctic mammals, should not be included as a character in a phylogenetic analysis as they do not contribute anything to our understanding of relationships. However, homoplasy is often not evident from inspection of the character itself (as in DNA sequence, for example), and is then detected by its incongruence (unparsimonious distribution) on a most-parsimonious cladogram. Note that characters that are homoplastic may still contain phylogenetic signal.[12]
A well-known example of homoplasy due to convergent evolution would be the character, "presence of wings". Although the wings of birds, bats, and insects serve the same function, each evolved independently, as can be seen by their anatomy. If a bird, bat, and a winged insect were scored for the character, "presence of wings", a homoplasy would be introduced into the dataset, and this could potentially confound the analysis, possibly resulting in a false hypothesis of relationships. Of course, the only reason a homoplasy is recognizable in the first place is because there are other characters that imply a pattern of relationships that reveal its homoplastic distribution.
What is not a cladogram
This section needs additional citations for verification. (January 2021) |
A cladogram is the diagrammatic result of an analysis, which groups taxa on the basis of synapomorphies alone. There are many other phylogenetic algorithms that treat data somewhat differently, and result in phylogenetic trees that look like cladograms but are not cladograms. For example, phenetic algorithms, such as UPGMA and Neighbor-Joining, group by overall similarity, and treat both synapomorphies and symplesiomorphies as evidence of grouping, The resulting diagrams are phenograms, not cladograms, Similarly, the results of model-based methods (Maximum Likelihood or Bayesian approaches) that take into account both branching order and "branch length," count both synapomorphies and autapomorphies as evidence for or against grouping, The diagrams resulting from those sorts of analysis are not cladograms, either.[13]
Cladogram selection
There are several
In general, cladogram generation algorithms must be implemented as computer programs, although some algorithms can be performed manually when the data sets are modest (for example, just a few species and a couple of characteristics).
Some algorithms are useful only when the characteristic data are molecular (DNA, RNA); other algorithms are useful only when the characteristic data are morphological. Other algorithms can be used when the characteristic data includes both molecular and morphological data.
Algorithms for cladograms or other types of phylogenetic trees include
Biologists sometimes use the term parsimony for a specific kind of cladogram generation algorithm and sometimes as an umbrella term for all phylogenetic algorithms.[15]
Algorithms that perform optimization tasks (such as building cladograms) can be sensitive to the order in which the input data (the list of species and their characteristics) is presented. Inputting the data in various orders can cause the same algorithm to produce different "best" cladograms. In these situations, the user should input the data in various orders and compare the results.
Using different algorithms on a single data set can sometimes yield different "best" cladograms, because each algorithm may have a unique definition of what is "best".
Because of the astronomical number of possible cladograms, algorithms cannot guarantee that the solution is the overall best solution. A nonoptimal cladogram will be selected if the program settles on a local minimum rather than the desired global minimum.[16] To help solve this problem, many cladogram algorithms use a simulated annealing approach to increase the likelihood that the selected cladogram is the optimal one.[17]
The basal position is the direction of the base (or root) of a rooted phylogenetic tree or cladogram. A basal clade is the earliest clade (of a given taxonomic rank[a]) to branch within a larger clade.
Statistics
Incongruence length difference test (or partition homogeneity test)
The incongruence length difference test (ILD) is a measurement of how the combination of different datasets (e.g. morphological and molecular, plastid and nuclear genes) contributes to a longer tree. It is measured by first calculating the total tree length of each partition and summing them. Then replicates are made by making randomly assembled partitions consisting of the original partitions. The lengths are summed. A p value of 0.01 is obtained for 100 replicates if 99 replicates have longer combined tree lengths.
Measuring homoplasy
Some measures attempt to measure the amount of homoplasy in a dataset with reference to a tree,[18] though it is not necessarily clear precisely what property these measures aim to quantify[19]
Consistency index
The consistency index (CI) measures the consistency of a tree to a set of data – a measure of the minimum amount of homoplasy implied by the tree.[20] It is calculated by counting the minimum number of changes in a dataset and dividing it by the actual number of changes needed for the cladogram.[20] A consistency index can also be calculated for an individual character i, denoted ci.
Besides reflecting the amount of homoplasy, the metric also reflects the number of taxa in the dataset,[21] (to a lesser extent) the number of characters in a dataset,[22] the degree to which each character carries phylogenetic information,[23] and the fashion in which additive characters are coded, rendering it unfit for purpose.[24]
ci occupies a range from 1 to 1/[n.taxa/2] in binary characters with an even state distribution; its minimum value is larger when states are not evenly spread.[23][18] In general, for a binary or non-binary character with , ci occupies a range from 1 to .[23]
Retention index
The retention index (RI) was proposed as an improvement of the CI "for certain applications"[25] This metric also purports to measure of the amount of homoplasy, but also measures how well synapomorphies explain the tree. It is calculated taking the (maximum number of changes on a tree minus the number of changes on the tree), and dividing by the (maximum number of changes on the tree minus the minimum number of changes in the dataset).
The rescaled consistency index (RC) is obtained by multiplying the CI by the RI; in effect this stretches the range of the CI such that its minimum theoretically attainable value is rescaled to 0, with its maximum remaining at 1.[18][25] The homoplasy index (HI) is simply 1 − CI.
Homoplasy Excess Ratio
This measures the amount of homoplasy observed on a tree relative to the maximum amount of homoplasy that could theoretically be present – 1 − (observed homoplasy excess) / (maximum homoplasy excess).[22] A value of 1 indicates no homoplasy; 0 represents as much homoplasy as there would be in a fully random dataset, and negative values indicate more homoplasy still (and tend only to occur in contrived examples).[22] The HER is presented as the best measure of homoplasy currently available.[18][26]
See also
References
- .
- S2CID 89032582.
- S2CID 54988538.
- ^ PMID 11146143.
- (PDF) from the original on 2017-09-21.
- ISBN 978-0-8014-3675-8.[page needed]
- ISBN 978-3-7643-6257-7.[page needed]
- .
- ISBN 978-0-87893-282-5.[page needed]
- ^ Hennig, Willi (1966). Phylogenetic Systematics. University of Illinois Press.
- ISBN 978-0-19-512235-0.
- S2CID 85905559.
- S2CID 85725091.
- ^
Kitching, Ian (1998). Cladistics: The Theory and Practice of Parsimony Analysis. Oxford University Press. ISBN 978-0-19-850138-1.[page needed]
- S2CID 4350103.
- ^
Foley, Peter (1993). Cladistics: A Practical Course in Systematics. Oxford Univ. Press. p. 66. ISBN 978-0-19-857766-9.
- S2CID 85720264.
- ^ ISBN 9780126180305.
- ISBN 9780126180305.
- ^ JSTOR 2412407.
- .
- ^ JSTOR 2992286.
- ^ S2CID 53320612.
- PMID 28564338.
- ^ S2CID 84287895.
- PMID 25451518.
External links
- Media related to Cladograms at Wikimedia Commons