Alphabetical order is a system whereby
When applied to strings or
To determine which of two strings of characters comes first when arranging in alphabetical order, their first letters are compared. If they differ, then the string whose first letter comes earlier in the alphabet comes before the other string. If the first letters are the same, then the second letters are compared, and so on. If a position is reached where one string has no more letters to compare while the other does, then the first (shorter) string is deemed to come first in alphabetical order.
The result of placing a set of words or strings in alphabetical order is that all of the strings beginning with the same letter are grouped together; within that grouping all words beginning with the same two-letter sequence are grouped together; and so on. The system thus tends to maximize the number of common initial letters between adjacent words.
Alphabetical order was first used in the 1st millennium
The Bible is dated to the 6th–7th centuries BCE. In the Book of Jeremiah, the prophet utilizes the Atbash substitution cipher, based on alphabetical order. Similarly, biblical authors used acrostics based on the (ordered) Hebrew alphabet.
The first effective use of alphabetical order as a cataloging device among scholars may have been in ancient Alexandria,
In the 1st century BC, Roman writer
Alphabetical order as an aid to consultation started to enter the mainstream of
Arrangement in alphabetical order can be seen as a force for democratising access to information, as it does not require extensive prior knowledge to find what was needed.
Ordering in the Latin script
Basic order and examples
The standard order of the modern ISO basic Latin alphabet is:
An example of straightforward alphabetical ordering follows:
- As; Aster; Astrolabe; Astronomy; Astrophysics; At; Ataman; Attack; Baa
- Barnacle; Be; Been; Benefit; Bent
The above words are ordered alphabetically. As comes before Aster because they begin with the same two letters and As has no more letters after that whereas Aster does. The next three words come after Aster because their fourth letter (the first one that differs) is r, which comes after e (the fourth letter of Aster) in the alphabet. Those words themselves are ordered based on their sixth letters (l, n and p respectively). Then comes At, which differs from the preceding words in the second letter (t comes after s). Ataman comes after At for the same reason that Aster came after As. Attack follows Ataman based on comparison of their third letters, and Baa comes after all of the others because it has a different first letter.
Treatment of multiword strings
When some of the strings being ordered consist of more than one word, i.e., they contain
- Oak; Oak Hill; Oak Ridge; Oakley Park; Oakley River
- where all strings beginning with the separate word Oak precede all those beginning Oakley, because Oak precedes Oakley in alphabetical order.
In the second approach, strings are alphabetized as if they had no spaces, giving the sequence:
- Oak; Oak Hill; Oakley Park; Oakley River; Oak Ridge
- where Oak Ridge now comes after the Oakley strings, as it would if it were written "Oakridge".
The second approach is the one usually taken in dictionaries, and it is thus often called dictionary order by publishers. The first approach has often been used in book indexes, although each publisher traditionally set its own standards for which approach to use therein; there was no ISO standard for book indexes (ISO 999) before 1975.
In French, modified letters (such as those with diacritics) are treated the same as the base letter for alphabetical ordering purposes. For example, rôle comes between rock and rose, as if it were written role. However, languages that use such letters systematically generally have their own ordering rules. See § Language-specific conventions below.
Ordering by surname
In most cultures where
Ordering by surname is frequently encountered in academic contexts. Within a single multi-author paper, ordering the authors alphabetically by surname, rather than by other methods such as reverse seniority or subjective degree of contribution to the paper, is seen as a way of "acknowledg[ing] similar contributions" or "avoid[ing] disharmony in collaborating groups". The practice in certain fields of ordering citations in bibliographies by the surnames of their authors has been found to create bias in favour of authors with surnames which appear earlier in the alphabet, while this effect does not appear in fields in which bibliographies are ordered chronologically.
The and other common words
If a phrase begins with a very common word (such as "the", "a" or "an", called articles in grammar), that word is sometimes ignored or moved to the end of the phrase, but this is not always the case. For example, the book "The Shining" might be treated as "Shining", or "Shining, The" and therefore before the book title "Summer of Sam". However, it may also be treated as simply "The Shining" and after "Summer of Sam". Similarly, "A Wrinkle in Time" might be treated as "Wrinkle in Time", "Wrinkle in Time, A", or "A Wrinkle in Time". All three alphabetization methods are fairly easy to create by algorithm, but many programs rely on simple lexicographic ordering instead.
The prefixes M and Mc in Irish and Scottish surnames are abbreviations for Mac and are sometimes alphabetized as if the spelling is Mac in full. Thus McKinley might be listed before Mackintosh (as it would be if it had been spelled out as "MacKinley"). Since the advent of computer-sorted lists, this type of alphabetization is less frequently encountered, though it is still used in British telephone directories.
The prefix St or St. is an abbreviation of "Saint", and is traditionally alphabetized as if the spelling is Saint in full. Thus in a gazetteer St John's might be listed before Salem (as if it would be if it had been spelled out as "Saint John's"). Since the advent of computer-sorted lists, this type of alphabetization is less frequently encountered, though it is still sometimes used.
Special rules may need to be adopted to sort strings which vary only by whether two letters are joined by a ligature.
Treatment of numerals
When some of the strings contain
In the case of monarchs and popes, although their numbers are in Roman numerals and resemble letters, they are normally arranged in numerical order: so, for example, even though V comes after I, the Danish king Christian IX comes after his predecessor Christian VIII.
This section needs additional citations for verification. (June 2017))
Languages which use an
In a few cases, such as
Alphabetization rules applied in various languages are listed below.
- In Roman Numerals. When the abjadiyya is used in numbering, a unique abstracted way of writing the letters must be used in order to distinguish those letters from three first letter of the sentence as well as from numbers. For example, the Alef "ا" which looks identical to the Hindi numeral one "١", a small oval loop extends clockwise of the letter's bottom, followed by a short tail. Although these characters are rarely used digitally, they have been recognized under ASCII as Arabic Mathematical Alphabet, with ranges from 1EE00 TO 1EEFF.  There is a less common order, which is ordered phonetically Sawti Alphabet [ar], starting from the deep throat sound haa to the lip most meem. This ingenious oder was coined by Al-faraheedi.
- In Azerbaijani, there are eight additional letters to the standard Latin alphabet. Five of them are vowels: i, ı, ö, ü, ə and three are consonants: ç, ş, ğ. The alphabet is the same as the Turkish, with the same sounds written with the same letters, except for three additional letters: q, x and ə for sounds that do not exist in Turkish. Although all the "Turkish letters" are collated in their "normal" alphabetical order like in Turkish, the three extra letters are collated arbitrarily after letters whose sounds approach theirs. So, q is collated just after k, x (pronounced like a German ch) is collated just after h and ə (pronounced roughly like an English short a) is collated just after e.
- In Breton, there is no "c", "q", "x" but there are the digraphs "ch" and "c'h", which are collated between "b" and "d". For example: « buzhugenn, chug, c'hoar, daeraouenn » (earthworm, juice, sister, teardrop).
- In the Danish and Norwegian alphabets, the same extra vowels as in Swedish (see below) are also present but in a different order and with different glyphs (..., X, Y, Z, Æ, Ø, Å). Also, "Aa" collates as an equivalent to "Å". The Danish alphabet has traditionally seen "W" as a variant of "V", but today "W" is considered a separate letter.
- In Ĳ) was formerly to be collated as Y (or sometimes as a separate letter: Y < IJ < Z), but is currently mostly collated as 2 letters (II < IJ < IK). Exceptions are phone directories; IJ is always collated as Y here because in many Dutch family names Y is used where modern spelling would require IJ. Note that a word starting with ij that is written with a capital I is also written with a capital J, for example, the town IJmuiden, the river IJssel and the country IJsland (Iceland).
- In ŭ (u with breve), are counted as separate letters and collated separately (c, ĉ, d, e, f, g, ĝ, h, ĥ, i, j, ĵ ... s, ŝ, t, u, ŭ, v, z).
- In Estonian alphabet, which otherwise does not differ from the basic Latin alphabet.
- In Filipino (Tagalog) and other Philippine languages, the letter Ng is treated as a separate letter. It is pronounced as in sing, ping-pong, etc. By itself, it is pronounced nang, but in general Filipino orthography, it is spelled as if it were two separate letters (n and g). Also, letter derivatives (such as Ñ) immediately follow the base letter. Filipino also is written with diacritics, but their use is very rare (except the tilde).
- The Finnish alphabetand collating rules are the same as those of Swedish.
- For French, the last accent in a given word determines the order. For example, in French, the following four words would be sorted this way: cote < côte < coté < côté.
- The Hungarian vowels have accents, umlauts, and double accents, while consonants are written with single, double (digraphs) or triple (trigraph) characters. In collating, accented vowels are equivalent with their non-accented counterparts and double and triple characters follow their single originals. Hungarian alphabetic order is: A=Á, B, C, Cs, D, Dz, Dzs, E=É, F, G, Gy, H, I=Í, J, K, L, Ly, M, N, Ny, O=Ó, Ö=Ő, P, Q, R, S, Sz, T, Ty, U=Ú, Ü=Ű, V, W, X, Y, Z, Zs. (Before 1984, dz and dzs were not considered single letters for collation, but two letters each, d+z and d+zs instead.) It means that e.g. nádcukor should precede nádcsomó (even though s normally precedes u), since c precedes cs in the collation. Difference in vowel length should only be taken into consideration if the two words are otherwise identical (e.g. egér, éger). Spaces and hyphens within phrases are ignored in collation. Ch also occurs as a digraph in certain words but it is not considered as a grapheme on its own right in terms of collation.
- A particular feature of Hungarian collation is that contracted forms of double di- and trigraphs (such as ggy from gy + gy or ddzs from dzs + dzs) should be collated as if they were written in full (independently of the fact of the contraction and the elements of the di- or trigraphs). For example, kaszinó should precede kassza (even though the fourth character z would normally come after s in the alphabet), because the fourth "character" (grapheme) of the word kassza is considered a second sz (decomposing ssz into sz + sz), which does follow i (in kaszinó).
- In Þ, Æ, Ö.
- voice-onset time, then the affricates, fricatives, liquids, and nasals:
- A, AU, E, I, O, U, B, F, P, V, D, J, T, TH, G, C, K, Q, CH, X, S, Z, L, Y, W, H, M, N
- In Lithuanian, specifically Lithuanian letters go after their Latin originals. Another change is that Y comes just before J: ... G, H, I, Į, Y, J, K...
- In Polish, specifically Polish letters derived from the Latin alphabet are collated after their originals: A, Ą, B, C, Ć, D, E, Ę, ..., L, Ł, M, N, Ń, O, Ó, P, ..., S, Ś, T, ..., Z, Ź, Ż. The digraphs for collation purposes are treated as if they were two separate letters.
- In Portuguese, the collating order is just like in English: A, B, C, D, E, F, G, H, I, J, K, L, M, N, O, P, Q, R, S, T, U, V, W, X, Y, Z. Digraphs and letters with diacritics are not included in the alphabet.
- In Romanian, special characters derived from the Latin alphabet are collated after their originals: A, Ă, Â, ..., I, Î, ..., S, Ș, T, Ț, ..., Z.
- In Serbo-Croatian and other related South Slavic languages, the five accented characters and three conjoined characters are sorted after the originals: ..., C, Č, Ć, D, DŽ, Đ, E, ..., L, LJ, M, N, NJ, O, ..., S, Š, T, ..., Z, Ž.
- RAE adopted the more conventional usage, and now LL is collated between LK and LM, and CH between CG and CI. The six characters with diacritics Á, É, Í, Ó, Ú, Ü are treated as the original letters A, E, I, O, U, for example: radio, ráfaga, rana, rápido, rastrillo. The only Spanish-specific collating question is Ñ (eñe) as a different letter collated after N.
- In the Svenska Akademiens ordlista(2006) "W" was considered a separate letter.
- In the Turkish alphabet there are 6 additional letters: ç, ğ, ı, ö, ş, and ü (but no q, w, and x). They are collated with ç after c, ğ after g, ı before i, ö after o, ş after s, and ü after u. Originally, when the alphabet was introduced in 1928, ı was collated after i, but the order was changed later so that letters having shapes containing dots, cedilles or other adorning marks always follow the letters with corresponding bare shapes. Note that in Turkish orthography the letter I is the majuscule of dotless ı, whereas İ is the majuscule of dotted i.
- In many
- In Volapük ä, ö and ü are counted as separate letters and collated separately (a, ä, b ... o, ö, p ... u, ü, v) while q and w are absent.
- In Welsh the digraphs CH, DD, FF, NG, LL, PH, RH, and TH are treated as single letters, and each is listed after the first character of the pair (except for NG which is listed after G), producing the order A, B, C, CH, D, DD, E, F, FF, G, NG, H, and so on. It can sometimes happen, however, that word compounding results in the juxtaposition of two letters which do not form a digraph. An example is the word LLONGYFARCH (composed from LLON + GYFARCH). This results in such an ordering as, for example, LAWR, LWCUS, LLONG, LLOM, LLONGYFARCH (NG is a digraph in LLONG, but not in LLONGYFARCH). The letter combination R+H (as distinct from the digraph RH) may similarly arise by juxtaposition in compounds, although this tends not to produce any pairs in which misidentification could affect the ordering. For the other potentially confusing letter combinations that may occur – namely, D+D and L+L – a hyphen is used in the spelling (e.g. AD-DAL, CHWIL-LYS).
The principle behind alphabetical ordering can still be applied in languages that do not strictly speaking use an alphabet – for example, they may be written using a syllabary or abugida – provided the symbols used have an established ordering.
Some computer applications use a version of alphabetical order that can be achieved using a very simple
A rhyming dictionary is based on sorting words in alphabetical order starting from the last to the first letter of the word.
- Reinhard G. Lehmann: "27-30-22-26. How Many Letters Needs an Alphabet? The Case of Semitic", in: The idea of writing: Writing across borders, edited by Alex de Voogt and Joachim Friedrich Quack, Leiden: Brill 2012, pp. 11–52.
- Street, Julie (10 June 2020). "From A to Z - the surprising history of alphabetical order" (text and audio). ABC News (ABC Radio National). Australian Broadcasting Corporation. Retrieved 6 July 2020.
- e.g. Psalms 25, 34, 37, 111, 112, 119 and 145 of the Hebrew Bible
- Daly, Lloyd. Contributions to the History of Alphabetization in Antiquity and the Middle Ages. Brussels, 1967. p. 25.
- LIVRE XI – texte latin – traduction + commentaires.
- Gibson, Craig (2002). Interpreting a classic: Demosthenes and his ancient commentators.
- Cawdrey, Robert (1604). A Table Alphabeticall. London. p. [A4]v.
- Coleridge's Letters, No.507.
- "Arabic Mathematical Alphabetic Symbols" (PDF). THE Unicode Standard.
- "Unicode Technical Standard #10: Unicode collation algorithm". Unicode, Inc. (unicode.org). 20 March 2008. Retrieved 27 August 2008.
- Midgley, Ralph. "Volapük to English dictionary" (PDF). Archived from the original (PDF) on 1 September 2012. Retrieved 24 September 2019.