Windows-1252
ISO 8859-15 | |
Windows-1252 or CP-1252 (
It is the most-used single-byte character encoding in the world. As of December 2023[update], 1.3%[2] of all web sites declare ISO 8859-1 which is treated as Windows-1252 by all modern browsers (as demanded by the HTML5 standard[3]), plus 0.3% of all websites declared use of Windows-1252,[2][4] for a total of 1.6% (and only 14 of the top 1000 websites[5]).
Depending on the country or language, in 2023, use (on websites at least) can be much higher than the global average, e.g. (including Windows-1252), for Brazil according to website use, use is at 3.8%,[6] and in Germany at 3.2%.[7][8] (these are the sums of ISO-8859-1 and CP-1252 declarations).
Details
This character encoding is a
Starting in the 1990s, many
Historically, the phrase "ANSI Code Page" was used in Windows to refer to non-DOS encodings; the intention was that most of these would be
In LaTeX packages, CP-1252 is referred to as "ansinew".
IBM uses code page 1252 (CCSID 1252 and euro sign extended CCSID 5348) for Windows-1252.[14][15][16]
It is called "WE8MSWIN1252" by Oracle.[17]
Codepage layout
The following table shows Windows-1252. Differences from
0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | A | B | C | D | E | F | |
0_ | NUL | SOH
|
STX
|
ETX | EOT | ENQ | ACK
|
BEL | BS | HT
|
LF
|
VT
|
FF
|
CR | SO
|
SI
|
1_ | DLE
|
DC1
|
DC2
|
DC3
|
DC4
|
NAK
|
SYN
|
ETB
|
CAN | EM
|
SUB | ESC | FS
|
GS
|
RS
|
US
|
2_ | SP | ! | " | # | $ | % | & | ' | (
|
)
|
* | +
|
,
|
-
|
. | / |
3_ | 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | : | ; | < | = | > | ? |
4_ | @
|
A | B | C | D | E | F | G | H | I | J | K | L | M | N | O |
5_ | P | Q | R | S | T | U | V | W | X | Y | Z | [
|
\ | ]
|
^ | _ |
6_ | `
|
a | b | c | d | e | f | g | h | i | j | k | l | m | n | o |
7_ | p | q | r | s | t | u | v | w | x | y | z | {
|
| | }
|
~ | DEL |
8_ | € 20AC |
‚ 201A |
ƒ 0192 |
„ 201E |
… 2026 |
† 2020 |
‡ 2021 |
ˆ 02C6 |
‰ 2030 |
Š 0160 |
‹ 2039 |
Œ 0152 |
Ž 017D |
|||
9_ | ‘ 2018 |
’ 2019 |
“ 201C |
” 201D |
• 2022 |
– 2013 |
— 2014 |
˜ 02DC |
™ 2122 |
š 0161 |
› 203A |
œ 0153 |
ž 017E |
Ÿ 0178 | ||
A_ | NBSP | ¡
|
¢ | £ | ¤ | ¥
|
¦
|
§ | ¨ | © | ª
|
« | ¬ | SHY | ®
|
¯ |
B_ | °
|
± | ²
|
³
|
´ | µ
|
¶ | · | ¸ | ¹
|
º
|
»
|
¼ | ½ | ¾ | ¿
|
C_ | À | Á | Â | Ã | Ä | Å | Æ | Ç | È | É | Ê | Ë | Ì | Í | Î | Ï |
D_ | Ð | Ñ | Ò | Ó | Ô
|
Õ | Ö | × | Ø | Ù
|
Ú | Û | Ü | Ý | Þ | ß |
E_ | à | á | â | ã | ä | å | æ | ç | è | é | ê | ë | ì | í | î | ï |
F_ | ð | ñ | ò | ó | ô
|
õ | ö | ÷ | ø | ù
|
ú | û | ü | ý | þ | ÿ |
According to the information on Microsoft's and the Unicode Consortium's websites, positions 81, 8D, 8F, 90, and 9D are unused; however, the Windows API MultiByteToWideChar
maps these to the corresponding C1 control codes. The "best fit" mapping documents this behavior, too.[18]
History
- The first version[when?] of the codepage 1252 used in Microsoft Windows 1.0 did not have positions D7 and F7 defined. All the characters in the ranges 80–9F were undefined too.
- The second version, used in Microsoft Windows 2.0, positions D7, F7, 91, and 92 had been defined.
- The third version, used since Microsoft Windows 3.1, had all the present-day positions defined, except euro sign and Z with caron character pair.
- The final version listed above debuted in Microsoft Windows 98 and was ported to older versions of Windows with the euro symbol update.
OS/2 extensions
The
0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | A | B | C | D | E | F | |
0_ | NUL | SOH
|
STX
|
ETX
|
ˉ 02C9 |
˘ 02D8 |
˙ 02D9 |
BEL | ˚ 02DA |
HT
|
˝ 02DD |
˛ 02DB |
ˇ 02C7 |
CR | SO
|
SI
|
MSDOS extensions [rare]
There is a rarely used, but useful, graphics extended code page 1252 where codes 0x00 to 0x1f allow for box drawing as used in applications such as MSDOS Edit and Codeview. One of the applications to use this code page was an Intel Corporation Install/Recovery disk image utility from mid/late 1995. These programs were written for its P6 User Test Program machines (US example[29]). It was used exclusively in its then EMEA region (Europe, Middle East & Africa). In time the programs were changed to use code page 850.
0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | A | B | C | D | E | F | |
0_ | ○ | ■ | ↑ | ↓ | → | ← | ║ | ═ | ╔ | ╗ | ╚ | ╝ | ░ | ▒ | ► | ◄ |
1_ | │ | ─ | ┌ | ┐ | └ | ┘ | ├ | ┤ | ┴ | ┬ | ♦ | ┼ | █ | ▄ | ▀ | ▬ |
Palm OS variant
Each Palm OS device supports a single language and a single character encoding, depending on its locale.[30]
For languages such as English and French, Palm OS uses a custom character encoding based on Windows-1252. For Japanese, it instead uses a
Palm OS 3.1 introduced several changes to the character encoding to better align with Windows-1252:[31]
- The special Palm OS glyphs "shortcut stroke" (0x9D) and "command stroke" (0x9E) were copied to 0x16 and 0x17, to ensure they were in the range guaranteed to be consistent between locales.[31] Starting in Palm OS 3.3, 0x16 and 0x17 are the only code points for those characters,[32] leaving 0x9D and 0x9E undefined.[33]
- The
- The Euro sign was added at 0x80, replacing what was previously the numeric space.[32]
- The playing card suits were copied to the font Symbol 9,[31] although their original code points remain valid.[32][33]
The following is the variant of Windows-1252 used by Palm OS 3.3 onward for English and several other locales.[32] Python gives it the palmos
label, describing it as the encoding for Palm OS 3.5.[34][35] Differences from Windows-1252 have their Unicode code point.
0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | A | B | C | D | E | F | |
8_ | €[a] | ‚
|
ƒ | „
|
…[b] | †
|
‡
|
ˆ | ‰
|
Š | ‹
|
Œ | ♦ 2666 |
♣ 2663 |
♥ 2665 | |
9_ | ♠ 2660 |
‘
|
’
|
“
|
”
|
• | –
|
—
|
˜
|
™ | š | › | œ | [c] | [d] | Ÿ |
- ^ Prior to Palm OS 3.1, the character at code point 0x80 was U+2007 NUMERIC SPACE; starting in Palm OS 3.1, 0x80 is the Euro sign and 0x19 is U+2007 NUMERIC SPACE instead.[32]
- ^ Starting in Palm OS 3.1, this character is also duplicated at 0x18.[31][32]
- ^ Prior to Palm OS 3.3, this code point was the Palm OS-exclusive character "shortcut stroke"; starting in Palm OS 3.3, this code point is undefined.[31][32]
- ^ Prior to Palm OS 3.3, this code point was the Palm OS-exclusive character "command stroke"; starting in Palm OS 3.3, this code point is undefined.[31][32]
See also
- Latin script in Unicode
- Unicode
- Universal Coded Character Set
- UTF-8
- Western Latin character sets (computing)
- Windows-1250
- Windows code pages
- ISO/IEC JTC 1/SC 2
- Extended ASCII
References
- ^ Character Sets, Internet Assigned Numbers Authority (IANA), 2018-12-12
- ^ a b "Historical trends in the usage statistics of character encodings for websites, December 2023". w3techs.com. Retrieved 2023-12-01.
- ^ a b "Encoding". WHATWG. 27 January 2015. sec. 5.2 Names and labels. Archived from the original on 4 February 2015. Retrieved 4 February 2015.
- ^ "Frequenty Asked Questions". w3techs.com.
- ^ "Usage Survey of Character Encodings broken down by Ranking". w3techs.com. Retrieved 2023-12-01.
- ^ "Distribution of Character Encodings among websites that use Brazil". w3techs.com. Retrieved 2023-12-01.
- ^ "Distribution of Character Encodings among websites that use .de". w3techs.com. Retrieved 2023-12-01.
- ^ "Distribution of Character Encodings among websites that use German". w3techs.com. Retrieved 2023-01-16.
- ^ Texin, Tex. "Comparing Characters in Windows-1252, ISO-8859-1, ISO-8859-15". I18nQA.com.
- ^ van Emden, Eva (28 January 2011). "How to make typographers' quotes in HTML". vancouvereditor.com. Retrieved 7 January 2024.
If you use typographers' quotes without specifying the right character encoding for your HTML file, some of your viewers are going to see question marks, boxes, or other crazy symbols instead of the beautiful curly quotes you intended them to see.
- ^ "Smart quotes in Word". Microsoft Support. Microsoft. Retrieved 7 January 2024.
- ^ "NetWare Web Search: Understanding Character Set Encodings". Novell Documentation. Novell.
if a document does not contain a CHARSET encoding value, the default encoding for HTML documents is ISO-8859-1, also known as Latin1. The default encoding for plain text documents is US-ASCII.
- ^ Wissink, Cathy (5 April 2002). "Unicode and Windows XP" (PDF). Microsoft. p. 1. Archived from the original (PDF) on 4 February 2015. Retrieved 4 February 2015.
- ^ "Code page 1252 information document". IBM. 30 September 1997. Archived from the original on 2016-03-03.
- ^ "CCSID 1252 information document". IBM. Archived from the original on 2016-03-26.
- ^ "CCSID 5348 information document". IBM. Archived from the original on 2014-11-29.
- ^ "Database Client Installation Guide". Oracle. Retrieved 2021-02-14.
- ^ a b "Unicode mappings of Windows-1252 with 'Best Fit'". Unicode. Archived from the original on 4 February 2015. Retrieved 4 February 2015.
- ^ Code Page 01252 (PDF), IBM, 1998, archived (PDF) from the original on 27 October 2023
- ^ Code Page (CPGID) 01252 (txt), IBM, 1998, archived from the original on 8 April 2023
- ^ International Components for Unicode (ICU), ibm-1252_P100-2000.ucm, 2002-12-03
- ^ International Components for Unicode (ICU), ibm-5348_P100-1997.ucm, 2002-12-03
- ^ "Code page 1004 information document". Archived from the original on 2015-06-25.
- ^ "CCSID 1004 information document". Archived from the original on 2016-03-26.
- ^ "Code Page 01004" (PDF). IBM. Archived from the original (PDF) on 2015-07-08. (version based on Windows 3.1 version of Windows-1252)
- ^ Code Page CPGID 01004 (pdf) (PDF), IBM
- ^ Code Page CPGID 01004 (txt), IBM
- ^ Borgendale, Ken (2001). "Codepage 1004 - Windows Extended". OS/2 codepages by number. Archived from the original on 2018-05-13. Retrieved 2018-05-13. (version based on current version of Windows-1252)
- S2CID 15711051. Archived from the original(PDF) on 2019-05-03.
- ^ a b "Chapter 13: Localized Applications". Palm OS Programmer's Companion (PDF). Palm Computing Platform. March 16, 2000. p. 321.
- ^ a b c d e f g "Appendix B: Compatibility Guide". Palm OS SDK Reference (PDF). Palm Computing Platform. March 16, 2000. pp. 1181–1182.
- ^ a b c d e f g h i Walleij, Linus. "Palm Pilot Character Sets And Unicode Mappings". GNU Recode. Datorföreningen vid Lunds Universitet och Lunds Tekniska Högskola. Retrieved 10 October 2023.
- ^ a b c Parker, Greg. "Palm OS Built-in Fonts". Sealie Software. Retrieved 10 October 2023.
- ^ "codecs — Codec registry and base classes (§ Text Encodings)". The Python Standard Library—Python 3.9.4 Documentation. Python Software Foundation.
- ^ a b Mullender, Sjoerd (13 July 2002). "Python Character Mapping Codec for Palm OS 3.5". CPython source tree. Python Software Foundation. Retrieved 9 December 2021.
External links
- Microsoft's code charts for Windows-1252 ("Code Page 1252 Windows Latin 1 (ANSI)")
- Unicode mapping table and code page definition with best fit mappings for Windows-1252