Japanese language and computers
In relation to the Japanese language and computers many adaptation issues arise, some unique to Japanese and others common to languages which have a very large number of characters. The number of characters needed in order to write in English is quite small, and thus it is possible to use only one byte (28=256 possible values) to encode each English character. However, the number of characters in Japanese is many more than 256 and thus cannot be encoded using a single byte - Japanese is thus encoded using two or more bytes, in a so-called "double byte" or "multi-byte" encoding. Problems that arise relate to transliteration and romanization, character encoding, and input of Japanese text.
Character encodings
There are several standard methods to
Until 2000s, most Japanese
The first encoding to become widely used was
The development of kanji encodings was the beginning of the split. Shift JIS supports kanji and was developed to be completely backward compatible with JIS X 0201, and thus is in much embedded electronic equipment. However, Shift JIS has the unfortunate property that it often breaks any parser (software that reads the coded text) that is not specifically designed to handle it.
For example, some Shift-JIS characters include a backslash (0x5C "\") in the second byte, which is used as an escape character in many programming languages.
構 | わ | な | い | ||||
---|---|---|---|---|---|---|---|
8d | 5c | 82 | ed | 82 | c8 | 82 | a2 |
A parser lacking support for Shift JIS will recognize 0x5C 0x82 as an invalid escape sequence, and remove it.[3] Therefore, the phrase cause mojibake.
高 | 墲 | ネ | い | ||||
---|---|---|---|---|---|---|---|
8d | 82 | ed | 82 | c8 | 82 | a2 |
This can happen for example in the C programming language, when having Shift-JIS in text strings. It does not happen in HTML since ASCII 0x00–0x3F (which includes ", %, & and some other used escape characters and string separators) do not appear as second byte in Shift-JIS, and backslash is not an escape characters there. But it can happen for JavaScript which can be embedded in HTML pages.
In
Unicode was intended to solve all encoding problems over all languages. The UTF-8 encoding used to encode Unicode in web pages does not have the disadvantages that Shift-JIS has. Unicode is supported by international software, and it eliminates the need for gaiji. There are still controversies, however. For Japanese, the kanji characters have been unified with Chinese; that is, a character considered to be the same in both Japanese and Chinese is given a single number, even if the appearance is actually somewhat different, with the precise appearance left to the use of a locale-appropriate font. This process, called Han unification, has caused controversy.[citation needed] The previous encodings in Japan, Taiwan Area, Mainland China and Korea have only handled one language and Unicode should handle all. The handling of Kanji/Chinese have however been designed by a committee composed of representatives from all four countries/areas.[citation needed]
Text input
Written Japanese uses several different scripts:
There are two main systems for the
Direction of text
Japanese can be written in two directions. Yokogaki style writes left-to-right, top-to-bottom, as with English. Tategaki style writes first top-to-bottom, and then moves right-to-left.
To compete with Ichitaro, Microsoft provided several updates for early Japanese versions of Microsoft Word including support for downward text, such as Word 5.0 Power Up Kit and Word 98.[5][6]
QuarkXPress was the most popular DTP software in Japan in 1990s, even it had a long development cycle. However, due to lacking support for downward text, it was surpassed by Adobe InDesign which had strong support for downward text through several updates.[7][8]
At present,[writing-mode
" which can render tategaki when given the value "vertical-rl
" (i.e. top to bottom, right to left). Word processors and DTP
See also
- Japanese writing system
- Japanese language
- Chinese input methods for computers
- CJK characters
- Korean language and computers
- Vietnamese language and computers
References
- ^ "【やじうまWatch】 ウェブサイトにおける文字コードの割合、UTF-8が90%超え。Shift_JISやEUC-JPは? - INTERNET Watch". INTERNET Watch. 2017-10-17. Retrieved 2019-05-11.
- ^ "文字コードについて". ASH Corporation. 2002. Retrieved 2019-05-14.
- ^ "Shift_JIS文字を含むソースコードをgccでコンパイル後、警告メッセージが表示される". Novell. 2006-02-10. Retrieved 2019-05-14.
- ^ 兵ちゃん (2016-02-18). "住基ネット統一文字コードによる外字の統一について". Archived from the original on 2020-08-02. Retrieved 2019-05-14.
- ^ "ASCII EXPRESS : マイクロソフトが「Access」と「Word 5.0 Power Up Kit」を発売". ASCII. 18 (1). 1994.
- ^ "Microsoft Office 97 Powered by Word 98 製品情報". Microsoft. 2001-08-01. Archived from the original on 2001-08-01. Retrieved 2019-05-14.
- ^ エディット-U. "DTPって何よ(4) [編集って何よ]". Retrieved 2019-05-14.
- ^ "アンチQuarkユーザーが気になるQuarkXPress 8の機能トップ10(3) 縦書きの組版が面倒だったけどどうなのよ?". MyNavi News. 2008-07-04. Retrieved 2019-05-14.