Line wrap and word wrap
![]() | This article's use of external links may not follow Wikipedia's policies or guidelines. (March 2015) |
Line breaking, also known as word wrapping, is breaking a section of text into lines so that it will fit into the available width of a page, window or other display area. In text display, line wrap is continuing on a new line when a line is full, so that each line fits into the viewable window, allowing text to be read from top to bottom without any horizontal
Soft and hard returns
A soft return or soft wrap is the break resulting from line wrap or word wrap (whether automatic or manual), whereas a hard return or hard wrap is an intentional break, creating a new paragraph. With a hard return, paragraph-break formatting can (and should) be applied (either indenting or vertical whitespace). Soft wrapping allows line lengths to adjust automatically with adjustments to the width of the user's window or margin settings, and is a standard feature of all modern text editors, word processors, and email clients. Manual soft breaks are unnecessary when word wrap is done automatically, so hitting the "Enter" key usually produces a hard return.
Alternatively, "soft return" can mean an intentional, stored line break that is not a paragraph break. For example, it is common to print postal addresses in a multiple-line format, but the several lines are understood to be a single paragraph. Line breaks are needed to divide the words of the address into lines of the appropriate length.
In the contemporary
In text-oriented markup languages, a soft return is typically offered as a markup tag. For example, in HTML there is a <br> tag that has the same purpose as the soft return in word processors described above.
Unicode
The Unicode Line Breaking Algorithm determines a set of positions, known as break opportunities, that are appropriate places in which to begin a new line. The actual line break positions are picked from among the break opportunities by the higher level software that calls the algorithm, not by the algorithm itself, because only the higher level software knows about the width of the display the text is displayed on and the width of the glyphs that make up the displayed text.[1]
The Unicode character set provides a line separator character as well as a paragraph separator to represent the semantics of the soft return and hard return.
- 0x2028 LINE SEPARATOR
- * may be used to represent this semantic unambiguously
- 0x2029 PARAGRAPH SEPARATOR
- * may be used to represent this semantic unambiguously
Word boundaries, hyphenation, and hard spaces
The soft returns are usually placed after the ends of complete words, or after the punctuation that follows complete words. However, word wrap may also occur following a
A word without hyphens can be made wrappable by having soft hyphens in it. When the word isn't wrapped (i.e., isn't broken across lines), the soft hyphen isn't visible. But if the word is wrapped across lines, this is done at the soft hyphen, at which point it is shown as a visible hyphen on the top line where the word is broken. (In the rare case of a word that is meant to be wrappable by breaking it across lines but without making a hyphen ever appear, a zero-width space is put at the permitted breaking point(s) in the word.)
Sometimes word wrap is undesirable between adjacent words. In such cases, word wrap can usually be blocked by using a hard space or non-breaking space between the words, instead of regular spaces.
Word wrapping in text containing Chinese, Japanese, and Korean
In
Under certain circumstances, however, word wrapping is not desired. For instance,
- word wrapping might not be desired within personal names, and
- word wrapping might not be desired within any compound words (when the text is flush left but only in some styles).
Most existing word processors and typesetting software cannot handle either of the above scenarios.
Algorithm
Word wrapping is an optimization problem. Depending on what needs to be optimized for, different algorithms are used.
Minimum number of lines
A simple way to do word wrapping is to use a
SpaceLeft := LineWidth for each Word in Text if (Width(Word) + SpaceWidth) > SpaceLeft insert line break before Word in Text SpaceLeft := LineWidth - Width(Word) else SpaceLeft := SpaceLeft - (Width(Word) + SpaceWidth)
Where LineWidth
is the width of a line, SpaceLeft
is the remaining width of space on the line to fill, SpaceWidth
is the width of a single space character, Text
is the input text to iterate over and Word
is a word in this text.
Minimum raggedness
A different algorithm, used in TeX, minimizes the sum of the squares of the lengths of the spaces at the end of lines to produce a more aesthetically pleasing result than the greedy algorithm, which does not always minimize squared space.
History
A primitive line-breaking feature was used in 1955 in a "page printer control unit" developed by Western Union. This system used relays rather than programmable digital computers, and therefore needed a simple algorithm that could be implemented without data buffers. In the Western Union system, each line was broken at the first space character to appear after the 58th character, or at the 70th character if no space character was found.[3]
The greedy algorithm for line-breaking predates the dynamic programming method outlined by Donald Knuth in an unpublished 1977 memo describing his TeX typesetting system[4] and later published in more detail by Knuth & Plass (1981).[5]
See also
References
- ^ Heninger, Andy, ed. (2013-01-25). "Unicode Line Breaking Algorithm" (PDF). Technical Reports. Annex #14 (Proposed Update Unicode Standard): 2. Retrieved 10 March 2015.
WORD JOINER should be used if the intent is to merely prevent a line break
- ISBN 9781565922242.
- ^ Harris, Robert W. (January 1956), "Keyboard standardization", Western Union Technical Review, 10 (1): 37–42, archived from the original on 2015-08-03, retrieved 2013-04-07.
- ISBN 1-57586-010-4.
- S2CID 206508107
External links
Knuth's algorithm
- "Knuth & Plass line-breaking Revisited"
- "tex_wrap": "Implements TeX's algorithm for breaking paragraphs into lines." Reference: "Breaking Paragraphs into Lines", D.E. Knuth and M.F. Plass, chapter 3 of _Digital Typography_, CSLI Lecture Notes #78.
- Text::Reflow - Perl module for reflowing text files using Knuth's paragraphing algorithm. "The reflow algorithm tries to keep the lines the same length but also tries to break at punctuation, and avoid breaking within a proper name or after certain connectives ("a", "the", etc.). The result is a file with a more "ragged" right margin than is produced by fmt or Text::Wrap but it is easier to read since fewer phrases are broken across line breaks."
- adjusting the Knuth algorithm to recognize the "soft hyphen".
- Knuth's breaking algorithm. "The detailed description of the model and the algorithm can be found on the paper "Breaking Paragraphs into Lines" by Donald E. Knuth, published in the book "Digital Typography" (Stanford, California: Center for the Study of Language and Information, 1999), (CSLI Lecture Notes, no. 78.)"; part of Google Summer Of Code 2006
- "Bridging the Algorithm Gap: A Linear-time Functional Program for Paragraph Formatting" by Oege de Moor, Jeremy Gibbons, 1997
Other word-wrap links
- the reverse problem -- picking columns just wide enough to fit (wrapped) text (Archived version)
- "Knuth linebreaking elements for Formatting Objects" by Simon Pepping 2006. Extends the Knuth model to handle a few enhancements.
- "a Knuth–Plass-like linebreaking algorithm ... The *really* interesting thing is how Adobe's algorithm differs from the Knuth–Plass algorithm. It must differ, since Adobe has managed to patent its algorithm (6,510,441)."[1]
- "Line breaking" compares the algorithms of various time complexities.