Data conversion
This article needs additional citations for verification. (November 2023) |
Data transformation |
---|
Concepts |
Transformation languages |
|
Techniques and transforms |
|
Applications |
Related |
Data conversion is the conversion of
There are many ways in which data is converted within the computer environment. This may be seamless, as in the case of upgrading to a newer version of a computer program. Alternatively, the conversion may require processing by the use of a special conversion program, or it may involve a complex process of going through intermediary stages, or involving complex "exporting" and "importing" procedures, which may include converting to and from a tab-delimited or comma-separated text file. In some cases, a program may recognize several data file formats at the data input stage and then is also capable of storing the output data in several different formats. Such a program may be used to convert a file format. If the source format or target format is not recognized, then at times a third program may be available which permits the conversion to an intermediate format, which can then be reformatted using the first program. There are many possible scenarios.
Information basics
Before any data conversion is carried out, the user or application programmer should keep a few basics of computing and information theory in mind. These include:
- Information can easily be discarded by the computer, but adding information takes effort.
- The computer can add information only in a rule-based fashion. [citation needed]
- Upsampling the data or converting to a more feature-richformat does not add information; it merely makes room for that addition, which usually a human must do.
- Data stored in an electronic format can be quickly modified and analyzed.
For example, a
Automatic restoration of information that was lost through a lossy compression process would probably require important advances in artificial intelligence.
Because of these realities of computing and information theory, data conversion is often a complex and error-prone process that requires the help of experts.
Pivotal conversion
Data conversion can occur directly from one format to another, but many applications that convert between multiple formats use an intermediate representation by way of which any source format is converted to its target.[1] For example, it is possible to convert Cyrillic text from KOI8-R to Windows-1251 using a lookup table between the two encodings, but the modern approach is to convert the KOI8-R file to Unicode first and from that to Windows-1251. This is a more manageable approach; rather than needing lookup tables for all possible pairs of character encodings, an application needs only one lookup table for each character set, which it uses to convert to and from Unicode, thereby scaling the number of tables down from hundreds to a few tens.[citation needed]
Pivotal conversion is similarly used in other areas. Office applications, when employed to convert between office file formats, use their internal, default file format as a pivot. For example, a
Lost and inexact data conversion
The objective of data conversion is to maintain all of the data, and as much of the embedded information as possible. This can only be done if the target format supports the same features and data structures present in the source file. Conversion of a word processing document to a plain text file necessarily involves loss of formatting information, because plain text format does not support word processing constructs such as marking a word as boldface. For this reason, conversion from one format to another which does not support a feature that is important to the user is rarely carried out, though it may be necessary for interoperability, e.g. converting a file from one version of Microsoft Word to an earlier version to enable transfer and use by other users who do not have the same later version of Word installed on their computer.
Loss of information can be mitigated by approximation in the target format. There is no way of converting a character like ä to ASCII, since the ASCII standard lacks it, but the information may be retained by approximating the character as ae. Of course, this is not an optimal solution, and can impact operations like searching and copying; and if a language makes a distinction between ä and ae, then that approximation does involve loss of information.
Data conversion can also suffer from inexactitude, the result of converting between formats that are conceptually different. The
Open vs. secret specifications
Successful data conversion requires thorough knowledge of the workings of both source and target formats. In the case where the specification of a format is unknown, reverse engineering will be needed to carry out conversion. Reverse engineering can achieve close approximation of the original specifications, but errors and missing features can still result.
Electronics
Data format conversion can also occur at the physical layer of an electronic communication system. Conversion between line codes such as NRZ and RZ can be accomplished when necessary.
See also
- Character encoding
- Comparison of programming languages (basic instructions)#Data conversions
- Data migration
- Data transformation
- Data wrangling
- Transcoding
- Distributed Data Management Architecture (DDM)
- Code conversion (computing)
- Source-to-source translation
- Presentation layer
References
- ISBN 978-0-321-32194-7.
Manolescu, FirstName (2006). Pattern Languages of Program Design 5. Upper Saddle River, NJ: Addison-Wesley.