Data reduction

Data reduction is the transformation of numerical or alphabetical

digital information derived empirically or experimentally into a corrected, ordered, and simplified form. The purpose of data reduction can be two-fold: reduce the number of data records by eliminating invalid data or produce summary data and statistics at different aggregation levels for various applications.^[1] Data reduction does not necessarily mean loss of information. For example, the body mass index

reduces two dimensions (body and mass) into a single measure, without any information being lost in the process.

When information is derived from instrument readings there may also be a transformation from

measurement errors

. Some idea of the nature of these errors is needed before the most likely value may be determined.

An example in

kB/s

. The on-board data reduction encompasses co-adding the raw frames for thirty minutes, reducing the bandwidth by a factor of 300. Furthermore, interesting targets are pre-selected and only the relevant pixels are processed, which is 6% of the total. This reduced data is then sent to Earth where it is processed further.

Research has also been carried out on the use of data reduction in wearable (wireless) devices for health monitoring and diagnosis applications. For example, in the context of epilepsy diagnosis, data reduction has been used to increase the battery lifetime of a wearable EEG device by selecting and only transmitting EEG data that is relevant for diagnosis and discarding background activity.^[2]

Types of Data Reduction

Dimensionality Reduction

When dimensionality increases, data becomes increasingly sparse while density and distance between points, critical to clustering and outlier analysis, becomes less meaningful. Dimensionality reduction helps reduce noise in the data and allows for easier visualization, such as the example below where 3-dimensional data is transformed into 2 dimensions to show hidden parts. One method of dimensionality reduction is wavelet transform, in which data is transformed to preserve relative distance between objects at different levels of resolution, and is often used for image compression.^[3]

Numerosity Reduction

This method of data reduction reduces the data volume by choosing alternate, smaller forms of data representation. Numerosity reduction can be split into 2 groups: parametric and non-parametric methods. Parametric methods (regression, for example) assume the data fits some model, estimate model parameters, store only the parameters, and discard the data. One example of this is in the image below, where the volume of data to be processed is reduced based on more specific criteria. Another example would be a log-linear model, obtaining a value at a point in m-D space as the product on appropriate marginal subspaces. Non-parametric methods do not assume models, some examples being histograms, clustering, sampling, etc.^[4]

Statistical modelling

Data reduction can be obtained by assuming a statistical model for the data. Classical principles of data reduction include sufficiency, likelihood, conditionality and equivariance.^[5]

References

^ "Travel Time Data Collection Handbook" (PDF). Retrieved 6 December 2020.
S2CID 24852887
.

^ Han, J.; Kamber, M.; Pei, J. (2011). "Data Mining: Concepts and Techniques (3rd ed.)" (PDF). Retrieved 6 December 2020.

^ Han, J.; Kamber, M.; Pei, J. (2011). "Data Mining: Concepts and Techniques (3rd ed.)" (PDF). Retrieved 6 December 2020.

OCLC 46538638
.

Further reading

ISBN 0-471-10134-6
.

v
t
e
Data

Acquisition

Augmentation

Analysis

Archaeology

Big

Cleansing

Collection

Compression

Corruption

Curation

Degradation

Editing

ETL/ELT
Extract

Transform

Load

Farming

Format management

Fusion

Integration

Integrity

Library

Lineage

Loss

Management

Migration

Mining

Philanthropy

Pre-processing

Preservation

Processing

Protection (privacy)

Publishing

Recovery

Reduction

Retention

Quality

Science

Scraping

Scrubbing

Security

Stewardship

Storage

Synchronization

Validation

Warehouse

Wrangling/munging

Retrieved from "https://en.wikipedia.org/w/index.php?title=Data_reduction&oldid=1212744290"

[1] "Travel Time Data Collection Handbook" (PDF). Retrieved 6 December 2020.

[2] S2CID 24852887
.

[3] Han, J.; Kamber, M.; Pei, J. (2011). "Data Mining: Concepts and Techniques (3rd ed.)" (PDF). Retrieved 6 December 2020.

[4] Han, J.; Kamber, M.; Pei, J. (2011). "Data Mining: Concepts and Techniques (3rd ed.)" (PDF). Retrieved 6 December 2020.

[5] OCLC 46538638
.

[1]

[2]

[3]

[4]

[5]

Types of Data Reduction

Dimensionality Reduction

Numerosity Reduction

Statistical modelling

See also

References

Further reading