Noisy data
This article has multiple issues. Please help improve it or discuss these issues on the talk page. (Learn how and when to remove these template messages)
|
Noisy data are data that is corrupted, distorted, or has a low signal-to-noise ratio. Improper procedures (or improperly-documented procedures) to subtract out the noise in data can lead to a false sense of accuracy or false conclusions.
Noisy data are data with a large amount of additional meaningless information in it called noise.[1] This includes data corruption and the term is often used as a synonym for corrupt data.[1] It also includes any data that a user system cannot understand and interpret correctly. Many systems, for example, cannot use unstructured text. Noisy data can adversely affect the results of any data analysis and skew conclusions if not handled properly. Statistical analysis is sometimes used to weed the noise out of noisy data.[1]
Sources of noise
Differences in real-world measured data from the true values come about from by multiple factors affecting the measurement.[2]
Improper filtering can add noise if the filtered signal is treated as if it were a directly measured signal. As an example, Convolution-type digital filters such a moving average can have side effects such as lags or truncation of peaks. Differentiating digital filters amplifies random noise in the original data.
Fraud: Individuals may deliberately skew data to influence the results toward a desired conclusion. Data that looks good with few outliers reflects well on the individual collecting it, and so there may be incentive to remove more data as outliers or make the data look smoother than it is.
References
- ^ a b c "What is noisy data? - Definition from WhatIs.com".
- ^ "Noisy Data in Data Mining - Soft Computing and Intelligent Information Systems". sci2s.ugr.es.
- ^ R.Y. Wang, V.C. Storey, C.P. Firth, A Framework for Analysis of Data Quality Research, IEEE Transactions on Knowledge and Data Engineering 7 (1995) 623-640 doi: 10.1109/69.404034)
- ^ X. Zhu, X. Wu, Class Noise vs. Attribute Noise: A Quantitative Study, Artificial Intelligence Review 22 (2004) 177-210 doi: 10.1007/s10462-004-0751-8