Data
Part of a series on |
Epistemology |
---|
In common
Data is
Data can be seen as the smallest units of factual information that can be used as a basis for calculation, reasoning, or discussion. Data can range from abstract ideas to concrete measurements, including, but not limited to, statistics. Thematically connected data presented in some relevant context can be viewed as information. Contextually connected pieces of information can then be described as data insights or intelligence. The stock of insights and intelligence that accumulates over time resulting from the synthesis of data into information, can then be described as knowledge. Data has been described as "the new oil of the digital economy".[4][5] Data, as a general concept, refers to the fact that some existing information or knowledge is represented or coded in some form suitable for better usage or processing.
Advances in computing technologies have led to the advent of big data, which usually refers to very large quantities of data, usually at the petabyte scale. Using traditional data analysis methods and computing, working with such large (and growing) datasets is difficult, even impossible. (Theoretically speaking, infinite data would yield infinite information, which would render extracting insights or intelligence impossible.) In response, the relatively new field of data science uses machine learning (and other artificial intelligence (AI)) methods that allow for efficient applications of analytic methods to big data.
Etymology and terminology
The Latin word data is the plural of datum, "(thing) given", neuter past participle of dare, "to give".[6] The first English use of the word "data" is from the 1640s. The word "data" was first used to mean "transmissible and storable computer information" in 1946. The expression "data processing" was first used in 1954.[6]
When "data" is used more generally as a synonym for "information", it is treated as a
Meaning
Data,
Knowledge is the awareness of its environment that some entity possesses, whereas data merely communicate that knowledge. For example, the entry in a database specifying the height of Mount Everest is a datum that communicates a precisely-measured value. This measurement may be included in a book along with other data on Mount Everest to describe the mountain in a manner useful for those who wish to decide on the best method to climb it. An awareness of the characteristics represented by this data is knowledge.
Data is often assumed to be the least abstract concept, information the next least, and knowledge the most abstract.[9] In this view, data becomes information by interpretation; e.g., the height of Mount Everest is generally considered "data", a book on Mount Everest geological characteristics may be considered "information", and a climber's guidebook containing practical information on the best way to reach Mount Everest's peak may be considered "knowledge". "Information" bears a diversity of meanings that ranges from everyday usage to technical use. This view, however, has also been argued to reverse how data emerges from information, and information from knowledge.[10] Generally speaking, the concept of information is closely related to notions of constraint, communication, control, data, form, instruction, knowledge, meaning, mental stimulus, pattern, perception, and representation. Beynon-Davies uses the concept of a sign to differentiate between data and information; data is a series of symbols, while information occurs when the symbols are used to refer to something.[11][12]
Before the development of computing devices and machines, people had to manually collect data and impose patterns on it. Since the development of computing devices and machines, these devices can also collect data. In the 2010s, computers are widely used in many fields to collect data and sort or process it, in disciplines ranging from
Mechanical computing devices are classified according to how they represent data. An analog computer represents a datum as a voltage, distance, position, or other physical quantity. A digital computer represents a piece of data as a sequence of symbols drawn from a fixed alphabet. The most common digital computers use a binary alphabet, that is, an alphabet of two characters typically denoted "0" and "1". More familiar representations, such as numbers or letters, are then constructed from the binary alphabet. Some special forms of data are distinguished. A computer program is a collection of data, which can be interpreted as instructions. Most computer languages make a distinction between programs and the other data on which programs operate, but in some languages, notably Lisp and similar languages, programs are essentially indistinguishable from other data. It is also useful to distinguish metadata, that is, a description of other data. A similar yet earlier term for metadata is "ancillary data." The prototypical example of metadata is the library catalog, which is a description of the contents of books.
Data documents
Part of a series on |
Library and information science |
---|
Whenever data needs to be registered, data exists in the form of a data document. Kinds of data documents include:
Some of these data documents (data repositories, data studies, data sets, and software) are indexed in
Data collection
Gathering data can be accomplished through a primary source (the researcher is the first person to obtain the data) or a secondary source (the researcher obtains the data that has already been collected by other sources, such as data disseminated in a scientific journal). Data analysis methodologies vary and include data triangulation and data percolation.[14] The latter offers an articulate method of collecting, classifying, and analyzing data using five possible angles of analysis (at least three) to maximize the research's objectivity and permit an understanding of the phenomena under investigation as complete as possible: qualitative and quantitative methods, literature reviews (including scholarly articles), interviews with experts, and computer simulation. The data is thereafter "percolated" using a series of pre-determined steps so as to extract the most relevant information.
Data longevity and accessibility
An important field in
Data accessibility. Another problem is that much scientific data is never published or deposited in data repositories such as databases. In a recent survey, data was requested from 516 studies that were published between 2 and 22 years earlier, but less than one out of five of these studies were able or willing to provide the requested data. Overall, the likelihood of retrieving data dropped by 17% each year after publication.[15] Similarly, a survey of 100 datasets in Dryad found that more than half lacked the details to reproduce the research results from these studies.[16] This shows the dire situation of access to scientific data that is not published or does not have enough details to be reproduced.
A solution to the problem of reproducibility is the attempt to require FAIR data, that is, data that is Findable, Accessible, Interoperable, and Reusable. Data that fulfills these requirements can be used in subsequent research and thus advances science and technology.[17]
In other fields
Although data is also increasingly used in other fields, it has been suggested that the highly interpretive nature of them might be at odds with the ethos of data as "given". Peter Checkland introduced the term capta (from the Latin capere, "to take") to distinguish between an immense number of possible data and a sub-set of them, to which attention is oriented.[18] Johanna Drucker has argued that since the humanities affirm knowledge production as "situated, partial, and constitutive," using data may introduce assumptions that are counterproductive, for example that phenomena are discrete or are observer-independent.[19] The term capta, which emphasizes the act of observation as constitutive, is offered as an alternative to data for visual representations in the humanities.
The term data-driven is a neologism applied to an activity which is primarily compelled by data over all other factors.[
See also
- Biological data
- Computer data processing
- Computer memory
- Dark data
- Data (computer science)
- Data acquisition
- Data analysis
- Data bank
- Data cable
- Data curation
- Data domain
- Data element
- Data farming
- Data governance
- Data integrity
- Data maintenance
- Data management
- Data mining
- Data modeling
- Data point
- Data preservation
- Data protection
- Data publication
- Data remanence
- Data science
- Data set
- Data structure
- Data visualization
- Data warehouse
- Database
- Datasheet
- Data-driven programming
- Data-driven journalism
- Data-driven testing
- Data-driven learning
- Data-driven science
- Data-driven control system
- Data-driven marketing
- Digital privacy
- Environmental data rescue
- Fieldwork
- Information engineering
- Machine learning
- Open data
- Scientific data archiving
- Secondary Data
- Statistics
References
- ISBN 978-92-64-025561.
- ^ "Statistical Language - What are Data?". Australian Bureau of Statistics. 2013-07-13. Archived from the original on 2019-04-19. Retrieved 2020-03-09.
- ^ "Data vs Information - Difference and Comparison | Diffen". www.diffen.com. Retrieved 2018-12-11.
- ^ Yonego, Joris Toonders (July 23, 2014). "Data Is the New Oil of the Digital Economy". Wired – via www.wired.com.
- ^ "Data is the new oil". July 16, 2018. Archived from the original on 2018-07-16.
- ^ a b "data | Origin and meaning of data by Online Etymology Dictionary". www.etymonline.com.
- ISBN 9781433832161.
- ^ "Joint Publication 2-0, Joint Intelligence" (PDF). Joint Chiefs of Staff, Joint Doctrine Publications. Department of Defense. 23 October 2013. pp. I-1. Archived from the original (PDF) on 18 July 2018. Retrieved July 17, 2018.
- ^ Akash Mitra (2011). "Classifying data for successful modeling". Archived from the original on 2017-11-07. Retrieved 2017-11-05.
- .
- ISBN 0-333-96390-3.
- ISBN 978-0-230-20368-6.
- ^ Sharon Daniel. The Database: An Aesthetics of Dignity.
- ISBN 978-3-319-15752-8
- S2CID 7799662.
- PMID 26556502.
- S2CID 247954952.
- ISBN 0-471-95820-4.
- ^ Johanna Drucker (2011). "Humanities Approaches to Graphical Display". Digital Humanities Quarterly. 005 (1).
External links
- Data is a singular noun (a detailed assessment)