Binary data
This article needs additional citations for verification. (April 2019) |
Binary data is
Binary data occurs in many different technical and scientific fields, where it can be called by different names including bit (binary digit) in computer science, truth value in mathematical logic and related domains and binary variable in statistics.
Mathematical and combinatoric foundations
A
A collection of n bits may have ... states can be ever superseded by allocating two, three, or four times more bits. So, the use of any other small number than 2 does not provide an advantage.
![](http://upload.wikimedia.org/wikipedia/commons/thumb/9/9b/Hypercubeorder_binary.svg/220px-Hypercubeorder_binary.svg.png)
Moreover, Boolean algebra provides a convenient mathematical structure for collection of bits, with a semantic of a collection of propositional variables. Boolean algebra operations are known as "bitwise operations" in computer science. Boolean functions are also well-studied theoretically and easily implementable, either with computer programs or by so-named logic gates in digital electronics. This contributes to the use of bits to represent different data, even those originally not binary.
In statistics
In
Often, binary data is used to represent one of two conceptually opposed values, e.g.:
- the outcome of an experiment ("success" or "failure")
- the response to a yes–no question ("yes" or "no")
- presence or absence of some feature ("is present" or "is not present")
- the truth or falsehood of a proposition ("true" or "false", "correct" or "incorrect")
However, it can also be used for data that is assumed to have only two possible values, even if they are not conceptually opposed or conceptually represent all possible values in the space. For example, binary data is often used to represent the party choices of voters in elections in the United States, i.e. Republican or Democratic. In this case, there is no inherent reason why only two political parties should exist, and indeed, other parties do exist in the U.S., but they are so minor that they are generally simply ignored. Modeling continuous data (or categorical data of more than 2 categories) as a binary variable for analysis purposes is called dichotomization (creating a dichotomy). Like all discretization, it involves discretization error, but the goal is to learn something valuable despite the error: treating it as negligible for the purpose at hand, but remembering that it cannot be assumed to be negligible in general.
Binary variables
A binary variable is a random variable of binary type, meaning with two possible values. Independent and identically distributed (i.i.d.) binary variables follow a Bernoulli distribution, but in general binary data need not come from i.i.d. variables. Total counts of i.i.d. binary variables (equivalently, sums of i.i.d. binary variables coded as 1 or 0) follow a binomial distribution, but when binary variables are not i.i.d., the distribution need not be binomial.
Counting
Like categorical data, binary data can be converted to a
Since there are only two possible values, this can be simplified to a single count (a scalar value) by considering one value as "success" and the other as "failure", coding a value of the success as 1 and of the failure as 0 (using only the coordinate for the "success" value, not the coordinate for the "failure" value). For example, if the value A is considered "success" (and thus B is considered "failure"), the data set A, A, B would be represented as 1, 1, 0. When this is grouped, the values are added, while the number of trial is generally tracked implicitly. For example, A, A, B would be grouped as 1 + 1 + 0 = 2 successes (out of trials). Going the other way, count data with is binary data, with the two classes being 0 (failure) or 1 (success).
Counts of i.i.d. binary variables follow a binomial distribution, with the total number of trials (points in the grouped data).
Regression
Similarly, counts of i.i.d. categorical variables with more than two categories can be modeled with a
In computer science
![](http://upload.wikimedia.org/wikipedia/commons/d/d7/Commons_QR_code.png)
In modern
In applied
1 and 0 are nothing but just two different voltage levels. You can make the computer understand 1 for higher voltage and 0 for lower voltage. There are many different ways to store two voltage levels. If you have seen floppy, then you will find a magnetic tape that has a coating of ferromagnetic material, this is a type of paramagnetic material that has domains aligned in a particular direction to give a remnant magnetic field even after removal of currents through materials or magnetic field. During loading of data in the magnetic tape, the magnetic field is passed in one direction to call the saved orientation of the domain 1 and for the magnetic field is passed in another direction, then the saved orientation of the domain is 0. In this way, generally, 1 and 0 data are stored.[3]
See also
- Bit array
- Binary protocol
- Bernoulli distribution
- Boolean data type
- Computer memory
- Categorical data
- Qualitative data
References
- ^ a b Collett 2002, p. 1.
- ISBN 978-0470463635.
- ^ Gul, Najam (2022-08-18). "How do different types of Data get stored in form of 0 and 1?". Curiosity Tea. Retrieved 2023-01-05.
- Collett, David (2002). Modelling Binary Data (Second ed.). CRC Press. ISBN 9781420057386.