Array (data structure)
This article needs additional citations for verification. (September 2008) |
In
For example, an array of ten
0x7D0
, 0x7D4
, 0x7D8
, ..., 0x7F4
) so that the element with index i has the address 2000 + (i × 4).[4]Because the mathematical concept of a matrix can be represented as a two-dimensional grid, two-dimensional arrays are also sometimes called "matrices". In some cases the term "vector" is used in computing to refer to an array, although tuples rather than vectors are the more mathematically correct equivalent. Tables are often implemented in the form of arrays, especially lookup tables; the word "table" is sometimes used as a synonym of array.
Arrays are among the oldest and most important data structures, and are used by almost every program. They are also used to implement many other data structures, such as
Arrays are useful mostly because the element indices can be computed at
The term "array" may also refer to an
The term is also used, especially in the description of algorithms, to mean associative array or "abstract array", a theoretical computer science model (an abstract data type or ADT) intended to capture the essential properties of arrays.
History
The first digital computers used machine-language programming to set up and access array structures for data tables, vector and matrix computations, and for many other purposes.
Assembly languages generally have no special support for arrays, other than what the machine itself provides. The earliest high-level programming languages, including FORTRAN (1957), Lisp (1958), COBOL (1960), and ALGOL 60 (1960), had support for multi-dimensional arrays, and so has C (1972). In C++ (1983), class templates exist for multi-dimensional arrays whose dimension is fixed at runtime[3][5] as well as for runtime-flexible arrays.[2]
Applications
Arrays are used to implement mathematical vectors and matrices, as well as other kinds of rectangular tables. Many databases, small and large, consist of (or include) one-dimensional arrays whose elements are records.
Arrays are used to implement other data structures, such as lists,
One or more large arrays are sometimes used to emulate in-program
Arrays can be used to determine partial or complete
statements) that direct the path of the execution.Element identifier and addressing formulas
When data objects are stored in an array, individual objects are selected by an index that is usually a non-negative
There are three ways in which the elements of an array can be indexed:
- 0 (zero-based indexing)
- The first element of the array is indexed by subscript of 0.[8]
- 1 (one-based indexing)
- The first element of the array is indexed by subscript of 1.
- n (n-based indexing)
- The base index of an array can be freely chosen. Usually programming languages allowing n-based indexing also allow negative index values and other scalar data types like enumerations, or charactersmay be used as an array index.
Using zero based indexing is the design choice of many influential programming languages, including C, Java and Lisp. This leads to simpler implementation where the subscript refers to an offset from the starting position of an array, so the first element has an offset of zero.
Arrays can have multiple dimensions, thus it is not uncommon to access an array using multiple indices. For example, a two-dimensional array A
with three rows and four columns might provide access to the element at the 2nd row and 4th column by the expression A[1][3]
in the case of a zero-based indexing system. Thus two indices are used for a two-dimensional array, three for a three-dimensional array, and n for an n-dimensional array.
The number of indices needed to specify an element is called the dimension, dimensionality, or rank of the array.
In standard arrays, each index is restricted to a certain range of consecutive integers (or consecutive values of some enumerated type), and the address of an element is computed by a "linear" formula on the indices.
One-dimensional arrays
A one-dimensional array (or single dimension array) is a type of linear array. Accessing its elements involves a single subscript which can either represent a row or column index.
As an example consider the C declaration int anArrayName[10];
which declares a one-dimensional array of ten integers. Here, the array can store ten elements of type int
. This array has indices starting from zero through nine. For example, the expressions anArrayName[0]
and anArrayName[9]
are the first and last elements respectively.
For a vector with linear addressing, the element with index i is located at the address B + c · i, where B is a fixed base address and c a fixed constant, sometimes called the address increment or stride.
If the valid element indices begin at 0, the constant B is simply the address of the first element of the array. For this reason, the C programming language specifies that array indices always begin at 0; and many programmers will call that element "zeroth" rather than "first".
However, one can choose the index of the first element by an appropriate choice of the base address B. For example, if the array has five elements, indexed 1 through 5, and the base address B is replaced by B + 30c, then the indices of those same elements will be 31 to 35. If the numbering does not start at 0, the constant B may not be the address of any element.
Multidimensional arrays
For a multidimensional array, the element with indices i,j would have address B + c · i + d · j, where the coefficients c and d are the row and column address increments, respectively.
More generally, in a k-dimensional array, the address of an element with indices i1, i2, ..., ik is
- B + c1 · i1 + c2 · i2 + … + ck · ik.
For example: int a[2][3];
This means that array a has 2 rows and 3 columns, and the array is of integer type. Here we can store 6 elements they will be stored linearly but starting from first row linear then continuing with second row. The above array will be stored as a11, a12, a13, a21, a22, a23.
This formula requires only k multiplications and k additions, for any array that can fit in memory. Moreover, if any coefficient is a fixed power of 2, the multiplication can be replaced by bit shifting.
The coefficients ck must be chosen so that every valid index tuple maps to the address of a distinct element.
If the minimum legal value for every index is 0, then B is the address of the element whose indices are all zero. As in the one-dimensional case, the element indices may be changed by changing the base address B. Thus, if a two-dimensional array has rows and columns indexed from 1 to 10 and 1 to 20, respectively, then replacing B by B + c1 − 3c2 will cause them to be renumbered from 0 through 9 and 4 through 23, respectively. Taking advantage of this feature, some languages (like FORTRAN 77) specify that array indices begin at 1, as in mathematical tradition while other languages (like Fortran 90, Pascal and Algol) let the user choose the minimum value for each index.
Dope vectors
The addressing formula is completely defined by the dimension d, the base address B, and the increments c1, c2, ..., ck. It is often useful to pack these parameters into a record called the array's descriptor or stride vector or
Compact layouts
Often the coefficients are chosen so that the elements occupy a contiguous area of memory. However, that is not necessary. Even if arrays are always created with contiguous elements, some array slicing operations may create non-contiguous sub-arrays from them.
There are two systematic compact layouts for a two-dimensional array. For example, consider the matrix
In the row-major order layout (adopted by C for statically declared arrays), the elements in each row are stored in consecutive positions and all of the elements of a row have a lower address than any of the elements of a consecutive row:
1 2 3 4 5 6 7 8 9
In column-major order (traditionally used by Fortran), the elements in each column are consecutive in memory and all of the elements of a column have a lower address than any of the elements of a consecutive column:
1 4 7 2 5 8 3 6 9
For arrays with three or more indices, "row major order" puts in consecutive positions any two elements whose index tuples differ only by one in the last index. "Column major order" is analogous with respect to the first index.
In systems which use
Resizing
Static arrays have a size that is fixed when they are created and consequently do not allow elements to be inserted or removed. However, by allocating a new array and copying the contents of the old array to it, it is possible to effectively implement a dynamic version of an array; see dynamic array. If this operation is done infrequently, insertions at the end of the array require only amortized constant time.
Some array data structures do not reallocate storage, but do store a count of the number of elements of the array in use, called the count or size. This effectively makes the array a
Non-linear formulas
More complicated (non-linear) formulas are occasionally used. For a compact two-dimensional triangular array, for instance, the addressing formula is a polynomial of degree 2.
Efficiency
Both store and select take (deterministic worst case)
In an array with element size k and on a machine with a cache line size of B bytes, iterating through an array of n elements requires the minimum of ceiling(nk/B) cache misses, because its elements occupy contiguous memory locations. This is roughly a factor of B/k better than the number of cache misses needed to access n elements at random memory locations. As a consequence, sequential iteration over an array is noticeably faster in practice than iteration over many other data structures, a property called
Memory-wise, arrays are compact data structures with no per-element
Array accesses with statically predictable access patterns are a major source of data parallelism.
Comparison with other data structures
Peek (index) |
Mutate (insert or delete) at … | Excess space, average | |||
---|---|---|---|---|---|
Beginning | End | Middle | |||
Linked list | Θ(n) | Θ(1) | Θ(1), known end element; Θ(n), unknown end element |
Θ(n)[9][10] | Θ(n) |
Array
|
Θ(1) | — | — | — | 0 |
Dynamic array | Θ(1) | Θ(n) | Θ(1) amortized | Θ(n) | Θ(n)[11] |
Balanced tree | Θ(log n) | Θ(log n) | Θ(log n) | Θ(log n) | Θ(n) |
Random-access list | Θ(log n)[12] | Θ(1) | —[12] | —[12] | Θ(n) |
Hashed array tree | Θ(1) | Θ(n) | Θ(1) amortized | Θ(n) | Θ(√n) |
Associative arrays provide a mechanism for array-like functionality without huge storage overheads when the index values are sparse. For example, an array that contains values only at indexes 1 and 2 billion may benefit from using such a structure. Specialized associative arrays with integer keys include Patricia tries, Judy arrays, and van Emde Boas trees.
Balanced trees require O(log n) time for indexed access, but also permit inserting or deleting elements in O(log n) time,[13] whereas growable arrays require linear (Θ(n)) time to insert or delete elements at an arbitrary position.
Linked lists allow constant time removal and insertion in the middle but take linear time for indexed access. Their memory use is typically worse than arrays, but is still linear.
An Iliffe vector is an alternative to a multidimensional array structure. It uses a one-dimensional array of references to arrays of one dimension less. For two dimensions, in particular, this alternative structure would be a vector of pointers to vectors, one for each row(pointer on c or c++). Thus an element in row i and column j of an array A would be accessed by double indexing (A[i][j] in typical notation). This alternative structure allows jagged arrays, where each row may have a different size—or, in general, where the valid range of each index depends on the values of all preceding indices. It also saves one multiplication (by the column address increment) replacing it by a bit shift (to index the vector of row pointers) and one extra memory access (fetching the row address), which may be worthwhile in some architectures.
Dimension
The dimension of an array is the number of indices needed to select an element. Thus, if the array is seen as a function on a set of possible index combinations, it is the dimension of the space of which its domain is a discrete subset. Thus a one-dimensional array is a list of data, a two-dimensional array is a rectangle of data,[14] a three-dimensional array a block of data, etc.
This should not be confused with the dimension of the set of all matrices with a given domain, that is, the number of elements in the array. For example, an array with 5 rows and 4 columns is two-dimensional, but such matrices form a 20-dimensional space. Similarly, a three-dimensional vector can be represented by a one-dimensional array of size three.
See also
References
- Dictionary of Algorithms and Data Structures. National Institute of Standards and Technology. Retrieved 22 August 2010.
- ^ ].
- ^ S2CID 10890293.
- ISBN 978-0-595-24039-5.
- ^ ISBN 978-3-540-65387-5.[dead link]
- ^ Knuth, Donald (1998). Sorting and Searching. The Art of Computer Programming. Vol. 3. Reading, MA: Addison-Wesley Professional. p. 159.
- ISBN 9780932376220.
- ^ "Array Code Examples - PHP Array Functions - PHP code". Computer Programming Web programming Tips. Archived from the original on 13 April 2011. Retrieved 8 April 2011.
In most computer languages array index (counting) starts from 0, not from 1. Index of the first element of the array is 0, index of the second element of the array is 1, and so on. In array of names below you can see indexes and values.
- ^ Day 1 Keynote - Bjarne Stroustrup: C++11 Style at GoingNative 2012 on channel9.msdn.com from minute 45 or foil 44
- ^ Number crunching: Why you should never, ever, EVER use linked-list in your code again at kjellkod.wordpress.com
- ^ Brodnik, Andrej; Carlsson, Svante; Sedgewick, Robert; Munro, JI; Demaine, ED (1999), Resizable Arrays in Optimal Time and Space (Technical Report CS-99-09) (PDF), Department of Computer Science, University of Waterloo
- ^ .
- ^ "Counted B-Trees".
- ^ "Two-Dimensional Arrays \ Processing.org". processing.org. Retrieved 1 May 2020.
External links
- Data Structures/Arrays at Wikibooks