Array programming: Difference between revisions

Source: Wikipedia, the free encyclopedia.
Content deleted Content added
Roger Hui (talk | contribs)
→‎APL: scalar v array
Added to the list of example languages and changed Mathematica to Wolfram Language.
Line 4: Line 4:
Array programming primitives concisely express broad ideas about data manipulation. The level of concision can be dramatic in certain cases: it is not uncommon to find array programming language [[one-liner program|one-liners]] that require more than a couple of pages of Java code.<ref>{{cite web |url=http://www.cs.nyu.edu/~michaels/screencasts/Java_vs_K/Java_vs_K.html |title=Java and K |accessdate=2008-01-23 |author=Michael Schidlowsky}}</ref>
Array programming primitives concisely express broad ideas about data manipulation. The level of concision can be dramatic in certain cases: it is not uncommon to find array programming language [[one-liner program|one-liners]] that require more than a couple of pages of Java code.<ref>{{cite web |url=http://www.cs.nyu.edu/~michaels/screencasts/Java_vs_K/Java_vs_K.html |title=Java and K |accessdate=2008-01-23 |author=Michael Schidlowsky}}</ref>


Modern programming languages that support array programming are commonly used in [[computational science|scientific]] and engineering settings; these include [[Fortran 90]], Mata, [[MATLAB]], [[Analytica (software)|Analytica]], [[TK Solver]] (as lists), [[GNU Octave|Octave]], [[R (programming language)|R]], [[Cilk Plus]], [[Julia (programming language)|Julia]], [[Perl Data Language|Perl Data Language (PDL)]] and the [[NumPy]] extension to [[Python (programming language)|Python]]. In these languages, an operation that operates on entire arrays can be called a ''vectorized'' operation,<ref>{{cite journal |title=The NumPy array: a structure for efficient numerical computation |author=Stéfan van der Walt |author2=S. Chris Colbert |author3=Gaël Varoquaux |last-author-amp=yes |journal=Computing in Science and Engineering |publisher=IEEE |year=2011}}</ref> regardless of whether it is executed on a [[vector processor]] or not.
Modern programming languages that support array programming are commonly used in [[computational science|scientific]] and engineering settings; these include [[Fortran 90]], Mata, [[MATLAB]], [[Analytica (software)|Analytica]], [[TK Solver]] (as lists), [[GNU Octave|Octave]], [[R (programming language)|R]], [[Cilk Plus]], [[Julia (programming language)|Julia]], [[Perl Data Language|Perl Data Language (PDL)]], [[Wolfram Language]], and the [[NumPy]] extension to [[Python (programming language)|Python]]. In these languages, an operation that operates on entire arrays can be called a ''vectorized'' operation,<ref>{{cite journal |title=The NumPy array: a structure for efficient numerical computation |author=Stéfan van der Walt |author2=S. Chris Colbert |author3=Gaël Varoquaux |last-author-amp=yes |journal=Computing in Science and Engineering |publisher=IEEE |year=2011}}</ref> regardless of whether it is executed on a [[vector processor]] or not.


==Concepts of array==
==Concepts of array==
Line 26: Line 26:


==Languages==
==Languages==
The canonical examples of array programming languages are [[APL (programming language)|APL]], [[J programming language|J]], and [[Fortran]]. Others include: [[A+ (programming language)|A+]], [[Analytica (software)|Analytica]], [[Chapel (programming language)|Chapel]], [[IDL (programming language)|IDL]], [[Julia (programming language)|Julia]], [[K (programming language)|K]], Klong, [[Q (programming language from Kx Systems)|Q]], Mata, [[Mathematica]], [[MATLAB]], [[MOLSF]], [[NumPy]], [[GNU Octave]], [[Perl Data Language|PDL]], [[R (programming language)|R]], [[S-Lang (programming language)|S-Lang]], [[SAC programming language|SAC]], [[Nial programming language|Nial]] and [[ZPL (programming language)|ZPL]].
The canonical examples of array programming languages are [[APL (programming language)|APL]], [[J programming language|J]], and [[Fortran]]. Others include: [[A+ (programming language)|A+]], [[Analytica (software)|Analytica]], [[Chapel (programming language)|Chapel]], [[IDL (programming language)|IDL]], [[Julia (programming language)|Julia]], [[K (programming language)|K]], Klong, [[Q (programming language from Kx Systems)|Q]], Mata, [[Wolfram Language]], [[MATLAB]], [[MOLSF]], [[NumPy]], [[GNU Octave]], [[Perl Data Language|PDL]], [[R (programming language)|R]], [[S-Lang (programming language)|S-Lang]], [[SAC programming language|SAC]], [[Nial programming language|Nial]] and [[ZPL (programming language)|ZPL]].


===Scalar languages===
===Scalar languages===

Revision as of 18:57, 13 December 2018

In

vectors, matrices
, and higher-dimensional arrays.

Array programming primitives concisely express broad ideas about data manipulation. The level of concision can be dramatic in certain cases: it is not uncommon to find array programming language one-liners that require more than a couple of pages of Java code.[1]

Modern programming languages that support array programming are commonly used in

Cilk Plus, Julia, Perl Data Language (PDL), Wolfram Language, and the NumPy extension to Python. In these languages, an operation that operates on entire arrays can be called a vectorized operation,[2] regardless of whether it is executed on a vector processor
or not.

Concepts of array

The fundamental idea behind array programming is that operations apply at once to an entire set of values. This makes it a high-level programming model as it allows the programmer to think and operate on whole aggregates of data, without having to resort to explicit loops of individual scalar operations.

Iverson described the rationale behind array programming (actually referring to APL) as follows:[3]

most programming languages are decidedly inferior to mathematical notation and are little used as tools of thought in ways that would be considered significant by, say, an applied mathematician.

The thesis is that the advantages of executability and universality found in programming languages can be effectively combined, in a single coherent language, with the advantages offered by mathematical notation. it is important to distinguish the difficulty of describing and of learning a piece of notation from the difficulty of mastering its implications. For example, learning the rules for computing a matrix product is easy, but a mastery of its implications (such as its associativity, its distributivity over addition, and its ability to represent linear functions and geometric operations) is a different and much more difficult matter.

Indeed, the very suggestiveness of a notation may make it seem harder to learn because of the many properties it suggests for explorations.

[...]

Users of computers and programming languages are often concerned primarily with the efficiency of execution of algorithms, and might, therefore, summarily dismiss many of the algorithms presented here. Such dismissal would be short-sighted, since a clear statement of an algorithm can usually be used as a basis from which one may easily derive more efficient algorithm.

The basis behind array programming and thinking is to find and exploit the properties of data where individual elements are similar or adjacent. Unlike object orientation which implicitly breaks down data to its constituent parts (or

scalar
quantities), array orientation looks to group data and apply a uniform handling.

Collapse operators
reduce the dimensionality of an input data array by one or more dimensions. For example, summing over elements collapses the input array by 1 dimension.

Uses

Array programming is very well suited to implicit parallelization; a topic of much research nowadays. Further,

MIMD
) to be solved piecemeal by numerous processors. Processors with two or more cores are increasingly common today.

Languages

The canonical examples of array programming languages are

Nial and ZPL
.

Scalar languages

In scalar languages such as C and Pascal, operations apply only to single values, so a+b expresses the addition of two numbers. In such languages, adding one array to another requires indexing and looping, the coding of which is tedious.

for (i = 0; i < n; i++)
    for (j = 0; j < n; j++)
        a[i][j] += b[i][j];

Array languages

In array languages, operations are generalized to apply to both scalars and arrays. Thus, a+b expresses the sum of two scalars if a and b are scalars, or the sum of two arrays if they are arrays.

An array language simplifies programming but possibly at a cost known as the abstraction penalty.

overhead
).

Ada

The previous C code would become the following in the Ada language,[7] which supports array-programming syntax.

 A := A + B;

APL

APL uses single character Unicode symbols with no syntactic sugar.

 A ← A + B

This operation works on arrays of any rank (including rank 0), and on a scalar and an array. Dyalog APL extends the original language with augmented assignments:

A +← B

Analytica

Analytica provides the same economy of expression as Ada.

 A := A + B;

BASIC

Dartmouth BASIC had MAT statements for matrix and array manipulation in its third edition (1966).

 DIM A(4),B(4),C(4)
 MAT A = 1
 MAT B = 2*A
 MAT C = A + B
 MAT PRINT A,B,C

Mata

Stata's matrix programming language Mata supports array programming. Below, we illustrate addition, multiplication, addition of a matrix and a scalar, element by element multiplication, subscripting, and one of Mata's many inverse matrix functions.

. mata:

: A = (1,2,3) \(4,5,6)

: A
       1   2   3
    +-------------+
  1 |  1   2   3  |
  2 |  4   5   6  |
    +-------------+

: B = (2..4) \(1..3)

: B
       1   2   3
    +-------------+
  1 |  2   3   4  |
  2 |  1   2   3  |
    +-------------+

: C = J(3,2,1)           // A 3 by 2 matrix of ones

: C
       1   2
    +---------+
  1 |  1   1  |
  2 |  1   1  |
  3 |  1   1  |
    +---------+

: D = A + B

: D
       1   2   3
    +-------------+
  1 |  3   5   7  |
  2 |  5   7   9  |
    +-------------+

: E = A*C

: E
        1    2
    +-----------+
  1 |   6    6  |
  2 |  15   15  |
    +-----------+

: F = A:*B

: F
        1    2    3
    +----------------+
  1 |   2    6   12  |
  2 |   4   10   18  |
    +----------------+

: G = E :+ 3

: G
        1    2
    +-----------+
  1 |   9    9  |
  2 |  18   18  |
    +-----------+

: H = F[(2\1), (1, 2)]    // Subscripting to get a submatrix of F and

:                         // switch row 1 and 2
: H
        1    2
    +-----------+
  1 |   4   10  |
  2 |   2    6  |
    +-----------+

: I = invsym(F'*F)        // Generalized inverse (F*F^(-1)F=F) of a

:                         // symmetric positive semi-definite matrix
: I
[symmetric]
                 1             2             3
    +-------------------------------------------+
  1 |            0                              |
  2 |            0          3.25                |
  3 |            0         -1.75   .9444444444  |
    +-------------------------------------------+

: end

MATLAB

The implementation in MATLAB allows the same economy allowed by using the Ada language.

A = A + B;

A variant of the MATLAB language is the GNU Octave language, which extends the original language with augmented assignments:

A += B;

Both MATLAB and GNU Octave natively support linear algebra operations such as matrix multiplication,

Moore–Penrose pseudoinverse.[8][9]

The Nial example of the inner product of two arrays can be implemented using the native matrix multiplication operator. If a is a row vector of size [1 n] and b is a corresponding column vector of size [n 1].

a * b;

The inner product between two matrices having the same number of elements can be implemented with the auxiliary operator (:), which reshapes a given matrix into a column vector, and the transpose operator ':

A(:)' * B(:);

rasql

The rasdaman query language is a database-oriented array-programming language. For example, two arrays could be added with the following query:

SELECT A + B
FROM   A, B

R

The R language supports array paradigm by default. The following example illustrates a process of multiplication of two matrices followed by an addition of a scalar (which is, in fact, a one-element vector) and a vector:

> A <- matrix(1:6, nrow=2)                              !!this has nrow=2 ... and A has 2 rows
> A
     [,1] [,2] [,3]
[1,]    1    3    5
[2,]    2    4    6
> B <- t( matrix(6:1, nrow=2) )  # t() is a transpose operator                           !!this has nrow=2 ... and B has 3 rows --- a clear contradiction to the definition of A
> B
     [,1] [,2]
[1,]    6    5
[2,]    4    3
[3,]    2    1
> C <- A %*% B
> C
     [,1] [,2]
[1,]   28   19
[2,]   40   28
> D <- C + 1
> D
     [,1] [,2]
[1,]   29   20
[2,]   41   29
> D + c(1, 1)  # c() creates a vector
     [,1] [,2]
[1,]   30   21
[2,]   42   30

Mathematical reasoning and language notation

The matrix left-division operator concisely expresses some semantic properties of matrices. As in the scalar equivalent, if the (

full rank square matrix
:

A^-1 *(A * x)==A^-1 * (b)
(A^-1 * A)* x ==A^-1 * b       (matrix-multiplication
associativity
)
x = A^-1 * b

where == is the equivalence relational operator. The previous statements are also valid MATLAB expressions if the third one is executed before the others (numerical comparisons may be false because of round-off errors).

If the system is overdetermined - so that A has more rows than columns - the pseudoinverse A+ (in MATLAB and GNU Octave languages: pinv(A)) can replace the inverse A−1, as follows:

pinv(A) *(A * x)==pinv(A) * (b)
(pinv(A) * A)* x ==pinv(A) * b       (matrix-multiplication associativity)
x = pinv(A) * b

However, these solutions are neither the most concise ones (e.g. still remains the need to notationally differentiate overdetermined systems) nor the most computationally efficient. The latter point is easy to understand when considering again the scalar equivalent a * x = b, for which the solution x = a^-1 * b would require two operations instead of the more efficient x = b / a. The problem is that generally matrix multiplications are not

commutative
as the extension of the scalar solution to the matrix case would require:

(a * x)/ a ==b / a
(x * a)/ a ==b / a       (commutativity does not hold for matrices!)
x * (a / a)==b / a       (associativity also holds for matrices)
x = b / a

The MATLAB language introduces the left-division operator \ to maintain the essential part of the analogy with the scalar case, therefore simplifying the mathematical reasoning and preserving the conciseness:

A \ (A * x)==A \ b
(A \ A)* x ==A \ b       (associativity also holds for matrices, commutativity is no more required)
x = A \ b

This is not only an example of terse array programming from the coding point of view but also from the computational efficiency perspective, which in several array programming languages benefits from quite efficient linear algebra libraries such as ATLAS or LAPACK.[10][11]

Returning to the previous quotation of Iverson, the rationale behind it should now be evident:

it is important to distinguish the difficulty of describing and of learning a piece of notation from the difficulty of mastering its implications. For example, learning the rules for computing a matrix product is easy, but a mastery of its implications (such as its associativity, its distributivity over addition, and its ability to represent linear functions and geometric operations) is a different and much more difficult matter. Indeed, the very suggestiveness of a notation may make it seem harder to learn because of the many properties it suggests for explorations.

Third-party libraries

The use of specialized and efficient libraries to provide more terse abstractions is also common in other programming languages. In C++ several linear algebra libraries exploit the language ability to overload operators. In some cases a very terse abstraction in those languages is explicitly influenced by the array programming paradigm, as the Armadillo and Blitz++ libraries do.[12][13]

See also

References

  1. ^ Michael Schidlowsky. "Java and K". Retrieved 2008-01-23.
  2. ^ Stéfan van der Walt; S. Chris Colbert; Gaël Varoquaux (2011). "The NumPy array: a structure for efficient numerical computation". Computing in Science and Engineering. IEEE. {{cite journal}}: Unknown parameter |last-author-amp= ignored (|name-list-style= suggested) (help)
  3. . Retrieved 2011-03-22.
  4. ^ Surana P (2006). "Meta-Compilation of Language Abstractions" (PDF). Archived from the original (PDF) on 2015-02-17. Retrieved 2008-03-17. {{cite journal}}: Cite journal requires |journal= (help); Unknown parameter |deadurl= ignored (|url-status= suggested) (help)
  5. ^ Kuketayev. "The Data Abstraction Penalty (DAP) Benchmark for Small Objects in Java". Retrieved 2008-03-17.
  6. .
  7. ^ Ada Reference Manual: G.3.1 Real Vectors and Matrices
  8. ^ "GNU Octave Manual. Arithmetic Operators". Retrieved 2011-03-19.
  9. ^ "MATLAB documentation. Arithmetic Operators". Retrieved 2011-03-19.
  10. ^ "GNU Octave Manual. Appendix G Installing Octave". Retrieved 2011-03-19.
  11. ^ "Mathematica 5.2 Documentation. Software References". Retrieved 2011-03-19.
  12. ^ "Reference for Armadillo 1.1.8. Examples of Matlab/Octave syntax and conceptually corresponding Armadillo syntax". Retrieved 2011-03-19.
  13. ^ "Blitz++ User's Guide. 3. Array Expressions". Archived from the original on 2011-03-23. Retrieved 2011-03-19. {{cite web}}: Unknown parameter |dead-url= ignored (|url-status= suggested) (help)

External links