All components of an array are of the same data type

In computer science, array is a data type that represents a collection of elements [values or variables], each selected by one or more indices [identifying keys] that can be computed at run time during program execution. Such a collection is usually called an array variable or array value. By analogy with the mathematical concepts vector and matrix, array types with one and two indices are often called vector type and matrix type, respectively. More generally, a multidimensional array type can be called a tensor type, by anology with the physical concept, tensor.

Language support for array types may include certain built-in array data types, some syntactic constructions [array type constructors] that the programmer may use to define such types and declare array variables, and special notation for indexing array elements. For example, in the Pascal programming language, the declaration type MyTable = array [1..4,1..2] of integer, defines a new array data type called MyTable. The declaration var A: MyTable then defines a variable A of that type, which is an aggregate of eight elements, each being an integer variable identified by two indices. In the Pascal program, those elements are denoted A[1,1], A[1,2], A[2,1], …, A[4,2]. Special array types are often defined by the language's standard libraries.

Dynamic lists are also more common and easier to implement[ – ] than dynamic arrays. Array types are distinguished from record types mainly because they allow the element indices to be computed at run time, as in the Pascal assignment A[I,J] := A[N-I,2*J]. Among other things, this feature allows a single iterative statement to process arbitrarily many elements of an array variable.

In more theoretical contexts, especially in type theory and in the description of abstract algorithms, the terms "array" and "array type" sometimes refer to an abstract data type [ADT] also called abstract array or may refer to an associative array, a mathematical model with the basic operations and behavior of a typical array type in most languages — basically, a collection of elements that are selected by indices computed at run-time.

Depending on the language, array types may overlap [or be identified with] other data types that describe aggregates of values, such as lists and strings. Array types are often implemented by array data structures, but sometimes by other means, such as hash tables, linked lists, or search trees.

History[edit]

Heinz Rutishauser's programming language Superplan [1949–1951] included multi-dimensional arrays. Rutishauser however although describing how a compiler for his language should be built, did not implement one.

Assembly languages and low-level languages like BCPL generally have no syntactic support for arrays.

Because of the importance of array structures for efficient computation, the earliest high-level programming languages, including FORTRAN [1957], COBOL [1960], and Algol 60 [1960], provided support for multi-dimensional arrays.

Abstract arrays[edit]

An array data structure can be mathematically modeled as an abstract data structure [an abstract array] with two operations

get[A, I]: the data stored in the element of the array A whose indices are the integer tuple I.set[A,I,V]: the array that results by setting the value of that element to V.

These operations are required to satisfy the axioms

get[set[A,I, V], I] = Vget[set[A,I, V], J] = get[A, J] if I ≠ J

for any array state A, any value V, and any tuples I, J for which the operations are defined.

The first axiom means that each element behaves like a variable. The second axiom means that elements with distinct indices behave as disjoint variables, so that storing a value in one element does not affect the value of any other element.

These axioms do not place any constraints on the set of valid index tuples I, therefore this abstract model can be used for triangular matrices and other oddly-shaped arrays.

Implementations[edit]

In order to effectively implement variables of such types as array structures [with indexing done by pointer arithmetic], many languages restrict the indices to integer data types [or other types that can be interpreted as integers, such as bytes and enumerated types], and require that all elements have the same data type and storage size. Most of those languages also restrict each index to a finite interval of integers, that remains fixed throughout the lifetime of the array variable. In some compiled languages, in fact, the index ranges may have to be known at compile time.

On the other hand, some programming languages provide more liberal array types, that allow indexing by arbitrary values, such as floating-point numbers, strings, objects, references, etc.. Such index values cannot be restricted to an interval, much less a fixed interval. So, these languages usually allow arbitrary new elements to be created at any time. This choice precludes the implementation of array types as array data structures. That is, those languages use array-like syntax to implement a more general associative array semantics, and must therefore be implemented by a hash table or some other search data structure.

Language support[edit]

Multi-dimensional arrays[edit]

The number of indices needed to specify an element is called the dimension, dimensionality, or rank of the array type. [This nomenclature conflicts with the concept of dimension in linear algebra, where it is the number of elements. Thus, an array of numbers with 5 rows and 4 columns, hence 20 elements, is said to have dimension 2 in computing contexts, but represents a matrix with dimension 4-by-5 or 20 in mathematics. Also, the computer science meaning of "rank" is similar to its meaning in tensor algebra but not to the linear algebra concept of rank of a matrix.]

Many languages support only one-dimensional arrays. In those languages, a multi-dimensional array is typically represented by an Iliffe vector, a one-dimensional array of references to arrays of one dimension less. A two-dimensional array, in particular, would be implemented as a vector of pointers to its rows. Thus an element in row i and column j of an array A would be accessed by double indexing [A[i][j] in typical notation]. This way of emulating multi-dimensional arrays allows the creation of jagged arrays, where each row may have a different size — or, in general, where the valid range of each index depends on the values of all preceding indices.

This representation for multi-dimensional arrays is quite prevalent in C and C++ software. However, C and C++ will use a linear indexing formula for multi-dimensional arrays that are declared with compile time constant size, e.g. by int A[10][20] or MyTable0, instead of the traditional MyTable1.

Indexing notation[edit]

Most programming languages that support arrays support the store and select operations, and have special syntax for indexing. Early languages used parentheses, e.g. MyTable2, as in FORTRAN; others choose square brackets, e.g. MyTable3 or MyTable4, as in Algol 60 and Pascal [to distinguish from the use of parentheses for function calls].

Index types[edit]

Array data types are most often implemented as array structures: with the indices restricted to integer [or totally ordered] values, index ranges fixed at array creation time, and multilinear element addressing. This was the case in most "third generation" languages, and is still the case of most systems programming languages such as Ada, C, and C++. In some languages, however, array data types have the semantics of associative arrays, with indices of arbitrary type and dynamic element creation. This is the case in some scripting languages such as Awk and Lua, and of some array types provided by standard C++ libraries.

Bounds checking[edit]

Some languages [like Pascal and Modula] perform bounds checking on every access, raising an exception or aborting the program when any index is out of its valid range. Compilers may allow these checks to be turned off to trade safety for speed. Other languages [like FORTRAN and C] trust the programmer and perform no checks. Good compilers may also analyze the program to determine the range of possible values that the index may have, and this analysis may lead to bounds-checking elimination.

Index origin[edit]

Some languages, such as C, provide only zero-based array types, for which the minimum valid value for any index is 0. This choice is convenient for array implementation and address computations. With a language such as C, a pointer to the interior of any array can be defined that will symbolically act as a pseudo-array that accommodates negative indices. This works only because C does not check an index against bounds when used.

Other languages provide only one-based array types, where each index starts at 1; this is the traditional convention in mathematics for matrices and mathematical sequences. A few languages, such as Pascal and Lua, support n-based array types, whose minimum legal indices are chosen by the programmer. The relative merits of each choice have been the subject of heated debate. Zero-based indexing can avoid off-by-one or fencepost errors, specifically a 0-based for[i=0;i

Chủ Đề