Coarray Fortran
This article has multiple issues. Please help improve it or discuss these issues on the talk page. (Learn how and when to remove these template messages)
|
Developer | PL22.3 Fortran Committee |
---|---|
Stable release | 2008 (ISO/IEC 1539-1:2010) |
NAG Fortran Compiler | |
Influenced by | |
Fortran |
Coarray Fortran (CAF), formerly known as F--, started as an extension of
A CAF
The CAF extension was implemented in some Fortran
Implementation in compilers
CAF is often implemented on top of a Message Passing Interface (MPI) library for portability. Some implementations, such as the ones available in the GNU Fortran and OpenUH compilers, may run on top of other low-level layers (for example, GASNet) designed for supporting partitioned global address space languages.
Examples
A simple example is given below. CAF is used in CGPACK, an open source package for simulating polycrystalline materials developed at the University of Bristol. [1]
program Hello_World
implicit none
integer :: i ! Local variable
character(len=20) :: name[*] ! scalar coarray, one "name" for each image.
! Note: "name" is the local variable while "name[<index>]" accesses the
! variable in a specific image; "name[this_image()]" is the same as "name".
! Interact with the user on Image 1; execution for all others pass by.
if (this_image() == 1) then
write(*,'(a)',advance='no') 'Enter your name: '
read(*,'(a)') name
! Distribute information to other images
do i = 2, num_images()
name[i] = name
end do
end if
sync all ! Barrier to make sure the data have arrived.
! I/O from all images, executing in any order, but each record written is intact.
write(*,'(3a,i0)') 'Hello ',trim(name),' from image ', this_image()
end program Hello_world
The program above scales poorly because the loop that distributes information executes sequentially. Writing scalable programs often requires a sophisticated understanding of parallel algorithms, a detailed knowledge of the underlying network characteristics, and special tuning for application characteristics such as the size of data transfers. For most application developers, letting the compiler or runtime library decide the best algorithm proves more robust and high-performing. Fortran 2018 will offer collective communication subroutines that empower compiler and runtime library teams to encapsulate efficient parallel algorithms for collective communication and distributed computation in a set of collective subroutines. These subroutines and other new parallel programming features are summarized in a technical specification [2] that the Fortran standards committee has voted to incorporate into Fortran 2018. These enable the user to write a more efficient version of the above algorithm
program Hello_World
implicit none
character(len=20) :: name[*] ! scalar coarray, one "name" for each image.
! Note: "name" is the local variable while "name[<index>]" accesses the
! variable in a specific image; "name[this_image()]" is the same as "name".
! Interact with the user on Image 1; execution for all others pass by.
if (this_image() == 1) then
write(*,'(a)',advance='no') 'Enter your name: '
read(*,'(a)') name
end if
! Distribute information to all images
call co_broadcast(name,source_image=1)
! I/O from all images, executing in any order, but each record written is intact.
write(*,'(3a,i0)') 'Hello ',trim(name),' from image ', this_image()
end program Hello_world
where the lack of explicit synchronization offers the potential for higher performance due to less coordination between the images. Furthermore, TS 18508 guarantees that "A transfer from an image cannot occur before the collective subroutine has been invoked on that image." This implies some partial synchronization inside co_broadcast, but could be higher performing than the "sync all" in the prior example. TS 18508 also incorporates several other new features that address issues targeted by the CAF 2.0 effort described below. Examples include teams of images and events.
An alternate perspective
This section may be unbalanced towards certain viewpoints. (September 2018) |
reliable, independent, third-party sources. (November 2021) ) |
In 2011,
- There is no support for processor subsets; for instance, coarrays must be allocated over all images.
- The coarray extensions lack any notion of global pointers, which are essential for creating and manipulating any kind of linked data structure.
- Reliance on named critical sections for mutual exclusion hinders scalable parallelism by associating mutual exclusion with code regions rather than data objects.
- Fortran 2008's sync images statement does not provide a safe synchronization space. As a result, synchronization operations in user's code that are pending when a library call is made can interfere with synchronization in the library call.
- There are no mechanisms to avoid or tolerate latency when manipulating data on remote images.
- There is no support for collective communication.
To address these shortcomings, the Rice University group is developing a clean-slate redesign of the Coarray Fortran programming model. Rice's new design for Coarray Fortran, which they call Coarray Fortran 2.0, is an expressive set of coarray-based extensions to Fortran designed to provide a productive parallel programming model. Compared to Fortran 2008, Rice's new coarray-based language extensions include some additional features:
- process subsets known as teams, which support coarrays, collective communication, and relative indexing of process images for pair-wise operations,
- topologies, which augment teams with a logical communication structure,
- dynamic allocation/deallocation of coarrays and other shared data,
- team-based coarray allocation and deallocation,
- global pointers in support of dynamic data structures,
- support for latency hiding and avoidance, and
- asynchronous copies,
- asynchronous collective operations, and
- function shipping.
- enhanced support for synchronization for fine-grain control over program execution.
- safe and scalable support for mutual exclusion, including locks and lock sets,
- events, which provide a safe space for point-to-point synchronization,
- cofence, which forces local completion of asynchronous operations,
- finish, a barrier-like SPMD construct that forces completion of asynchronous operations across a team,
See also
- Array programming
- Chapel
- Fortress
- Parallel computing
- Partitioned global address space
- Unified Parallel C
- X10
References
- ISBN 978-0-9926615-0-2
- ^ TS 18508 Additional Parallel Features in Fortran
- ^ "CoArray Fortran 2.0".