Dataflow programming
In
Considerations
Traditionally, a program is modelled as a series of operations happening in a specific order; this may be referred to as sequential,[2]: p.3 procedural,[3] control flow[3] (indicating that the program chooses a specific path), or imperative programming. The program focuses on commands, in line with the von Neumann[2]: p.3 vision of sequential programming, where data is normally "at rest".[3]: p.7
In contrast, dataflow programming emphasizes the movement of data and models programs as a series of connections. Explicitly defined inputs and outputs connect operations, which function like black boxes.[3]: p.2 An operation runs as soon as all of its inputs become valid.[4] Thus, dataflow languages are inherently parallel and can work well in large, decentralized systems.[2]: p.3 [5] [6]
State
One of the key concepts in computer programming is the idea of
Where a sequential program can be imagined as a single worker moving between tasks (operations), a dataflow program is more like a series of workers on an assembly line, each doing a specific task whenever materials are available. Since the operations are only concerned with the availability of data inputs, they have no hidden state to track, and are all "ready" at the same time.
Representation
Dataflow programs are represented in different ways. A traditional program is usually represented as a series of text instructions, which is reasonable for describing a serial system which pipes data between small, single-purpose tools that receive, process, and return. Dataflow programs start with an input, perhaps the
In terms of encoding, a dataflow program might be implemented as a hash table, with uniquely identified inputs as the keys, used to look up pointers to the instructions. When any operation completes, the program scans down the list of operations until it finds the first operation where all inputs are currently valid, and runs it. When that operation finishes, it will typically output data, thereby making another operation become valid.
For parallel operation, only the list needs to be shared; it is the state of the entire program. Thus the task of maintaining state is removed from the programmer and given to the language's
Incremental updates
Some recent dataflow libraries such as Differential/Timely Dataflow have used incremental computing for much more efficient data processing.[1][7][8]
History
A pioneer dataflow language was
In a 1966 Ph.D. thesis, The On-line Graphical Specification of Computer Procedures,
The United States Navy funded development of ACOS and SPGN (signal processing graph notation) starting in the early 1980s. This is in use on a number of platforms in the field today.[12]
A more radical concept is
There are many hardware architectures oriented toward the efficient implementation of dataflow programming models.[vague] MIT's tagged token dataflow architecture was designed by Greg Papadopoulos.[undue weight? ]
Data flow has been proposed[by whom?] as an abstraction for specifying the global behavior of distributed system components: in the live distributed objects programming model, distributed data flows are used to store and communicate state, and as such, they play the role analogous to variables, fields, and parameters in Java-like programming languages.
Languages
This section needs additional citations for verification. (February 2019) |
Dataflow programming languages include:
- Céu (programming language)
- ASCET
- AviSynth scripting language, for video processing
- BMDFMBinary Modular Dataflow Machine
- CAL
- functionalworkflow language.
- CMS Pipelines
- Hume
- Joule
- Keysight VEE
- KNIME is a free and open-source data analytics, reporting and integration platform
- LabVIEW, G[4]
- Linda
- Lucid[3]
- Lustre
- Max/MSP
- Microsoft Robotics Studio designed for roboticsprogramming
- Nextflow: a workflow language
- Orange - An open-source, visual programming tool for data mining, statistical data analysis, and machine learning.
- Oz now also distributed since 1.4.0
- Pipeline Pilot
- Prograph
- Pure Data
- Quartz Composer - Designed by Apple; used for graphic animations and effects
- SAC Single assignment C
- SIGNAL(a dataflow-oriented synchronous language enabling multi-clock specifications)
- Simulink
- SISAL
- SystemVerilog - A hardware description language
- Verilog - A hardware description language absorbed into the SystemVerilog standard in 2009
- VisSim - A block diagram language for simulation of dynamic systems and automatic firmware generation
- VHDL - A hardware description language
- Wapice IOT-TICKET implements an unnamed visual dataflow programming language for IoT data analysis and reporting.
- XEE (Starlight)XML engineering environment
- XProc
Libraries
- Apache Beam: Java/Scala SDK that unifies streaming (and batch) processing with several execution engines supported (Apache Spark, Apache Flink, Google Dataflow etc.)
- Apache Flink: Java/Scala library that allows streaming (and batch) computations to be run atop a distributed Hadoop (or other) cluster
- Apache Spark
- SystemC: Library for C++, mainly aimed at hardware design.
- TensorFlow: A machine-learning library based on dataflow programming.
See also
- Actor model
- Data-driven programming
- Digital signal processing
- Event-driven programming
- Flow-based programming
- Functional reactive programming
- Glossary of reconfigurable computing
- High-performance reconfigurable computing
- Incremental computing
- Parallel programming model
- Partitioned global address space
- Pipeline (Unix)
- Quantum circuit
- Signal programming
- Stream processing
- Yahoo Pipes
References
- ^ a b Schwarzkopf, Malte (7 March 2020). "The Remarkable Utility of Dataflow Computing". ACM SIGOPS. Retrieved 31 July 2022.
- ^ S2CID 5257722. Retrieved 15 August 2013.
- ^ ISBN 9780127296500. Retrieved 15 August 2013.
- ^ a b "Dataflow Programming Basics". Getting Started with NI Products. National Instruments Corporation. Retrieved 15 August 2013.
- ^ Harter, Richard. "Data Flow languages and programming - Part I". Richard Harter's World. Archived from the original on 8 December 2015. Retrieved 15 August 2013.
- ^ "Why Dataflow Programming Languages are Ideal for Programming Parallel Hardware". Multicore Programming Fundamentals Whitepaper Series. National Instruments Corporation. Retrieved 15 August 2013.
- ^ McSherry, Frank; Murray, Derek; Isaacs, Rebecca; Isard, Michael (5 January 2013). "Differential dataflow". Microsoft. Retrieved 31 July 2022.
- ^ "Differential Dataflow". Timely Dataflow. 30 July 2022. Retrieved 31 July 2022.
- .
- hdl:1721.1/13474. Retrieved 2022-08-25.
- ^ Gloria Lambert (1973). "Large scale file processing: POGOL". POPL '73: Proceedings of the 1st annual ACM SIGACT-SIGPLAN symposium on Principles of programming languages. ACM. pp. 226–234.
- ^ Underwater Acoustic Data Processing, Y.T. Chan
External links
- Book: Dataflow and Reactive Programming Systems
- Basics of Dataflow Programming in F# and C#
- Dataflow Programming - Concept, Languages and Applications
- Static Scheduling of Synchronous Data Flow Programs for Digital Signal Processing
- Handling huge loads without adding complexity The basic concepts of dataflow programming, Dr. Dobb's, Sept. 2011