Serialization
In computing, serialization (or serialisation) is the process of translating a
This process of serializing an object is also called
Uses
Serialization application examples includes methods such as:
- serializing data for transfer across wires and networks (messaging).
- storing data (in databases, on hard disk drives).
- remote procedure calls, e.g., as in SOAP.
- distributing objects, especially in CORBA, etc.
- detecting changes in time-varying data.
For some of these features to be useful, architecture independence must be maintained. For example, for maximal use of distribution, a computer running on a different
Inherent to any serialization scheme is that, because the encoding of the data is by definition serial, extracting one part of the serialized data structure requires that the entire object be read from start to end, and reconstructed. In many applications, this linearity is an asset, because it enables simple, common I/O interfaces to be utilized to hold and pass on the state of an object. In applications where higher performance is an issue, it can make sense to expend more effort to deal with a more complex, non-linear storage organization.
Even on a single machine, primitive
Since both serializing and deserializing can be driven from common code (for example, the Serialize function in
Drawbacks
Serialization breaks the opacity of an
To discourage competitors from making compatible products, publishers of
Many institutions, such as archives and libraries, attempt to
Serialization formats
The
In the late 1990s, a push to provide an alternative to the standard serialization protocols started:
YAML is a strict superset of JSON and includes additional features such as a data type tags, support for cyclic data structures, indentation-sensitive syntax, and multiple forms of scalar data quoting. YAML is an open format.
Property lists are used for serialization by NeXTSTEP, GNUstep, macOS, and iOS frameworks. Property list, or p-list for short, doesn't refer to a single serialization format but instead several different variants, some human-readable and one binary.
For large volume scientific datasets, such as satellite data and output of numerical climate, weather, or ocean models, specific binary serialization standards have been developed, e.g. HDF, netCDF and the older GRIB.
Programming language support
Several
family of languages. There are also libraries available that add serialization support to languages that lack native support for it.C and C++
(Microsoft) also provides serialization methodology as part of its Document-View architecture.CFML
<cfwddx>
tag and to JSON with the SerializeJSON()Delphi
Go
Go natively supports unmarshalling/marshalling of JSON and XML data.[10] There are also third-party modules that support YAML[11] and Protocol Buffers.[12] Go also supports Gobs.[13]
Haskell
In Haskell, serialization is supported for types that are members of the Read and Show type classes. Every type that is a member of the Read
type class defines a function that will extract the data from the string representation of the dumped data. The Show
type class, in turn, contains the show
function from which a string representation of the object can be generated. The programmer need not define the functions explicitly—merely declaring a type to be deriving Read or deriving Show, or both, can make the compiler generate the appropriate functions for many cases (but not all: function types, for example, cannot automatically derive Show or Read). The auto-generated instance for Show also produces valid source code, so the same Haskell value can be generated by running the code produced by show in, for example, a Haskell interpreter.[14] For more efficient serialization, there are haskell libraries that allow high-speed serialization in binary format, e.g. binary.
Java
Java provides automatic serialization which requires that the object be
components do implement the Serializable interface, they are not guaranteed to be portable between different versions of the Java Virtual Machine. As such, a Swing component, or any component which inherits it, may be serialized to a byte stream, but it is not guaranteed that this will be re-constitutable on another machine.JavaScript
Since ECMAScript 5.1,
Julia
Julia implements serialization through the serialize()
/ deserialize()
modules,[21] intended to work within the same version of Julia, and/or instance of the same system image.[22] The HDF5.jl
package offers a more stable alternative, using a documented format and common library with wrappers for different languages,[23] while the default serialization format is suggested to have been designed rather with maximal performance for network communication in mind.[24]
Lisp
Generally a Lisp data structure can be serialized with the functions "read
" and "print
". A variable foo containing, for example, a list of arrays would be printed by (print foo)
. Similarly an object can be read from a stream named s by (read s)
. These two parts of the Lisp implementation are called the Printer and the Reader. The output of "print
" is human readable; it uses lists demarked by parentheses, for example: (4 2.9 "x" y)
. In many types of Lisp, including Common Lisp, the printer cannot represent every type of data because it is not clear how to do so. In Common Lisp for example the printer cannot print CLOS objects. Instead the programmer may write a method on the generic function print-object
, this will be invoked when the object is printed. This is somewhat similar to the method used in Ruby. Lisp code itself is written in the syntax of the reader, called read syntax. Most languages use separate and different parsers to deal with code and data, Lisp only uses one. A file containing lisp code may be read into memory as a data structure, transformed by another program, then possibly executed or written out, such as in a read–eval–print loop. Not all readers/writers support cyclic, recursive or shared structures.
.NET Framework
.NET Framework has several serializers designed by Microsoft. There are also many serializers by third parties. More than a dozen serializers are discussed and tested here.[25] and here[26]
OCaml
OCaml's standard library provides marshalling through the Marshal
module[3] and the Pervasives functions output_value
and input_value
. While OCaml programming is statically type-checked, uses of the Marshal
module may break type guarantees, as there is no way to check whether an unmarshalled stream represents objects of the expected type. In OCaml it is difficult to marshal a function or a data structure which contains a function (e.g. an object which contains a method), because executable code in functions cannot be transmitted across different programs. (There is a flag to marshal the code position of a function but it can only be unmarshalled in exactly the same program). The standard marshalling functions can preserve sharing and handle cyclic data, which can be configured by a flag.
Perl
Several Perl modules available from CPAN provide serialization mechanisms, including Storable
, JSON::XS
and FreezeThaw
. Storable includes functions to serialize and deserialize Perl data structures to and from files or Perl scalars. In addition to serializing directly to files, Storable
includes the freeze
function to return a serialized copy of the data packed into a scalar, and thaw
to deserialize such a scalar. This is useful for sending a complex data structure over a network socket or storing it in a database. When serializing structures with Storable
, there are network safe functions that always store their data in a format that is readable on any computer at a small cost of speed. These functions are named nstore
, nfreeze
, etc. There are no "n" functions for deserializing these structures — the regular thaw
and retrieve
deserialize structures serialized with the "n
" functions and their machine-specific equivalents.
PHP
PHP originally implemented serialization through the built-in serialize()
and unserialize()
functions.[27] PHP can serialize any of its data types except resources (file pointers, sockets, etc.). The built-in unserialize()
function is often dangerous when used on completely untrusted data.[28] For objects, there are two "magic methods" that can be implemented within a class — __sleep()
and __wakeup()
— that are called from within serialize()
and unserialize()
, respectively, that can clean up and restore an object. For example, it may be desirable to close a database connection on serialization and restore the connection on deserialization; this functionality would be handled in these two magic methods. They also permit the object to pick which properties are serialized. Since PHP 5.1, there is an object-oriented serialization mechanism for objects, the Serializable
interface.[29]
Prolog
Prolog's term structure, which is the only data structure of the language, can be serialized out through the built-in predicate write_term/3
and serialized-in through the built-in predicates read/1
and read_term/2
. The resulting stream is uncompressed text (in some encoding determined by configuration of the target stream), with any free variables in the term represented by placeholder variable names. The predicate write_term/3
is standardized in the ISO Specification for Prolog (ISO/IEC 13211-1) on pages 59 ff. ("Writing a term, § 7.10.5"). Therefore it is expected that terms serialized-out by one implementation can be serialized-in by another without ambiguity or surprises. In practice, implementation-specific extensions (e.g. SWI-Prolog's dictionaries) may use non-standard term structures, so interoperability may break in edge cases. As examples, see the corresponding manual pages for SWI-Prolog,[30] SICStus Prolog,[31] GNU Prolog.[32] Whether and how serialized terms received over the network are checked against a specification (after deserialization from the character stream has happened) is left to the implementer. Prolog's built-in Definite Clause Grammars can be applied at that stage.
Python
The core general serialization mechanism is the pickle
R
R has the function dput
which writes an ASCII text representation of an R object to a file or connection. A representation can be read from a file using dget
.[39] More specific, the function serialize
serializes an R object to a connection, the output being a raw vector coded in hexadecimal format. The unserialize
function allows to read an object from a connection or a raw vector.[40]
REBOL
load
function. RProtoBuf
provides cross-language data serialization in R, using Protocol Buffers.[41]Ruby
_load
should take a String
and return an object of this class.
Rust
Serde
is the most widely used library, or crate, for serialization in
Smalltalk
In general, non-recursive and non-sharing objects can be stored and retrieved in a human readable form using the storeOn:
/readFrom:
protocol. The storeOn:
method generates the text of a Smalltalk expression which – when evaluated using readFrom:
– recreates the original object. This scheme is special, in that it uses a procedural description of the object, not the data itself. It is therefore very flexible, allowing for classes to define more compact representations. However, in its original form, it does not handle cyclic data structures or preserve the identity of shared references (i.e. two references a single object will be restored as references to two equal, but not identical copies). For this, various portable and non-portable alternatives exist. Some of them are specific to a particular Smalltalk implementation or class library. There are several ways in Squeak Smalltalk to serialize and store objects. The easiest and most used are storeOn:/readFrom:
and binary storage formats based on SmartRefStream
serializers. In addition, bundled objects can be stored and retrieved using ImageSegments
. Both provide a so-called "binary-object storage framework", which support serialization into and retrieval from a compact binary form. Both handle cyclic, recursive and shared structures, storage/retrieval of class and metaclass info and include mechanisms for "on the fly" object migration (i.e. to convert instances which were written by an older version of a class with a different object layout). The APIs are similar (storeBinary/readBinary), but the encoding details are different, making these two formats incompatible. However, the Smalltalk/X code is open source and free and can be loaded into other Smalltalks to allow for cross-dialect object interchange. Object serialization is not part of the ANSI Smalltalk specification. As a result, the code to serialize an object varies by Smalltalk implementation. The resulting binary data also varies. For instance, a serialized object created in Squeak Smalltalk cannot be restored in Ambrai Smalltalk. Consequently, various applications that do work on multiple Smalltalk implementations that rely on object serialization cannot share data between these different implementations. These applications include the MinneStore object database[42] and some RPC packages. A solution to this problem is SIXX,[43] which is a package for multiple Smalltalks that uses an XML-based format for serialization.
Swift
The Swift standard library provides two protocols, Encodable
and Decodable
(composed together as Codable
), which allow instances of conforming types to be serialized to or deserialized from JSON, property lists, or other formats.[44] Default implementations of these protocols can be generated by the compiler for types whose stored properties are also Decodable
or Encodable
.
Windows PowerShell
Export-CliXML
. Export-CliXML
serializes .NET objects and stores the resulting XML in a file. To reconstitute the objects, use the Import-CliXML
cmdlet, which generates a deserialized object from the XML in the exported file. Deserialized objects, often known as "property bags" are not live objects; they are snapshots that have properties, but no methods. Two dimensional data structures can also be (de)serialized in CSVImport-CSV
and Export-CSV
.
See also
- Commutation (telemetry)
- Comparison of data serialization formats
- Container format
- Hibernate (Java)
- XML Schema
- Basic Encoding Rules
- Google Protocol Buffers
- Wikibase
- Apache Avro
References
- ^ Cline, Marshall. "C++ FAQ: "What's This "Serialization" Thing All About?"". Archived from the original on 2015-04-05.
It lets you take an object or group of objects, put them on a disk or send them through a wire or wireless transport mechanism, then later, perhaps on another computer, reverse the process, resurrecting the original object(s). The basic mechanisms are to flatten object(s) into a one-dimensional stream of bits, and to turn that stream of bits back into the original object(s).
- ^ "Module: Marshal (Ruby 3.0.2)". ruby-doc.org. Retrieved 25 July 2021.
- ^ a b "Marshal". OCaml. Retrieved 25 July 2021.
- ^ "Python 3.9.6 documentation - Python object serialization —pickle". Documentation - The Python Standard Library.
- ^ S. Miller, Mark. "Safe Serialization Under Mutual Suspicion". ERights.org.
Serialization, explained below, is an example of a tool for use by objects within an object system for operating on the graph they are embedded in. This seems to require violating the encapsulation provided by the pure object model.
- ^ Sun Microsystems (1987). "XDR: External Data Representation Standard". RFC 1014. Network Working Group. Retrieved July 11, 2011.
- ^ "Serialization". www.boost.org.
- ^ beal, stephan. "s11n.net: object serialization/persistence in C++". s11n.net.
- ^ "cereal Docs - Main". uscilab.github.io.
- ^ "Package encoding". pkg.go.dev. 12 July 2021.
- ^ "GitHub - YAML support for the Go language". GitHub. Retrieved 25 July 2021.
- ^ "proto · pkg.go.dev". pkg.go.dev. Retrieved 2021-06-22.
- ^ "gob package - encoding/gob - pkg.go.dev". pkg.go.dev. Retrieved 2022-03-04.
- ^ "Text.Show Documentation". Retrieved 15 January 2014.
- ISBN 978-0134685991.
- ^ "Ask TOM "Serializing Java Objects into the database (and ge..."". asktom.oracle.com.
- ^ "JSON". MDN Web Docs. Retrieved 22 March 2018.
- ^ "JSON". www.json.org. Retrieved 22 March 2018.
- ^ Holm, Magnus (15 May 2011). "JSON: The JavaScript subset that isn't". The timeless repository. Archived from the original on 13 May 2012. Retrieved 23 September 2016.
- ^ "TC39 Proposal: Subsume JSON". ECMA TC39 committee. 22 May 2018.
- ^ "Serialization". The Julia Language. Retrieved 25 July 2021.
- ^ "faster and more compact serialization of symbols and strings · JuliaLang/julia@bb67ff2". GitHub.
- ^ "HDF5.jl: Saving and loading data in the HDF5 file format". 20 August 2017 – via GitHub.
- ^ "Julia: how stable are serialize() / deserialize()". stackoverflow.com. 2014.
- ^ ".NET Serializers".
There are many kinds of serializers; they produce very compact data very fast. There are serializers for messaging, for data stores, for marshaling objects. What is the best serializer in .NET?
- ^ "SERBENCH by aumcode". aumcode.github.io.
- ^ "PHP: Object Serialization - Manual". ca.php.net.
- ^ Esser, Stephen (2009-11-28). "Shocking News in PHP Exploitation". Suspekt... Archived from the original on 2012-01-06.
- ^ "PHP: Serializable - Manual". www.php.net.
- ^ ""Term reading and writing"". www.swi-prolog.org.
- ^ ""write_term/[2,3]"". sicstus.sics.se.
- ^ ""Term input/output"". gprolog.org.
- S2CID 8126961.
- dynamically typedvalues, while our RPC implementation works only by generating code for the marshalling of statically typed values. Each facility would benefit from adding the mechanisms of the other, but that has not yet been done.
- ^ van Rossum, Guido (1 December 1994). "Flattening Python Objects". Python Programming Language – Legacy Website. Delaware, United States: Python Software Foundation. Retrieved 6 April 2017.
Origin of the name 'flattening': Because I want to leave the original 'marshal' module alone, and Jim complained that 'serialization' also means something totally different that's actually relevant in the context of concurrent access to persistent objects, I'll use the term 'flattening' from now on. ... (The Modula-3 system uses the term 'pickled' data for this concept. They have probably solved all problems already, and in a type-safe manner :-)
- ^ a b "11.1. pickle — Python object serialization — Python 2.7.14rc1 documentation". docs.python.org.
- ^ "pickle — Python object serialization — Python v3.0.1 documentation". docs.python.org.
- ^ "What's New In Python 3.0 — Python v3.1.5 documentation". docs.python.org.
- ^ [R manual http://stat.ethz.ch/R-manual/R-patched/library/base/html/dput.html]
- ^ [R manual http://stat.ethz.ch/R-manual/R-patched/library/base/html/serialize.html]
- S2CID 36239952.
- ^ "MinneStore version 2". SourceForge. Archived from the original on 11 May 2008.
- ^ "What's new". SIXX - Smalltalk Instance eXchange in XML. 23 January 2010. Retrieved 25 July 2021.
- ^ "Swift Archival & Serialization". www.github.com. 2018-12-02.
External links
- Java Object Serialization documentation
- Java 1.4 Object Serialization documentation.
- Durable Java: Serialization Archived 25 November 2005 at the Wayback Machine
- XML Data Binding Resources
- Databoard - Binary serialization with partial and random access, type system, RPC, type adaption, and text format