Distributed data store

A distributed data store is a computer network where information is stored on more than one node, often in a replicated fashion.^[1] It is usually specifically used to refer to either a distributed database where users store information on a number of nodes, or a computer network in which users store information on a number of peer network nodes.^[2]

Distributed databases

Amazon's Dynamo^[4]

and

Microsoft Azure Storage.^[5]

As the ability of arbitrary querying is not as important as the availability, designers of distributed data stores have increased the latter at an expense of consistency. But the high-speed read/write access results in reduced consistency, as it is not possible to guarantee both consistency and availability on a partitioned network, as stated by the CAP theorem.

Peer network node data stores

In peer network data stores, the user can usually reciprocate and allow other users to use their computer as a storage node as well. Information may or may not be accessible to other users depending on the design of the network.

Most

Freenet, Winny, Share and Perfect Dark

where any node may be storing any part of the files on the network.

Distributed data stores typically use an error detection and correction technique. Some distributed data stores (such as

forward error correction

techniques to recover the original file when parts of that file are damaged or unavailable. Others try again to download that file from a different mirror.

Examples

Distributed non-relational databases

Product	License	High availability	Notes
Apache Accumulo	AL2
Aerospike	AGPL
Apache Cassandra	AL2	Yes	formerly used by Facebook
Apache Ignite	AL2
Bigtable	Proprietary		used by Google
Couchbase	AL2		used by LinkedIn, PayPal, and eBay
CrateDB	AL2	Yes
Apache Druid	AL2		used by Yahoo
Dynamo	Proprietary		used by Amazon
etcd	AL2	Yes
Hazelcast	AL2 , Proprietary
HBase	AL2	Yes	formerly used by Facebook
Hypertable	GPL 2		Baidu
MongoDB	SSPL
MySQL NDB Cluster	GPL 2	Yes	SQL and NoSQL APIs
Riak	AL2	Yes
Redis	BSD License	Yes
ScyllaDB	AGPL
Voldemort	AL2		used by LinkedIn

Peer network node data stores

BitTorrent
Blockchain (database)
Chord project
Freenet
GNUnet
IPFS
Mnet
Napster
NNTP (the distributed data storage protocol used for Usenet news)
Unity, of the software Perfect Dark
Share
Siacoin
DeNet
Storage@home
Tahoe-LAFS
Winny
ZeroNet

References

OL 25423189M

^ "Distributed Data Storage - an overview | ScienceDirect Topics".

^ "Bigtable: Google's Distributed Data Store". Paper Trail. Archived from the original on 2017-07-16. Retrieved 2011-04-05. Although GFS provides Google with reliable, scalable distributed file storage, it does not provide any facility for structuring the data contained in the files beyond a hierarchical directory structure and meaningful file names. It's well known that more expressive solutions are required for large data sets. Google's terabytes upon terabytes of data that they retrieve from web crawlers, amongst many other sources, need organising, so that client applications can quickly perform lookups and updates at a finer granularity than the file level. [...] The very first thing you need to know about Bigtable is that it isn't a relational database. This should come as no surprise: one persistent theme through all of these large scale distributed data store papers is that RDBMSs are hard to do with good performance. There is no hard, fixed schema in a Bigtable, no referential integrity between tables (so no foreign keys) and therefore little support for optimised joins.

^ Sarah Pidcock (2011-01-31). "Dynamo: Amazon's Highly Available Key-value Store" (PDF). WATERLOO – CHERITON SCHOOL OF COMPUTER SCIENCE. p. 2/22. Retrieved 2011-04-05. Dynamo: a highly available and scalable distributed data store

^ "Windows Azure Storage". Microsoft. 2011-09-16. Archived from the original on 9 November 2011. Retrieved 6 November 2011.

Retrieved from "https://en.wikipedia.org/w/index.php?title=Distributed_data_store&oldid=1217385012"

[1] OL 25423189M

[urlDistributed_Data_Storage_-_an_overview_|_ScienceDirect_Topics-2] "Distributed Data Storage - an overview | ScienceDirect Topics".

[3] "Bigtable: Google's Distributed Data Store". Paper Trail. Archived from the original on 2017-07-16. Retrieved 2011-04-05. Although GFS provides Google with reliable, scalable distributed file storage, it does not provide any facility for structuring the data contained in the files beyond a hierarchical directory structure and meaningful file names. It's well known that more expressive solutions are required for large data sets. Google's terabytes upon terabytes of data that they retrieve from web crawlers, amongst many other sources, need organising, so that client applications can quickly perform lookups and updates at a finer granularity than the file level. [...] The very first thing you need to know about Bigtable is that it isn't a relational database. This should come as no surprise: one persistent theme through all of these large scale distributed data store papers is that RDBMSs are hard to do with good performance. There is no hard, fixed schema in a Bigtable, no referential integrity between tables (so no foreign keys) and therefore little support for optimised joins.

[4] Sarah Pidcock (2011-01-31). "Dynamo: Amazon's Highly Available Key-value Store" (PDF). WATERLOO – CHERITON SCHOOL OF COMPUTER SCIENCE. p. 2/22. Retrieved 2011-04-05. Dynamo: a highly available and scalable distributed data store

[5] "Windows Azure Storage". Microsoft. 2011-09-16. Archived from the original on 9 November 2011. Retrieved 6 November 2011.

[1]

[2]

[4]

[5]

Distributed databases

Peer network node data stores

Examples

Distributed non-relational databases

Peer network node data stores

See also

References