Durability (database systems)

Source: Wikipedia, the free encyclopedia.

In

database systems, durability is the ACID property that guarantees that the effects of transactions that have been committed will survive permanently, even in case of failures,[1] including incidents and catastrophic events. For example, if a flight booking reports that a seat has successfully been booked, then the seat will remain booked even if the system crashes.[2]

Formally, a database system ensures the durability property if it tolerates three types of failures: transaction, system, and media failures.[1] In particular, a transaction fails if its execution is interrupted before all its operations have been processed by the system.[3] These kinds of interruptions can be originated at the transaction level by data-entry errors, operator cancellation, timeout, or application-specific errors, like withdrawing money from a bank account with insufficient funds.[1] At the system level, a failure occurs if the contents of the volatile storage are lost, due, for instance, to system crashes, like out-of-memory events.[3] At the media level, where media means a stable storage that withstands system failures, failures happen when the stable storage, or part of it, is lost.[3] These cases are typically represented by disk failures.[1]

Thus, to be durable, the database system should implement strategies and operations that guarantee that the effects of transactions that have been committed before the failure will survive the event (even by reconstruction), while the changes of incomplete transactions, which have not been committed yet at the time of failure, will be reverted and will not affect the state of the database system. These behaviours are proven to be correct when the execution of transactions has respectively the

recoverability properties.[3]

Mechanisms

A simplified finite state automaton showing possible DBMS after-failure (in red) states and the transitions (in black) that are necessary to return to a running system to achieve durability.

In transaction-based systems, the mechanisms that assure durability are historically associated with the concept of reliability of systems, as proposed by Jim Gray in 1981.[1] This concept includes durability, but it also relies on aspects of the atomicity and consistency properties.[4] Specifically, a reliability mechanism requires primitives that explicitly state the beginning, the end, and the rollback of transactions,[1] which are also implied for the other two aforementioned properties. In this article, only the mechanisms strictly related to durability have been considered. These mechanisms are divided into three levels: transaction, system, and media level. This can be seen as well for scenarios where failures could happen and that have to be considered in the design of database systems to address durability.[3]

Transaction level

Durability against failures that occur at transaction level, such as canceled calls and inconsistent actions that may be blocked before committing by

locking.[1]

System level

At system level, failures happen, by definition,

non-volatile memories (NVM) technologies grows.[7]

In systems that include non-volatile storage, durability can be achieved by keeping and flushing an immutable sequential log of the transactions to such non-volatile storage before acknowledging commitment. Thanks to their atomicity property, the transactions can be considered the unit of work in the recovery process that guarantees durability while exploiting the log. In particular, the logging mechanism is called write-ahead log (WAL) and allows durability by buffering changes to the disk before they are synchronized from the main memory. In this way, by reconstruction from the log file, all committed transactions are resilient to system-level failures, because they can be redone. Non-committed transactions, instead, are recoverable, since their operations are logged to non-volatile storage before they effectively modify the state of the database.[8] In this way, the partially executed operations can be undone without affecting the state of the system. After that, those transactions that were incomplete can be redone. Therefore, the transaction log from non-volatile storage can be reprocessed to recreate the system state right before any later system-level failure. It is worth mentioning that logging is done as a combination of tracking data and operations (i.e. transactions) for performance reasons.[9]

Media level

At media level, failure scenarios affect non-volatile storage, like hard disk drives, solid-state drives, and other types of storage hardware components.[8] To guarantee durability at this level, the database system shall rely on stable memory, which is a memory that is completely and ideally failure-resistant. This kind of memory can be achieved with mechanisms of replication and robust writing protocols.[4]

Many tools and technologies are available to provide a logical stable memory, such as the

disaster recovery.[11]

Therefore, in case of media failure, the durability of transactions is guaranteed by the ability to reconstruct the state of the database from the log files stored in the stable memory, in any way it was implemented in the database system.

Distributed databases

In distributed transactions, ensuring durability requires additional mechanisms to preserve a consistent state sequence across all database nodes. This means, for example, that a single node may not be enough to decide to conclude a transaction by committing it. In fact, the resources used in that transaction may be on other nodes, where other transactions are occurring concurrently. Otherwise, in case of failure, if consistency could not be guaranteed, it would be impossible to acknowledge a safe state of the database for recovery. For this reason, all participating nodes must coordinate before a commit can be acknowledged. This is usually done by a two-phase commit protocol.[13]

In addition, in distributed databases, even the protocols for logging and recovery shall address the issues of distributed environments, such as deadlocks, that could prevent the resilience and recoverability of transactions and, thus, durability.[13] A widely adopted family of algorithms that ensures these properties is Algorithms for Recovery and Isolation Exploiting Semantics (ARIES).[8]

See also

References

  1. ^ a b c d e f g h Gray, Jim (1981). "The transaction concept: Virtues and limitations" (PDF). VLDB. 81: 144–154.
  2. ^ "ACID Compliance: What It Means and Why You Should Care". MariaDB. 29 July 2018. Retrieved 22 September 2021.
  3. ^
    S2CID 7052304
    .
  4. ^ .
  5. ^ Svobodova, L. (1980). "MANAGEMENT OF OBJECT HISTORIES IN THE SWALLOW REPOSITORY". Mit/LCS Tr-243. USA.
  6. .
  7. .
  8. ^ .
  9. .
  10. .
  11. .
  12. .
  13. ^ .

Further reading

External links