Berkeley DB Reference Guide:
Transaction Subsystem


Berkeley DB and transactions

The Transaction subsystem makes operations atomic, consistent, isolated, and durable in the face of system and application failures. The subsystem requires that the data be properly logged and locked in order to attain these properties. Berkeley DB contains all the components necessary to transaction-protect the Berkeley DB access methods, and other forms of data may be protected if they are logged and locked appropriately.

The Transaction subsystem is created, initialized, and opened by calls to DB_ENV->open with the DB_INIT_TXN flag specified. Note that enabling transactions automatically enables logging, but does not enable locking because a single thread of control that needed atomicity and recoverability would not require it.

The txn_begin function starts a transaction, returning an opaque handle to a transaction. If the parent parameter to txn_begin is non-NULL, the new transaction is a child of the designated parent transaction.

The txn_abort function ends the designated transaction and causes all updates performed by the transaction to be undone. The end result is that the database is left in a state identical to the state that existed prior to the txn_begin. If the aborting transaction has any child transactions associated with it (even ones that have already been committed), they are also aborted. Any transactions that are unresolved (neither committed nor aborted) when the application or system fails are aborted during recovery.

The txn_commit function ends the designated transaction and makes all the updates performed by the transaction permanent, even in the face of application or system failure. If this is a parent transaction committing, all child transactions that individually committed or had not been resolved are also committed.

Transactions are identified by 32-bit unsigned integers. The ID associated with any transaction can be obtained using the txn_id function. If an application is maintaining information outside of Berkeley DB that it wishes to transaction-protect, it should use this transaction ID as the locking ID.

The txn_checkpoint function causes a transaction checkpoint. A checkpoint is performed using to a specific log sequence number (LSN), referred to as the checkpoint LSN. When a checkpoint completes successfully, it means that all data buffers whose updates are described by LSNs less than the checkpoint LSN have been written to disk. This, in turn, means that the log records less than the checkpoint LSN are no longer necessary for normal recovery (although they would be required for catastrophic recovery if the database files were lost), and all log files containing only records prior to the checkpoint LSN may be safely archived and removed.

It is possible that in order to complete a transaction checkpoint, it will be necessary to write a buffer that is currently in use (that is, it is actively being read or written by some transaction). In this case, txn_checkpoint will not be able to write the buffer because doing so might cause an inconsistent version of the page to be written to disk, and will return with an error code of DB_INCOMPLETE instead of completing successfully. In such cases, the checkpoint can simply be retried after a short delay.

The time required to run normal recovery is proportional to the amount of work done between checkpoints. If a large number of modifications happen between checkpoints, many updates recorded in the log may not have been written to disk when failure occurred, and recovery may take longer to run. Generally, if the interval between checkpoints is short, data may be being written to disk more frequently, but the recovery time will be shorter. Often, the checkpoint interval is tuned for each specific application.

The txn_stat function returns information about the status of the transaction subsystem. It is the programmatic interface used by the db_stat utility.

The transaction system is closed by a call to DB_ENV->close.

Finally, the entire transaction system may be removed using the DB_ENV->remove interface.


Copyright Sleepycat Software