Berkeley DB Reference Guide:
Transaction Subsystem


Transactions and non-Berkeley DB applications

It is possible to use the Locking, Logging and Transaction subsystems of Berkeley DB to provide transaction semantics on objects other than those described by the Berkeley DB access methods. In these cases, the application will need more explicit customization of the subsystems, as well as the development of appropriate data-structure-specific recovery functions.

For example, consider an application that provides transaction semantics on data stored in plain UNIX files accessed using the POSIX read and write system calls. The operations for which transaction protection is desired are bracketed by calls to txn_begin and txn_commit.

Before data is accessed, the application must make a call to the lock manager, lock_get, for a lock of the appropriate type (for example, read) on the object being locked. The object might be a page in the file, a byte, a range of bytes, or some key. It is up to the application to ensure that appropriate locks are acquired. Before a write is performed, the application should acquire a write lock on the object by making an appropriate call to the lock manager, lock_get. Then, the application should make a call to the log manager, log_put, to record enough information to redo the operation in case of failure after commit and to undo the operation in case of abort.

When designing applications that will use the log subsystem, it is important to remember that the application is responsible for providing any necessary structure to the log record. For example, the application must understand what part of the log record is an operation code, what part identifies the file being modified, what part is redo information, and what part is undo information.

After the log message is written, the application may issue the write system call. After all requests are issued, the application may call txn_commit. When txn_commit returns, the caller is guaranteed that all necessary log writes have been written to disk.

At any time, the application may call txn_abort, which will result in restoration of the database to a consistent pretransaction state. (The application may specify its own recovery function for this purpose using the DB_ENV->set_tx_recover function. The recovery function must be able to either reapply or undo the update depending on the context, for each different type of log record.)

If the application crashes, the recovery process uses the log to restore the database to a consistent state.

The txn_prepare function provides the core functionality to implement distributed transactions, but it does not manage the notification of distributed transaction managers. The caller is responsible for issuing txn_prepare calls to all sites participating in the transaction. If all responses are positive, the caller can issue a txn_commit. If any of the responses are negative, the caller should issue a txn_abort. In general, the txn_prepare call requires that the transaction log be flushed to disk.


Copyright Sleepycat Software