+** 2004 April 6
+**
+** The author disclaims copyright to this source code. In place of
+** a legal notice, here is a blessing:
+**
+** May you do good and not evil.
+** May you find forgiveness for yourself and forgive others.
+** May you share freely, never taking more than you give.
+**
+*************************************************************************
+** $Id: btreeInt.h,v 1.42 2009/02/03 16:51:25 danielk1977 Exp $
+**
+** This file implements a external (disk-based) database using BTrees.
+** For a detailed discussion of BTrees, refer to
+**
+** Donald E. Knuth, THE ART OF COMPUTER PROGRAMMING, Volume 3:
+** "Sorting And Searching", pages 473-480. Addison-Wesley
+** Publishing Company, Reading, Massachusetts.
+**
+** The basic idea is that each page of the file contains N database
+** entries and N+1 pointers to subpages.
+**
+** ----------------------------------------------------------------
+** | Ptr(0) | Key(0) | Ptr(1) | Key(1) | ... | Key(N-1) | Ptr(N) |
+** ----------------------------------------------------------------
+**
+** All of the keys on the page that Ptr(0) points to have values less
+** than Key(0). All of the keys on page Ptr(1) and its subpages have
+** values greater than Key(0) and less than Key(1). All of the keys
+** on Ptr(N) and its subpages have values greater than Key(N-1). And
+** so forth.
+**
+** Finding a particular key requires reading O(log(M)) pages from the
+** disk where M is the number of entries in the tree.
+**
+** In this implementation, a single file can hold one or more separate
+** BTrees. Each BTree is identified by the index of its root page. The
+** key and data for any entry are combined to form the "payload". A
+** fixed amount of payload can be carried directly on the database
+** page. If the payload is larger than the preset amount then surplus
+** bytes are stored on overflow pages. The payload for an entry
+** and the preceding pointer are combined to form a "Cell". Each
+** page has a small header which contains the Ptr(N) pointer and other
+** information such as the size of key and data.
+**
+** FORMAT DETAILS
+**
+** The file is divided into pages. The first page is called page 1,
+** the second is page 2, and so forth. A page number of zero indicates
+** "no such page". The page size can be anything between 512 and 65536.
+** Each page can be either a btree page, a freelist page or an overflow
+** page.
+**
+** The first page is always a btree page. The first 100 bytes of the first
+** page contain a special header (the "file header") that describes the file.
+** The format of the file header is as follows:
+**
+** OFFSET SIZE DESCRIPTION
+** 0 16 Header string: "SQLite format 3\000"
+** 16 2 Page size in bytes.
+** 18 1 File format write version
+** 19 1 File format read version
+** 20 1 Bytes of unused space at the end of each page
+** 21 1 Max embedded payload fraction
+** 22 1 Min embedded payload fraction
+** 23 1 Min leaf payload fraction
+** 24 4 File change counter
+** 28 4 Reserved for future use
+** 32 4 First freelist page
+** 36 4 Number of freelist pages in the file
+** 40 60 15 4-byte meta values passed to higher layers
+**
+** All of the integer values are big-endian (most significant byte first).
+**
+** The file change counter is incremented when the database is changed
+** This counter allows other processes to know when the file has changed
+** and thus when they need to flush their cache.
+**
+** The max embedded payload fraction is the amount of the total usable
+** space in a page that can be consumed by a single cell for standard
+** B-tree (non-LEAFDATA) tables. A value of 255 means 100%. The default
+** is to limit the maximum cell size so that at least 4 cells will fit
+** on one page. Thus the default max embedded payload fraction is 64.
+**
+** If the payload for a cell is larger than the max payload, then extra
+** payload is spilled to overflow pages. Once an overflow page is allocated,
+** as many bytes as possible are moved into the overflow pages without letting
+** the cell size drop below the min embedded payload fraction.
+**
+** The min leaf payload fraction is like the min embedded payload fraction
+** except that it applies to leaf nodes in a LEAFDATA tree. The maximum
+** payload fraction for a LEAFDATA tree is always 100% (or 255) and it
+** not specified in the header.
+**
+** Each btree pages is divided into three sections: The header, the
+** cell pointer array, and the cell content area. Page 1 also has a 100-byte
+** file header that occurs before the page header.
+**
+** |----------------|
+** | file header | 100 bytes. Page 1 only.
+** |----------------|
+** | page header | 8 bytes for leaves. 12 bytes for interior nodes
+** |----------------|
+** | cell pointer | | 2 bytes per cell. Sorted order.
+** | array | | Grows downward
+** | | v
+** |----------------|
+** | unallocated |
+** | space |
+** |----------------| ^ Grows upwards
+** | cell content | | Arbitrary order interspersed with freeblocks.
+** | area | | and free space fragments.
+** |----------------|
+**
+** The page headers looks like this:
+**
+** OFFSET SIZE DESCRIPTION
+** 0 1 Flags. 1: intkey, 2: zerodata, 4: leafdata, 8: leaf
+** 1 2 byte offset to the first freeblock
+** 3 2 number of cells on this page
+** 5 2 first byte of the cell content area
+** 7 1 number of fragmented free bytes
+** 8 4 Right child (the Ptr(N) value). Omitted on leaves.
+**
+** The flags define the format of this btree page. The leaf flag means that
+** this page has no children. The zerodata flag means that this page carries
+** only keys and no data. The intkey flag means that the key is a integer
+** which is stored in the key size entry of the cell header rather than in
+** the payload area.
+**
+** The cell pointer array begins on the first byte after the page header.
+** The cell pointer array contains zero or more 2-byte numbers which are
+** offsets from the beginning of the page to the cell content in the cell
+** content area. The cell pointers occur in sorted order. The system strives
+** to keep free space after the last cell pointer so that new cells can
+** be easily added without having to defragment the page.
+**
+** Cell content is stored at the very end of the page and grows toward the
+** beginning of the page.
+**
+** Unused space within the cell content area is collected into a linked list of
+** freeblocks. Each freeblock is at least 4 bytes in size. The byte offset
+** to the first freeblock is given in the header. Freeblocks occur in
+** increasing order. Because a freeblock must be at least 4 bytes in size,
+** any group of 3 or fewer unused bytes in the cell content area cannot
+** exist on the freeblock chain. A group of 3 or fewer free bytes is called
+** a fragment. The total number of bytes in all fragments is recorded.
+** in the page header at offset 7.
+**
+** SIZE DESCRIPTION
+** 2 Byte offset of the next freeblock
+** 2 Bytes in this freeblock
+**
+** Cells are of variable length. Cells are stored in the cell content area at
+** the end of the page. Pointers to the cells are in the cell pointer array
+** that immediately follows the page header. Cells is not necessarily
+** contiguous or in order, but cell pointers are contiguous and in order.
+**
+** Cell content makes use of variable length integers. A variable
+** length integer is 1 to 9 bytes where the lower 7 bits of each
+** byte are used. The integer consists of all bytes that have bit 8 set and
+** the first byte with bit 8 clear. The most significant byte of the integer
+** appears first. A variable-length integer may not be more than 9 bytes long.
+** As a special case, all 8 bytes of the 9th byte are used as data. This
+** allows a 64-bit integer to be encoded in 9 bytes.
+**
+** 0x00 becomes 0x00000000
+** 0x7f becomes 0x0000007f
+** 0x81 0x00 becomes 0x00000080
+** 0x82 0x00 becomes 0x00000100
+** 0x80 0x7f becomes 0x0000007f
+** 0x8a 0x91 0xd1 0xac 0x78 becomes 0x12345678
+** 0x81 0x81 0x81 0x81 0x01 becomes 0x10204081
+**
+** Variable length integers are used for rowids and to hold the number of
+** bytes of key and data in a btree cell.
+**
+** The content of a cell looks like this:
+**
+** SIZE DESCRIPTION
+** 4 Page number of the left child. Omitted if leaf flag is set.
+** var Number of bytes of data. Omitted if the zerodata flag is set.
+** var Number of bytes of key. Or the key itself if intkey flag is set.
+** * Payload
+** 4 First page of the overflow chain. Omitted if no overflow
+**
+** Overflow pages form a linked list. Each page except the last is completely
+** filled with data (pagesize - 4 bytes). The last page can have as little
+** as 1 byte of data.
+**
+** SIZE DESCRIPTION
+** 4 Page number of next overflow page
+** * Data
+**
+** Freelist pages come in two subtypes: trunk pages and leaf pages. The
+** file header points to the first in a linked list of trunk page. Each trunk
+** page points to multiple leaf pages. The content of a leaf page is
+** unspecified. A trunk page looks like this:
+**
+** SIZE DESCRIPTION
+** 4 Page number of next trunk page
+** 4 Number of leaf pointers on this page
+** * zero or more pages numbers of leaves
+*/
+
+/* Round up a number to the next larger multiple of 8. This is used
+** to force 8-byte alignment on 64-bit architectures.
+*/
+#define ROUND8(x) ((x+7)&~7)
+
+
+/* The following value is the maximum cell size assuming a maximum page
+** size give above.
+*/
+#define MX_CELL_SIZE(pBt) (pBt->pageSize-8)
+
+/* The maximum number of cells on a single page of the database. This
+** assumes a minimum cell size of 6 bytes (4 bytes for the cell itself
+** plus 2 bytes for the index to the cell in the page header). Such
+** small cells will be rare, but they are possible.
+*/
+#define MX_CELL(pBt) ((pBt->pageSize-8)/6)
+
+/* Forward declarations */
+typedef struct MemPage MemPage;
+typedef struct BtLock BtLock;
+
+/*
+** This is a magic string that appears at the beginning of every
+** SQLite database in order to identify the file as a real database.
+**
+** You can change this value at compile-time by specifying a
+** -DSQLITE_FILE_HEADER="..." on the compiler command-line. The
+** header must be exactly 16 bytes including the zero-terminator so
+** the string itself should be 15 characters long. If you change
+** the header, then your custom library will not be able to read
+** databases generated by the standard tools and the standard tools
+** will not be able to read databases created by your custom library.
+*/
+#ifndef SQLITE_FILE_HEADER /* 123456789 123456 */
+# define SQLITE_FILE_HEADER "SQLite format 3"
+#endif
+
+/*
+** Page type flags. An ORed combination of these flags appear as the
+** first byte of on-disk image of every BTree page.
+*/
+#define PTF_INTKEY 0x01
+#define PTF_ZERODATA 0x02
+#define PTF_LEAFDATA 0x04
+#define PTF_LEAF 0x08
+
+/*
+** As each page of the file is loaded into memory, an instance of the following
+** structure is appended and initialized to zero. This structure stores
+** information about the page that is decoded from the raw file page.
+**
+** The pParent field points back to the parent page. This allows us to
+** walk up the BTree from any leaf to the root. Care must be taken to
+** unref() the parent page pointer when this page is no longer referenced.
+** The pageDestructor() routine handles that chore.
+**
+** Access to all fields of this structure is controlled by the mutex
+** stored in MemPage.pBt->mutex.
+*/
+struct MemPage {
+ u8 isInit; /* True if previously initialized. MUST BE FIRST! */
+ u8 nOverflow; /* Number of overflow cell bodies in aCell[] */
+ u8 intKey; /* True if intkey flag is set */
+ u8 leaf; /* True if leaf flag is set */
+ u8 hasData; /* True if this page stores data */
+ u8 hdrOffset; /* 100 for page 1. 0 otherwise */
+ u8 childPtrSize; /* 0 if leaf==1. 4 if leaf==0 */
+ u16 maxLocal; /* Copy of BtShared.maxLocal or BtShared.maxLeaf */
+ u16 minLocal; /* Copy of BtShared.minLocal or BtShared.minLeaf */
+ u16 cellOffset; /* Index in aData of first cell pointer */
+ u16 nFree; /* Number of free bytes on the page */
+ u16 nCell; /* Number of cells on this page, local and ovfl */
+ u16 maskPage; /* Mask for page offset */
+ struct _OvflCell { /* Cells that will not fit on aData[] */
+ u8 *pCell; /* Pointers to the body of the overflow cell */
+ u16 idx; /* Insert this cell before idx-th non-overflow cell */
+ } aOvfl[5];
+ BtShared *pBt; /* Pointer to BtShared that this page is part of */
+ u8 *aData; /* Pointer to disk image of the page data */
+ DbPage *pDbPage; /* Pager page handle */
+ Pgno pgno; /* Page number for this page */
+};
+
+/*
+** The in-memory image of a disk page has the auxiliary information appended
+** to the end. EXTRA_SIZE is the number of bytes of space needed to hold
+** that extra information.
+*/
+#define EXTRA_SIZE sizeof(MemPage)
+
+/* A Btree handle
+**
+** A database connection contains a pointer to an instance of
+** this object for every database file that it has open. This structure
+** is opaque to the database connection. The database connection cannot
+** see the internals of this structure and only deals with pointers to
+** this structure.
+**
+** For some database files, the same underlying database cache might be
+** shared between multiple connections. In that case, each contection
+** has it own pointer to this object. But each instance of this object
+** points to the same BtShared object. The database cache and the
+** schema associated with the database file are all contained within
+** the BtShared object.
+**
+** All fields in this structure are accessed under sqlite3.mutex.
+** The pBt pointer itself may not be changed while there exists cursors
+** in the referenced BtShared that point back to this Btree since those
+** cursors have to do go through this Btree to find their BtShared and
+** they often do so without holding sqlite3.mutex.
+*/
+struct Btree {
+ sqlite3 *db; /* The database connection holding this btree */
+ BtShared *pBt; /* Sharable content of this btree */
+ u8 inTrans; /* TRANS_NONE, TRANS_READ or TRANS_WRITE */
+ u8 sharable; /* True if we can share pBt with another db */
+ u8 locked; /* True if db currently has pBt locked */
+ int wantToLock; /* Number of nested calls to sqlite3BtreeEnter() */
+ int nBackup; /* Number of backup operations reading this btree */
+ Btree *pNext; /* List of other sharable Btrees from the same db */
+ Btree *pPrev; /* Back pointer of the same list */
+};
+
+/*
+** Btree.inTrans may take one of the following values.
+**
+** If the shared-data extension is enabled, there may be multiple users
+** of the Btree structure. At most one of these may open a write transaction,
+** but any number may have active read transactions.
+*/
+#define TRANS_NONE 0
+#define TRANS_READ 1
+#define TRANS_WRITE 2
+
+/*
+** An instance of this object represents a single database file.
+**
+** A single database file can be in use as the same time by two
+** or more database connections. When two or more connections are
+** sharing the same database file, each connection has it own
+** private Btree object for the file and each of those Btrees points
+** to this one BtShared object. BtShared.nRef is the number of
+** connections currently sharing this database file.
+**
+** Fields in this structure are accessed under the BtShared.mutex
+** mutex, except for nRef and pNext which are accessed under the
+** global SQLITE_MUTEX_STATIC_MASTER mutex. The pPager field
+** may not be modified once it is initially set as long as nRef>0.
+** The pSchema field may be set once under BtShared.mutex and
+** thereafter is unchanged as long as nRef>0.
+*/
+struct BtShared {
+ Pager *pPager; /* The page cache */
+ sqlite3 *db; /* Database connection currently using this Btree */
+ BtCursor *pCursor; /* A list of all open cursors */
+ MemPage *pPage1; /* First page of the database */
+ u8 inStmt; /* True if we are in a statement subtransaction */
+ u8 readOnly; /* True if the underlying file is readonly */
+ u8 pageSizeFixed; /* True if the page size can no longer be changed */
+#ifndef SQLITE_OMIT_AUTOVACUUM
+ u8 autoVacuum; /* True if auto-vacuum is enabled */
+ u8 incrVacuum; /* True if incr-vacuum is enabled */
+#endif
+ u16 pageSize; /* Total number of bytes on a page */
+ u16 usableSize; /* Number of usable bytes on each page */
+ u16 maxLocal; /* Maximum local payload in non-LEAFDATA tables */
+ u16 minLocal; /* Minimum local payload in non-LEAFDATA tables */
+ u16 maxLeaf; /* Maximum local payload in a LEAFDATA table */
+ u16 minLeaf; /* Minimum local payload in a LEAFDATA table */
+ u8 inTransaction; /* Transaction state */
+ int nTransaction; /* Number of open transactions (read + write) */
+ void *pSchema; /* Pointer to space allocated by sqlite3BtreeSchema() */
+ void (*xFreeSchema)(void*); /* Destructor for BtShared.pSchema */
+ sqlite3_mutex *mutex; /* Non-recursive mutex required to access this struct */
+ Bitvec *pHasContent; /* Set of pages moved to free-list this transaction */
+#ifndef SQLITE_OMIT_SHARED_CACHE
+ int nRef; /* Number of references to this structure */
+ BtShared *pNext; /* Next on a list of sharable BtShared structs */
+ BtLock *pLock; /* List of locks held on this shared-btree struct */
+ Btree *pExclusive; /* Btree with an EXCLUSIVE lock on the whole db */
+#endif
+ u8 *pTmpSpace; /* BtShared.pageSize bytes of space for tmp use */
+};
+
+/*
+** An instance of the following structure is used to hold information
+** about a cell. The parseCellPtr() function fills in this structure
+** based on information extract from the raw disk page.
+*/
+typedef struct CellInfo CellInfo;
+struct CellInfo {
+ u8 *pCell; /* Pointer to the start of cell content */
+ i64 nKey; /* The key for INTKEY tables, or number of bytes in key */
+ u32 nData; /* Number of bytes of data */
+ u32 nPayload; /* Total amount of payload */
+ u16 nHeader; /* Size of the cell content header in bytes */
+ u16 nLocal; /* Amount of payload held locally */
+ u16 iOverflow; /* Offset to overflow page number. Zero if no overflow */
+ u16 nSize; /* Size of the cell content on the main b-tree page */
+};
+
+/*
+** Maximum depth of an SQLite B-Tree structure. Any B-Tree deeper than
+** this will be declared corrupt. This value is calculated based on a
+** maximum database size of 2^31 pages a minimum fanout of 2 for a
+** root-node and 3 for all other internal nodes.
+**
+** If a tree that appears to be taller than this is encountered, it is
+** assumed that the database is corrupt.
+*/
+#define BTCURSOR_MAX_DEPTH 20
+
+/*
+** A cursor is a pointer to a particular entry within a particular
+** b-tree within a database file.
+**
+** The entry is identified by its MemPage and the index in
+** MemPage.aCell[] of the entry.
+**
+** When a single database file can shared by two more database connections,
+** but cursors cannot be shared. Each cursor is associated with a
+** particular database connection identified BtCursor.pBtree.db.
+**
+** Fields in this structure are accessed under the BtShared.mutex
+** found at self->pBt->mutex.
+*/
+struct BtCursor {
+ Btree *pBtree; /* The Btree to which this cursor belongs */
+ BtShared *pBt; /* The BtShared this cursor points to */
+ BtCursor *pNext, *pPrev; /* Forms a linked list of all cursors */
+ struct KeyInfo *pKeyInfo; /* Argument passed to comparison function */
+ Pgno pgnoRoot; /* The root page of this tree */
+ CellInfo info; /* A parse of the cell we are pointing at */
+ u8 wrFlag; /* True if writable */
+ u8 atLast; /* Cursor pointing to the last entry */
+ u8 validNKey; /* True if info.nKey is valid */
+ u8 eState; /* One of the CURSOR_XXX constants (see below) */
+ void *pKey; /* Saved key that was cursor's last known position */
+ i64 nKey; /* Size of pKey, or last integer key */
+ int skip; /* (skip<0) -> Prev() is a no-op. (skip>0) -> Next() is */
+#ifndef SQLITE_OMIT_INCRBLOB
+ u8 isIncrblobHandle; /* True if this cursor is an incr. io handle */
+ Pgno *aOverflow; /* Cache of overflow page locations */
+#endif
+#ifndef NDEBUG
+ u8 pagesShuffled; /* True if Btree pages are rearranged by balance()*/
+#endif
+ i16 iPage; /* Index of current page in apPage */
+ MemPage *apPage[BTCURSOR_MAX_DEPTH]; /* Pages from root to current page */
+ u16 aiIdx[BTCURSOR_MAX_DEPTH]; /* Current index in apPage[i] */
+};
+
+/*
+** Potential values for BtCursor.eState.
+**
+** CURSOR_VALID:
+** Cursor points to a valid entry. getPayload() etc. may be called.
+**
+** CURSOR_INVALID:
+** Cursor does not point to a valid entry. This can happen (for example)
+** because the table is empty or because BtreeCursorFirst() has not been
+** called.
+**
+** CURSOR_REQUIRESEEK:
+** The table that this cursor was opened on still exists, but has been
+** modified since the cursor was last used. The cursor position is saved
+** in variables BtCursor.pKey and BtCursor.nKey. When a cursor is in
+** this state, restoreCursorPosition() can be called to attempt to
+** seek the cursor to the saved position.
+**
+** CURSOR_FAULT:
+** A unrecoverable error (an I/O error or a malloc failure) has occurred
+** on a different connection that shares the BtShared cache with this
+** cursor. The error has left the cache in an inconsistent state.
+** Do nothing else with this cursor. Any attempt to use the cursor
+** should return the error code stored in BtCursor.skip
+*/
+#define CURSOR_INVALID 0
+#define CURSOR_VALID 1
+#define CURSOR_REQUIRESEEK 2
+#define CURSOR_FAULT 3
+
+/*
+** The database page the PENDING_BYTE occupies. This page is never used.
+*/
+# define PENDING_BYTE_PAGE(pBt) PAGER_MJ_PGNO(pBt)
+
+/*
+** A linked list of the following structures is stored at BtShared.pLock.
+** Locks are added (or upgraded from READ_LOCK to WRITE_LOCK) when a cursor
+** is opened on the table with root page BtShared.iTable. Locks are removed
+** from this list when a transaction is committed or rolled back, or when
+** a btree handle is closed.
+*/
+struct BtLock {
+ Btree *pBtree; /* Btree handle holding this lock */
+ Pgno iTable; /* Root page of table */
+ u8 eLock; /* READ_LOCK or WRITE_LOCK */
+ BtLock *pNext; /* Next in BtShared.pLock list */
+};
+
+/* Candidate values for BtLock.eLock */
+#define READ_LOCK 1
+#define WRITE_LOCK 2
+
+/*
+** These macros define the location of the pointer-map entry for a
+** database page. The first argument to each is the number of usable
+** bytes on each page of the database (often 1024). The second is the
+** page number to look up in the pointer map.
+**
+** PTRMAP_PAGENO returns the database page number of the pointer-map
+** page that stores the required pointer. PTRMAP_PTROFFSET returns
+** the offset of the requested map entry.
+**
+** If the pgno argument passed to PTRMAP_PAGENO is a pointer-map page,
+** then pgno is returned. So (pgno==PTRMAP_PAGENO(pgsz, pgno)) can be
+** used to test if pgno is a pointer-map page. PTRMAP_ISPAGE implements
+** this test.
+*/
+#define PTRMAP_PAGENO(pBt, pgno) ptrmapPageno(pBt, pgno)
+#define PTRMAP_PTROFFSET(pgptrmap, pgno) (5*(pgno-pgptrmap-1))
+#define PTRMAP_ISPAGE(pBt, pgno) (PTRMAP_PAGENO((pBt),(pgno))==(pgno))
+
+/*
+** The pointer map is a lookup table that identifies the parent page for
+** each child page in the database file. The parent page is the page that
+** contains a pointer to the child. Every page in the database contains
+** 0 or 1 parent pages. (In this context 'database page' refers
+** to any page that is not part of the pointer map itself.) Each pointer map
+** entry consists of a single byte 'type' and a 4 byte parent page number.
+** The PTRMAP_XXX identifiers below are the valid types.
+**
+** The purpose of the pointer map is to facility moving pages from one
+** position in the file to another as part of autovacuum. When a page
+** is moved, the pointer in its parent must be updated to point to the
+** new location. The pointer map is used to locate the parent page quickly.
+**
+** PTRMAP_ROOTPAGE: The database page is a root-page. The page-number is not
+** used in this case.
+**
+** PTRMAP_FREEPAGE: The database page is an unused (free) page. The page-number
+** is not used in this case.
+**
+** PTRMAP_OVERFLOW1: The database page is the first page in a list of
+** overflow pages. The page number identifies the page that
+** contains the cell with a pointer to this overflow page.
+**
+** PTRMAP_OVERFLOW2: The database page is the second or later page in a list of
+** overflow pages. The page-number identifies the previous
+** page in the overflow page list.
+**
+** PTRMAP_BTREE: The database page is a non-root btree page. The page number
+** identifies the parent page in the btree.
+*/
+#define PTRMAP_ROOTPAGE 1
+#define PTRMAP_FREEPAGE 2
+#define PTRMAP_OVERFLOW1 3
+#define PTRMAP_OVERFLOW2 4
+#define PTRMAP_BTREE 5
+
+/* A bunch of assert() statements to check the transaction state variables
+** of handle p (type Btree*) are internally consistent.
+*/
+#define btreeIntegrity(p) \
+ assert( p->pBt->inTransaction!=TRANS_NONE || p->pBt->nTransaction==0 ); \
+ assert( p->pBt->inTransaction>=p->inTrans );
+
+
+/*
+** The ISAUTOVACUUM macro is used within balance_nonroot() to determine
+** if the database supports auto-vacuum or not. Because it is used
+** within an expression that is an argument to another macro
+** (sqliteMallocRaw), it is not possible to use conditional compilation.
+** So, this macro is defined instead.
+*/
+#ifndef SQLITE_OMIT_AUTOVACUUM
+#define ISAUTOVACUUM (pBt->autoVacuum)
+#else
+#define ISAUTOVACUUM 0
+#endif
+
+
+/*
+** This structure is passed around through all the sanity checking routines
+** in order to keep track of some global state information.
+*/
+typedef struct IntegrityCk IntegrityCk;
+struct IntegrityCk {
+ BtShared *pBt; /* The tree being checked out */
+ Pager *pPager; /* The associated pager. Also accessible by pBt->pPager */
+ Pgno nPage; /* Number of pages in the database */
+ int *anRef; /* Number of times each page is referenced */
+ int mxErr; /* Stop accumulating errors when this reaches zero */
+ int nErr; /* Number of messages written to zErrMsg so far */
+ int mallocFailed; /* A memory allocation error has occurred */
+ StrAccum errMsg; /* Accumulate the error message text here */
+};
+
+/*
+** Read or write a two- and four-byte big-endian integer values.
+*/
+#define get2byte(x) ((x)[0]<<8 | (x)[1])
+#define put2byte(p,v) ((p)[0] = (u8)((v)>>8), (p)[1] = (u8)(v))
+#define get4byte sqlite3Get4byte
+#define put4byte sqlite3Put4byte
+
+/*
+** Internal routines that should be accessed by the btree layer only.
+*/
+SQLITE_PRIVATE int sqlite3BtreeGetPage(BtShared*, Pgno, MemPage**, int);
+SQLITE_PRIVATE int sqlite3BtreeInitPage(MemPage *pPage);
+SQLITE_PRIVATE void sqlite3BtreeParseCellPtr(MemPage*, u8*, CellInfo*);
+SQLITE_PRIVATE void sqlite3BtreeParseCell(MemPage*, int, CellInfo*);
+SQLITE_PRIVATE int sqlite3BtreeRestoreCursorPosition(BtCursor *pCur);
+SQLITE_PRIVATE void sqlite3BtreeGetTempCursor(BtCursor *pCur, BtCursor *pTempCur);
+SQLITE_PRIVATE void sqlite3BtreeReleaseTempCursor(BtCursor *pCur);
+SQLITE_PRIVATE void sqlite3BtreeMoveToParent(BtCursor *pCur);
+
+/************** End of btreeInt.h ********************************************/
+/************** Continuing where we left off in btmutex.c ********************/
+#if SQLITE_THREADSAFE && !defined(SQLITE_OMIT_SHARED_CACHE)
+
+
+/*
+** Enter a mutex on the given BTree object.
+**
+** If the object is not sharable, then no mutex is ever required
+** and this routine is a no-op. The underlying mutex is non-recursive.
+** But we keep a reference count in Btree.wantToLock so the behavior
+** of this interface is recursive.
+**
+** To avoid deadlocks, multiple Btrees are locked in the same order
+** by all database connections. The p->pNext is a list of other
+** Btrees belonging to the same database connection as the p Btree
+** which need to be locked after p. If we cannot get a lock on
+** p, then first unlock all of the others on p->pNext, then wait
+** for the lock to become available on p, then relock all of the
+** subsequent Btrees that desire a lock.
+*/
+SQLITE_PRIVATE void sqlite3BtreeEnter(Btree *p){
+ Btree *pLater;
+
+ /* Some basic sanity checking on the Btree. The list of Btrees
+ ** connected by pNext and pPrev should be in sorted order by
+ ** Btree.pBt value. All elements of the list should belong to
+ ** the same connection. Only shared Btrees are on the list. */
+ assert( p->pNext==0 || p->pNext->pBt>p->pBt );
+ assert( p->pPrev==0 || p->pPrev->pBt<p->pBt );
+ assert( p->pNext==0 || p->pNext->db==p->db );
+ assert( p->pPrev==0 || p->pPrev->db==p->db );
+ assert( p->sharable || (p->pNext==0 && p->pPrev==0) );
+
+ /* Check for locking consistency */
+ assert( !p->locked || p->wantToLock>0 );
+ assert( p->sharable || p->wantToLock==0 );
+
+ /* We should already hold a lock on the database connection */
+ assert( sqlite3_mutex_held(p->db->mutex) );
+
+ if( !p->sharable ) return;
+ p->wantToLock++;
+ if( p->locked ) return;
+
+ /* In most cases, we should be able to acquire the lock we
+ ** want without having to go throught the ascending lock
+ ** procedure that follows. Just be sure not to block.
+ */
+ if( sqlite3_mutex_try(p->pBt->mutex)==SQLITE_OK ){
+ p->locked = 1;
+ return;
+ }
+
+ /* To avoid deadlock, first release all locks with a larger
+ ** BtShared address. Then acquire our lock. Then reacquire
+ ** the other BtShared locks that we used to hold in ascending
+ ** order.
+ */
+ for(pLater=p->pNext; pLater; pLater=pLater->pNext){
+ assert( pLater->sharable );
+ assert( pLater->pNext==0 || pLater->pNext->pBt>pLater->pBt );
+ assert( !pLater->locked || pLater->wantToLock>0 );
+ if( pLater->locked ){
+ sqlite3_mutex_leave(pLater->pBt->mutex);
+ pLater->locked = 0;
+ }
+ }
+ sqlite3_mutex_enter(p->pBt->mutex);
+ p->locked = 1;
+ for(pLater=p->pNext; pLater; pLater=pLater->pNext){
+ if( pLater->wantToLock ){
+ sqlite3_mutex_enter(pLater->pBt->mutex);
+ pLater->locked = 1;
+ }
+ }
+}
+
+/*
+** Exit the recursive mutex on a Btree.
+*/
+SQLITE_PRIVATE void sqlite3BtreeLeave(Btree *p){
+ if( p->sharable ){
+ assert( p->wantToLock>0 );
+ p->wantToLock--;
+ if( p->wantToLock==0 ){
+ assert( p->locked );
+ sqlite3_mutex_leave(p->pBt->mutex);
+ p->locked = 0;
+ }
+ }
+}
+
+#ifndef NDEBUG
+/*
+** Return true if the BtShared mutex is held on the btree.
+**
+** This routine makes no determination one why or another if the
+** database connection mutex is held.
+**
+** This routine is used only from within assert() statements.
+*/
+SQLITE_PRIVATE int sqlite3BtreeHoldsMutex(Btree *p){
+ return (p->sharable==0 ||
+ (p->locked && p->wantToLock && sqlite3_mutex_held(p->pBt->mutex)));
+}
+#endif
+
+
+#ifndef SQLITE_OMIT_INCRBLOB
+/*
+** Enter and leave a mutex on a Btree given a cursor owned by that
+** Btree. These entry points are used by incremental I/O and can be
+** omitted if that module is not used.
+*/
+SQLITE_PRIVATE void sqlite3BtreeEnterCursor(BtCursor *pCur){
+ sqlite3BtreeEnter(pCur->pBtree);
+}
+SQLITE_PRIVATE void sqlite3BtreeLeaveCursor(BtCursor *pCur){
+ sqlite3BtreeLeave(pCur->pBtree);
+}
+#endif /* SQLITE_OMIT_INCRBLOB */
+
+
+/*
+** Enter the mutex on every Btree associated with a database
+** connection. This is needed (for example) prior to parsing
+** a statement since we will be comparing table and column names
+** against all schemas and we do not want those schemas being
+** reset out from under us.
+**
+** There is a corresponding leave-all procedures.
+**
+** Enter the mutexes in accending order by BtShared pointer address
+** to avoid the possibility of deadlock when two threads with
+** two or more btrees in common both try to lock all their btrees
+** at the same instant.
+*/
+SQLITE_PRIVATE void sqlite3BtreeEnterAll(sqlite3 *db){
+ int i;
+ Btree *p, *pLater;
+ assert( sqlite3_mutex_held(db->mutex) );
+ for(i=0; i<db->nDb; i++){
+ p = db->aDb[i].pBt;
+ if( p && p->sharable ){
+ p->wantToLock++;
+ if( !p->locked ){
+ assert( p->wantToLock==1 );
+ while( p->pPrev ) p = p->pPrev;
+ while( p->locked && p->pNext ) p = p->pNext;
+ for(pLater = p->pNext; pLater; pLater=pLater->pNext){
+ if( pLater->locked ){
+ sqlite3_mutex_leave(pLater->pBt->mutex);
+ pLater->locked = 0;
+ }
+ }
+ while( p ){
+ sqlite3_mutex_enter(p->pBt->mutex);
+ p->locked++;
+ p = p->pNext;
+ }
+ }
+ }
+ }
+}
+SQLITE_PRIVATE void sqlite3BtreeLeaveAll(sqlite3 *db){
+ int i;
+ Btree *p;
+ assert( sqlite3_mutex_held(db->mutex) );
+ for(i=0; i<db->nDb; i++){
+ p = db->aDb[i].pBt;
+ if( p && p->sharable ){
+ assert( p->wantToLock>0 );
+ p->wantToLock--;
+ if( p->wantToLock==0 ){
+ assert( p->locked );
+ sqlite3_mutex_leave(p->pBt->mutex);
+ p->locked = 0;
+ }
+ }
+ }
+}
+
+#ifndef NDEBUG
+/*
+** Return true if the current thread holds the database connection
+** mutex and all required BtShared mutexes.
+**
+** This routine is used inside assert() statements only.
+*/
+SQLITE_PRIVATE int sqlite3BtreeHoldsAllMutexes(sqlite3 *db){
+ int i;
+ if( !sqlite3_mutex_held(db->mutex) ){
+ return 0;
+ }
+ for(i=0; i<db->nDb; i++){
+ Btree *p;
+ p = db->aDb[i].pBt;
+ if( p && p->sharable &&
+ (p->wantToLock==0 || !sqlite3_mutex_held(p->pBt->mutex)) ){
+ return 0;
+ }
+ }
+ return 1;
+}
+#endif /* NDEBUG */
+
+/*
+** Add a new Btree pointer to a BtreeMutexArray.
+** if the pointer can possibly be shared with
+** another database connection.
+**
+** The pointers are kept in sorted order by pBtree->pBt. That
+** way when we go to enter all the mutexes, we can enter them
+** in order without every having to backup and retry and without
+** worrying about deadlock.
+**
+** The number of shared btrees will always be small (usually 0 or 1)
+** so an insertion sort is an adequate algorithm here.
+*/
+SQLITE_PRIVATE void sqlite3BtreeMutexArrayInsert(BtreeMutexArray *pArray, Btree *pBtree){
+ int i, j;
+ BtShared *pBt;
+ if( pBtree==0 || pBtree->sharable==0 ) return;
+#ifndef NDEBUG
+ {
+ for(i=0; i<pArray->nMutex; i++){
+ assert( pArray->aBtree[i]!=pBtree );
+ }
+ }
+#endif
+ assert( pArray->nMutex>=0 );
+ assert( pArray->nMutex<ArraySize(pArray->aBtree)-1 );
+ pBt = pBtree->pBt;
+ for(i=0; i<pArray->nMutex; i++){
+ assert( pArray->aBtree[i]!=pBtree );
+ if( pArray->aBtree[i]->pBt>pBt ){
+ for(j=pArray->nMutex; j>i; j--){
+ pArray->aBtree[j] = pArray->aBtree[j-1];
+ }
+ pArray->aBtree[i] = pBtree;
+ pArray->nMutex++;
+ return;
+ }
+ }
+ pArray->aBtree[pArray->nMutex++] = pBtree;
+}
+
+/*
+** Enter the mutex of every btree in the array. This routine is
+** called at the beginning of sqlite3VdbeExec(). The mutexes are
+** exited at the end of the same function.
+*/
+SQLITE_PRIVATE void sqlite3BtreeMutexArrayEnter(BtreeMutexArray *pArray){
+ int i;
+ for(i=0; i<pArray->nMutex; i++){
+ Btree *p = pArray->aBtree[i];
+ /* Some basic sanity checking */
+ assert( i==0 || pArray->aBtree[i-1]->pBt<p->pBt );
+ assert( !p->locked || p->wantToLock>0 );
+
+ /* We should already hold a lock on the database connection */
+ assert( sqlite3_mutex_held(p->db->mutex) );
+
+ p->wantToLock++;
+ if( !p->locked && p->sharable ){
+ sqlite3_mutex_enter(p->pBt->mutex);
+ p->locked = 1;
+ }
+ }
+}
+
+/*
+** Leave the mutex of every btree in the group.
+*/
+SQLITE_PRIVATE void sqlite3BtreeMutexArrayLeave(BtreeMutexArray *pArray){
+ int i;
+ for(i=0; i<pArray->nMutex; i++){
+ Btree *p = pArray->aBtree[i];
+ /* Some basic sanity checking */
+ assert( i==0 || pArray->aBtree[i-1]->pBt<p->pBt );
+ assert( p->locked || !p->sharable );
+ assert( p->wantToLock>0 );
+
+ /* We should already hold a lock on the database connection */
+ assert( sqlite3_mutex_held(p->db->mutex) );
+
+ p->wantToLock--;
+ if( p->wantToLock==0 && p->locked ){
+ sqlite3_mutex_leave(p->pBt->mutex);
+ p->locked = 0;
+ }
+ }
+}
+
+
+#endif /* SQLITE_THREADSAFE && !SQLITE_OMIT_SHARED_CACHE */
+
+/************** End of btmutex.c *********************************************/
+/************** Begin file btree.c *******************************************/
+/*
+** 2004 April 6
+**
+** The author disclaims copyright to this source code. In place of
+** a legal notice, here is a blessing:
+**
+** May you do good and not evil.
+** May you find forgiveness for yourself and forgive others.
+** May you share freely, never taking more than you give.
+**
+*************************************************************************
+** $Id: btree.c,v 1.565 2009/02/04 01:49:30 shane Exp $
+**
+** This file implements a external (disk-based) database using BTrees.
+** See the header comment on "btreeInt.h" for additional information.
+** Including a description of file format and an overview of operation.
+*/
+
+/*
+** The header string that appears at the beginning of every
+** SQLite database.
+*/
+static const char zMagicHeader[] = SQLITE_FILE_HEADER;
+
+/*
+** Set this global variable to 1 to enable tracing using the TRACE
+** macro.
+*/
+#if 0
+int sqlite3BtreeTrace=0; /* True to enable tracing */
+# define TRACE(X) if(sqlite3BtreeTrace){printf X;fflush(stdout);}
+#else
+# define TRACE(X)
+#endif
+
+
+
+#ifndef SQLITE_OMIT_SHARED_CACHE
+/*
+** A list of BtShared objects that are eligible for participation
+** in shared cache. This variable has file scope during normal builds,
+** but the test harness needs to access it so we make it global for
+** test builds.
+*/
+#ifdef SQLITE_TEST
+SQLITE_PRIVATE BtShared *SQLITE_WSD sqlite3SharedCacheList = 0;
+#else
+static BtShared *SQLITE_WSD sqlite3SharedCacheList = 0;
+#endif
+#endif /* SQLITE_OMIT_SHARED_CACHE */
+
+#ifndef SQLITE_OMIT_SHARED_CACHE
+/*
+** Enable or disable the shared pager and schema features.
+**
+** This routine has no effect on existing database connections.
+** The shared cache setting effects only future calls to
+** sqlite3_open(), sqlite3_open16(), or sqlite3_open_v2().
+*/
+SQLITE_API int sqlite3_enable_shared_cache(int enable){
+ sqlite3GlobalConfig.sharedCacheEnabled = enable;
+ return SQLITE_OK;
+}
+#endif
+
+
+/*
+** Forward declaration
+*/
+static int checkReadLocks(Btree*, Pgno, BtCursor*, i64);
+
+
+#ifdef SQLITE_OMIT_SHARED_CACHE
+ /*
+ ** The functions queryTableLock(), lockTable() and unlockAllTables()
+ ** manipulate entries in the BtShared.pLock linked list used to store
+ ** shared-cache table level locks. If the library is compiled with the
+ ** shared-cache feature disabled, then there is only ever one user
+ ** of each BtShared structure and so this locking is not necessary.
+ ** So define the lock related functions as no-ops.
+ */
+ #define queryTableLock(a,b,c) SQLITE_OK
+ #define lockTable(a,b,c) SQLITE_OK
+ #define unlockAllTables(a)
+#endif
+
+#ifndef SQLITE_OMIT_SHARED_CACHE
+/*
+** Query to see if btree handle p may obtain a lock of type eLock
+** (READ_LOCK or WRITE_LOCK) on the table with root-page iTab. Return
+** SQLITE_OK if the lock may be obtained (by calling lockTable()), or
+** SQLITE_LOCKED if not.
+*/
+static int queryTableLock(Btree *p, Pgno iTab, u8 eLock){
+ BtShared *pBt = p->pBt;
+ BtLock *pIter;
+
+ assert( sqlite3BtreeHoldsMutex(p) );
+ assert( eLock==READ_LOCK || eLock==WRITE_LOCK );
+ assert( p->db!=0 );
+
+ /* This is a no-op if the shared-cache is not enabled */
+ if( !p->sharable ){
+ return SQLITE_OK;
+ }
+
+ /* If some other connection is holding an exclusive lock, the
+ ** requested lock may not be obtained.
+ */
+ if( pBt->pExclusive && pBt->pExclusive!=p ){
+ return SQLITE_LOCKED;
+ }
+
+ /* This (along with lockTable()) is where the ReadUncommitted flag is
+ ** dealt with. If the caller is querying for a read-lock and the flag is
+ ** set, it is unconditionally granted - even if there are write-locks
+ ** on the table. If a write-lock is requested, the ReadUncommitted flag
+ ** is not considered.
+ **
+ ** In function lockTable(), if a read-lock is demanded and the
+ ** ReadUncommitted flag is set, no entry is added to the locks list
+ ** (BtShared.pLock).
+ **
+ ** To summarize: If the ReadUncommitted flag is set, then read cursors do
+ ** not create or respect table locks. The locking procedure for a
+ ** write-cursor does not change.
+ */
+ if(
+ 0==(p->db->flags&SQLITE_ReadUncommitted) ||
+ eLock==WRITE_LOCK ||
+ iTab==MASTER_ROOT
+ ){
+ for(pIter=pBt->pLock; pIter; pIter=pIter->pNext){
+ if( pIter->pBtree!=p && pIter->iTable==iTab &&
+ (pIter->eLock!=eLock || eLock!=READ_LOCK) ){
+ return SQLITE_LOCKED;
+ }
+ }
+ }
+ return SQLITE_OK;
+}
+#endif /* !SQLITE_OMIT_SHARED_CACHE */
+
+#ifndef SQLITE_OMIT_SHARED_CACHE
+/*
+** Add a lock on the table with root-page iTable to the shared-btree used
+** by Btree handle p. Parameter eLock must be either READ_LOCK or
+** WRITE_LOCK.
+**
+** SQLITE_OK is returned if the lock is added successfully. SQLITE_BUSY and
+** SQLITE_NOMEM may also be returned.
+*/
+static int lockTable(Btree *p, Pgno iTable, u8 eLock){
+ BtShared *pBt = p->pBt;
+ BtLock *pLock = 0;
+ BtLock *pIter;
+
+ assert( sqlite3BtreeHoldsMutex(p) );
+ assert( eLock==READ_LOCK || eLock==WRITE_LOCK );
+ assert( p->db!=0 );
+
+ /* This is a no-op if the shared-cache is not enabled */
+ if( !p->sharable ){
+ return SQLITE_OK;
+ }
+
+ assert( SQLITE_OK==queryTableLock(p, iTable, eLock) );
+
+ /* If the read-uncommitted flag is set and a read-lock is requested,
+ ** return early without adding an entry to the BtShared.pLock list. See
+ ** comment in function queryTableLock() for more info on handling
+ ** the ReadUncommitted flag.
+ */
+ if(
+ (p->db->flags&SQLITE_ReadUncommitted) &&
+ (eLock==READ_LOCK) &&
+ iTable!=MASTER_ROOT
+ ){
+ return SQLITE_OK;
+ }
+
+ /* First search the list for an existing lock on this table. */
+ for(pIter=pBt->pLock; pIter; pIter=pIter->pNext){
+ if( pIter->iTable==iTable && pIter->pBtree==p ){
+ pLock = pIter;
+ break;
+ }
+ }
+
+ /* If the above search did not find a BtLock struct associating Btree p
+ ** with table iTable, allocate one and link it into the list.
+ */
+ if( !pLock ){
+ pLock = (BtLock *)sqlite3MallocZero(sizeof(BtLock));
+ if( !pLock ){
+ return SQLITE_NOMEM;
+ }
+ pLock->iTable = iTable;
+ pLock->pBtree = p;
+ pLock->pNext = pBt->pLock;
+ pBt->pLock = pLock;
+ }
+
+ /* Set the BtLock.eLock variable to the maximum of the current lock
+ ** and the requested lock. This means if a write-lock was already held
+ ** and a read-lock requested, we don't incorrectly downgrade the lock.
+ */
+ assert( WRITE_LOCK>READ_LOCK );
+ if( eLock>pLock->eLock ){
+ pLock->eLock = eLock;
+ }
+
+ return SQLITE_OK;
+}
+#endif /* !SQLITE_OMIT_SHARED_CACHE */
+
+#ifndef SQLITE_OMIT_SHARED_CACHE
+/*
+** Release all the table locks (locks obtained via calls to the lockTable()
+** procedure) held by Btree handle p.
+*/
+static void unlockAllTables(Btree *p){
+ BtShared *pBt = p->pBt;
+ BtLock **ppIter = &pBt->pLock;
+
+ assert( sqlite3BtreeHoldsMutex(p) );
+ assert( p->sharable || 0==*ppIter );
+
+ while( *ppIter ){
+ BtLock *pLock = *ppIter;
+ assert( pBt->pExclusive==0 || pBt->pExclusive==pLock->pBtree );
+ if( pLock->pBtree==p ){
+ *ppIter = pLock->pNext;
+ sqlite3_free(pLock);
+ }else{
+ ppIter = &pLock->pNext;
+ }
+ }
+
+ if( pBt->pExclusive==p ){
+ pBt->pExclusive = 0;
+ }
+}
+#endif /* SQLITE_OMIT_SHARED_CACHE */
+
+static void releasePage(MemPage *pPage); /* Forward reference */
+
+/*
+** Verify that the cursor holds a mutex on the BtShared
+*/
+#ifndef NDEBUG
+static int cursorHoldsMutex(BtCursor *p){
+ return sqlite3_mutex_held(p->pBt->mutex);
+}
+#endif
+
+
+#ifndef SQLITE_OMIT_INCRBLOB
+/*
+** Invalidate the overflow page-list cache for cursor pCur, if any.
+*/
+static void invalidateOverflowCache(BtCursor *pCur){
+ assert( cursorHoldsMutex(pCur) );
+ sqlite3_free(pCur->aOverflow);
+ pCur->aOverflow = 0;
+}
+
+/*
+** Invalidate the overflow page-list cache for all cursors opened
+** on the shared btree structure pBt.
+*/
+static void invalidateAllOverflowCache(BtShared *pBt){
+ BtCursor *p;
+ assert( sqlite3_mutex_held(pBt->mutex) );
+ for(p=pBt->pCursor; p; p=p->pNext){
+ invalidateOverflowCache(p);
+ }
+}
+#else
+ #define invalidateOverflowCache(x)
+ #define invalidateAllOverflowCache(x)
+#endif
+
+/*
+** Set bit pgno of the BtShared.pHasContent bitvec. This is called
+** when a page that previously contained data becomes a free-list leaf
+** page.
+**
+** The BtShared.pHasContent bitvec exists to work around an obscure
+** bug caused by the interaction of two useful IO optimizations surrounding
+** free-list leaf pages:
+**
+** 1) When all data is deleted from a page and the page becomes
+** a free-list leaf page, the page is not written to the database
+** (as free-list leaf pages contain no meaningful data). Sometimes
+** such a page is not even journalled (as it will not be modified,
+** why bother journalling it?).
+**
+** 2) When a free-list leaf page is reused, its content is not read
+** from the database or written to the journal file (why should it
+** be, if it is not at all meaningful?).
+**
+** By themselves, these optimizations work fine and provide a handy
+** performance boost to bulk delete or insert operations. However, if
+** a page is moved to the free-list and then reused within the same
+** transaction, a problem comes up. If the page is not journalled when
+** it is moved to the free-list and it is also not journalled when it
+** is extracted from the free-list and reused, then the original data
+** may be lost. In the event of a rollback, it may not be possible
+** to restore the database to its original configuration.
+**
+** The solution is the BtShared.pHasContent bitvec. Whenever a page is
+** moved to become a free-list leaf page, the corresponding bit is
+** set in the bitvec. Whenever a leaf page is extracted from the free-list,
+** optimization 2 above is ommitted if the corresponding bit is already
+** set in BtShared.pHasContent. The contents of the bitvec are cleared
+** at the end of every transaction.
+*/
+static int btreeSetHasContent(BtShared *pBt, Pgno pgno){
+ int rc = SQLITE_OK;
+ if( !pBt->pHasContent ){
+ int nPage;
+ rc = sqlite3PagerPagecount(pBt->pPager, &nPage);
+ if( rc==SQLITE_OK ){
+ pBt->pHasContent = sqlite3BitvecCreate((u32)nPage);
+ if( !pBt->pHasContent ){
+ rc = SQLITE_NOMEM;
+ }
+ }
+ }
+ if( rc==SQLITE_OK && pgno<=sqlite3BitvecSize(pBt->pHasContent) ){
+ rc = sqlite3BitvecSet(pBt->pHasContent, pgno);
+ }
+ return rc;
+}
+
+/*
+** Query the BtShared.pHasContent vector.
+**
+** This function is called when a free-list leaf page is removed from the
+** free-list for reuse. It returns false if it is safe to retrieve the
+** page from the pager layer with the 'no-content' flag set. True otherwise.
+*/
+static int btreeGetHasContent(BtShared *pBt, Pgno pgno){
+ Bitvec *p = pBt->pHasContent;
+ return (p && (pgno>sqlite3BitvecSize(p) || sqlite3BitvecTest(p, pgno)));
+}
+
+/*
+** Clear (destroy) the BtShared.pHasContent bitvec. This should be
+** invoked at the conclusion of each write-transaction.
+*/
+static void btreeClearHasContent(BtShared *pBt){
+ sqlite3BitvecDestroy(pBt->pHasContent);
+ pBt->pHasContent = 0;
+}
+
+/*
+** Save the current cursor position in the variables BtCursor.nKey
+** and BtCursor.pKey. The cursor's state is set to CURSOR_REQUIRESEEK.
+*/
+static int saveCursorPosition(BtCursor *pCur){
+ int rc;
+
+ assert( CURSOR_VALID==pCur->eState );
+ assert( 0==pCur->pKey );
+ assert( cursorHoldsMutex(pCur) );
+
+ rc = sqlite3BtreeKeySize(pCur, &pCur->nKey);
+
+ /* If this is an intKey table, then the above call to BtreeKeySize()
+ ** stores the integer key in pCur->nKey. In this case this value is
+ ** all that is required. Otherwise, if pCur is not open on an intKey
+ ** table, then malloc space for and store the pCur->nKey bytes of key
+ ** data.
+ */
+ if( rc==SQLITE_OK && 0==pCur->apPage[0]->intKey){
+ void *pKey = sqlite3Malloc( (int)pCur->nKey );
+ if( pKey ){
+ rc = sqlite3BtreeKey(pCur, 0, (int)pCur->nKey, pKey);
+ if( rc==SQLITE_OK ){
+ pCur->pKey = pKey;
+ }else{
+ sqlite3_free(pKey);
+ }
+ }else{
+ rc = SQLITE_NOMEM;
+ }
+ }
+ assert( !pCur->apPage[0]->intKey || !pCur->pKey );
+
+ if( rc==SQLITE_OK ){
+ int i;
+ for(i=0; i<=pCur->iPage; i++){
+ releasePage(pCur->apPage[i]);
+ pCur->apPage[i] = 0;
+ }
+ pCur->iPage = -1;
+ pCur->eState = CURSOR_REQUIRESEEK;
+ }
+
+ invalidateOverflowCache(pCur);
+ return rc;
+}
+
+/*
+** Save the positions of all cursors except pExcept open on the table
+** with root-page iRoot. Usually, this is called just before cursor
+** pExcept is used to modify the table (BtreeDelete() or BtreeInsert()).
+*/
+static int saveAllCursors(BtShared *pBt, Pgno iRoot, BtCursor *pExcept){
+ BtCursor *p;
+ assert( sqlite3_mutex_held(pBt->mutex) );
+ assert( pExcept==0 || pExcept->pBt==pBt );
+ for(p=pBt->pCursor; p; p=p->pNext){
+ if( p!=pExcept && (0==iRoot || p->pgnoRoot==iRoot) &&
+ p->eState==CURSOR_VALID ){
+ int rc = saveCursorPosition(p);
+ if( SQLITE_OK!=rc ){
+ return rc;
+ }
+ }
+ }
+ return SQLITE_OK;
+}
+
+/*
+** Clear the current cursor position.
+*/
+SQLITE_PRIVATE void sqlite3BtreeClearCursor(BtCursor *pCur){
+ assert( cursorHoldsMutex(pCur) );
+ sqlite3_free(pCur->pKey);
+ pCur->pKey = 0;
+ pCur->eState = CURSOR_INVALID;
+}
+
+/*
+** Restore the cursor to the position it was in (or as close to as possible)
+** when saveCursorPosition() was called. Note that this call deletes the
+** saved position info stored by saveCursorPosition(), so there can be
+** at most one effective restoreCursorPosition() call after each
+** saveCursorPosition().
+*/
+SQLITE_PRIVATE int sqlite3BtreeRestoreCursorPosition(BtCursor *pCur){
+ int rc;
+ assert( cursorHoldsMutex(pCur) );
+ assert( pCur->eState>=CURSOR_REQUIRESEEK );
+ if( pCur->eState==CURSOR_FAULT ){
+ return pCur->skip;
+ }
+ pCur->eState = CURSOR_INVALID;
+ rc = sqlite3BtreeMoveto(pCur, pCur->pKey, pCur->nKey, 0, &pCur->skip);
+ if( rc==SQLITE_OK ){
+ sqlite3_free(pCur->pKey);
+ pCur->pKey = 0;
+ assert( pCur->eState==CURSOR_VALID || pCur->eState==CURSOR_INVALID );
+ }
+ return rc;
+}
+
+#define restoreCursorPosition(p) \
+ (p->eState>=CURSOR_REQUIRESEEK ? \
+ sqlite3BtreeRestoreCursorPosition(p) : \
+ SQLITE_OK)
+
+/*
+** Determine whether or not a cursor has moved from the position it
+** was last placed at. Cursors can move when the row they are pointing
+** at is deleted out from under them.
+**
+** This routine returns an error code if something goes wrong. The
+** integer *pHasMoved is set to one if the cursor has moved and 0 if not.
+*/
+SQLITE_PRIVATE int sqlite3BtreeCursorHasMoved(BtCursor *pCur, int *pHasMoved){
+ int rc;
+
+ rc = restoreCursorPosition(pCur);
+ if( rc ){
+ *pHasMoved = 1;
+ return rc;
+ }
+ if( pCur->eState!=CURSOR_VALID || pCur->skip!=0 ){
+ *pHasMoved = 1;
+ }else{
+ *pHasMoved = 0;
+ }
+ return SQLITE_OK;
+}
+
+#ifndef SQLITE_OMIT_AUTOVACUUM
+/*
+** Given a page number of a regular database page, return the page
+** number for the pointer-map page that contains the entry for the
+** input page number.
+*/
+static Pgno ptrmapPageno(BtShared *pBt, Pgno pgno){
+ int nPagesPerMapPage;
+ Pgno iPtrMap, ret;
+ assert( sqlite3_mutex_held(pBt->mutex) );
+ nPagesPerMapPage = (pBt->usableSize/5)+1;
+ iPtrMap = (pgno-2)/nPagesPerMapPage;
+ ret = (iPtrMap*nPagesPerMapPage) + 2;
+ if( ret==PENDING_BYTE_PAGE(pBt) ){
+ ret++;
+ }
+ return ret;
+}
+
+/*
+** Write an entry into the pointer map.
+**
+** This routine updates the pointer map entry for page number 'key'
+** so that it maps to type 'eType' and parent page number 'pgno'.
+** An error code is returned if something goes wrong, otherwise SQLITE_OK.
+*/
+static int ptrmapPut(BtShared *pBt, Pgno key, u8 eType, Pgno parent){
+ DbPage *pDbPage; /* The pointer map page */
+ u8 *pPtrmap; /* The pointer map data */
+ Pgno iPtrmap; /* The pointer map page number */
+ int offset; /* Offset in pointer map page */
+ int rc;
+
+ assert( sqlite3_mutex_held(pBt->mutex) );
+ /* The master-journal page number must never be used as a pointer map page */
+ assert( 0==PTRMAP_ISPAGE(pBt, PENDING_BYTE_PAGE(pBt)) );
+
+ assert( pBt->autoVacuum );
+ if( key==0 ){
+ return SQLITE_CORRUPT_BKPT;
+ }
+ iPtrmap = PTRMAP_PAGENO(pBt, key);
+ rc = sqlite3PagerGet(pBt->pPager, iPtrmap, &pDbPage);
+ if( rc!=SQLITE_OK ){
+ return rc;
+ }
+ offset = PTRMAP_PTROFFSET(iPtrmap, key);
+ pPtrmap = (u8 *)sqlite3PagerGetData(pDbPage);
+
+ if( eType!=pPtrmap[offset] || get4byte(&pPtrmap[offset+1])!=parent ){
+ TRACE(("PTRMAP_UPDATE: %d->(%d,%d)\n", key, eType, parent));
+ rc = sqlite3PagerWrite(pDbPage);
+ if( rc==SQLITE_OK ){
+ pPtrmap[offset] = eType;
+ put4byte(&pPtrmap[offset+1], parent);
+ }
+ }
+
+ sqlite3PagerUnref(pDbPage);
+ return rc;
+}
+
+/*
+** Read an entry from the pointer map.
+**
+** This routine retrieves the pointer map entry for page 'key', writing
+** the type and parent page number to *pEType and *pPgno respectively.
+** An error code is returned if something goes wrong, otherwise SQLITE_OK.
+*/
+static int ptrmapGet(BtShared *pBt, Pgno key, u8 *pEType, Pgno *pPgno){
+ DbPage *pDbPage; /* The pointer map page */
+ int iPtrmap; /* Pointer map page index */
+ u8 *pPtrmap; /* Pointer map page data */
+ int offset; /* Offset of entry in pointer map */
+ int rc;
+
+ assert( sqlite3_mutex_held(pBt->mutex) );
+
+ iPtrmap = PTRMAP_PAGENO(pBt, key);
+ rc = sqlite3PagerGet(pBt->pPager, iPtrmap, &pDbPage);
+ if( rc!=0 ){
+ return rc;
+ }
+ pPtrmap = (u8 *)sqlite3PagerGetData(pDbPage);
+
+ offset = PTRMAP_PTROFFSET(iPtrmap, key);
+ assert( pEType!=0 );
+ *pEType = pPtrmap[offset];
+ if( pPgno ) *pPgno = get4byte(&pPtrmap[offset+1]);
+
+ sqlite3PagerUnref(pDbPage);
+ if( *pEType<1 || *pEType>5 ) return SQLITE_CORRUPT_BKPT;
+ return SQLITE_OK;
+}
+
+#else /* if defined SQLITE_OMIT_AUTOVACUUM */
+ #define ptrmapPut(w,x,y,z) SQLITE_OK
+ #define ptrmapGet(w,x,y,z) SQLITE_OK
+ #define ptrmapPutOvfl(y,z) SQLITE_OK
+#endif
+
+/*
+** Given a btree page and a cell index (0 means the first cell on
+** the page, 1 means the second cell, and so forth) return a pointer
+** to the cell content.
+**
+** This routine works only for pages that do not contain overflow cells.
+*/
+#define findCell(P,I) \
+ ((P)->aData + ((P)->maskPage & get2byte(&(P)->aData[(P)->cellOffset+2*(I)])))
+
+/*
+** This a more complex version of findCell() that works for
+** pages that do contain overflow cells. See insert
+*/
+static u8 *findOverflowCell(MemPage *pPage, int iCell){
+ int i;
+ assert( sqlite3_mutex_held(pPage->pBt->mutex) );
+ for(i=pPage->nOverflow-1; i>=0; i--){
+ int k;
+ struct _OvflCell *pOvfl;
+ pOvfl = &pPage->aOvfl[i];
+ k = pOvfl->idx;
+ if( k<=iCell ){
+ if( k==iCell ){
+ return pOvfl->pCell;
+ }
+ iCell--;
+ }
+ }
+ return findCell(pPage, iCell);
+}
+
+/*
+** Parse a cell content block and fill in the CellInfo structure. There
+** are two versions of this function. sqlite3BtreeParseCell() takes a
+** cell index as the second argument and sqlite3BtreeParseCellPtr()
+** takes a pointer to the body of the cell as its second argument.
+**
+** Within this file, the parseCell() macro can be called instead of
+** sqlite3BtreeParseCellPtr(). Using some compilers, this will be faster.
+*/
+SQLITE_PRIVATE void sqlite3BtreeParseCellPtr(
+ MemPage *pPage, /* Page containing the cell */
+ u8 *pCell, /* Pointer to the cell text. */
+ CellInfo *pInfo /* Fill in this structure */
+){
+ u16 n; /* Number bytes in cell content header */
+ u32 nPayload; /* Number of bytes of cell payload */
+
+ assert( sqlite3_mutex_held(pPage->pBt->mutex) );
+
+ pInfo->pCell = pCell;
+ assert( pPage->leaf==0 || pPage->leaf==1 );
+ n = pPage->childPtrSize;
+ assert( n==4-4*pPage->leaf );
+ if( pPage->intKey ){
+ if( pPage->hasData ){
+ n += getVarint32(&pCell[n], nPayload);
+ }else{
+ nPayload = 0;
+ }
+ n += getVarint(&pCell[n], (u64*)&pInfo->nKey);
+ pInfo->nData = nPayload;
+ }else{
+ pInfo->nData = 0;
+ n += getVarint32(&pCell[n], nPayload);
+ pInfo->nKey = nPayload;
+ }
+ pInfo->nPayload = nPayload;
+ pInfo->nHeader = n;
+ if( likely(nPayload<=pPage->maxLocal) ){
+ /* This is the (easy) common case where the entire payload fits
+ ** on the local page. No overflow is required.
+ */
+ int nSize; /* Total size of cell content in bytes */
+ nSize = nPayload + n;
+ pInfo->nLocal = (u16)nPayload;
+ pInfo->iOverflow = 0;
+ if( (nSize & ~3)==0 ){
+ nSize = 4; /* Minimum cell size is 4 */
+ }
+ pInfo->nSize = (u16)nSize;
+ }else{
+ /* If the payload will not fit completely on the local page, we have
+ ** to decide how much to store locally and how much to spill onto
+ ** overflow pages. The strategy is to minimize the amount of unused
+ ** space on overflow pages while keeping the amount of local storage
+ ** in between minLocal and maxLocal.
+ **
+ ** Warning: changing the way overflow payload is distributed in any
+ ** way will result in an incompatible file format.
+ */
+ int minLocal; /* Minimum amount of payload held locally */
+ int maxLocal; /* Maximum amount of payload held locally */
+ int surplus; /* Overflow payload available for local storage */
+
+ minLocal = pPage->minLocal;
+ maxLocal = pPage->maxLocal;
+ surplus = minLocal + (nPayload - minLocal)%(pPage->pBt->usableSize - 4);
+ if( surplus <= maxLocal ){
+ pInfo->nLocal = (u16)surplus;
+ }else{
+ pInfo->nLocal = (u16)minLocal;
+ }
+ pInfo->iOverflow = (u16)(pInfo->nLocal + n);
+ pInfo->nSize = pInfo->iOverflow + 4;
+ }
+}
+#define parseCell(pPage, iCell, pInfo) \
+ sqlite3BtreeParseCellPtr((pPage), findCell((pPage), (iCell)), (pInfo))
+SQLITE_PRIVATE void sqlite3BtreeParseCell(
+ MemPage *pPage, /* Page containing the cell */
+ int iCell, /* The cell index. First cell is 0 */
+ CellInfo *pInfo /* Fill in this structure */
+){
+ parseCell(pPage, iCell, pInfo);
+}
+
+/*
+** Compute the total number of bytes that a Cell needs in the cell
+** data area of the btree-page. The return number includes the cell
+** data header and the local payload, but not any overflow page or
+** the space used by the cell pointer.
+*/
+#ifndef NDEBUG
+static u16 cellSize(MemPage *pPage, int iCell){
+ CellInfo info;
+ sqlite3BtreeParseCell(pPage, iCell, &info);
+ return info.nSize;
+}
+#endif
+static u16 cellSizePtr(MemPage *pPage, u8 *pCell){
+ CellInfo info;
+ sqlite3BtreeParseCellPtr(pPage, pCell, &info);
+ return info.nSize;
+}
+
+#ifndef SQLITE_OMIT_AUTOVACUUM
+/*
+** If the cell pCell, part of page pPage contains a pointer
+** to an overflow page, insert an entry into the pointer-map
+** for the overflow page.
+*/
+static int ptrmapPutOvflPtr(MemPage *pPage, u8 *pCell){
+ CellInfo info;
+ assert( pCell!=0 );
+ sqlite3BtreeParseCellPtr(pPage, pCell, &info);
+ assert( (info.nData+(pPage->intKey?0:info.nKey))==info.nPayload );
+ if( (info.nData+(pPage->intKey?0:info.nKey))>info.nLocal ){
+ Pgno ovfl = get4byte(&pCell[info.iOverflow]);
+ return ptrmapPut(pPage->pBt, ovfl, PTRMAP_OVERFLOW1, pPage->pgno);
+ }
+ return SQLITE_OK;
+}
+/*
+** If the cell with index iCell on page pPage contains a pointer
+** to an overflow page, insert an entry into the pointer-map
+** for the overflow page.
+*/
+static int ptrmapPutOvfl(MemPage *pPage, int iCell){
+ u8 *pCell;
+ assert( sqlite3_mutex_held(pPage->pBt->mutex) );
+ pCell = findOverflowCell(pPage, iCell);
+ return ptrmapPutOvflPtr(pPage, pCell);
+}
+#endif
+
+
+/*
+** Defragment the page given. All Cells are moved to the
+** end of the page and all free space is collected into one
+** big FreeBlk that occurs in between the header and cell
+** pointer array and the cell content area.
+*/
+static int defragmentPage(MemPage *pPage){
+ int i; /* Loop counter */
+ int pc; /* Address of a i-th cell */
+ int addr; /* Offset of first byte after cell pointer array */
+ int hdr; /* Offset to the page header */
+ int size; /* Size of a cell */
+ int usableSize; /* Number of usable bytes on a page */
+ int cellOffset; /* Offset to the cell pointer array */
+ int cbrk; /* Offset to the cell content area */
+ int nCell; /* Number of cells on the page */
+ unsigned char *data; /* The page data */
+ unsigned char *temp; /* Temp area for cell content */
+
+ assert( sqlite3PagerIswriteable(pPage->pDbPage) );
+ assert( pPage->pBt!=0 );
+ assert( pPage->pBt->usableSize <= SQLITE_MAX_PAGE_SIZE );
+ assert( pPage->nOverflow==0 );
+ assert( sqlite3_mutex_held(pPage->pBt->mutex) );
+ temp = sqlite3PagerTempSpace(pPage->pBt->pPager);
+ data = pPage->aData;
+ hdr = pPage->hdrOffset;
+ cellOffset = pPage->cellOffset;
+ nCell = pPage->nCell;
+ assert( nCell==get2byte(&data[hdr+3]) );
+ usableSize = pPage->pBt->usableSize;
+ cbrk = get2byte(&data[hdr+5]);
+ memcpy(&temp[cbrk], &data[cbrk], usableSize - cbrk);
+ cbrk = usableSize;
+ for(i=0; i<nCell; i++){
+ u8 *pAddr; /* The i-th cell pointer */
+ pAddr = &data[cellOffset + i*2];
+ pc = get2byte(pAddr);
+ if( pc>=usableSize ){
+ return SQLITE_CORRUPT_BKPT;
+ }
+ size = cellSizePtr(pPage, &temp[pc]);
+ cbrk -= size;
+ if( cbrk<cellOffset+2*nCell || pc+size>usableSize ){
+ return SQLITE_CORRUPT_BKPT;
+ }
+ assert( cbrk+size<=usableSize && cbrk>=0 );
+ memcpy(&data[cbrk], &temp[pc], size);
+ put2byte(pAddr, cbrk);
+ }
+ assert( cbrk>=cellOffset+2*nCell );
+ put2byte(&data[hdr+5], cbrk);
+ data[hdr+1] = 0;
+ data[hdr+2] = 0;
+ data[hdr+7] = 0;
+ addr = cellOffset+2*nCell;
+ memset(&data[addr], 0, cbrk-addr);
+ assert( sqlite3PagerIswriteable(pPage->pDbPage) );
+ if( cbrk-addr!=pPage->nFree ){
+ return SQLITE_CORRUPT_BKPT;
+ }
+ return SQLITE_OK;
+}
+
+/*
+** Allocate nByte bytes of space on a page.
+**
+** Return the index into pPage->aData[] of the first byte of
+** the new allocation. The caller guarantees that there is enough
+** space. This routine will never fail.
+**
+** If the page contains nBytes of free space but does not contain
+** nBytes of contiguous free space, then this routine automatically
+** calls defragementPage() to consolidate all free space before
+** allocating the new chunk.
+*/
+static int allocateSpace(MemPage *pPage, int nByte){
+ int addr, pc, hdr;
+ int size;
+ int nFrag;
+ int top;
+ int nCell;
+ int cellOffset;
+ unsigned char *data;
+
+ data = pPage->aData;
+ assert( sqlite3PagerIswriteable(pPage->pDbPage) );
+ assert( pPage->pBt );
+ assert( sqlite3_mutex_held(pPage->pBt->mutex) );
+ assert( nByte>=0 ); /* Minimum cell size is 4 */
+ assert( pPage->nFree>=nByte );
+ assert( pPage->nOverflow==0 );
+ pPage->nFree -= (u16)nByte;
+ hdr = pPage->hdrOffset;
+
+ nFrag = data[hdr+7];
+ if( nFrag<60 ){
+ /* Search the freelist looking for a slot big enough to satisfy the
+ ** space request. */
+ addr = hdr+1;
+ while( (pc = get2byte(&data[addr]))>0 ){
+ size = get2byte(&data[pc+2]);
+ if( size>=nByte ){
+ int x = size - nByte;
+ if( size<nByte+4 ){
+ memcpy(&data[addr], &data[pc], 2);
+ data[hdr+7] = (u8)(nFrag + x);
+ return pc;
+ }else{
+ put2byte(&data[pc+2], x);
+ return pc + x;
+ }
+ }
+ addr = pc;
+ }
+ }
+
+ /* Allocate memory from the gap in between the cell pointer array
+ ** and the cell content area.
+ */
+ top = get2byte(&data[hdr+5]);
+ nCell = get2byte(&data[hdr+3]);
+ cellOffset = pPage->cellOffset;
+ if( nFrag>=60 || cellOffset + 2*nCell > top - nByte ){
+ defragmentPage(pPage);
+ top = get2byte(&data[hdr+5]);
+ }
+ top -= nByte;
+ assert( cellOffset + 2*nCell <= top );
+ put2byte(&data[hdr+5], top);
+ assert( sqlite3PagerIswriteable(pPage->pDbPage) );
+ return top;
+}
+
+/*
+** Return a section of the pPage->aData to the freelist.
+** The first byte of the new free block is pPage->aDisk[start]
+** and the size of the block is "size" bytes.
+**
+** Most of the effort here is involved in coalesing adjacent
+** free blocks into a single big free block.
+*/
+static int freeSpace(MemPage *pPage, int start, int size){
+ int addr, pbegin, hdr;
+ unsigned char *data = pPage->aData;
+
+ assert( pPage->pBt!=0 );
+ assert( sqlite3PagerIswriteable(pPage->pDbPage) );
+ assert( start>=pPage->hdrOffset+6+(pPage->leaf?0:4) );
+ assert( (start + size)<=pPage->pBt->usableSize );
+ assert( sqlite3_mutex_held(pPage->pBt->mutex) );
+ assert( size>=0 ); /* Minimum cell size is 4 */
+
+#ifdef SQLITE_SECURE_DELETE
+ /* Overwrite deleted information with zeros when the SECURE_DELETE
+ ** option is enabled at compile-time */
+ memset(&data[start], 0, size);
+#endif
+
+ /* Add the space back into the linked list of freeblocks */
+ hdr = pPage->hdrOffset;
+ addr = hdr + 1;
+ while( (pbegin = get2byte(&data[addr]))<start && pbegin>0 ){
+ assert( pbegin<=pPage->pBt->usableSize-4 );
+ if( pbegin<=addr ) {
+ return SQLITE_CORRUPT_BKPT;
+ }
+ addr = pbegin;
+ }
+ if ( pbegin>pPage->pBt->usableSize-4 ) {
+ return SQLITE_CORRUPT_BKPT;
+ }
+ assert( pbegin>addr || pbegin==0 );
+ put2byte(&data[addr], start);
+ put2byte(&data[start], pbegin);
+ put2byte(&data[start+2], size);
+ pPage->nFree += (u16)size;
+
+ /* Coalesce adjacent free blocks */
+ addr = pPage->hdrOffset + 1;
+ while( (pbegin = get2byte(&data[addr]))>0 ){
+ int pnext, psize, x;
+ assert( pbegin>addr );
+ assert( pbegin<=pPage->pBt->usableSize-4 );
+ pnext = get2byte(&data[pbegin]);
+ psize = get2byte(&data[pbegin+2]);
+ if( pbegin + psize + 3 >= pnext && pnext>0 ){
+ int frag = pnext - (pbegin+psize);
+ if( (frag<0) || (frag>(int)data[pPage->hdrOffset+7]) ){
+ return SQLITE_CORRUPT_BKPT;
+ }
+ data[pPage->hdrOffset+7] -= (u8)frag;
+ x = get2byte(&data[pnext]);
+ put2byte(&data[pbegin], x);
+ x = pnext + get2byte(&data[pnext+2]) - pbegin;
+ put2byte(&data[pbegin+2], x);
+ }else{
+ addr = pbegin;
+ }
+ }
+
+ /* If the cell content area begins with a freeblock, remove it. */
+ if( data[hdr+1]==data[hdr+5] && data[hdr+2]==data[hdr+6] ){
+ int top;
+ pbegin = get2byte(&data[hdr+1]);
+ memcpy(&data[hdr+1], &data[pbegin], 2);
+ top = get2byte(&data[hdr+5]) + get2byte(&data[pbegin+2]);
+ put2byte(&data[hdr+5], top);
+ }
+ assert( sqlite3PagerIswriteable(pPage->pDbPage) );
+ return SQLITE_OK;
+}
+
+/*
+** Decode the flags byte (the first byte of the header) for a page
+** and initialize fields of the MemPage structure accordingly.
+**
+** Only the following combinations are supported. Anything different
+** indicates a corrupt database files:
+**
+** PTF_ZERODATA
+** PTF_ZERODATA | PTF_LEAF
+** PTF_LEAFDATA | PTF_INTKEY
+** PTF_LEAFDATA | PTF_INTKEY | PTF_LEAF
+*/
+static int decodeFlags(MemPage *pPage, int flagByte){
+ BtShared *pBt; /* A copy of pPage->pBt */
+
+ assert( pPage->hdrOffset==(pPage->pgno==1 ? 100 : 0) );
+ assert( sqlite3_mutex_held(pPage->pBt->mutex) );
+ pPage->leaf = (u8)(flagByte>>3); assert( PTF_LEAF == 1<<3 );
+ flagByte &= ~PTF_LEAF;
+ pPage->childPtrSize = 4-4*pPage->leaf;
+ pBt = pPage->pBt;
+ if( flagByte==(PTF_LEAFDATA | PTF_INTKEY) ){
+ pPage->intKey = 1;
+ pPage->hasData = pPage->leaf;
+ pPage->maxLocal = pBt->maxLeaf;
+ pPage->minLocal = pBt->minLeaf;
+ }else if( flagByte==PTF_ZERODATA ){
+ pPage->intKey = 0;
+ pPage->hasData = 0;
+ pPage->maxLocal = pBt->maxLocal;
+ pPage->minLocal = pBt->minLocal;
+ }else{
+ return SQLITE_CORRUPT_BKPT;
+ }
+ return SQLITE_OK;
+}
+
+/*
+** Initialize the auxiliary information for a disk block.
+**
+** Return SQLITE_OK on success. If we see that the page does
+** not contain a well-formed database page, then return
+** SQLITE_CORRUPT. Note that a return of SQLITE_OK does not
+** guarantee that the page is well-formed. It only shows that
+** we failed to detect any corruption.
+*/
+SQLITE_PRIVATE int sqlite3BtreeInitPage(MemPage *pPage){
+
+ assert( pPage->pBt!=0 );
+ assert( sqlite3_mutex_held(pPage->pBt->mutex) );
+ assert( pPage->pgno==sqlite3PagerPagenumber(pPage->pDbPage) );
+ assert( pPage == sqlite3PagerGetExtra(pPage->pDbPage) );
+ assert( pPage->aData == sqlite3PagerGetData(pPage->pDbPage) );
+
+ if( !pPage->isInit ){
+ u16 pc; /* Address of a freeblock within pPage->aData[] */
+ u8 hdr; /* Offset to beginning of page header */
+ u8 *data; /* Equal to pPage->aData */
+ BtShared *pBt; /* The main btree structure */
+ u16 usableSize; /* Amount of usable space on each page */
+ u16 cellOffset; /* Offset from start of page to first cell pointer */
+ u16 nFree; /* Number of unused bytes on the page */
+ u16 top; /* First byte of the cell content area */
+
+ pBt = pPage->pBt;
+
+ hdr = pPage->hdrOffset;
+ data = pPage->aData;
+ if( decodeFlags(pPage, data[hdr]) ) return SQLITE_CORRUPT_BKPT;
+ assert( pBt->pageSize>=512 && pBt->pageSize<=32768 );
+ pPage->maskPage = pBt->pageSize - 1;
+ pPage->nOverflow = 0;
+ usableSize = pBt->usableSize;
+ pPage->cellOffset = cellOffset = hdr + 12 - 4*pPage->leaf;
+ top = get2byte(&data[hdr+5]);
+ pPage->nCell = get2byte(&data[hdr+3]);
+ if( pPage->nCell>MX_CELL(pBt) ){
+ /* To many cells for a single page. The page must be corrupt */
+ return SQLITE_CORRUPT_BKPT;
+ }
+
+ /* Compute the total free space on the page */
+ pc = get2byte(&data[hdr+1]);
+ nFree = data[hdr+7] + top - (cellOffset + 2*pPage->nCell);
+ while( pc>0 ){
+ u16 next, size;
+ if( pc>usableSize-4 ){
+ /* Free block is off the page */
+ return SQLITE_CORRUPT_BKPT;
+ }
+ next = get2byte(&data[pc]);
+ size = get2byte(&data[pc+2]);
+ if( next>0 && next<=pc+size+3 ){
+ /* Free blocks must be in accending order */
+ return SQLITE_CORRUPT_BKPT;
+ }
+ nFree += size;
+ pc = next;
+ }
+ pPage->nFree = (u16)nFree;
+ if( nFree>=usableSize ){
+ /* Free space cannot exceed total page size */
+ return SQLITE_CORRUPT_BKPT;
+ }
+
+#if 0
+ /* Check that all the offsets in the cell offset array are within range.
+ **
+ ** Omitting this consistency check and using the pPage->maskPage mask
+ ** to prevent overrunning the page buffer in findCell() results in a
+ ** 2.5% performance gain.
+ */
+ {
+ u8 *pOff; /* Iterator used to check all cell offsets are in range */
+ u8 *pEnd; /* Pointer to end of cell offset array */
+ u8 mask; /* Mask of bits that must be zero in MSB of cell offsets */
+ mask = ~(((u8)(pBt->pageSize>>8))-1);
+ pEnd = &data[cellOffset + pPage->nCell*2];
+ for(pOff=&data[cellOffset]; pOff!=pEnd && !((*pOff)&mask); pOff+=2);
+ if( pOff!=pEnd ){
+ return SQLITE_CORRUPT_BKPT;
+ }
+ }
+#endif
+
+ pPage->isInit = 1;
+ }
+ return SQLITE_OK;
+}
+
+/*
+** Set up a raw page so that it looks like a database page holding
+** no entries.
+*/
+static void zeroPage(MemPage *pPage, int flags){
+ unsigned char *data = pPage->aData;
+ BtShared *pBt = pPage->pBt;
+ u8 hdr = pPage->hdrOffset;
+ u16 first;
+
+ assert( sqlite3PagerPagenumber(pPage->pDbPage)==pPage->pgno );
+ assert( sqlite3PagerGetExtra(pPage->pDbPage) == (void*)pPage );
+ assert( sqlite3PagerGetData(pPage->pDbPage) == data );
+ assert( sqlite3PagerIswriteable(pPage->pDbPage) );
+ assert( sqlite3_mutex_held(pBt->mutex) );
+ /*memset(&data[hdr], 0, pBt->usableSize - hdr);*/
+ data[hdr] = (char)flags;
+ first = hdr + 8 + 4*((flags&PTF_LEAF)==0 ?1:0);
+ memset(&data[hdr+1], 0, 4);
+ data[hdr+7] = 0;
+ put2byte(&data[hdr+5], pBt->usableSize);
+ pPage->nFree = pBt->usableSize - first;
+ decodeFlags(pPage, flags);
+ pPage->hdrOffset = hdr;
+ pPage->cellOffset = first;
+ pPage->nOverflow = 0;
+ assert( pBt->pageSize>=512 && pBt->pageSize<=32768 );
+ pPage->maskPage = pBt->pageSize - 1;
+ pPage->nCell = 0;
+ pPage->isInit = 1;
+}
+
+
+/*
+** Convert a DbPage obtained from the pager into a MemPage used by
+** the btree layer.
+*/
+static MemPage *btreePageFromDbPage(DbPage *pDbPage, Pgno pgno, BtShared *pBt){
+ MemPage *pPage = (MemPage*)sqlite3PagerGetExtra(pDbPage);
+ pPage->aData = sqlite3PagerGetData(pDbPage);
+ pPage->pDbPage = pDbPage;
+ pPage->pBt = pBt;
+ pPage->pgno = pgno;
+ pPage->hdrOffset = pPage->pgno==1 ? 100 : 0;
+ return pPage;
+}
+
+/*
+** Get a page from the pager. Initialize the MemPage.pBt and
+** MemPage.aData elements if needed.
+**
+** If the noContent flag is set, it means that we do not care about
+** the content of the page at this time. So do not go to the disk
+** to fetch the content. Just fill in the content with zeros for now.
+** If in the future we call sqlite3PagerWrite() on this page, that
+** means we have started to be concerned about content and the disk
+** read should occur at that point.
+*/
+SQLITE_PRIVATE int sqlite3BtreeGetPage(
+ BtShared *pBt, /* The btree */
+ Pgno pgno, /* Number of the page to fetch */
+ MemPage **ppPage, /* Return the page in this parameter */
+ int noContent /* Do not load page content if true */
+){
+ int rc;
+ DbPage *pDbPage;
+
+ assert( sqlite3_mutex_held(pBt->mutex) );
+ rc = sqlite3PagerAcquire(pBt->pPager, pgno, (DbPage**)&pDbPage, noContent);
+ if( rc ) return rc;
+ *ppPage = btreePageFromDbPage(pDbPage, pgno, pBt);
+ return SQLITE_OK;
+}
+
+/*
+** Retrieve a page from the pager cache. If the requested page is not
+** already in the pager cache return NULL. Initialize the MemPage.pBt and
+** MemPage.aData elements if needed.
+*/
+static MemPage *btreePageLookup(BtShared *pBt, Pgno pgno){
+ DbPage *pDbPage;
+ assert( sqlite3_mutex_held(pBt->mutex) );
+ pDbPage = sqlite3PagerLookup(pBt->pPager, pgno);
+ if( pDbPage ){
+ return btreePageFromDbPage(pDbPage, pgno, pBt);
+ }
+ return 0;
+}
+
+/*
+** Return the size of the database file in pages. If there is any kind of
+** error, return ((unsigned int)-1).
+*/
+static Pgno pagerPagecount(BtShared *pBt){
+ int nPage = -1;
+ int rc;
+ assert( pBt->pPage1 );
+ rc = sqlite3PagerPagecount(pBt->pPager, &nPage);
+ assert( rc==SQLITE_OK || nPage==-1 );
+ return (Pgno)nPage;
+}
+
+/*
+** Get a page from the pager and initialize it. This routine
+** is just a convenience wrapper around separate calls to
+** sqlite3BtreeGetPage() and sqlite3BtreeInitPage().
+*/
+static int getAndInitPage(
+ BtShared *pBt, /* The database file */
+ Pgno pgno, /* Number of the page to get */
+ MemPage **ppPage /* Write the page pointer here */
+){
+ int rc;
+ MemPage *pPage;
+
+ assert( sqlite3_mutex_held(pBt->mutex) );
+ if( pgno==0 ){
+ return SQLITE_CORRUPT_BKPT;
+ }
+
+ /* It is often the case that the page we want is already in cache.
+ ** If so, get it directly. This saves us from having to call
+ ** pagerPagecount() to make sure pgno is within limits, which results
+ ** in a measureable performance improvements.
+ */
+ *ppPage = pPage = btreePageLookup(pBt, pgno);
+ if( pPage ){
+ /* Page is already in cache */
+ rc = SQLITE_OK;
+ }else{
+ /* Page not in cache. Acquire it. */
+ if( pgno>pagerPagecount(pBt) ){
+ return SQLITE_CORRUPT_BKPT;
+ }
+ rc = sqlite3BtreeGetPage(pBt, pgno, ppPage, 0);
+ if( rc ) return rc;
+ pPage = *ppPage;
+ }
+ if( !pPage->isInit ){
+ rc = sqlite3BtreeInitPage(pPage);
+ }
+ if( rc!=SQLITE_OK ){
+ releasePage(pPage);
+ *ppPage = 0;
+ }
+ return rc;
+}
+
+/*
+** Release a MemPage. This should be called once for each prior
+** call to sqlite3BtreeGetPage.
+*/
+static void releasePage(MemPage *pPage){
+ if( pPage ){
+ assert( pPage->nOverflow==0 || sqlite3PagerPageRefcount(pPage->pDbPage)>1 );
+ assert( pPage->aData );
+ assert( pPage->pBt );
+ assert( sqlite3PagerGetExtra(pPage->pDbPage) == (void*)pPage );
+ assert( sqlite3PagerGetData(pPage->pDbPage)==pPage->aData );
+ assert( sqlite3_mutex_held(pPage->pBt->mutex) );
+ sqlite3PagerUnref(pPage->pDbPage);
+ }
+}
+
+/*
+** During a rollback, when the pager reloads information into the cache
+** so that the cache is restored to its original state at the start of
+** the transaction, for each page restored this routine is called.
+**
+** This routine needs to reset the extra data section at the end of the
+** page to agree with the restored data.
+*/
+static void pageReinit(DbPage *pData){
+ MemPage *pPage;
+ pPage = (MemPage *)sqlite3PagerGetExtra(pData);
+ if( pPage->isInit ){
+ assert( sqlite3_mutex_held(pPage->pBt->mutex) );
+ pPage->isInit = 0;
+ if( sqlite3PagerPageRefcount(pData)>0 ){
+ sqlite3BtreeInitPage(pPage);
+ }
+ }
+}
+
+/*
+** Invoke the busy handler for a btree.
+*/
+static int btreeInvokeBusyHandler(void *pArg){
+ BtShared *pBt = (BtShared*)pArg;
+ assert( pBt->db );
+ assert( sqlite3_mutex_held(pBt->db->mutex) );
+ return sqlite3InvokeBusyHandler(&pBt->db->busyHandler);
+}
+
+/*
+** Open a database file.
+**
+** zFilename is the name of the database file. If zFilename is NULL
+** a new database with a random name is created. This randomly named
+** database file will be deleted when sqlite3BtreeClose() is called.
+** If zFilename is ":memory:" then an in-memory database is created
+** that is automatically destroyed when it is closed.
+*/
+SQLITE_PRIVATE int sqlite3BtreeOpen(
+ const char *zFilename, /* Name of the file containing the BTree database */
+ sqlite3 *db, /* Associated database handle */
+ Btree **ppBtree, /* Pointer to new Btree object written here */
+ int flags, /* Options */
+ int vfsFlags /* Flags passed through to sqlite3_vfs.xOpen() */
+){
+ sqlite3_vfs *pVfs; /* The VFS to use for this btree */
+ BtShared *pBt = 0; /* Shared part of btree structure */
+ Btree *p; /* Handle to return */
+ int rc = SQLITE_OK;
+ u8 nReserve;
+ unsigned char zDbHeader[100];
+
+ /* Set the variable isMemdb to true for an in-memory database, or
+ ** false for a file-based database. This symbol is only required if
+ ** either of the shared-data or autovacuum features are compiled
+ ** into the library.
+ */
+#if !defined(SQLITE_OMIT_SHARED_CACHE) || !defined(SQLITE_OMIT_AUTOVACUUM)
+ #ifdef SQLITE_OMIT_MEMORYDB
+ const int isMemdb = 0;
+ #else
+ const int isMemdb = zFilename && !strcmp(zFilename, ":memory:");
+ #endif
+#endif
+
+ assert( db!=0 );
+ assert( sqlite3_mutex_held(db->mutex) );
+
+ pVfs = db->pVfs;
+ p = sqlite3MallocZero(sizeof(Btree));
+ if( !p ){
+ return SQLITE_NOMEM;
+ }
+ p->inTrans = TRANS_NONE;
+ p->db = db;
+
+#if !defined(SQLITE_OMIT_SHARED_CACHE) && !defined(SQLITE_OMIT_DISKIO)
+ /*
+ ** If this Btree is a candidate for shared cache, try to find an
+ ** existing BtShared object that we can share with
+ */
+ if( isMemdb==0
+ && (db->flags & SQLITE_Vtab)==0
+ && zFilename && zFilename[0]
+ ){
+ if( sqlite3GlobalConfig.sharedCacheEnabled ){
+ int nFullPathname = pVfs->mxPathname+1;
+ char *zFullPathname = sqlite3Malloc(nFullPathname);
+ sqlite3_mutex *mutexShared;
+ p->sharable = 1;
+ db->flags |= SQLITE_SharedCache;
+ if( !zFullPathname ){
+ sqlite3_free(p);
+ return SQLITE_NOMEM;
+ }
+ sqlite3OsFullPathname(pVfs, zFilename, nFullPathname, zFullPathname);
+ mutexShared = sqlite3MutexAlloc(SQLITE_MUTEX_STATIC_MASTER);
+ sqlite3_mutex_enter(mutexShared);
+ for(pBt=GLOBAL(BtShared*,sqlite3SharedCacheList); pBt; pBt=pBt->pNext){
+ assert( pBt->nRef>0 );
+ if( 0==strcmp(zFullPathname, sqlite3PagerFilename(pBt->pPager))
+ && sqlite3PagerVfs(pBt->pPager)==pVfs ){
+ p->pBt = pBt;
+ pBt->nRef++;
+ break;
+ }
+ }
+ sqlite3_mutex_leave(mutexShared);
+ sqlite3_free(zFullPathname);
+ }
+#ifdef SQLITE_DEBUG
+ else{
+ /* In debug mode, we mark all persistent databases as sharable
+ ** even when they are not. This exercises the locking code and
+ ** gives more opportunity for asserts(sqlite3_mutex_held())
+ ** statements to find locking problems.
+ */
+ p->sharable = 1;
+ }
+#endif
+ }
+#endif
+ if( pBt==0 ){
+ /*
+ ** The following asserts make sure that structures used by the btree are
+ ** the right size. This is to guard against size changes that result
+ ** when compiling on a different architecture.
+ */
+ assert( sizeof(i64)==8 || sizeof(i64)==4 );
+ assert( sizeof(u64)==8 || sizeof(u64)==4 );
+ assert( sizeof(u32)==4 );
+ assert( sizeof(u16)==2 );
+ assert( sizeof(Pgno)==4 );
+
+ pBt = sqlite3MallocZero( sizeof(*pBt) );
+ if( pBt==0 ){
+ rc = SQLITE_NOMEM;
+ goto btree_open_out;
+ }
+ rc = sqlite3PagerOpen(pVfs, &pBt->pPager, zFilename,
+ EXTRA_SIZE, flags, vfsFlags);
+ if( rc==SQLITE_OK ){
+ rc = sqlite3PagerReadFileheader(pBt->pPager,sizeof(zDbHeader),zDbHeader);
+ }
+ if( rc!=SQLITE_OK ){
+ goto btree_open_out;
+ }
+ sqlite3PagerSetBusyhandler(pBt->pPager, btreeInvokeBusyHandler, pBt);
+ p->pBt = pBt;
+
+ sqlite3PagerSetReiniter(pBt->pPager, pageReinit);
+ pBt->pCursor = 0;
+ pBt->pPage1 = 0;
+ pBt->readOnly = sqlite3PagerIsreadonly(pBt->pPager);
+ pBt->pageSize = get2byte(&zDbHeader[16]);
+ if( pBt->pageSize<512 || pBt->pageSize>SQLITE_MAX_PAGE_SIZE
+ || ((pBt->pageSize-1)&pBt->pageSize)!=0 ){
+ pBt->pageSize = 0;
+ sqlite3PagerSetPagesize(pBt->pPager, &pBt->pageSize);
+#ifndef SQLITE_OMIT_AUTOVACUUM
+ /* If the magic name ":memory:" will create an in-memory database, then
+ ** leave the autoVacuum mode at 0 (do not auto-vacuum), even if
+ ** SQLITE_DEFAULT_AUTOVACUUM is true. On the other hand, if
+ ** SQLITE_OMIT_MEMORYDB has been defined, then ":memory:" is just a
+ ** regular file-name. In this case the auto-vacuum applies as per normal.
+ */
+ if( zFilename && !isMemdb ){
+ pBt->autoVacuum = (SQLITE_DEFAULT_AUTOVACUUM ? 1 : 0);
+ pBt->incrVacuum = (SQLITE_DEFAULT_AUTOVACUUM==2 ? 1 : 0);
+ }
+#endif
+ nReserve = 0;
+ }else{
+ nReserve = zDbHeader[20];
+ pBt->pageSizeFixed = 1;
+#ifndef SQLITE_OMIT_AUTOVACUUM
+ pBt->autoVacuum = (get4byte(&zDbHeader[36 + 4*4])?1:0);
+ pBt->incrVacuum = (get4byte(&zDbHeader[36 + 7*4])?1:0);
+#endif
+ }
+ pBt->usableSize = pBt->pageSize - nReserve;
+ assert( (pBt->pageSize & 7)==0 ); /* 8-byte alignment of pageSize */
+ sqlite3PagerSetPagesize(pBt->pPager, &pBt->pageSize);
+
+#if !defined(SQLITE_OMIT_SHARED_CACHE) && !defined(SQLITE_OMIT_DISKIO)
+ /* Add the new BtShared object to the linked list sharable BtShareds.
+ */
+ if( p->sharable ){
+ sqlite3_mutex *mutexShared;
+ pBt->nRef = 1;
+ mutexShared = sqlite3MutexAlloc(SQLITE_MUTEX_STATIC_MASTER);
+ if( SQLITE_THREADSAFE && sqlite3GlobalConfig.bCoreMutex ){
+ pBt->mutex = sqlite3MutexAlloc(SQLITE_MUTEX_FAST);
+ if( pBt->mutex==0 ){
+ rc = SQLITE_NOMEM;
+ db->mallocFailed = 0;
+ goto btree_open_out;
+ }
+ }
+ sqlite3_mutex_enter(mutexShared);
+ pBt->pNext = GLOBAL(BtShared*,sqlite3SharedCacheList);
+ GLOBAL(BtShared*,sqlite3SharedCacheList) = pBt;
+ sqlite3_mutex_leave(mutexShared);
+ }
+#endif
+ }
+
+#if !defined(SQLITE_OMIT_SHARED_CACHE) && !defined(SQLITE_OMIT_DISKIO)
+ /* If the new Btree uses a sharable pBtShared, then link the new
+ ** Btree into the list of all sharable Btrees for the same connection.
+ ** The list is kept in ascending order by pBt address.
+ */
+ if( p->sharable ){
+ int i;
+ Btree *pSib;
+ for(i=0; i<db->nDb; i++){
+ if( (pSib = db->aDb[i].pBt)!=0 && pSib->sharable ){
+ while( pSib->pPrev ){ pSib = pSib->pPrev; }
+ if( p->pBt<pSib->pBt ){
+ p->pNext = pSib;
+ p->pPrev = 0;
+ pSib->pPrev = p;
+ }else{
+ while( pSib->pNext && pSib->pNext->pBt<p->pBt ){
+ pSib = pSib->pNext;
+ }
+ p->pNext = pSib->pNext;
+ p->pPrev = pSib;
+ if( p->pNext ){
+ p->pNext->pPrev = p;
+ }
+ pSib->pNext = p;
+ }
+ break;
+ }
+ }
+ }
+#endif
+ *ppBtree = p;
+
+btree_open_out:
+ if( rc!=SQLITE_OK ){
+ if( pBt && pBt->pPager ){
+ sqlite3PagerClose(pBt->pPager);
+ }
+ sqlite3_free(pBt);
+ sqlite3_free(p);
+ *ppBtree = 0;
+ }
+ return rc;
+}
+
+/*
+** Decrement the BtShared.nRef counter. When it reaches zero,
+** remove the BtShared structure from the sharing list. Return
+** true if the BtShared.nRef counter reaches zero and return
+** false if it is still positive.
+*/
+static int removeFromSharingList(BtShared *pBt){
+#ifndef SQLITE_OMIT_SHARED_CACHE
+ sqlite3_mutex *pMaster;
+ BtShared *pList;
+ int removed = 0;
+
+ assert( sqlite3_mutex_notheld(pBt->mutex) );
+ pMaster = sqlite3MutexAlloc(SQLITE_MUTEX_STATIC_MASTER);
+ sqlite3_mutex_enter(pMaster);
+ pBt->nRef--;
+ if( pBt->nRef<=0 ){
+ if( GLOBAL(BtShared*,sqlite3SharedCacheList)==pBt ){
+ GLOBAL(BtShared*,sqlite3SharedCacheList) = pBt->pNext;
+ }else{
+ pList = GLOBAL(BtShared*,sqlite3SharedCacheList);
+ while( ALWAYS(pList) && pList->pNext!=pBt ){
+ pList=pList->pNext;
+ }
+ if( ALWAYS(pList) ){
+ pList->pNext = pBt->pNext;
+ }
+ }
+ if( SQLITE_THREADSAFE ){
+ sqlite3_mutex_free(pBt->mutex);
+ }
+ removed = 1;
+ }
+ sqlite3_mutex_leave(pMaster);
+ return removed;
+#else
+ return 1;
+#endif
+}
+
+/*
+** Make sure pBt->pTmpSpace points to an allocation of
+** MX_CELL_SIZE(pBt) bytes.
+*/
+static void allocateTempSpace(BtShared *pBt){
+ if( !pBt->pTmpSpace ){
+ pBt->pTmpSpace = sqlite3PageMalloc( pBt->pageSize );
+ }
+}
+
+/*
+** Free the pBt->pTmpSpace allocation
+*/
+static void freeTempSpace(BtShared *pBt){
+ sqlite3PageFree( pBt->pTmpSpace);
+ pBt->pTmpSpace = 0;
+}
+
+/*
+** Close an open database and invalidate all cursors.
+*/
+SQLITE_PRIVATE int sqlite3BtreeClose(Btree *p){
+ BtShared *pBt = p->pBt;
+ BtCursor *pCur;
+
+ /* Close all cursors opened via this handle. */
+ assert( sqlite3_mutex_held(p->db->mutex) );
+ sqlite3BtreeEnter(p);
+ pBt->db = p->db;
+ pCur = pBt->pCursor;
+ while( pCur ){
+ BtCursor *pTmp = pCur;
+ pCur = pCur->pNext;
+ if( pTmp->pBtree==p ){
+ sqlite3BtreeCloseCursor(pTmp);
+ }
+ }
+
+ /* Rollback any active transaction and free the handle structure.
+ ** The call to sqlite3BtreeRollback() drops any table-locks held by
+ ** this handle.
+ */
+ sqlite3BtreeRollback(p);
+ sqlite3BtreeLeave(p);
+
+ /* If there are still other outstanding references to the shared-btree
+ ** structure, return now. The remainder of this procedure cleans
+ ** up the shared-btree.
+ */
+ assert( p->wantToLock==0 && p->locked==0 );
+ if( !p->sharable || removeFromSharingList(pBt) ){
+ /* The pBt is no longer on the sharing list, so we can access
+ ** it without having to hold the mutex.
+ **
+ ** Clean out and delete the BtShared object.
+ */
+ assert( !pBt->pCursor );
+ sqlite3PagerClose(pBt->pPager);
+ if( pBt->xFreeSchema && pBt->pSchema ){
+ pBt->xFreeSchema(pBt->pSchema);
+ }
+ sqlite3_free(pBt->pSchema);
+ freeTempSpace(pBt);
+ sqlite3_free(pBt);
+ }
+
+#ifndef SQLITE_OMIT_SHARED_CACHE
+ assert( p->wantToLock==0 );
+ assert( p->locked==0 );
+ if( p->pPrev ) p->pPrev->pNext = p->pNext;
+ if( p->pNext ) p->pNext->pPrev = p->pPrev;
+#endif
+
+ sqlite3_free(p);
+ return SQLITE_OK;
+}
+
+/*
+** Change the limit on the number of pages allowed in the cache.
+**
+** The maximum number of cache pages is set to the absolute
+** value of mxPage. If mxPage is negative, the pager will
+** operate asynchronously - it will not stop to do fsync()s
+** to insure data is written to the disk surface before
+** continuing. Transactions still work if synchronous is off,
+** and the database cannot be corrupted if this program
+** crashes. But if the operating system crashes or there is
+** an abrupt power failure when synchronous is off, the database
+** could be left in an inconsistent and unrecoverable state.
+** Synchronous is on by default so database corruption is not
+** normally a worry.
+*/
+SQLITE_PRIVATE int sqlite3BtreeSetCacheSize(Btree *p, int mxPage){
+ BtShared *pBt = p->pBt;
+ assert( sqlite3_mutex_held(p->db->mutex) );
+ sqlite3BtreeEnter(p);
+ sqlite3PagerSetCachesize(pBt->pPager, mxPage);
+ sqlite3BtreeLeave(p);
+ return SQLITE_OK;
+}
+
+/*
+** Change the way data is synced to disk in order to increase or decrease
+** how well the database resists damage due to OS crashes and power
+** failures. Level 1 is the same as asynchronous (no syncs() occur and
+** there is a high probability of damage) Level 2 is the default. There
+** is a very low but non-zero probability of damage. Level 3 reduces the
+** probability of damage to near zero but with a write performance reduction.
+*/
+#ifndef SQLITE_OMIT_PAGER_PRAGMAS
+SQLITE_PRIVATE int sqlite3BtreeSetSafetyLevel(Btree *p, int level, int fullSync){
+ BtShared *pBt = p->pBt;
+ assert( sqlite3_mutex_held(p->db->mutex) );
+ sqlite3BtreeEnter(p);
+ sqlite3PagerSetSafetyLevel(pBt->pPager, level, fullSync);
+ sqlite3BtreeLeave(p);
+ return SQLITE_OK;
+}
+#endif
+
+/*
+** Return TRUE if the given btree is set to safety level 1. In other
+** words, return TRUE if no sync() occurs on the disk files.
+*/
+SQLITE_PRIVATE int sqlite3BtreeSyncDisabled(Btree *p){
+ BtShared *pBt = p->pBt;
+ int rc;
+ assert( sqlite3_mutex_held(p->db->mutex) );
+ sqlite3BtreeEnter(p);
+ assert( pBt && pBt->pPager );
+ rc = sqlite3PagerNosync(pBt->pPager);
+ sqlite3BtreeLeave(p);
+ return rc;
+}
+
+#if !defined(SQLITE_OMIT_PAGER_PRAGMAS) || !defined(SQLITE_OMIT_VACUUM)
+/*
+** Change the default pages size and the number of reserved bytes per page.
+**
+** The page size must be a power of 2 between 512 and 65536. If the page
+** size supplied does not meet this constraint then the page size is not
+** changed.
+**
+** Page sizes are constrained to be a power of two so that the region
+** of the database file used for locking (beginning at PENDING_BYTE,
+** the first byte past the 1GB boundary, 0x40000000) needs to occur
+** at the beginning of a page.
+**
+** If parameter nReserve is less than zero, then the number of reserved
+** bytes per page is left unchanged.
+*/
+SQLITE_PRIVATE int sqlite3BtreeSetPageSize(Btree *p, int pageSize, int nReserve){
+ int rc = SQLITE_OK;
+ BtShared *pBt = p->pBt;
+ assert( nReserve>=-1 && nReserve<=255 );
+ sqlite3BtreeEnter(p);
+ if( pBt->pageSizeFixed ){
+ sqlite3BtreeLeave(p);
+ return SQLITE_READONLY;
+ }
+ if( nReserve<0 ){
+ nReserve = pBt->pageSize - pBt->usableSize;
+ }
+ assert( nReserve>=0 && nReserve<=255 );
+ if( pageSize>=512 && pageSize<=SQLITE_MAX_PAGE_SIZE &&
+ ((pageSize-1)&pageSize)==0 ){
+ assert( (pageSize & 7)==0 );
+ assert( !pBt->pPage1 && !pBt->pCursor );
+ pBt->pageSize = (u16)pageSize;
+ freeTempSpace(pBt);
+ rc = sqlite3PagerSetPagesize(pBt->pPager, &pBt->pageSize);
+ }
+ pBt->usableSize = pBt->pageSize - (u16)nReserve;
+ sqlite3BtreeLeave(p);
+ return rc;
+}
+
+/*
+** Return the currently defined page size
+*/
+SQLITE_PRIVATE int sqlite3BtreeGetPageSize(Btree *p){
+ return p->pBt->pageSize;
+}
+SQLITE_PRIVATE int sqlite3BtreeGetReserve(Btree *p){
+ int n;
+ sqlite3BtreeEnter(p);
+ n = p->pBt->pageSize - p->pBt->usableSize;
+ sqlite3BtreeLeave(p);
+ return n;
+}
+
+/*
+** Set the maximum page count for a database if mxPage is positive.
+** No changes are made if mxPage is 0 or negative.
+** Regardless of the value of mxPage, return the maximum page count.
+*/
+SQLITE_PRIVATE int sqlite3BtreeMaxPageCount(Btree *p, int mxPage){
+ int n;
+ sqlite3BtreeEnter(p);
+ n = sqlite3PagerMaxPageCount(p->pBt->pPager, mxPage);
+ sqlite3BtreeLeave(p);
+ return n;
+}
+#endif /* !defined(SQLITE_OMIT_PAGER_PRAGMAS) || !defined(SQLITE_OMIT_VACUUM) */
+
+/*
+** Change the 'auto-vacuum' property of the database. If the 'autoVacuum'
+** parameter is non-zero, then auto-vacuum mode is enabled. If zero, it
+** is disabled. The default value for the auto-vacuum property is
+** determined by the SQLITE_DEFAULT_AUTOVACUUM macro.
+*/
+SQLITE_PRIVATE int sqlite3BtreeSetAutoVacuum(Btree *p, int autoVacuum){
+#ifdef SQLITE_OMIT_AUTOVACUUM
+ return SQLITE_READONLY;
+#else
+ BtShared *pBt = p->pBt;
+ int rc = SQLITE_OK;
+ u8 av = autoVacuum ?1:0;
+
+ sqlite3BtreeEnter(p);
+ if( pBt->pageSizeFixed && av!=pBt->autoVacuum ){
+ rc = SQLITE_READONLY;
+ }else{
+ pBt->autoVacuum = av;
+ }
+ sqlite3BtreeLeave(p);
+ return rc;
+#endif
+}
+
+/*
+** Return the value of the 'auto-vacuum' property. If auto-vacuum is
+** enabled 1 is returned. Otherwise 0.
+*/
+SQLITE_PRIVATE int sqlite3BtreeGetAutoVacuum(Btree *p){
+#ifdef SQLITE_OMIT_AUTOVACUUM
+ return BTREE_AUTOVACUUM_NONE;
+#else
+ int rc;
+ sqlite3BtreeEnter(p);
+ rc = (
+ (!p->pBt->autoVacuum)?BTREE_AUTOVACUUM_NONE:
+ (!p->pBt->incrVacuum)?BTREE_AUTOVACUUM_FULL:
+ BTREE_AUTOVACUUM_INCR
+ );
+ sqlite3BtreeLeave(p);
+ return rc;
+#endif
+}
+
+
+/*
+** Get a reference to pPage1 of the database file. This will
+** also acquire a readlock on that file.
+**
+** SQLITE_OK is returned on success. If the file is not a
+** well-formed database file, then SQLITE_CORRUPT is returned.
+** SQLITE_BUSY is returned if the database is locked. SQLITE_NOMEM
+** is returned if we run out of memory.
+*/
+static int lockBtree(BtShared *pBt){
+ int rc;
+ MemPage *pPage1;
+ int nPage;
+
+ assert( sqlite3_mutex_held(pBt->mutex) );
+ if( pBt->pPage1 ) return SQLITE_OK;
+ rc = sqlite3BtreeGetPage(pBt, 1, &pPage1, 0);
+ if( rc!=SQLITE_OK ) return rc;
+
+ /* Do some checking to help insure the file we opened really is
+ ** a valid database file.
+ */
+ rc = sqlite3PagerPagecount(pBt->pPager, &nPage);
+ if( rc!=SQLITE_OK ){
+ goto page1_init_failed;
+ }else if( nPage>0 ){
+ int pageSize;
+ int usableSize;
+ u8 *page1 = pPage1->aData;
+ rc = SQLITE_NOTADB;
+ if( memcmp(page1, zMagicHeader, 16)!=0 ){
+ goto page1_init_failed;
+ }
+ if( page1[18]>1 ){
+ pBt->readOnly = 1;
+ }
+ if( page1[19]>1 ){
+ goto page1_init_failed;
+ }
+
+ /* The maximum embedded fraction must be exactly 25%. And the minimum
+ ** embedded fraction must be 12.5% for both leaf-data and non-leaf-data.
+ ** The original design allowed these amounts to vary, but as of
+ ** version 3.6.0, we require them to be fixed.
+ */
+ if( memcmp(&page1[21], "\100\040\040",3)!=0 ){
+ goto page1_init_failed;
+ }
+ pageSize = get2byte(&page1[16]);
+ if( ((pageSize-1)&pageSize)!=0 || pageSize<512 ||
+ (SQLITE_MAX_PAGE_SIZE<32768 && pageSize>SQLITE_MAX_PAGE_SIZE)
+ ){
+ goto page1_init_failed;
+ }
+ assert( (pageSize & 7)==0 );
+ usableSize = pageSize - page1[20];
+ if( pageSize!=pBt->pageSize ){
+ /* After reading the first page of the database assuming a page size
+ ** of BtShared.pageSize, we have discovered that the page-size is
+ ** actually pageSize. Unlock the database, leave pBt->pPage1 at
+ ** zero and return SQLITE_OK. The caller will call this function
+ ** again with the correct page-size.
+ */
+ releasePage(pPage1);
+ pBt->usableSize = (u16)usableSize;
+ pBt->pageSize = (u16)pageSize;
+ freeTempSpace(pBt);
+ sqlite3PagerSetPagesize(pBt->pPager, &pBt->pageSize);
+ return SQLITE_OK;
+ }
+ if( usableSize<500 ){
+ goto page1_init_failed;
+ }
+ pBt->pageSize = (u16)pageSize;
+ pBt->usableSize = (u16)usableSize;
+#ifndef SQLITE_OMIT_AUTOVACUUM
+ pBt->autoVacuum = (get4byte(&page1[36 + 4*4])?1:0);
+ pBt->incrVacuum = (get4byte(&page1[36 + 7*4])?1:0);
+#endif
+ }
+
+ /* maxLocal is the maximum amount of payload to store locally for
+ ** a cell. Make sure it is small enough so that at least minFanout
+ ** cells can will fit on one page. We assume a 10-byte page header.
+ ** Besides the payload, the cell must store:
+ ** 2-byte pointer to the cell
+ ** 4-byte child pointer
+ ** 9-byte nKey value
+ ** 4-byte nData value
+ ** 4-byte overflow page pointer
+ ** So a cell consists of a 2-byte poiner, a header which is as much as
+ ** 17 bytes long, 0 to N bytes of payload, and an optional 4 byte overflow
+ ** page pointer.
+ */
+ pBt->maxLocal = (pBt->usableSize-12)*64/255 - 23;
+ pBt->minLocal = (pBt->usableSize-12)*32/255 - 23;
+ pBt->maxLeaf = pBt->usableSize - 35;
+ pBt->minLeaf = (pBt->usableSize-12)*32/255 - 23;
+ assert( pBt->maxLeaf + 23 <= MX_CELL_SIZE(pBt) );
+ pBt->pPage1 = pPage1;
+ return SQLITE_OK;
+
+page1_init_failed:
+ releasePage(pPage1);
+ pBt->pPage1 = 0;
+ return rc;
+}
+
+/*
+** This routine works like lockBtree() except that it also invokes the
+** busy callback if there is lock contention.
+*/
+static int lockBtreeWithRetry(Btree *pRef){
+ int rc = SQLITE_OK;
+
+ assert( sqlite3BtreeHoldsMutex(pRef) );
+ if( pRef->inTrans==TRANS_NONE ){
+ u8 inTransaction = pRef->pBt->inTransaction;
+ btreeIntegrity(pRef);
+ rc = sqlite3BtreeBeginTrans(pRef, 0);
+ pRef->pBt->inTransaction = inTransaction;
+ pRef->inTrans = TRANS_NONE;
+ if( rc==SQLITE_OK ){
+ pRef->pBt->nTransaction--;
+ }
+ btreeIntegrity(pRef);
+ }
+ return rc;
+}
+
+
+/*
+** If there are no outstanding cursors and we are not in the middle
+** of a transaction but there is a read lock on the database, then
+** this routine unrefs the first page of the database file which
+** has the effect of releasing the read lock.
+**
+** If there are any outstanding cursors, this routine is a no-op.
+**
+** If there is a transaction in progress, this routine is a no-op.
+*/
+static void unlockBtreeIfUnused(BtShared *pBt){
+ assert( sqlite3_mutex_held(pBt->mutex) );
+ if( pBt->inTransaction==TRANS_NONE && pBt->pCursor==0 && pBt->pPage1!=0 ){
+ if( sqlite3PagerRefcount(pBt->pPager)>=1 ){
+ assert( pBt->pPage1->aData );
+#if 0
+ if( pBt->pPage1->aData==0 ){
+ MemPage *pPage = pBt->pPage1;
+ pPage->aData = sqlite3PagerGetData(pPage->pDbPage);
+ pPage->pBt = pBt;
+ pPage->pgno = 1;
+ }
+#endif
+ releasePage(pBt->pPage1);
+ }
+ pBt->pPage1 = 0;
+ pBt->inStmt = 0;
+ }
+}
+
+/*
+** Create a new database by initializing the first page of the
+** file.
+*/
+static int newDatabase(BtShared *pBt){
+ MemPage *pP1;
+ unsigned char *data;
+ int rc;
+ int nPage;
+
+ assert( sqlite3_mutex_held(pBt->mutex) );
+ rc = sqlite3PagerPagecount(pBt->pPager, &nPage);
+ if( rc!=SQLITE_OK || nPage>0 ){
+ return rc;
+ }
+ pP1 = pBt->pPage1;
+ assert( pP1!=0 );
+ data = pP1->aData;
+ rc = sqlite3PagerWrite(pP1->pDbPage);
+ if( rc ) return rc;
+ memcpy(data, zMagicHeader, sizeof(zMagicHeader));
+ assert( sizeof(zMagicHeader)==16 );
+ put2byte(&data[16], pBt->pageSize);
+ data[18] = 1;
+ data[19] = 1;
+ assert( pBt->usableSize<=pBt->pageSize && pBt->usableSize+255>=pBt->pageSize);
+ data[20] = (u8)(pBt->pageSize - pBt->usableSize);
+ data[21] = 64;
+ data[22] = 32;
+ data[23] = 32;
+ memset(&data[24], 0, 100-24);
+ zeroPage(pP1, PTF_INTKEY|PTF_LEAF|PTF_LEAFDATA );
+ pBt->pageSizeFixed = 1;
+#ifndef SQLITE_OMIT_AUTOVACUUM
+ assert( pBt->autoVacuum==1 || pBt->autoVacuum==0 );
+ assert( pBt->incrVacuum==1 || pBt->incrVacuum==0 );
+ put4byte(&data[36 + 4*4], pBt->autoVacuum);
+ put4byte(&data[36 + 7*4], pBt->incrVacuum);
+#endif
+ return SQLITE_OK;
+}
+
+/*
+** Attempt to start a new transaction. A write-transaction
+** is started if the second argument is nonzero, otherwise a read-
+** transaction. If the second argument is 2 or more and exclusive
+** transaction is started, meaning that no other process is allowed
+** to access the database. A preexisting transaction may not be
+** upgraded to exclusive by calling this routine a second time - the
+** exclusivity flag only works for a new transaction.
+**
+** A write-transaction must be started before attempting any
+** changes to the database. None of the following routines
+** will work unless a transaction is started first:
+**
+** sqlite3BtreeCreateTable()
+** sqlite3BtreeCreateIndex()
+** sqlite3BtreeClearTable()
+** sqlite3BtreeDropTable()
+** sqlite3BtreeInsert()
+** sqlite3BtreeDelete()
+** sqlite3BtreeUpdateMeta()
+**
+** If an initial attempt to acquire the lock fails because of lock contention
+** and the database was previously unlocked, then invoke the busy handler
+** if there is one. But if there was previously a read-lock, do not
+** invoke the busy handler - just return SQLITE_BUSY. SQLITE_BUSY is
+** returned when there is already a read-lock in order to avoid a deadlock.
+**
+** Suppose there are two processes A and B. A has a read lock and B has
+** a reserved lock. B tries to promote to exclusive but is blocked because
+** of A's read lock. A tries to promote to reserved but is blocked by B.
+** One or the other of the two processes must give way or there can be
+** no progress. By returning SQLITE_BUSY and not invoking the busy callback
+** when A already has a read lock, we encourage A to give up and let B
+** proceed.
+*/
+SQLITE_PRIVATE int sqlite3BtreeBeginTrans(Btree *p, int wrflag){
+ BtShared *pBt = p->pBt;
+ int rc = SQLITE_OK;
+
+ sqlite3BtreeEnter(p);
+ pBt->db = p->db;
+ btreeIntegrity(p);
+
+ /* If the btree is already in a write-transaction, or it
+ ** is already in a read-transaction and a read-transaction
+ ** is requested, this is a no-op.
+ */
+ if( p->inTrans==TRANS_WRITE || (p->inTrans==TRANS_READ && !wrflag) ){
+ goto trans_begun;
+ }
+
+ /* Write transactions are not possible on a read-only database */
+ if( pBt->readOnly && wrflag ){
+ rc = SQLITE_READONLY;
+ goto trans_begun;
+ }
+
+ /* If another database handle has already opened a write transaction
+ ** on this shared-btree structure and a second write transaction is
+ ** requested, return SQLITE_BUSY.
+ */
+ if( pBt->inTransaction==TRANS_WRITE && wrflag ){
+ rc = SQLITE_BUSY;
+ goto trans_begun;
+ }
+
+#ifndef SQLITE_OMIT_SHARED_CACHE
+ if( wrflag>1 ){
+ BtLock *pIter;
+ for(pIter=pBt->pLock; pIter; pIter=pIter->pNext){
+ if( pIter->pBtree!=p ){
+ rc = SQLITE_BUSY;
+ goto trans_begun;
+ }
+ }
+ }
+#endif
+
+ do {
+ if( pBt->pPage1==0 ){
+ do{
+ rc = lockBtree(pBt);
+ }while( pBt->pPage1==0 && rc==SQLITE_OK );
+ }
+
+ if( rc==SQLITE_OK && wrflag ){
+ if( pBt->readOnly ){
+ rc = SQLITE_READONLY;
+ }else{
+ rc = sqlite3PagerBegin(pBt->pPager, wrflag>1);
+ if( rc==SQLITE_OK ){
+ rc = newDatabase(pBt);
+ }
+ }
+ }
+
+ if( rc==SQLITE_OK ){
+ if( wrflag ) pBt->inStmt = 0;
+ }else{
+ unlockBtreeIfUnused(pBt);
+ }
+ }while( rc==SQLITE_BUSY && pBt->inTransaction==TRANS_NONE &&
+ btreeInvokeBusyHandler(pBt) );
+
+ if( rc==SQLITE_OK ){
+ if( p->inTrans==TRANS_NONE ){
+ pBt->nTransaction++;
+ }
+ p->inTrans = (wrflag?TRANS_WRITE:TRANS_READ);
+ if( p->inTrans>pBt->inTransaction ){
+ pBt->inTransaction = p->inTrans;
+ }
+#ifndef SQLITE_OMIT_SHARED_CACHE
+ if( wrflag>1 ){
+ assert( !pBt->pExclusive );
+ pBt->pExclusive = p;
+ }
+#endif
+ }
+
+
+trans_begun:
+ if( rc==SQLITE_OK && wrflag ){
+ /* This call makes sure that the pager has the correct number of
+ ** open savepoints. If the second parameter is greater than 0 and
+ ** the sub-journal is not already open, then it will be opened here.
+ */
+ rc = sqlite3PagerOpenSavepoint(pBt->pPager, p->db->nSavepoint);
+ }
+
+ btreeIntegrity(p);
+ sqlite3BtreeLeave(p);
+ return rc;
+}
+
+#ifndef SQLITE_OMIT_AUTOVACUUM
+
+/*
+** Set the pointer-map entries for all children of page pPage. Also, if
+** pPage contains cells that point to overflow pages, set the pointer
+** map entries for the overflow pages as well.
+*/
+static int setChildPtrmaps(MemPage *pPage){
+ int i; /* Counter variable */
+ int nCell; /* Number of cells in page pPage */
+ int rc; /* Return code */
+ BtShared *pBt = pPage->pBt;
+ u8 isInitOrig = pPage->isInit;
+ Pgno pgno = pPage->pgno;
+
+ assert( sqlite3_mutex_held(pPage->pBt->mutex) );
+ rc = sqlite3BtreeInitPage(pPage);
+ if( rc!=SQLITE_OK ){
+ goto set_child_ptrmaps_out;
+ }
+ nCell = pPage->nCell;
+
+ for(i=0; i<nCell; i++){
+ u8 *pCell = findCell(pPage, i);
+
+ rc = ptrmapPutOvflPtr(pPage, pCell);
+ if( rc!=SQLITE_OK ){
+ goto set_child_ptrmaps_out;
+ }
+
+ if( !pPage->leaf ){
+ Pgno childPgno = get4byte(pCell);
+ rc = ptrmapPut(pBt, childPgno, PTRMAP_BTREE, pgno);
+ if( rc!=SQLITE_OK ) goto set_child_ptrmaps_out;
+ }
+ }
+
+ if( !pPage->leaf ){
+ Pgno childPgno = get4byte(&pPage->aData[pPage->hdrOffset+8]);
+ rc = ptrmapPut(pBt, childPgno, PTRMAP_BTREE, pgno);
+ }
+
+set_child_ptrmaps_out:
+ pPage->isInit = isInitOrig;
+ return rc;
+}
+
+/*
+** Somewhere on pPage, which is guarenteed to be a btree page, not an overflow
+** page, is a pointer to page iFrom. Modify this pointer so that it points to
+** iTo. Parameter eType describes the type of pointer to be modified, as
+** follows:
+**
+** PTRMAP_BTREE: pPage is a btree-page. The pointer points at a child
+** page of pPage.
+**
+** PTRMAP_OVERFLOW1: pPage is a btree-page. The pointer points at an overflow
+** page pointed to by one of the cells on pPage.
+**
+** PTRMAP_OVERFLOW2: pPage is an overflow-page. The pointer points at the next
+** overflow page in the list.
+*/
+static int modifyPagePointer(MemPage *pPage, Pgno iFrom, Pgno iTo, u8 eType){
+ assert( sqlite3_mutex_held(pPage->pBt->mutex) );
+ assert( sqlite3PagerIswriteable(pPage->pDbPage) );
+ if( eType==PTRMAP_OVERFLOW2 ){
+ /* The pointer is always the first 4 bytes of the page in this case. */
+ if( get4byte(pPage->aData)!=iFrom ){
+ return SQLITE_CORRUPT_BKPT;
+ }
+ put4byte(pPage->aData, iTo);
+ }else{
+ u8 isInitOrig = pPage->isInit;
+ int i;
+ int nCell;
+
+ sqlite3BtreeInitPage(pPage);
+ nCell = pPage->nCell;
+
+ for(i=0; i<nCell; i++){
+ u8 *pCell = findCell(pPage, i);
+ if( eType==PTRMAP_OVERFLOW1 ){
+ CellInfo info;
+ sqlite3BtreeParseCellPtr(pPage, pCell, &info);
+ if( info.iOverflow ){
+ if( iFrom==get4byte(&pCell[info.iOverflow]) ){
+ put4byte(&pCell[info.iOverflow], iTo);
+ break;
+ }
+ }
+ }else{
+ if( get4byte(pCell)==iFrom ){
+ put4byte(pCell, iTo);
+ break;
+ }
+ }
+ }
+
+ if( i==nCell ){
+ if( eType!=PTRMAP_BTREE ||
+ get4byte(&pPage->aData[pPage->hdrOffset+8])!=iFrom ){
+ return SQLITE_CORRUPT_BKPT;
+ }
+ put4byte(&pPage->aData[pPage->hdrOffset+8], iTo);
+ }
+
+ pPage->isInit = isInitOrig;
+ }
+ return SQLITE_OK;
+}
+
+
+/*
+** Move the open database page pDbPage to location iFreePage in the
+** database. The pDbPage reference remains valid.
+*/
+static int relocatePage(
+ BtShared *pBt, /* Btree */
+ MemPage *pDbPage, /* Open page to move */
+ u8 eType, /* Pointer map 'type' entry for pDbPage */
+ Pgno iPtrPage, /* Pointer map 'page-no' entry for pDbPage */
+ Pgno iFreePage, /* The location to move pDbPage to */
+ int isCommit
+){
+ MemPage *pPtrPage; /* The page that contains a pointer to pDbPage */
+ Pgno iDbPage = pDbPage->pgno;
+ Pager *pPager = pBt->pPager;
+ int rc;
+
+ assert( eType==PTRMAP_OVERFLOW2 || eType==PTRMAP_OVERFLOW1 ||
+ eType==PTRMAP_BTREE || eType==PTRMAP_ROOTPAGE );
+ assert( sqlite3_mutex_held(pBt->mutex) );
+ assert( pDbPage->pBt==pBt );
+
+ /* Move page iDbPage from its current location to page number iFreePage */
+ TRACE(("AUTOVACUUM: Moving %d to free page %d (ptr page %d type %d)\n",
+ iDbPage, iFreePage, iPtrPage, eType));
+ rc = sqlite3PagerMovepage(pPager, pDbPage->pDbPage, iFreePage, isCommit);
+ if( rc!=SQLITE_OK ){
+ return rc;
+ }
+ pDbPage->pgno = iFreePage;
+
+ /* If pDbPage was a btree-page, then it may have child pages and/or cells
+ ** that point to overflow pages. The pointer map entries for all these
+ ** pages need to be changed.
+ **
+ ** If pDbPage is an overflow page, then the first 4 bytes may store a
+ ** pointer to a subsequent overflow page. If this is the case, then
+ ** the pointer map needs to be updated for the subsequent overflow page.
+ */
+ if( eType==PTRMAP_BTREE || eType==PTRMAP_ROOTPAGE ){
+ rc = setChildPtrmaps(pDbPage);
+ if( rc!=SQLITE_OK ){
+ return rc;
+ }
+ }else{
+ Pgno nextOvfl = get4byte(pDbPage->aData);
+ if( nextOvfl!=0 ){
+ rc = ptrmapPut(pBt, nextOvfl, PTRMAP_OVERFLOW2, iFreePage);
+ if( rc!=SQLITE_OK ){
+ return rc;
+ }
+ }
+ }
+
+ /* Fix the database pointer on page iPtrPage that pointed at iDbPage so
+ ** that it points at iFreePage. Also fix the pointer map entry for
+ ** iPtrPage.
+ */
+ if( eType!=PTRMAP_ROOTPAGE ){
+ rc = sqlite3BtreeGetPage(pBt, iPtrPage, &pPtrPage, 0);
+ if( rc!=SQLITE_OK ){
+ return rc;
+ }
+ rc = sqlite3PagerWrite(pPtrPage->pDbPage);
+ if( rc!=SQLITE_OK ){
+ releasePage(pPtrPage);
+ return rc;
+ }
+ rc = modifyPagePointer(pPtrPage, iDbPage, iFreePage, eType);
+ releasePage(pPtrPage);
+ if( rc==SQLITE_OK ){
+ rc = ptrmapPut(pBt, iFreePage, eType, iPtrPage);
+ }
+ }
+ return rc;
+}
+
+/* Forward declaration required by incrVacuumStep(). */
+static int allocateBtreePage(BtShared *, MemPage **, Pgno *, Pgno, u8);
+
+/*
+** Perform a single step of an incremental-vacuum. If successful,
+** return SQLITE_OK. If there is no work to do (and therefore no
+** point in calling this function again), return SQLITE_DONE.
+**
+** More specificly, this function attempts to re-organize the
+** database so that the last page of the file currently in use
+** is no longer in use.
+**
+** If the nFin parameter is non-zero, the implementation assumes
+** that the caller will keep calling incrVacuumStep() until
+** it returns SQLITE_DONE or an error, and that nFin is the
+** number of pages the database file will contain after this
+** process is complete.
+*/
+static int incrVacuumStep(BtShared *pBt, Pgno nFin, Pgno iLastPg){
+ Pgno nFreeList; /* Number of pages still on the free-list */
+
+ assert( sqlite3_mutex_held(pBt->mutex) );
+
+ if( !PTRMAP_ISPAGE(pBt, iLastPg) && iLastPg!=PENDING_BYTE_PAGE(pBt) ){
+ int rc;
+ u8 eType;
+ Pgno iPtrPage;
+
+ nFreeList = get4byte(&pBt->pPage1->aData[36]);
+ if( nFreeList==0 || nFin==iLastPg ){
+ return SQLITE_DONE;
+ }
+
+ rc = ptrmapGet(pBt, iLastPg, &eType, &iPtrPage);
+ if( rc!=SQLITE_OK ){
+ return rc;
+ }
+ if( eType==PTRMAP_ROOTPAGE ){
+ return SQLITE_CORRUPT_BKPT;
+ }
+
+ if( eType==PTRMAP_FREEPAGE ){
+ if( nFin==0 ){
+ /* Remove the page from the files free-list. This is not required
+ ** if nFin is non-zero. In that case, the free-list will be
+ ** truncated to zero after this function returns, so it doesn't
+ ** matter if it still contains some garbage entries.
+ */
+ Pgno iFreePg;
+ MemPage *pFreePg;
+ rc = allocateBtreePage(pBt, &pFreePg, &iFreePg, iLastPg, 1);
+ if( rc!=SQLITE_OK ){
+ return rc;
+ }
+ assert( iFreePg==iLastPg );
+ releasePage(pFreePg);
+ }
+ } else {
+ Pgno iFreePg; /* Index of free page to move pLastPg to */
+ MemPage *pLastPg;
+
+ rc = sqlite3BtreeGetPage(pBt, iLastPg, &pLastPg, 0);
+ if( rc!=SQLITE_OK ){
+ return rc;
+ }
+
+ /* If nFin is zero, this loop runs exactly once and page pLastPg
+ ** is swapped with the first free page pulled off the free list.
+ **
+ ** On the other hand, if nFin is greater than zero, then keep
+ ** looping until a free-page located within the first nFin pages
+ ** of the file is found.
+ */
+ do {
+ MemPage *pFreePg;
+ rc = allocateBtreePage(pBt, &pFreePg, &iFreePg, 0, 0);
+ if( rc!=SQLITE_OK ){
+ releasePage(pLastPg);
+ return rc;
+ }
+ releasePage(pFreePg);
+ }while( nFin!=0 && iFreePg>nFin );
+ assert( iFreePg<iLastPg );
+
+ rc = sqlite3PagerWrite(pLastPg->pDbPage);
+ if( rc==SQLITE_OK ){
+ rc = relocatePage(pBt, pLastPg, eType, iPtrPage, iFreePg, nFin!=0);
+ }
+ releasePage(pLastPg);
+ if( rc!=SQLITE_OK ){
+ return rc;
+ }
+ }
+ }
+
+ if( nFin==0 ){
+ iLastPg--;
+ while( iLastPg==PENDING_BYTE_PAGE(pBt)||PTRMAP_ISPAGE(pBt, iLastPg) ){
+ iLastPg--;
+ }
+ sqlite3PagerTruncateImage(pBt->pPager, iLastPg);
+ }
+ return SQLITE_OK;
+}
+
+/*
+** A write-transaction must be opened before calling this function.
+** It performs a single unit of work towards an incremental vacuum.
+**
+** If the incremental vacuum is finished after this function has run,
+** SQLITE_DONE is returned. If it is not finished, but no error occured,
+** SQLITE_OK is returned. Otherwise an SQLite error code.
+*/
+SQLITE_PRIVATE int sqlite3BtreeIncrVacuum(Btree *p){
+ int rc;
+ BtShared *pBt = p->pBt;
+
+ sqlite3BtreeEnter(p);
+ pBt->db = p->db;
+ assert( pBt->inTransaction==TRANS_WRITE && p->inTrans==TRANS_WRITE );
+ if( !pBt->autoVacuum ){
+ rc = SQLITE_DONE;
+ }else{
+ invalidateAllOverflowCache(pBt);
+ rc = incrVacuumStep(pBt, 0, pagerPagecount(pBt));
+ }
+ sqlite3BtreeLeave(p);
+ return rc;
+}
+
+/*
+** This routine is called prior to sqlite3PagerCommit when a transaction
+** is commited for an auto-vacuum database.
+**
+** If SQLITE_OK is returned, then *pnTrunc is set to the number of pages
+** the database file should be truncated to during the commit process.
+** i.e. the database has been reorganized so that only the first *pnTrunc
+** pages are in use.
+*/
+static int autoVacuumCommit(BtShared *pBt){
+ int rc = SQLITE_OK;
+ Pager *pPager = pBt->pPager;
+ VVA_ONLY( int nRef = sqlite3PagerRefcount(pPager) );
+
+ assert( sqlite3_mutex_held(pBt->mutex) );
+ invalidateAllOverflowCache(pBt);
+ assert(pBt->autoVacuum);
+ if( !pBt->incrVacuum ){
+ Pgno nFin;
+ Pgno nFree;
+ Pgno nPtrmap;
+ Pgno iFree;
+ const int pgsz = pBt->pageSize;
+ Pgno nOrig = pagerPagecount(pBt);
+
+ if( PTRMAP_ISPAGE(pBt, nOrig) ){
+ return SQLITE_CORRUPT_BKPT;
+ }
+ if( nOrig==PENDING_BYTE_PAGE(pBt) ){
+ nOrig--;
+ }
+ nFree = get4byte(&pBt->pPage1->aData[36]);
+ nPtrmap = (nFree-nOrig+PTRMAP_PAGENO(pBt, nOrig)+pgsz/5)/(pgsz/5);
+ nFin = nOrig - nFree - nPtrmap;
+ if( nOrig>PENDING_BYTE_PAGE(pBt) && nFin<=PENDING_BYTE_PAGE(pBt) ){
+ nFin--;
+ }
+ while( PTRMAP_ISPAGE(pBt, nFin) || nFin==PENDING_BYTE_PAGE(pBt) ){
+ nFin--;
+ }
+
+ for(iFree=nOrig; iFree>nFin && rc==SQLITE_OK; iFree--){
+ rc = incrVacuumStep(pBt, nFin, iFree);
+ }
+ if( (rc==SQLITE_DONE || rc==SQLITE_OK) && nFree>0 ){
+ rc = SQLITE_OK;
+ rc = sqlite3PagerWrite(pBt->pPage1->pDbPage);
+ put4byte(&pBt->pPage1->aData[32], 0);
+ put4byte(&pBt->pPage1->aData[36], 0);
+ sqlite3PagerTruncateImage(pBt->pPager, nFin);
+ }
+ if( rc!=SQLITE_OK ){
+ sqlite3PagerRollback(pPager);
+ }
+ }
+
+ assert( nRef==sqlite3PagerRefcount(pPager) );
+ return rc;
+}
+
+#endif /* ifndef SQLITE_OMIT_AUTOVACUUM */
+
+/*
+** This routine does the first phase of a two-phase commit. This routine
+** causes a rollback journal to be created (if it does not already exist)
+** and populated with enough information so that if a power loss occurs
+** the database can be restored to its original state by playing back
+** the journal. Then the contents of the journal are flushed out to
+** the disk. After the journal is safely on oxide, the changes to the
+** database are written into the database file and flushed to oxide.
+** At the end of this call, the rollback journal still exists on the
+** disk and we are still holding all locks, so the transaction has not
+** committed. See sqlite3BtreeCommit() for the second phase of the
+** commit process.
+**
+** This call is a no-op if no write-transaction is currently active on pBt.
+**
+** Otherwise, sync the database file for the btree pBt. zMaster points to
+** the name of a master journal file that should be written into the
+** individual journal file, or is NULL, indicating no master journal file
+** (single database transaction).
+**
+** When this is called, the master journal should already have been
+** created, populated with this journal pointer and synced to disk.
+**
+** Once this is routine has returned, the only thing required to commit
+** the write-transaction for this database file is to delete the journal.
+*/
+SQLITE_PRIVATE int sqlite3BtreeCommitPhaseOne(Btree *p, const char *zMaster){
+ int rc = SQLITE_OK;
+ if( p->inTrans==TRANS_WRITE ){
+ BtShared *pBt = p->pBt;
+ sqlite3BtreeEnter(p);
+ pBt->db = p->db;
+#ifndef SQLITE_OMIT_AUTOVACUUM
+ if( pBt->autoVacuum ){
+ rc = autoVacuumCommit(pBt);
+ if( rc!=SQLITE_OK ){
+ sqlite3BtreeLeave(p);
+ return rc;
+ }
+ }
+#endif
+ rc = sqlite3PagerCommitPhaseOne(pBt->pPager, zMaster, 0);
+ sqlite3BtreeLeave(p);
+ }
+ return rc;
+}
+
+/*
+** Commit the transaction currently in progress.
+**
+** This routine implements the second phase of a 2-phase commit. The
+** sqlite3BtreeSync() routine does the first phase and should be invoked
+** prior to calling this routine. The sqlite3BtreeSync() routine did
+** all the work of writing information out to disk and flushing the
+** contents so that they are written onto the disk platter. All this
+** routine has to do is delete or truncate the rollback journal
+** (which causes the transaction to commit) and drop locks.
+**
+** This will release the write lock on the database file. If there
+** are no active cursors, it also releases the read lock.
+*/
+SQLITE_PRIVATE int sqlite3BtreeCommitPhaseTwo(Btree *p){
+ BtShared *pBt = p->pBt;
+
+ sqlite3BtreeEnter(p);
+ pBt->db = p->db;
+ btreeIntegrity(p);
+
+ /* If the handle has a write-transaction open, commit the shared-btrees
+ ** transaction and set the shared state to TRANS_READ.
+ */
+ if( p->inTrans==TRANS_WRITE ){
+ int rc;
+ assert( pBt->inTransaction==TRANS_WRITE );
+ assert( pBt->nTransaction>0 );
+ rc = sqlite3PagerCommitPhaseTwo(pBt->pPager);
+ if( rc!=SQLITE_OK ){
+ sqlite3BtreeLeave(p);
+ return rc;
+ }
+ pBt->inTransaction = TRANS_READ;
+ pBt->inStmt = 0;
+ }
+ unlockAllTables(p);
+
+ /* If the handle has any kind of transaction open, decrement the transaction
+ ** count of the shared btree. If the transaction count reaches 0, set
+ ** the shared state to TRANS_NONE. The unlockBtreeIfUnused() call below
+ ** will unlock the pager.
+ */
+ if( p->inTrans!=TRANS_NONE ){
+ pBt->nTransaction--;
+ if( 0==pBt->nTransaction ){
+ pBt->inTransaction = TRANS_NONE;
+ }
+ }
+
+ /* Set the handles current transaction state to TRANS_NONE and unlock
+ ** the pager if this call closed the only read or write transaction.
+ */
+ btreeClearHasContent(pBt);
+ p->inTrans = TRANS_NONE;
+ unlockBtreeIfUnused(pBt);
+
+ btreeIntegrity(p);
+ sqlite3BtreeLeave(p);
+ return SQLITE_OK;
+}
+
+/*
+** Do both phases of a commit.
+*/
+SQLITE_PRIVATE int sqlite3BtreeCommit(Btree *p){
+ int rc;
+ sqlite3BtreeEnter(p);
+ rc = sqlite3BtreeCommitPhaseOne(p, 0);
+ if( rc==SQLITE_OK ){
+ rc = sqlite3BtreeCommitPhaseTwo(p);
+ }
+ sqlite3BtreeLeave(p);
+ return rc;
+}
+
+#ifndef NDEBUG
+/*
+** Return the number of write-cursors open on this handle. This is for use
+** in assert() expressions, so it is only compiled if NDEBUG is not
+** defined.
+**
+** For the purposes of this routine, a write-cursor is any cursor that
+** is capable of writing to the databse. That means the cursor was
+** originally opened for writing and the cursor has not be disabled
+** by having its state changed to CURSOR_FAULT.
+*/
+static int countWriteCursors(BtShared *pBt){
+ BtCursor *pCur;
+ int r = 0;
+ for(pCur=pBt->pCursor; pCur; pCur=pCur->pNext){
+ if( pCur->wrFlag && pCur->eState!=CURSOR_FAULT ) r++;
+ }
+ return r;
+}
+#endif
+
+/*
+** This routine sets the state to CURSOR_FAULT and the error
+** code to errCode for every cursor on BtShared that pBtree
+** references.
+**
+** Every cursor is tripped, including cursors that belong
+** to other database connections that happen to be sharing
+** the cache with pBtree.
+**
+** This routine gets called when a rollback occurs.
+** All cursors using the same cache must be tripped
+** to prevent them from trying to use the btree after
+** the rollback. The rollback may have deleted tables
+** or moved root pages, so it is not sufficient to
+** save the state of the cursor. The cursor must be
+** invalidated.
+*/
+SQLITE_PRIVATE void sqlite3BtreeTripAllCursors(Btree *pBtree, int errCode){
+ BtCursor *p;
+ sqlite3BtreeEnter(pBtree);
+ for(p=pBtree->pBt->pCursor; p; p=p->pNext){
+ int i;
+ sqlite3BtreeClearCursor(p);
+ p->eState = CURSOR_FAULT;
+ p->skip = errCode;
+ for(i=0; i<=p->iPage; i++){
+ releasePage(p->apPage[i]);
+ p->apPage[i] = 0;
+ }
+ }
+ sqlite3BtreeLeave(pBtree);
+}
+
+/*
+** Rollback the transaction in progress. All cursors will be
+** invalided by this operation. Any attempt to use a cursor
+** that was open at the beginning of this operation will result
+** in an error.
+**
+** This will release the write lock on the database file. If there
+** are no active cursors, it also releases the read lock.
+*/
+SQLITE_PRIVATE int sqlite3BtreeRollback(Btree *p){
+ int rc;
+ BtShared *pBt = p->pBt;
+ MemPage *pPage1;
+
+ sqlite3BtreeEnter(p);
+ pBt->db = p->db;
+ rc = saveAllCursors(pBt, 0, 0);
+#ifndef SQLITE_OMIT_SHARED_CACHE
+ if( rc!=SQLITE_OK ){
+ /* This is a horrible situation. An IO or malloc() error occured whilst
+ ** trying to save cursor positions. If this is an automatic rollback (as
+ ** the result of a constraint, malloc() failure or IO error) then
+ ** the cache may be internally inconsistent (not contain valid trees) so
+ ** we cannot simply return the error to the caller. Instead, abort
+ ** all queries that may be using any of the cursors that failed to save.
+ */
+ sqlite3BtreeTripAllCursors(p, rc);
+ }
+#endif
+ btreeIntegrity(p);
+ unlockAllTables(p);
+
+ if( p->inTrans==TRANS_WRITE ){
+ int rc2;
+
+ assert( TRANS_WRITE==pBt->inTransaction );
+ rc2 = sqlite3PagerRollback(pBt->pPager);
+ if( rc2!=SQLITE_OK ){
+ rc = rc2;
+ }
+
+ /* The rollback may have destroyed the pPage1->aData value. So
+ ** call sqlite3BtreeGetPage() on page 1 again to make
+ ** sure pPage1->aData is set correctly. */
+ if( sqlite3BtreeGetPage(pBt, 1, &pPage1, 0)==SQLITE_OK ){
+ releasePage(pPage1);
+ }
+ assert( countWriteCursors(pBt)==0 );
+ pBt->inTransaction = TRANS_READ;
+ }
+
+ if( p->inTrans!=TRANS_NONE ){
+ assert( pBt->nTransaction>0 );
+ pBt->nTransaction--;
+ if( 0==pBt->nTransaction ){
+ pBt->inTransaction = TRANS_NONE;
+ }
+ }
+
+ btreeClearHasContent(pBt);
+ p->inTrans = TRANS_NONE;
+ pBt->inStmt = 0;
+ unlockBtreeIfUnused(pBt);
+
+ btreeIntegrity(p);
+ sqlite3BtreeLeave(p);
+ return rc;
+}
+
+/*
+** Start a statement subtransaction. The subtransaction can
+** can be rolled back independently of the main transaction.
+** You must start a transaction before starting a subtransaction.
+** The subtransaction is ended automatically if the main transaction
+** commits or rolls back.
+**
+** Only one subtransaction may be active at a time. It is an error to try
+** to start a new subtransaction if another subtransaction is already active.
+**
+** Statement subtransactions are used around individual SQL statements
+** that are contained within a BEGIN...COMMIT block. If a constraint
+** error occurs within the statement, the effect of that one statement
+** can be rolled back without having to rollback the entire transaction.
+*/
+SQLITE_PRIVATE int sqlite3BtreeBeginStmt(Btree *p){
+ int rc;
+ BtShared *pBt = p->pBt;
+ sqlite3BtreeEnter(p);
+ pBt->db = p->db;
+ assert( p->inTrans==TRANS_WRITE );
+ assert( !pBt->inStmt );
+ assert( pBt->readOnly==0 );
+ if( NEVER(p->inTrans!=TRANS_WRITE || pBt->inStmt || pBt->readOnly) ){
+ rc = SQLITE_INTERNAL;
+ }else{
+ assert( pBt->inTransaction==TRANS_WRITE );
+ /* At the pager level, a statement transaction is a savepoint with
+ ** an index greater than all savepoints created explicitly using
+ ** SQL statements. It is illegal to open, release or rollback any
+ ** such savepoints while the statement transaction savepoint is active.
+ */
+ rc = sqlite3PagerOpenSavepoint(pBt->pPager, p->db->nSavepoint+1);
+ pBt->inStmt = 1;
+ }
+ sqlite3BtreeLeave(p);
+ return rc;
+}
+
+/*
+** Commit the statment subtransaction currently in progress. If no
+** subtransaction is active, this is a no-op.
+*/
+SQLITE_PRIVATE int sqlite3BtreeCommitStmt(Btree *p){
+ int rc;
+ BtShared *pBt = p->pBt;
+ sqlite3BtreeEnter(p);
+ pBt->db = p->db;
+ assert( pBt->readOnly==0 );
+ if( pBt->inStmt ){
+ int iStmtpoint = p->db->nSavepoint;
+ rc = sqlite3PagerSavepoint(pBt->pPager, SAVEPOINT_RELEASE, iStmtpoint);
+ }else{
+ rc = SQLITE_OK;
+ }
+ pBt->inStmt = 0;
+ sqlite3BtreeLeave(p);
+ return rc;
+}
+
+/*
+** Rollback the active statement subtransaction. If no subtransaction
+** is active this routine is a no-op.
+**
+** All cursors will be invalidated by this operation. Any attempt
+** to use a cursor that was open at the beginning of this operation
+** will result in an error.
+*/
+SQLITE_PRIVATE int sqlite3BtreeRollbackStmt(Btree *p){
+ int rc = SQLITE_OK;
+ BtShared *pBt = p->pBt;
+ sqlite3BtreeEnter(p);
+ pBt->db = p->db;
+ assert( pBt->readOnly==0 );
+ if( pBt->inStmt ){
+ int iStmtpoint = p->db->nSavepoint;
+ rc = sqlite3PagerSavepoint(pBt->pPager, SAVEPOINT_ROLLBACK, iStmtpoint);
+ if( rc==SQLITE_OK ){
+ rc = sqlite3PagerSavepoint(pBt->pPager, SAVEPOINT_RELEASE, iStmtpoint);
+ }
+ pBt->inStmt = 0;
+ }
+ sqlite3BtreeLeave(p);
+ return rc;
+}
+
+/*
+** The second argument to this function, op, is always SAVEPOINT_ROLLBACK
+** or SAVEPOINT_RELEASE. This function either releases or rolls back the
+** savepoint identified by parameter iSavepoint, depending on the value
+** of op.
+**
+** Normally, iSavepoint is greater than or equal to zero. However, if op is
+** SAVEPOINT_ROLLBACK, then iSavepoint may also be -1. In this case the
+** contents of the entire transaction are rolled back. This is different
+** from a normal transaction rollback, as no locks are released and the
+** transaction remains open.
+*/
+SQLITE_PRIVATE int sqlite3BtreeSavepoint(Btree *p, int op, int iSavepoint){
+ int rc = SQLITE_OK;
+ if( p && p->inTrans==TRANS_WRITE ){
+ BtShared *pBt = p->pBt;
+ assert( pBt->inStmt==0 );
+ assert( op==SAVEPOINT_RELEASE || op==SAVEPOINT_ROLLBACK );
+ assert( iSavepoint>=0 || (iSavepoint==-1 && op==SAVEPOINT_ROLLBACK) );
+ sqlite3BtreeEnter(p);
+ pBt->db = p->db;
+ rc = sqlite3PagerSavepoint(pBt->pPager, op, iSavepoint);
+ if( rc==SQLITE_OK ){
+ rc = newDatabase(pBt);
+ }
+ sqlite3BtreeLeave(p);
+ }
+ return rc;
+}
+
+/*
+** Create a new cursor for the BTree whose root is on the page
+** iTable. The act of acquiring a cursor gets a read lock on
+** the database file.
+**
+** If wrFlag==0, then the cursor can only be used for reading.
+** If wrFlag==1, then the cursor can be used for reading or for
+** writing if other conditions for writing are also met. These
+** are the conditions that must be met in order for writing to
+** be allowed:
+**
+** 1: The cursor must have been opened with wrFlag==1
+**
+** 2: Other database connections that share the same pager cache
+** but which are not in the READ_UNCOMMITTED state may not have
+** cursors open with wrFlag==0 on the same table. Otherwise
+** the changes made by this write cursor would be visible to
+** the read cursors in the other database connection.
+**
+** 3: The database must be writable (not on read-only media)
+**
+** 4: There must be an active transaction.
+**
+** No checking is done to make sure that page iTable really is the
+** root page of a b-tree. If it is not, then the cursor acquired
+** will not work correctly.
+**
+** It is assumed that the sqlite3BtreeCursorSize() bytes of memory
+** pointed to by pCur have been zeroed by the caller.
+*/
+static int btreeCursor(
+ Btree *p, /* The btree */
+ int iTable, /* Root page of table to open */
+ int wrFlag, /* 1 to write. 0 read-only */
+ struct KeyInfo *pKeyInfo, /* First arg to comparison function */
+ BtCursor *pCur /* Space for new cursor */
+){
+ int rc;
+ Pgno nPage;
+ BtShared *pBt = p->pBt;
+
+ assert( sqlite3BtreeHoldsMutex(p) );
+ assert( wrFlag==0 || wrFlag==1 );
+ if( wrFlag ){
+ assert( !pBt->readOnly );
+ if( NEVER(pBt->readOnly) ){
+ return SQLITE_READONLY;
+ }
+ if( checkReadLocks(p, iTable, 0, 0) ){
+ return SQLITE_LOCKED;
+ }
+ }
+
+ if( pBt->pPage1==0 ){
+ rc = lockBtreeWithRetry(p);
+ if( rc!=SQLITE_OK ){
+ return rc;
+ }
+ }
+ pCur->pgnoRoot = (Pgno)iTable;
+ rc = sqlite3PagerPagecount(pBt->pPager, (int *)&nPage);
+ if( rc!=SQLITE_OK ){
+ return rc;
+ }
+ if( iTable==1 && nPage==0 ){
+ rc = SQLITE_EMPTY;
+ goto create_cursor_exception;
+ }
+ rc = getAndInitPage(pBt, pCur->pgnoRoot, &pCur->apPage[0]);
+ if( rc!=SQLITE_OK ){
+ goto create_cursor_exception;
+ }
+
+ /* Now that no other errors can occur, finish filling in the BtCursor
+ ** variables, link the cursor into the BtShared list and set *ppCur (the
+ ** output argument to this function).
+ */
+ pCur->pKeyInfo = pKeyInfo;
+ pCur->pBtree = p;
+ pCur->pBt = pBt;
+ pCur->wrFlag = (u8)wrFlag;
+ pCur->pNext = pBt->pCursor;
+ if( pCur->pNext ){
+ pCur->pNext->pPrev = pCur;
+ }
+ pBt->pCursor = pCur;
+ pCur->eState = CURSOR_INVALID;
+
+ return SQLITE_OK;
+
+create_cursor_exception:
+ releasePage(pCur->apPage[0]);
+ unlockBtreeIfUnused(pBt);
+ return rc;
+}
+SQLITE_PRIVATE int sqlite3BtreeCursor(
+ Btree *p, /* The btree */
+ int iTable, /* Root page of table to open */
+ int wrFlag, /* 1 to write. 0 read-only */
+ struct KeyInfo *pKeyInfo, /* First arg to xCompare() */
+ BtCursor *pCur /* Write new cursor here */
+){
+ int rc;
+ sqlite3BtreeEnter(p);
+ p->pBt->db = p->db;
+ rc = btreeCursor(p, iTable, wrFlag, pKeyInfo, pCur);
+ sqlite3BtreeLeave(p);
+ return rc;
+}
+SQLITE_PRIVATE int sqlite3BtreeCursorSize(){
+ return sizeof(BtCursor);
+}
+
+
+
+/*
+** Close a cursor. The read lock on the database file is released
+** when the last cursor is closed.
+*/
+SQLITE_PRIVATE int sqlite3BtreeCloseCursor(BtCursor *pCur){
+ Btree *pBtree = pCur->pBtree;
+ if( pBtree ){
+ int i;
+ BtShared *pBt = pCur->pBt;
+ sqlite3BtreeEnter(pBtree);
+ pBt->db = pBtree->db;
+ sqlite3BtreeClearCursor(pCur);
+ if( pCur->pPrev ){
+ pCur->pPrev->pNext = pCur->pNext;
+ }else{
+ pBt->pCursor = pCur->pNext;
+ }
+ if( pCur->pNext ){
+ pCur->pNext->pPrev = pCur->pPrev;
+ }
+ for(i=0; i<=pCur->iPage; i++){
+ releasePage(pCur->apPage[i]);
+ }
+ unlockBtreeIfUnused(pBt);
+ invalidateOverflowCache(pCur);
+ /* sqlite3_free(pCur); */
+ sqlite3BtreeLeave(pBtree);
+ }
+ return SQLITE_OK;
+}
+
+/*
+** Make a temporary cursor by filling in the fields of pTempCur.
+** The temporary cursor is not on the cursor list for the Btree.
+*/
+SQLITE_PRIVATE void sqlite3BtreeGetTempCursor(BtCursor *pCur, BtCursor *pTempCur){
+ int i;
+ assert( cursorHoldsMutex(pCur) );
+ memcpy(pTempCur, pCur, sizeof(BtCursor));
+ pTempCur->pNext = 0;
+ pTempCur->pPrev = 0;
+ for(i=0; i<=pTempCur->iPage; i++){
+ sqlite3PagerRef(pTempCur->apPage[i]->pDbPage);
+ }
+ assert( pTempCur->pKey==0 );
+}
+
+/*
+** Delete a temporary cursor such as was made by the CreateTemporaryCursor()
+** function above.
+*/
+SQLITE_PRIVATE void sqlite3BtreeReleaseTempCursor(BtCursor *pCur){
+ int i;
+ assert( cursorHoldsMutex(pCur) );
+ for(i=0; i<=pCur->iPage; i++){
+ sqlite3PagerUnref(pCur->apPage[i]->pDbPage);
+ }
+ sqlite3_free(pCur->pKey);
+}
+
+/*
+** Make sure the BtCursor* given in the argument has a valid
+** BtCursor.info structure. If it is not already valid, call
+** sqlite3BtreeParseCell() to fill it in.
+**
+** BtCursor.info is a cache of the information in the current cell.
+** Using this cache reduces the number of calls to sqlite3BtreeParseCell().
+**
+** 2007-06-25: There is a bug in some versions of MSVC that cause the
+** compiler to crash when getCellInfo() is implemented as a macro.
+** But there is a measureable speed advantage to using the macro on gcc
+** (when less compiler optimizations like -Os or -O0 are used and the
+** compiler is not doing agressive inlining.) So we use a real function
+** for MSVC and a macro for everything else. Ticket #2457.
+*/
+#ifndef NDEBUG
+ static void assertCellInfo(BtCursor *pCur){
+ CellInfo info;
+ int iPage = pCur->iPage;
+ memset(&info, 0, sizeof(info));
+ sqlite3BtreeParseCell(pCur->apPage[iPage], pCur->aiIdx[iPage], &info);
+ assert( memcmp(&info, &pCur->info, sizeof(info))==0 );
+ }
+#else
+ #define assertCellInfo(x)
+#endif
+#ifdef _MSC_VER
+ /* Use a real function in MSVC to work around bugs in that compiler. */
+ static void getCellInfo(BtCursor *pCur){
+ if( pCur->info.nSize==0 ){
+ int iPage = pCur->iPage;
+ sqlite3BtreeParseCell(pCur->apPage[iPage],pCur->aiIdx[iPage],&pCur->info);
+ pCur->validNKey = 1;
+ }else{
+ assertCellInfo(pCur);
+ }
+ }
+#else /* if not _MSC_VER */
+ /* Use a macro in all other compilers so that the function is inlined */
+#define getCellInfo(pCur) \
+ if( pCur->info.nSize==0 ){ \
+ int iPage = pCur->iPage; \
+ sqlite3BtreeParseCell(pCur->apPage[iPage],pCur->aiIdx[iPage],&pCur->info); \
+ pCur->validNKey = 1; \
+ }else{ \
+ assertCellInfo(pCur); \
+ }
+#endif /* _MSC_VER */
+
+/*
+** Set *pSize to the size of the buffer needed to hold the value of
+** the key for the current entry. If the cursor is not pointing
+** to a valid entry, *pSize is set to 0.
+**
+** For a table with the INTKEY flag set, this routine returns the key
+** itself, not the number of bytes in the key.
+*/
+SQLITE_PRIVATE int sqlite3BtreeKeySize(BtCursor *pCur, i64 *pSize){
+ int rc;
+
+ assert( cursorHoldsMutex(pCur) );
+ rc = restoreCursorPosition(pCur);
+ if( rc==SQLITE_OK ){
+ assert( pCur->eState==CURSOR_INVALID || pCur->eState==CURSOR_VALID );
+ if( pCur->eState==CURSOR_INVALID ){
+ *pSize = 0;
+ }else{
+ getCellInfo(pCur);
+ *pSize = pCur->info.nKey;
+ }
+ }
+ return rc;
+}
+
+/*
+** Set *pSize to the number of bytes of data in the entry the
+** cursor currently points to. Always return SQLITE_OK.
+** Failure is not possible. If the cursor is not currently
+** pointing to an entry (which can happen, for example, if
+** the database is empty) then *pSize is set to 0.
+*/
+SQLITE_PRIVATE int sqlite3BtreeDataSize(BtCursor *pCur, u32 *pSize){
+ int rc;
+
+ assert( cursorHoldsMutex(pCur) );
+ rc = restoreCursorPosition(pCur);
+ if( rc==SQLITE_OK ){
+ assert( pCur->eState==CURSOR_INVALID || pCur->eState==CURSOR_VALID );
+ if( pCur->eState==CURSOR_INVALID ){
+ /* Not pointing at a valid entry - set *pSize to 0. */
+ *pSize = 0;
+ }else{
+ getCellInfo(pCur);
+ *pSize = pCur->info.nData;
+ }
+ }
+ return rc;
+}
+
+/*
+** Given the page number of an overflow page in the database (parameter
+** ovfl), this function finds the page number of the next page in the
+** linked list of overflow pages. If possible, it uses the auto-vacuum
+** pointer-map data instead of reading the content of page ovfl to do so.
+**
+** If an error occurs an SQLite error code is returned. Otherwise:
+**
+** The page number of the next overflow page in the linked list is
+** written to *pPgnoNext. If page ovfl is the last page in its linked
+** list, *pPgnoNext is set to zero.
+**
+** If ppPage is not NULL, and a reference to the MemPage object corresponding
+** to page number pOvfl was obtained, then *ppPage is set to point to that
+** reference. It is the responsibility of the caller to call releasePage()
+** on *ppPage to free the reference. In no reference was obtained (because
+** the pointer-map was used to obtain the value for *pPgnoNext), then
+** *ppPage is set to zero.
+*/
+static int getOverflowPage(
+ BtShared *pBt,
+ Pgno ovfl, /* Overflow page */
+ MemPage **ppPage, /* OUT: MemPage handle (may be NULL) */
+ Pgno *pPgnoNext /* OUT: Next overflow page number */
+){
+ Pgno next = 0;
+ MemPage *pPage = 0;
+ int rc = SQLITE_OK;
+
+ assert( sqlite3_mutex_held(pBt->mutex) );
+ assert(pPgnoNext);
+
+#ifndef SQLITE_OMIT_AUTOVACUUM
+ /* Try to find the next page in the overflow list using the
+ ** autovacuum pointer-map pages. Guess that the next page in
+ ** the overflow list is page number (ovfl+1). If that guess turns
+ ** out to be wrong, fall back to loading the data of page
+ ** number ovfl to determine the next page number.
+ */
+ if( pBt->autoVacuum ){
+ Pgno pgno;
+ Pgno iGuess = ovfl+1;
+ u8 eType;
+
+ while( PTRMAP_ISPAGE(pBt, iGuess) || iGuess==PENDING_BYTE_PAGE(pBt) ){
+ iGuess++;
+ }
+
+ if( iGuess<=pagerPagecount(pBt) ){
+ rc = ptrmapGet(pBt, iGuess, &eType, &pgno);
+ if( rc==SQLITE_OK && eType==PTRMAP_OVERFLOW2 && pgno==ovfl ){
+ next = iGuess;
+ rc = SQLITE_DONE;
+ }
+ }
+ }
+#endif
+
+ if( rc==SQLITE_OK ){
+ rc = sqlite3BtreeGetPage(pBt, ovfl, &pPage, 0);
+ assert(rc==SQLITE_OK || pPage==0);
+ if( next==0 && rc==SQLITE_OK ){
+ next = get4byte(pPage->aData);
+ }
+ }
+
+ *pPgnoNext = next;
+ if( ppPage ){
+ *ppPage = pPage;
+ }else{
+ releasePage(pPage);
+ }
+ return (rc==SQLITE_DONE ? SQLITE_OK : rc);
+}
+
+/*
+** Copy data from a buffer to a page, or from a page to a buffer.
+**
+** pPayload is a pointer to data stored on database page pDbPage.
+** If argument eOp is false, then nByte bytes of data are copied
+** from pPayload to the buffer pointed at by pBuf. If eOp is true,
+** then sqlite3PagerWrite() is called on pDbPage and nByte bytes
+** of data are copied from the buffer pBuf to pPayload.
+**
+** SQLITE_OK is returned on success, otherwise an error code.
+*/
+static int copyPayload(
+ void *pPayload, /* Pointer to page data */
+ void *pBuf, /* Pointer to buffer */
+ int nByte, /* Number of bytes to copy */
+ int eOp, /* 0 -> copy from page, 1 -> copy to page */
+ DbPage *pDbPage /* Page containing pPayload */
+){
+ if( eOp ){
+ /* Copy data from buffer to page (a write operation) */
+ int rc = sqlite3PagerWrite(pDbPage);
+ if( rc!=SQLITE_OK ){
+ return rc;
+ }
+ memcpy(pPayload, pBuf, nByte);
+ }else{
+ /* Copy data from page to buffer (a read operation) */
+ memcpy(pBuf, pPayload, nByte);
+ }
+ return SQLITE_OK;
+}
+
+/*
+** This function is used to read or overwrite payload information
+** for the entry that the pCur cursor is pointing to. If the eOp