mirror of
https://github.com/mdbtools/mdbtools.git
synced 2025-04-05 14:11:01 +08:00
710 lines
39 KiB
Plaintext
710 lines
39 KiB
Plaintext
This file documents the Microsoft MDB file format for Jet3 and Jet4 databases.
|
|
|
|
General Notes
|
|
-------------
|
|
|
|
Access (Jet) does not in general initialize pages to zero before writing them,
|
|
so the file will contains a lot of unititialized data. This makes the task of
|
|
figuring out the format a bit more difficult than it otherwise would be.
|
|
|
|
This document will, generally speaking, provide all offsets and constants in
|
|
hex format.
|
|
|
|
Most multibyte pointer and integers are stored in little endian (LSB-MSB) order.
|
|
There is an exception in the case of indexes, see the section on index pages for
|
|
details.
|
|
|
|
Terminology
|
|
-----------
|
|
|
|
This section contains a mix of information about data structures used in the MDB
|
|
file format along with general database terminology needed to explain these
|
|
structures.
|
|
|
|
Page - A fixed size region within the file on a 2 or 4K boundry. All
|
|
data in the file exists inside pages.
|
|
Catalog Table - Tables in Access generally starting with "MSys". See the TDEF
|
|
(table definition) pages for "System Table" field.
|
|
Catalog Entry - A row from the MSysObjects table describing another database
|
|
object. The MSysObjects table definition page is always at
|
|
page 2 of the database, and a phony tdef structure is
|
|
bootstrapped to initially read the database.
|
|
Page Split - A process in which a row is added to a page with no space left.
|
|
A second page is allocated and rows on the original page are
|
|
split between the two pages and then indexes are updated. Pages
|
|
can use a variety of algorithms for splitting the rows, the
|
|
most popular being a 50/50 split in which rows are divided
|
|
evenly between pages.
|
|
Overflow Page - Instead of doing a full page split with associated index writes,
|
|
a pointer to an "overflow" page can be stored at the original
|
|
row's location. Compacting a database would normally rewrite
|
|
overflow pages back into regular pages.
|
|
Leaf Page - The lowest page on an index tree. In Access, leaf pages are of
|
|
a different type than other index pages.
|
|
UCS-2 - a two byte unicode encoding used in Jet4 files.
|
|
Covered Query - a query that can be satisfied by reading only index pages. For
|
|
instance if the query
|
|
"SELECT count(*) from Table1 where Column3 = 4" were run and
|
|
Column3 was indexed, the query could be satisfied by reading
|
|
only indexes. Because of the way Access hashes text columns
|
|
in indexes, covered queries on text columns are not possible.
|
|
|
|
Pages
|
|
-----
|
|
|
|
At it's topmost level MDB files are organized into a series of fixed sized
|
|
pages. These are 2K in size for Jet3 (Access 97) and 4K for Jet4 (Access
|
|
2000/2002). All data in MDB files exists within pages, of which there are
|
|
a number of types.
|
|
|
|
The first byte of each page idenitifies the page type as follows.
|
|
|
|
0x00 Database definition page. (Always page 0)
|
|
0x01 Data page
|
|
0x02 Table definition
|
|
0x03 Intermediate Index pages
|
|
0x04 Leaf Index pages
|
|
0x05 Page Usage Bitmaps (extended page usage)
|
|
|
|
Database Definition Page
|
|
------------------------
|
|
|
|
Each MDB database has a single definition page located at beginning of the file.
|
|
Not a lot is known about this page, and it is one of the least documented page
|
|
types. However, it contains things like Jet version, encryption keys, and name
|
|
of the creating program.
|
|
|
|
Offset 0x14 contains the Jet version of this database 0x00 for 3, 0x01 for 4
|
|
This is used by the mdb-ver utility to determine the Jet version.
|
|
|
|
Data Pages
|
|
----------
|
|
|
|
All data rows are stored in type 0x01 pages.
|
|
|
|
The header of a Jet3 data page looks like this:
|
|
|
|
+--------------------------------------------------------------------------+
|
|
| Jet3 Data Page Definition |
|
|
+------+---------+---------------------------------------------------------+
|
|
| data | length | name | description |
|
|
+------+---------+---------------------------------------------------------+
|
|
| 0x01 | 1 byte | page_type | 0x01 indicates a data page. |
|
|
| 0x01 | 1 byte | unknown | |
|
|
| ???? | 2 bytes | free_space | Free space in this page |
|
|
| ???? | 4 bytes | tdef_pg | Page pointer to table definition |
|
|
| ???? | 4 bytes | num_rows | number of records on this page |
|
|
+------+---------+---------------------------------------------------------+
|
|
| Iterate for the number of records |
|
|
+--------------------------------------------------------------------------+
|
|
| ???? | 2 bytes | offset_row | The records location on this page |
|
|
+--------------------------------------------------------------------------+
|
|
|
|
In Jet4, an additional four byte field was added. It's purpose is currently
|
|
unknown.
|
|
|
|
+--------------------------------------------------------------------------+
|
|
| Jet4 Data Page Definition |
|
|
+------+---------+---------------------------------------------------------+
|
|
| data | length | name | description |
|
|
+------+---------+---------------------------------------------------------+
|
|
| 0x01 | 1 byte | page_type | 0x01 indicates a data page. |
|
|
| 0x01 | 1 byte | unknown | |
|
|
| ???? | 2 bytes | free_space | Free space in this page |
|
|
| ???? | 4 bytes | tdef_pg | Page pointer to table definition |
|
|
| ???? | 4 bytes | unknown | Unknown |
|
|
| ???? | 4 bytes | num_rows | number of records on this page |
|
|
+------+---------+---------------------------------------------------------+
|
|
| Iterate for the number of records |
|
|
+--------------------------------------------------------------------------+
|
|
| ???? | 2 bytes | offset_row | The records location on this page |
|
|
+--------------------------------------------------------------------------+
|
|
|
|
Notes for offset_row:
|
|
- Offsets that have 0x40 in the high order byte point to a location within
|
|
the page where a Data Pointer (4 bytes) to another data page is stored. Also
|
|
known as an overflow page.
|
|
- Offsets that have 0x80 in the high order byte are deleted rows.
|
|
(These flags are delflag and lookupflag in source code)
|
|
|
|
Rows are stored from the end of the page to the top of the page. So, the first
|
|
row stored runs from bytes offset_row to page_size - 1. The next row runs from
|
|
its offset to the previous row's offset, and so on.
|
|
|
|
Decoding a row requires knowing the number and types of columns from its TDEF
|
|
page. Decoding is handled by the routine mdb_crack_row().
|
|
|
|
The Jet3 row format is:
|
|
|
|
+--------------------------------------------------------------------------+
|
|
| Jet3 Row Definition |
|
|
+------+---------+---------------------------------------------------------+
|
|
| data | length | name | description |
|
|
+------+---------+---------------------------------------------------------+
|
|
| ???? | 1 byte | num_cols | Number of columns stored on this row. |
|
|
| ???? | n bytes | fixed_cols | Fixed length columns |
|
|
| ???? | n bytes | var_cols | Variable length columns |
|
|
| ???? | 1 byte | eod | length of data from begining of record |
|
|
| ???? | n bytes | var_table[]| offset from start of row for each var_col |
|
|
| ???? | 1 byte | var_len | number of variable length columns |
|
|
| ???? | n bytes | jump_table | number of variable length columns |
|
|
| ???? | n bytes | null_mask | Null indicator. size is 1 byte per 8 cols |
|
|
| | | | 0 indicates a null value. Also used to |
|
|
| | | | represent value of boolean type columns |
|
|
+--------------------------------------------------------------------------+
|
|
|
|
Notes:
|
|
|
|
. A row will always have the number of fixed columns as specified in the table
|
|
definition, but may have less variable columns, as rows are not updated when
|
|
columns are added.
|
|
. All fixed length columns are stored first to last, followed by variable length
|
|
columns.
|
|
. The size of the null table is computed by (num_cols - 1)/8 + 1
|
|
. Fixed columns can be null (unlike some other databases).
|
|
. The var_len field indicates the size of the var_table[].
|
|
. The eod field points at the last byte of the var_cols field. It is used to
|
|
determine where the last var_col ends.
|
|
. For boolean fixed columns, the values are in null_table[]: 0 indicates a false
|
|
value, 1 indicates a true value
|
|
. An 0xFF stored in the var_table indicates that this column has been deleted.
|
|
|
|
In Jet3 offsets are stored as 1 byte fields yielding a maximum of 256 bytes. To
|
|
get around this offsets are computed using a jump table. The jump table stores
|
|
the number of the first column in this jump segment. If the size of the data is
|
|
less than 256 then no jump table will be present.
|
|
|
|
For example if the row contains 45 columns and the offset of the 14th column is
|
|
more than 256 then the first entry in the jump table will be 0xe (14). If the
|
|
23rd column is the first one at offset > 512 the second entry of the jump table
|
|
would be 0x17 (23) and so on.
|
|
|
|
+--------------------------------------------------------------------------+
|
|
| Jet4 Row Definition |
|
|
+------+---------+---------------------------------------------------------+
|
|
| data | length | name | description |
|
|
+------+---------+---------------------------------------------------------+
|
|
| ???? | 2 bytes | num_cols | Number of columns stored on this row. |
|
|
| ???? | n bytes | fixed_cols | Fixed length columns |
|
|
| ???? | n bytes | var_cols | Variable length columns |
|
|
| ???? | 2 bytes | eod | length of data from begining of record |
|
|
| ???? | n bytes | var_table[]| offset from start of row for each var_col |
|
|
| ???? | 2 bytes | var_len | number of variable length columns |
|
|
| ???? | n bytes | null_mask | Null indicator. size is 1 byte per 8 cols |
|
|
| | | | 0 indicates a null value. Also used to |
|
|
| | | | represent value of bit type columns |
|
|
+--------------------------------------------------------------------------+
|
|
|
|
Notes:
|
|
|
|
. All offsets are stored as 2 byte fields including the var_table entries.
|
|
. the jump table was (thankfully) ditched in Jet4.
|
|
|
|
Each memo column (or other long binary data) in a row
|
|
|
|
+-------------------------------------------------------------------------+
|
|
| Memo Field Definition (12 bytes)
|
|
+------+---------+-------------+------------------------------------------+
|
|
| data | length | name | description |
|
|
+------+---------+-------------+------------------------------------------+
|
|
| ???? | 2 bytes | memo_len | Total length of the memo |
|
|
| ???? | 2 bytes | bitmask | See values |
|
|
| ???? | 4 bytes | lval_dp | Data pointer to LVAL page (if needed) |
|
|
| 0x00 | 4 bytes | unknown | |
|
|
+------+---------+-------------+------------------------------------------+
|
|
Values for the bitmask:
|
|
|
|
0x8000= the memo is in a string at the end of this header (memo_len bytes)
|
|
0x4000= the memo is in a unique LVAL page in a record type 1
|
|
0x0000= the memo is in n LVAL pages in a record type 2
|
|
|
|
If the memo is in a LVAL page, we use row_id of lval_dp to find the row.
|
|
offset_start of memo = (int16*) LVAL_page[ 10 + row_id * 2]
|
|
if (rowid=0)
|
|
offset_stop of memo = 2048
|
|
else
|
|
offset_stop of memo = (int16*) LVAL_page[ 10 + row_id * 2 - 2]
|
|
|
|
The length (partial if type 2) for the memo is:
|
|
memo_page_len = offset_stop - offset_start
|
|
|
|
LVAL Pages
|
|
----------
|
|
(LVAL Page are particular data pages for long data storages )
|
|
|
|
The header of a LVAL page looks like this (10 bytes) :
|
|
+------+---------+-------------+------------------------------------------+
|
|
| data | length | name | description |
|
|
+------+---------+-------------+------------------------------------------+
|
|
| 0x01 | 1 bytes | page_type | 0x01 indicate a data page |
|
|
| 0x01 | 1 bytes | unknown | |
|
|
| ???? | 2 bytes | free_space | The free space in this page |
|
|
| LVAL | 4 bytes | lval_id | The word 'LVAL' |
|
|
| ???? | 2 bytes | num_rows | Number of rows in this page |
|
|
+-------------------------------------------------------------------------+
|
|
| Iterate for the number of records |
|
|
+-------------------------------------------------------------------------+
|
|
| ???? | 2 bytes | row_offset | to the records location on this page |
|
|
+-------------------------------------------------------------------------+
|
|
|
|
Each memo record type 1 looks like this:
|
|
+------+---------+-------------+------------------------------------------+
|
|
| data | length | name | description |
|
|
+------+---------+-------------+------------------------------------------+
|
|
| ???? | n bytes | memo_value | A string which is the memo |
|
|
+-------------------------------------------------------------------------+
|
|
|
|
Each memo record type 2 looks like this:
|
|
+------+---------+-------------+------------------------------------------+
|
|
| data | length | name | description |
|
|
+------+---------+-------------+------------------------------------------+
|
|
| ???? | 4 bytes | lval_dp | Next page LVAL type 2 if memo is too long|
|
|
| ???? | n bytes | memo_value | A string which is the memo (partial) |
|
|
+-------------------------------------------------------------------------+
|
|
|
|
In a LVAL type 2 data page, you have
|
|
10 or 12 bytes for the header of the data page,
|
|
2 bytes for an offset,
|
|
4 bytes for the next lval_pg
|
|
|
|
So there is a bloc of 2048 - (10+2+4) = 2032(jet3)
|
|
or 4096 - (12+2+4) = 4078(jet4) bytes max in a page.
|
|
|
|
TDEF Pages (Table Definition)
|
|
-----------------------------
|
|
|
|
Every table in the database has a TDEF page. It contains a definition of
|
|
the columns, types, sizes, indexes, and similar information.
|
|
|
|
+-------------------------------------------------------------------------+
|
|
| Jet3/Jet4 TDEF Header
|
|
+------+---------+-------------+------------------------------------------+
|
|
| data | length | name | description |
|
|
+------+---------+-------------+------------------------------------------+
|
|
| 0x02 | 1 bytes | page_type | 0x02 indicate a tabledef page |
|
|
| 0x01 | 1 bytes | unknown | |
|
|
| 'VC' | 2 bytes | tdef_id | The word 'VC' (Jet3 only, Jet4 unknown) |
|
|
| 0x00 | 4 bytes | next_pg | Next tdef page pointer (0 if none) |
|
|
+------+---------+-------------+------------------------------------------+
|
|
|
|
TDEFs can span multiple pages for large tables, this is accomplished using the
|
|
next_pg field.
|
|
|
|
|
|
+-------------------------------------------------------------------------+
|
|
| Jet3 Table Definition Block (35 bytes) |
|
|
+------+---------+-------------+------------------------------------------+
|
|
| data | length | name | description |
|
|
+------+---------+-------------+------------------------------------------+
|
|
| ???? | 4 bytes | tdef_len | Length of the data for this page |
|
|
| ???? | 4 bytes | num_rows | Number of records in this table |
|
|
| 0x00 | 4 bytes | autonumber | value for the next value of the |
|
|
| | | | autonumber column, if any. 0 otherwise |
|
|
| 0x4e | 1 byte | table_type | 0x53: user table, 0x4e: system table |
|
|
| ???? | 2 bytes | max_cols | Max columns a row will have (deletions) |
|
|
| ???? | 2 bytes | num_var_cols| Number of variable columns in table |
|
|
| ???? | 2 bytes | num_cols | Number of columns in table (repeat) |
|
|
| ???? | 4 bytes | num_idx | Number of indexes in table |
|
|
| ???? | 4 bytes | num_real_idx| Number of indexes in table (repeat) |
|
|
| ???? | 4 bytes | used_pages | Points to a record containing the |
|
|
| | | | usage bitmask for this table. |
|
|
| ???? | 4 bytes | free_pages | Points to a similar record as above, |
|
|
| | | | listing pages which contain free space. |
|
|
+-------------------------------------------------------------------------+
|
|
| Iterate for the number of num_real_idx (8 bytes per idxs) |
|
|
+-------------------------------------------------------------------------+
|
|
| 0x00 | 4 bytes | ??? | |
|
|
| ???? | 4 bytes | num_idx_rows| (not sure) |
|
|
+-------------------------------------------------------------------------+
|
|
| Iterate for the number of num_cols (18 bytes per column) |
|
|
+-------------------------------------------------------------------------+
|
|
| ???? | 1 byte | col_type | Column Type (see table below) |
|
|
| ???? | 2 bytes | col_num | Column Number, (not always) |
|
|
| ???? | 2 bytes | offset_V | Offset for variable length columns |
|
|
| ???? | 4 bytes | ??? | |
|
|
| ???? | 4 bytes | ??? | |
|
|
| ???? | 1 byte | bitmask | low order bit indicates variable columns |
|
|
| ???? | 2 bytes | offset_F | Offset for fixed length columns |
|
|
| ???? | 2 bytes | col_len | Length of the column (0 if memo) |
|
|
+-------------------------------------------------------------------------+
|
|
| Iterate for the number of num_cols (n bytes per column) |
|
|
+-------------------------------------------------------------------------+
|
|
| ???? | 1 byte | col_name_len| len of the name of the column |
|
|
| ???? | n bytes | col_name | Name of the column |
|
|
+-------------------------------------------------------------------------+
|
|
| Iterate for the number of num_real_idx (30+9 = 39 bytes) |
|
|
+-------------------------------------------------------------------------+
|
|
| Iterate 10 times for 10 possible columns (10*3 = 30 bytes) |
|
|
+-------------------------------------------------------------------------+
|
|
| ???? | 2 bytes | col_num | number of a column (0xFFFF= none) |
|
|
| ???? | 1 byte | col_order | 0x01 = ascendency order |
|
|
+-------------------------------------------------------------------------+
|
|
| ???? | 4 bytes | unknown | |
|
|
| ???? | 4 bytes | first_dp | Data pointer of the index page |
|
|
| ???? | 1 byte | flags | See flags table for indexes |
|
|
+-------------------------------------------------------------------------+
|
|
| Iterate for the number of num_real_idx |
|
|
+-------------------------------------------------------------------------+
|
|
| ???? | 4 bytes | index_num | Number of the index |
|
|
| | | |(warn: not always in the sequential order)|
|
|
| ???? | 4 bytes | index_num2 | Number of the index (repeat) |
|
|
| 0xFF | 4 bytes | ??? | |
|
|
| 0x00 | 4 bytes | ??? | |
|
|
| 0x04 | 2 bytes | ??? | |
|
|
| ???? | 1 byte | primary_key | 0x01 if this index is primary |
|
|
+-------------------------------------------------------------------------+
|
|
| Iterate for the number of num_real_idx |
|
|
+-------------------------------------------------------------------------+
|
|
| ???? | 1 byte | idx_name_len| len of the name of the index |
|
|
| ???? | n bytes | idx_name | Name of the index |
|
|
+-------------------------------------------------------------------------+
|
|
| ???? | n bytes | ??? | |
|
|
| 0xFF | 2 bytes | ??? | End of the tableDef ? |
|
|
+-------------------------------------------------------------------------+
|
|
|
|
+-------------------------------------------------------------------------+
|
|
| Jet4 Table Definition Block (55 bytes) |
|
|
+------+---------+-------------+------------------------------------------+
|
|
| data | length | name | description |
|
|
+------+---------+-------------+------------------------------------------+
|
|
| ???? | 4 bytes | tdef_len | Length of the data for this page |
|
|
| ???? | 4 bytes | unknown | unknown |
|
|
| ???? | 4 bytes | num_rows | Number of records in this table |
|
|
| 0x00 | 4 bytes | autonumber | value for the next value of the |
|
|
| | | | autonumber column, if any. 0 otherwise |
|
|
| ???? |16 bytes | unknown | unknown |
|
|
| 0x4e | 1 byte | table_type | 0x53: user table, 0x4e: system table |
|
|
| ???? | 2 bytes | max_cols | Max columns a row will have (deletions) |
|
|
| ???? | 2 bytes | num_var_cols| Number of variable columns in table |
|
|
| ???? | 2 bytes | num_cols | Number of columns in table (repeat) |
|
|
| ???? | 4 bytes | num_idx | Number of indexes in table |
|
|
| ???? | 4 bytes | num_real_idx| Number of indexes in table (repeat) |
|
|
| ???? | 4 bytes | used_pages | Points to a record containing the |
|
|
| | | | usage bitmask for this table. |
|
|
| ???? | 4 bytes | free_pages | Points to a similar record as above, |
|
|
| | | | listing pages which contain free space. |
|
|
+-------------------------------------------------------------------------+
|
|
| Iterate for the number of num_real_idx (12 bytes per idxs) |
|
|
+-------------------------------------------------------------------------+
|
|
| 0x00 | 4 bytes | ??? | |
|
|
| ???? | 4 bytes | num_idx_rows| (not sure) |
|
|
| 0x00 | 4 bytes | ??? | |
|
|
+-------------------------------------------------------------------------+
|
|
| Iterate for the number of num_cols (25 bytes per column) |
|
|
+-------------------------------------------------------------------------+
|
|
| ???? | 1 byte | col_type | Column Type (see table below) |
|
|
| ???? | 4 bytes | unknown | matches first unknown definition block |
|
|
| ???? | 2 bytes | col_num | Column Number |
|
|
| ???? | 2 bytes | offset_V | Offset for variable length columns |
|
|
| ???? | 2 bytes | col_num | Column Number (repeat) |
|
|
| ???? | 4 bytes | ??? | |
|
|
| ???? | 1 byte | bitmask | low order bit indicates variable columns |
|
|
| ???? | 1 byte | ??? | seems to be 1 when variable len |
|
|
| 0000 | 4 bytes | ??? | |
|
|
| ???? | 2 bytes | offset_F | Offset for fixed length columns |
|
|
| ???? | 2 bytes | col_len | Length of the column (0 if memo) |
|
|
+-------------------------------------------------------------------------+
|
|
| Iterate for the number of num_cols (n*2 bytes per column) |
|
|
+-------------------------------------------------------------------------+
|
|
| ???? | 2 bytes | col_name_len| len of the name of the column |
|
|
| ???? | n bytes | col_name | Name of the column (UCS-2 format) |
|
|
+-------------------------------------------------------------------------+
|
|
| Iterate for the number of num_real_idx (30+9 = 39 bytes) |
|
|
+-------------------------------------------------------------------------+
|
|
| ???? | 4 bytes | ??? | |
|
|
+-------------------------------------------------------------------------+
|
|
| Iterate 10 times for 10 possible columns (10*3 = 30 bytes) |
|
|
+-------------------------------------------------------------------------+
|
|
| ???? | 2 bytes | col_num | number of a column (0xFFFF= none) |
|
|
| ???? | 1 byte | col_order | 0x01 = ascendency order |
|
|
+-------------------------------------------------------------------------+
|
|
| ???? | 4 bytes | unknown | |
|
|
| ???? | 4 bytes | first_dp | Data pointer of the index page |
|
|
| ???? | 1 byte | flags | See flags table for indexes |
|
|
| ???? | 9 bytes | unknown | |
|
|
+-------------------------------------------------------------------------+
|
|
| Iterate for the number of num_real_idx (27 bytes) |
|
|
+-------------------------------------------------------------------------+
|
|
| ???? | 4 bytes | unknown | matches first unknown definition block |
|
|
| ???? | 4 bytes | index_num | Number of the index |
|
|
| | | |(warn: not always in the sequential order)|
|
|
| ???? | 4 bytes | index_num2 | Number of the index (repeat) |
|
|
| 0xFF | 4 bytes | ??? | |
|
|
| 0x00 | 4 bytes | ??? | |
|
|
| 0x04 | 2 bytes | ??? | |
|
|
| ???? | 1 byte | primary_key | 0x01 if this index is primary |
|
|
| ???? | 4 bytes | unknown | |
|
|
+-------------------------------------------------------------------------+
|
|
| Iterate for the number of num_real_idx |
|
|
+-------------------------------------------------------------------------+
|
|
| ???? | 2 bytes | idx_name_len| len of the name of the index |
|
|
| ???? | n bytes | idx_name | Name of the index (UCS-2) |
|
|
+-------------------------------------------------------------------------+
|
|
| ???? | n bytes | ??? | |
|
|
| 0xFF | 2 bytes | ??? | End of the tableDef ? |
|
|
+-------------------------------------------------------------------------+
|
|
|
|
Index flags (not complete):
|
|
0x01 Unique
|
|
0x02 IgnoreNuls
|
|
0x08 Required
|
|
|
|
|
|
Column Type may be one of the following (not complete):
|
|
|
|
BOOL = 0x01 /* boolean ( 1 bit ) */
|
|
BYTE = 0x02 /* byte ( 8 bits ) */
|
|
INT = 0x03 /* Integer (16 bits ) */
|
|
LONGINT = 0x04 /* Long Integer (32 bits ) */
|
|
MONEY = 0x05 /* Currency ( 8 bytes) */
|
|
FLOAT = 0x06 /* Single ( 4 bytes) */
|
|
DOUBLE = 0x07 /* Double ( 8 bytes) */
|
|
SDATETIME = 0x08 /* Short Date/Time ( 8 bytes) */
|
|
BINARY = 0x09 /* binay (255 bytes) */
|
|
TEXT = 0x0A /* Text (255 bytes) */
|
|
OLE = 0x0B /* OLE */
|
|
MEMO = 0x0C /* Memo, Hyperlink */
|
|
UNKNOWN_0D = 0x0D
|
|
REPID = 0x0F /* GUID */
|
|
|
|
Notes on deleted columns:
|
|
|
|
If a fixed length column is deleted the offset_F field will contain the values
|
|
of the original row definition. Thus is the number of columns on the row does
|
|
not match the number in the tdef, the offset_F field could be used to return
|
|
the proper data. No idea how multiple column deletions are handled, my instinct
|
|
is that Access can only handle a single column deletion event (which may involve
|
|
more than one column if new rows have not been added) without touching all the
|
|
rows.
|
|
|
|
Don't know if variable columns work by the same mechanism yet. Stay tuned.
|
|
|
|
Page Usage Maps
|
|
---------------
|
|
|
|
There are three uses for the page usage bitmaps. There is a global page usage
|
|
stored on page 1 which tracks allocated pages throughout the database.
|
|
|
|
Tables store two page usage bitmaps. One is straight map of which pages are
|
|
owned by the table. The second is a map of the pages owned by the table which
|
|
have free space on them (used for inserting data).
|
|
|
|
The table bitmaps appear to be of a fixed size for both Jet 3
|
|
and 4 (128 and 64 bytes respectively). The first byte of the map is a type
|
|
field.
|
|
|
|
Type 0 page usage map definition follows:
|
|
|
|
+--------------------------------------------------------------------------+
|
|
| Type 0 Page Usage Map |
|
|
+------+---------+---------------------------------------------------------+
|
|
| data | length | name | description |
|
|
+------+---------+---------------------------------------------------------+
|
|
| 0x00 | 1 byte | map_type | 0x00 indicates map stored within. |
|
|
| ???? | 4 byte | page_start | first page for which this map applies |
|
|
+------+---------+---------------------------------------------------------+
|
|
| Iterate for the length of map |
|
|
+--------------------------------------------------------------------------+
|
|
| ???? | 1 byte | bitmap | each bit encodes the allocation status of a|
|
|
| | | | page. 1 indicates allocated to this table. |
|
|
| | | | Pages are stored from msb to lsb. |
|
|
+--------------------------------------------------------------------------+
|
|
|
|
If you're paying attention then you'll realize that the relatively small size of
|
|
the map (128*8*2048 or 64*8*4096 = 2 Meg) means that this scheme won't work with
|
|
larger database files although the initial start page helps a bit. To overcome
|
|
this there is a second page usage map scheme with the map_type of 0x01 as
|
|
follows:
|
|
|
|
+--------------------------------------------------------------------------+
|
|
| Type 1 Page Usage Map |
|
|
+------+---------+---------------------------------------------------------+
|
|
| data | length | name | description |
|
|
+------+---------+---------------------------------------------------------+
|
|
| 0x01 | 1 byte | map_type | 0x01 indicates this is a indirection list. |
|
|
+------+---------+---------------------------------------------------------+
|
|
| Iterate for the length of map |
|
|
+--------------------------------------------------------------------------+
|
|
| ???? | 4 bytes | map_page | pointer to page type 0x05 containing map |
|
|
+--------------------------------------------------------------------------+
|
|
|
|
Note that the intial start page is gone and is reused for the first page
|
|
indirection. The 0x05 type page header looks like:
|
|
|
|
+--------------------------------------------------------------------------+
|
|
| Usage Map Page (type 0x05) |
|
|
+------+---------+---------------------------------------------------------+
|
|
| data | length | name | description |
|
|
+------+---------+---------------------------------------------------------+
|
|
| 0x05 | 1 byte | page_type | allocation map page |
|
|
| 0x01 | 1 byte | unknown | always 1 as with other page types |
|
|
| 0x00 | 2 bytes | unknown | |
|
|
+------+---------+---------------------------------------------------------+
|
|
|
|
The rest of the page is the allocation bitmap following the same scheme (lsb
|
|
to msb order, 1 bit per page) as a type 0 map. This yields a maximum of
|
|
2044*8=16352 (jet3) or 4092*8 = 32736 (jet4) pages mapped per type 0x05 page.
|
|
Given 128/4+1 = 33 or 64/4+1 = 17 page pointers per indirection row (remember
|
|
the start page field is reused, thus the +1), this yields 33*16352*2048 = 1053
|
|
Meg (jet3) or 17*32736*4096 = 2173 Meg (jet4) or enough to cover the maximum
|
|
size of each of the database formats comfortably, so there is no reason to
|
|
believe any other page map schemes exist.
|
|
|
|
Indices
|
|
-------
|
|
|
|
Indices are not completely understood but here is what we know.
|
|
|
|
+-------------------------------------------------------------------------+
|
|
| Index Page (type 0x03) |
|
|
+------+---------+-------------+------------------------------------------+
|
|
| data | length | name | description |
|
|
+------+---------+-------------+------------------------------------------+
|
|
| 0x01 | 1 bytes | page_type | 0x03 indicate an index page |
|
|
| 0x01 | 1 bytes | unknown | |
|
|
| ???? | 2 bytes | free_space | The free space at the end this page |
|
|
| ???? | 4 bytes | parent_page | The page number of the TDEF for this idx |
|
|
| ???? | 4 bytes | prev_page | Previous page at this index level |
|
|
| ???? | 4 bytes | next_page | Next page at this index level |
|
|
| ???? | 4 bytes | leaf_page | Pointer to leaf page, purpose unknown |
|
|
+-------------------------------------------------------------------------+
|
|
|
|
Index pages come in two flavors.
|
|
|
|
0x04 pages are leaf pages which contain one entry for each row in the table.
|
|
Each entry is composed of a flag, the indexed column values and a page/row
|
|
pointer to the data.
|
|
|
|
0x03 index pages make up the rest of the index tree and contain a flag, the
|
|
indexed columns, the page/row contains this entry, and the leaf page or
|
|
intermediate (another 0x03 page) page pointer for which this is the first
|
|
entry on.
|
|
|
|
Both index types have a bitmask starting at 0x16 which identifies the starting
|
|
location of each index entry on this page. The first entry is assumed and
|
|
the count starts from the low order bit. For example take the data:
|
|
|
|
00 20 00 04 80 00 ...
|
|
|
|
This first entry starts at 0xf8 (always). Convert the bytes to binary starting
|
|
with the low order bit and stopping at the first "on" bit:
|
|
|
|
0000 0000 0000 01
|
|
-- 00 --- -- 20 -->
|
|
This next entry starts 14 (0xe) bytes in at 0x105. Proceding from here, the next
|
|
entry:
|
|
|
|
00 0000 0000 001
|
|
<-- 20 -- -- 00 --- -- 04
|
|
|
|
starts 13 (0xd) bytes further in at 0x112. The final entry starts at
|
|
|
|
0 0000 0000 0001
|
|
<-- 04 -- -- 80 ---
|
|
|
|
or 13 (0xd) bytes more at 0x120. In this example the rest of the mask (up
|
|
to offset 0xf8) would be zero filled and thus this last entry at 0x120 isn't
|
|
an actual entry but the stopping point of the data.
|
|
|
|
Since 0xf8 = 248 and 0x16 = 22, (248 - 22) * 8 = 1808 and 2048 - 1808 = 240
|
|
leaving just enough space for the bit mask to encode the remainder of the page.
|
|
One wonders why MS didn't use a row offset table like they did on data pages,
|
|
seems like it would have been easier and more flexible.
|
|
|
|
So now we come to the index entries for type 0x03 pages which look like this:
|
|
|
|
+-------------------------------------------------------------------------+
|
|
| Index Record |
|
|
+------+---------+-------------+------------------------------------------+
|
|
| data | length | name | description |
|
|
+------+---------+-------------+------------------------------------------+
|
|
| 0x7f | 1 byte | flags | 0x80 LSB, 0x7f MSB, 0x00 null? |
|
|
| ???? | variable| indexed cols| indexed column data |
|
|
| ???? | 3 bytes | data page | page containing row referred to by this |
|
|
| | | | index entry |
|
|
| ???? | 1 byte | data row | row number on that page of this entry |
|
|
| ???? | 4 bytes | child page | next level index page containing this |
|
|
| | | | entry as first entry. Could be a leaf |
|
|
| | | | node. |
|
|
+-------------------------------------------------------------------------+
|
|
|
|
The flag field is generally either 0x00, 0x7f, 0x80. 0x80 is the one's
|
|
complement of 0x7f and all text data in the index would then need to be negated.
|
|
The reason for this negation is unknown, although I suspect it has to do with
|
|
descending order. The 0x00 flag indicates that the key column is null, and no
|
|
data will follow, only the page pointer. In multicolumn indexes the flag field
|
|
plus data is repeated for the number of columns participating in the key.
|
|
|
|
Update: There is a compression scheme utilized on leaf pages as follows:
|
|
Normally an index entry with an integer primary key would be 9 bytes (1
|
|
for the flags field, 4 for the integer, 3 for page, and 1 for row). The
|
|
entry can be shorter than 9, containing only 5 bytes, the first byte is the last
|
|
octet of the encoded primary key field (integer) and the last four are the page/row
|
|
pointer. Thus if the first key value on the page is 1 and it points to page 261
|
|
(00 01 05) row 3, it becomes
|
|
7f 00 00 00 01 00 01 05 03
|
|
|
|
the next index entry can be:
|
|
02 00 01 05 04
|
|
|
|
that is, the key value is 2 (the last octet changes to 02) page 261 row 4.
|
|
|
|
Access stores an 'alphabetic sort order' version of the text key columns in the index.
|
|
Basically this means that upper and lower case characters A-Z are merged and start at
|
|
0x60. Digits are 0x56 through 0x5f. Once converted into this (non-ascii) character set,
|
|
the text value is able to be sorted in 'alphabetic' order. A text column will end with
|
|
a NULL (0x00 or 0xff if negated).
|
|
|
|
The leaf page entries store the key column and the 3 byte page and 1 byte row
|
|
number.
|
|
|
|
The value of the index root page in the index definition may be an index page
|
|
(type 0x03), an index leaf page (type 0x04) if there is only one index page,
|
|
or (in the case of tables small enough to fit on one page) a data page
|
|
(type 0x01).
|
|
|
|
So to search the index, you need to convert your value into the alphabetic
|
|
character set, compare against each index entry, and on successful comparison
|
|
follow the page and row number to the data. Because text data is managled
|
|
during this conversion there is no 'covered querys' possible on text columns.
|
|
|
|
To conserve on frequent index updates, Jet also does something special when
|
|
creating new leaf pages at the end of a primary key (maybe others as well) index.
|
|
The next leaf page pointer of the last leaf node points to the new leaf page but
|
|
the index tree is not otherwise updated. In src/libmdb/index.c, the last leaf
|
|
read is stored, once the index search has been exhausted by the normal search
|
|
routine, it enters a "clean up mode" and reads the next leaf page pointer until
|
|
it's null.
|
|
|
|
KKD Records
|
|
-----------
|
|
|
|
Design View table definitions appear to be stored in 'KKD' records (my name for
|
|
them...they always start with 'KKD\0'). Again these reside on pages, packed to
|
|
the end of the page.
|
|
|
|
Update: The KKD records are stored in LvProp column of MSysObjects so they are
|
|
stored as other OLE/Memo fields are.
|
|
|
|
They look a little like this: (this needs work...see the kkd.c)
|
|
|
|
'K' 'K' 'D' 0x00
|
|
16 bit length value (this includes the length)
|
|
0x00 0x00
|
|
0x80 0x00 (0x80 seems to indicate a header)
|
|
Then one or more of: 16 bit length field and a value of that size.
|
|
For instance:
|
|
0x0d 0x00 and 'AccessVersion' (AccessVersion is 13 bytes, 0x0d 0x00 intel order)
|
|
|
|
Next comes one of more rows of data. (column names, descriptions, etc...)
|
|
16 bit length value (this includes the length)
|
|
0x00 0x00
|
|
0x00 0x00
|
|
16bit length field (this include the length itself)
|
|
4 bytes of unknown purpose
|
|
16 bit length field (non-inclusive)
|
|
value (07.53 for the AccessVersion example above)
|
|
|
|
See kkd.c for an example, although it needs cleanup.
|
|
|