mirror of
https://github.com/mdbtools/mdbtools.git
synced 2025-04-05 14:11:01 +08:00
594 lines
32 KiB
Plaintext
594 lines
32 KiB
Plaintext
Ok, this is a brain-dump of everything I've learned about MDB files. I'm am
|
|
using Access 97, so everything I say applies to that and maybe or maybe not
|
|
other versions.
|
|
|
|
Right, so here goes:
|
|
|
|
Note: It appears that much of the data in the pages is unitialized garbage.
|
|
This makes the task of figuring out the format a bit more challenging.
|
|
|
|
Pages
|
|
-----
|
|
|
|
MDB files are a set of pages. These pages are 2K (2048 bytes) in size, so in a
|
|
hex dump of the data they start on addreeses like xxx000 and xxx800. Access
|
|
2000 has increased the page size to 4K and thus pages would appear on hex
|
|
addresses ending in xxx000.
|
|
|
|
Each page is known by a page_id of 3 bytes (max value is 0x07FFFF).
|
|
The start address of a page is at page_id * 0x800.
|
|
So the maximum of data storage for Access97 database is near
|
|
0x080000 * 0x800 = 0x40000000 bytes (1 Go)
|
|
|
|
We have two different structures which use page_id :
|
|
|
|
1) Data pointer structure (_dp):
|
|
+------+---------+-------------+------------------------------------------+
|
|
| data | length | name | description |
|
|
+------+---------+-------------+------------------------------------------+
|
|
| ???? | 1 byte | row_id | The row id in the data page |
|
|
| ???? | 3 bytes | page_id | Max value is 0x07FFFF |
|
|
+-------------------------------------------------------------------------+
|
|
|
|
2) Page pointer structure (_pg):
|
|
+------+---------+-------------+------------------------------------------+
|
|
| data | length | name | description |
|
|
+------+---------+-------------+------------------------------------------+
|
|
| ???? | 3 bytes | page_id | Max value is 0x07FFFF |
|
|
| ???? | 1 byte | flags | If not null, indicate a system object. |
|
|
+-------------------------------------------------------------------------+
|
|
|
|
The first byte of each page seems to be a type identifier, for instance, the
|
|
first page in the mdb file is 0x00, which no other page seems to share. Other
|
|
pages have the following values:
|
|
|
|
0x00 Database definition page. (Page 0)
|
|
0x01 Data page
|
|
0x02 Table definition
|
|
0x03 Index pages
|
|
0x04 Index pages (Leaf nodes?)
|
|
0x05 Page Usage Bitmaps (extended page usage)
|
|
|
|
The second byte is always 0x01 as far as I can tell.
|
|
|
|
At some point in the file the page layout is apparently abandoned though the
|
|
very last 2K in the file again looks like a valid page. The purpose of this
|
|
non-paged region is so far unknown. Could be a corrupt db as well. My current
|
|
thinking is that this area is unallocated pages based on the GAM (global
|
|
allocation map stored on page 0x01).
|
|
|
|
Bytes after the first and second seemed to depend on the type of page, although bytes 4-7 are pages pointers that refer to the parent (data pages) or a continuation page (table definition).
|
|
|
|
Pages seem to have two parts, a header and a data portion. The header starts
|
|
at the front of the page and builds up. The data is packed to the end of the
|
|
page. This means the last byte of the data portion is the last byte of the
|
|
page.
|
|
|
|
Byte Order
|
|
----------
|
|
|
|
All offsets to data within the file are in little endian (intel) order
|
|
|
|
Catalogs
|
|
--------
|
|
|
|
Note: This section was written fairly early in the process of determining the file
|
|
format. It is now understood that the catalog pages are data for the MSysObjects
|
|
system table (with a table definition starting at page 2). The rest of this
|
|
section is presented for the understanding of the current code until it may be
|
|
replaced by a more proper implementation.
|
|
|
|
|
|
So far the first page of the catalog has always been seen at 0x9000 bytes into
|
|
the file. It is unclear whether this is always where it occurs, or whether a
|
|
pointer to this location exists elsewhere.
|
|
|
|
The header to the catalog page(s) start look something like this:
|
|
|
|
+------+---------+--------------------------------------------------------+
|
|
| 0x01 | 1 byte | Page type |
|
|
| 0x01 | 1 byte | Unknown |
|
|
| ???? | 2 bytes | A pointer of unknown use into the page |
|
|
| 0x02 | 1 byte | Unknown |
|
|
| 0x00 | 3 bytes | Possibly part of a 32 bit int including the 0x02 above |
|
|
| ???? | 2 bytes | a 16bit int of the number of records on this page |
|
|
+-------------------------------------------------------------------------+
|
|
| Iterate for the number of records |
|
|
+-------------------------------------------------------------------------+
|
|
| ???? | 2 bytes | offset to the records location on this page |
|
|
+-------------------------------------------------------------------------+
|
|
|
|
The rest of the data is packed to the end of the page, such that the last
|
|
record ends on byte 2047 (0 based).
|
|
|
|
Some of the offsets are not within the bounds of the page. The reason for this
|
|
is not presently understood and the current code discards them silently.
|
|
Offsets that have 0x40 in the high order byte point to a location within the
|
|
page where a pointer to another catalog page is stored. This does not seem to
|
|
yield a complete chain of catalog pages and is currently being ignored in favor
|
|
of a brute force read of the entire database for catalog pages.
|
|
|
|
Little is understood of the meaning of the bytes that make up the records. They
|
|
vary in size, but portion prior to the objects name seems to be fixed. All
|
|
records start with a '0x11'. The next two bytes are a page number to the column definitions. (see Column Definition).
|
|
|
|
Byte offset 9 from the beginning of the record contains its type. Here is a
|
|
table of known types:
|
|
|
|
0x00 Form
|
|
0x01 User Table
|
|
0x02 Macro
|
|
0x03 System Table
|
|
0x04 Report
|
|
0x05 Query
|
|
0x06 Linked Table
|
|
0x07 Module
|
|
0x0b Unknown but used for two objects (AccessLayout and UserDefined)
|
|
|
|
Byte offset 31 from the begining of the record starts the object's name. I am
|
|
not presently aware of any field defining the length of the name, so the present
|
|
course of action has been to stop at the first non-printable character
|
|
(generally a 0x03 or 0x02)
|
|
|
|
After the name there is sometimes have (not yet determined why only sometimes)
|
|
a page pointer and offset to the KKD records (see below). There is also pointer to other catalog pages, but I'm not really sure how to parse those.
|
|
|
|
TDEF Pages (Table Definition)
|
|
-----------------------------
|
|
A table definition, includes name, type, size, number of datarows, a pointer
|
|
to the first data page, and possibly more.
|
|
|
|
The header of each Tdef page looks like this (8 bytes) :
|
|
+------+---------+-------------+------------------------------------------+
|
|
| data | length | name | description |
|
|
+------+---------+-------------+------------------------------------------+
|
|
| 0x02 | 1 bytes | page_type | 0x02 indicate a tabledef page |
|
|
| 0x01 | 1 bytes | unknown | |
|
|
| 'VC' | 2 bytes | tdef_id | The word 'VC' |
|
|
| 0x00 | 4 bytes | next_pg | Next tdef page pointer (0 if none) |
|
|
+------+---------+-------------+------------------------------------------+
|
|
|
|
Note: The tabledef can be very long, so it can take many TDEF pages linked
|
|
with the next_pg pointer.
|
|
|
|
+-------------------------------------------------------------------------+
|
|
| Table definition bloc (35 bytes) |
|
|
+------+---------+-------------+------------------------------------------+
|
|
| data | length | name | description |
|
|
+------+---------+-------------+------------------------------------------+
|
|
| ???? | 4 bytes | tdef_len | Length of the data for this page |
|
|
| ???? | 4 bytes | num_rows | Number of records in this table |
|
|
| 0x00 | 4 bytes | autonumber | value for the next value of the |
|
|
| | | | autonumber column, if any. 0 otherwise |
|
|
| 0x4e | 1 byte | table_type | 0x53: user table, 0x4e: system table |
|
|
| ???? | 2 bytes | num_real_col| Number of columns in table (not always) |
|
|
| ???? | 2 bytes | num_var_cols| Number of variable columns in table |
|
|
| ???? | 2 bytes | num_cols | Number of columns in table (repeat) |
|
|
| ???? | 4 bytes | num_idx | Number of indexes in table |
|
|
| ???? | 4 bytes | num_real_idx| Number of indexes in table (repeat) |
|
|
| ???? | 4 bytes | used_pages | Points to a record containing the |
|
|
| | | | usage bitmask for this table. |
|
|
| ???? | 4 bytes | | Points to a similar record as above, |
|
|
| | | | listing pages which contain free space. |
|
|
+-------------------------------------------------------------------------+
|
|
| Iterate for the number of num_real_idx (8 bytes per idxs) |
|
|
+-------------------------------------------------------------------------+
|
|
| 0x00 | 4 bytes | ??? | |
|
|
| ???? | 4 bytes | num_idx_rows| (not sure) |
|
|
+-------------------------------------------------------------------------+
|
|
| Iterate for the number of num_cols (18 bytes per column) |
|
|
+-------------------------------------------------------------------------+
|
|
| ???? | 1 byte | col_type | Column Type (see table below) |
|
|
| ???? | 2 bytes | col_num | Column Number, (not always) |
|
|
| ???? | 2 bytes | offset_V | Offset for variable length columns |
|
|
| ???? | 4 bytes | ??? | |
|
|
| ???? | 4 bytes | ??? | |
|
|
| ???? | 1 byte | bitmask | low order bit indicates variable columns |
|
|
| ???? | 2 bytes | offset_F | Offset for fixed length columns |
|
|
| ???? | 2 bytes | col_len | Length of the column (0 if memo) |
|
|
+-------------------------------------------------------------------------+
|
|
| Iterate for the number of num_cols (n bytes per column) |
|
|
+-------------------------------------------------------------------------+
|
|
| ???? | 1 byte | col_name_len| len of the name of the column |
|
|
| ???? | n bytes | col_name | Name of the column |
|
|
+-------------------------------------------------------------------------+
|
|
| Iterate for the number of num_real_idx (30+9 = 39 bytes) |
|
|
+-------------------------------------------------------------------------+
|
|
| Iterate 10 times for 10 possible columns (10*3 = 30 bytes) |
|
|
+-------------------------------------------------------------------------+
|
|
| ???? | 2 bytes | col_num | number of a column (0xFFFF= none) |
|
|
| ???? | 1 byte | col_order | 0x01 = ascendency order |
|
|
+-------------------------------------------------------------------------+
|
|
| ???? | 4 bytes | unknown | |
|
|
| ???? | 4 bytes | first_dp | Data pointer of the index page |
|
|
| ???? | 1 byte | flags | See flags table for indexes |
|
|
+-------------------------------------------------------------------------+
|
|
| Iterate for the number of num_real_idx |
|
|
+-------------------------------------------------------------------------+
|
|
| ???? | 4 bytes | index_num | Number of the index |
|
|
| | | |(warn: not always in the sequential order)|
|
|
| ???? | 4 bytes | index_num2 | Number of the index (repeat) |
|
|
| 0xFF | 4 bytes | ??? | |
|
|
| 0x00 | 4 bytes | ??? | |
|
|
| 0x04 | 2 bytes | ??? | |
|
|
| ???? | 1 byte | primary_key | 0x01 if this index is primary |
|
|
+-------------------------------------------------------------------------+
|
|
| Iterate for the number of num_real_idx |
|
|
+-------------------------------------------------------------------------+
|
|
| ???? | 1 byte | idx_name_len| len of the name of the index |
|
|
| ???? | n bytes | idx_name | Name of the index |
|
|
+-------------------------------------------------------------------------+
|
|
| ???? | n bytes | ??? | |
|
|
| 0xFF | 2 bytes | ??? | End of the tableDef ? |
|
|
+-------------------------------------------------------------------------+
|
|
Index flags (not complete):
|
|
0x01 Unique
|
|
0x02 IgnoreNuls
|
|
0x08 Required
|
|
|
|
|
|
Column Type may be one of the following (not complete):
|
|
|
|
BOOL = 0x01 /* boolean ( 1 bit ) */
|
|
BYTE = 0x02 /* byte ( 8 bits ) */
|
|
INT = 0x03 /* Integer (16 bits ) */
|
|
LONGINT = 0x04 /* Long Integer (32 bits ) */
|
|
MONEY = 0x05 /* Currency ( 8 bytes) */
|
|
FLOAT = 0x06 /* Single ( 4 bytes) */
|
|
DOUBLE = 0x07 /* Double ( 8 bytes) */
|
|
SDATETIME = 0x08 /* Short Date/Time ( 8 bytes) */
|
|
BINARY = 0x09 /* binay (255 bytes) */
|
|
TEXT = 0x0A /* Text (255 bytes) */
|
|
OLE = 0x0B /* OLE */
|
|
MEMO = 0x0C /* Memo, Hyperlink */
|
|
UNKNOWN_0D = 0x0D
|
|
REPID = 0x0F /* GUID */
|
|
|
|
Note: this is where my stuff didn't mesh with Yves Maingoy's who reworked the section above.
|
|
|
|
(start old stuff)
|
|
Following the 18 byte column records begins the column names, listed in order
|
|
with a 1 byte size prefix preceding each name.
|
|
|
|
After this are a series of 39 byte fields for each index. At offset 34 is a 4 byte page number where the index lives.
|
|
|
|
Beyond this are a series of 20 byte fields for each 'index entry'. There may be more entrys than indexes and byte 20 represents its type (0x00 for normal index, 0x01 for Primary Key, and 0x02 otherwise).
|
|
|
|
It is currently unknown how indexes are mapped to columns or the format of the index pages.
|
|
(end old stuff)
|
|
|
|
Page Usage Map
|
|
--------------
|
|
|
|
The purpose of the page usage bitmap (called object allocation map (OAM) by
|
|
SQL Server, not sure what the official terminology is for Access) is to store
|
|
a bitmap of page allocations for a table. This determines quickly which pages
|
|
are owned by the table and helps speed up access to the data.
|
|
|
|
The table definition contains a data pointer to a usage bitmap of pages
|
|
allocated to this table. It appears to be of a fixed size for both Jet 3
|
|
and 4 (128 and 64 bytes respectively). The first byte of the map is a type
|
|
field.
|
|
|
|
Type 0 page usage map definition follows:
|
|
|
|
+------+---------+---------------------------------------------------------+
|
|
| data | length | name | description |
|
|
+------+---------+---------------------------------------------------------+
|
|
| 0x00 | 1 byte | map_type | 0x00 indicates map stored within. |
|
|
| ???? | 4 byte | page_start | first page for which this map applies |
|
|
+------+---------+---------------------------------------------------------+
|
|
| Iterate for the length of map |
|
|
+--------------------------------------------------------------------------+
|
|
| ???? | 1 byte | bitmap | each bit encodes the allocation status of a|
|
|
| | | | page. 1 indicates allocated to this table. |
|
|
| | | | Pages are stored from msb to lsb. |
|
|
+--------------------------------------------------------------------------+
|
|
|
|
If you're paying attention then you'll realize that the relatively small size of the map (128*8*2048 or 64*8*4096 = 2 Meg) means that this scheme won't work with larger database files although the initial start page helps a bit. To overcome this there is a second page usage map scheme with the map_type of 0x01 as follows:
|
|
|
|
+------+---------+---------------------------------------------------------+
|
|
| data | length | name | description |
|
|
+------+---------+---------------------------------------------------------+
|
|
| 0x01 | 1 byte | map_type | 0x01 indicates this is a indirection list. |
|
|
+------+---------+---------------------------------------------------------+
|
|
| Iterate for the length of map |
|
|
+--------------------------------------------------------------------------+
|
|
| ???? | 4 bytes | map_page | pointer to page type 0x05 containing map |
|
|
+--------------------------------------------------------------------------+
|
|
|
|
Note that the intial start page is gone and is reused for the first page indirection. The 0x05 type page header looks like:
|
|
|
|
+------+---------+---------------------------------------------------------+
|
|
| data | length | name | description |
|
|
+------+---------+---------------------------------------------------------+
|
|
| 0x05 | 1 byte | page_type | allocation map page |
|
|
| 0x01 | 1 byte | unknown | always 1 as with other page types |
|
|
| 0x00 | 2 bytes | unknown | |
|
|
+------+---------+---------------------------------------------------------+
|
|
|
|
The rest of the page is the allocation bitmap following the same scheme (lsb
|
|
to msb order, 1 bit per page) as a type 0 map. This yields a maximum of
|
|
2044*8=16352 (jet3) or 4092*8 = 32736 (jet4) pages mapped per type 0x05 page.
|
|
Given 128/4+1 = 33 or 64/4+1 = 17 page pointers per indirection row (remember
|
|
the start page field is reused, thus the +1), this yields 33*16352*2048 = 1053
|
|
Meg (jet3) or 17*32736*4096 = 2173 Meg (jet4) or enough to cover the maximum
|
|
size of each of the database formats comfortably, so there is no reason to
|
|
believe any other page map schemes exist.
|
|
|
|
Data Pages
|
|
----------
|
|
|
|
The header of a data page looks like this:
|
|
|
|
+------+---------+---------------------------------------------------------+
|
|
| data | length | name | description |
|
|
+------+---------+---------------------------------------------------------+
|
|
| 0x01 | 1 byte | page_type | 0x01 indicates a data page. |
|
|
| 0x01 | 1 byte | unknown | |
|
|
| ???? | 2 bytes | free_space | Free space in this page |
|
|
| ???? | 4 bytes | tdef_pg | Page pointer to table definition |
|
|
| ???? | 4 bytes | num_rows | number of records on this page |
|
|
+------+---------+---------------------------------------------------------+
|
|
| Iterate for the number of records |
|
|
+--------------------------------------------------------------------------+
|
|
| ???? | 2 bytes | offset_row | The records location on this page |
|
|
+--------------------------------------------------------------------------+
|
|
|
|
Notes for offset_row:
|
|
- Offsets that have 0x40 in the high order byte point to a location within
|
|
the page where a Data Pointer (4 bytes) to another data page is stored.
|
|
- Offsets that have 0x80 in the high order byte are deleted rows.
|
|
(These flags are delflag and lookupflag in source code)
|
|
|
|
|
|
Each data row looks like this (JET3):
|
|
|
|
+------+---------+----------------------------------------------------------+
|
|
| data | length | name | description |
|
|
+------+---------+----------------------------------------------------------+
|
|
| ???? | 1 byte | num_cols | Number of columns stored in this row |
|
|
| ???? | n bytes | | Fixed length columns |
|
|
| ???? | n bytes | | Variable length columns |
|
|
| ???? | 1 byte | fixed_len | length of data from beginning of record |
|
|
| ???? | n bytes | var_table[] | offset from start of row for each variable |
|
|
| | | | length column |
|
|
| ???? | 1 byte | var_len | number of variable length columns |
|
|
| ???? | n bytes | null_table[]| Null indicator. size is 1 byte per 8 cols. |
|
|
| | | | 0 indicates a null value. |
|
|
+------+---------+----------------------------------------------------------+
|
|
|
|
Note: For boolean fixed columns, the values are in null_table[]:
|
|
0 indicates a false value
|
|
1 indicates a true value
|
|
|
|
An 0xFF stored in the var_table indicates that this column has been deleted.
|
|
|
|
Note: A row will always have the number of fixed columns as specified in the table definition, but may have less variable columns, as rows are not updated when columns are added.
|
|
|
|
In Access 2000 (JET4) data rows are like this
|
|
|
|
+------+---------+----------------------------------------------------------+
|
|
| data | length | name | description |
|
|
+------+---------+----------------------------------------------------------+
|
|
| ???? | 2 bytes | num_cols | Number of columns stored in this row |
|
|
| ???? | n bytes | | Fixed length columns |
|
|
| ???? | n bytes | | Variable length columns |
|
|
| ???? | 2 bytes | fixed_len | length of data from beginning of record |
|
|
| ???? | n bytes | var_table[] | offset from start of row for each variable |
|
|
| | | | length column. (2 bytes per var column) |
|
|
| ???? | 2 bytes | var_len | number of variable length columns |
|
|
| ???? | n bytes | null_table[]| Null indicator. size is 1 byte per 8 cols. |
|
|
| | | | 0 indicates a null value. |
|
|
+------+---------+----------------------------------------------------------+
|
|
|
|
Note: it is possible for the offset to the beginning of a variable length
|
|
column to require more than one byte (if the sum of the lengths of columns is
|
|
greater than 255). I have no idea how this is represented in the data as I
|
|
have not looked at tables large enough for this to occur yet.
|
|
Update: This is currently implemented using a jump counter for Jet 3 files, see
|
|
src/libmdb/data.c for details.
|
|
|
|
Each memo column (or other long binary data) in a row
|
|
looks like this (12 bytes):
|
|
+------+---------+-------------+------------------------------------------+
|
|
| data | length | name | description |
|
|
+------+---------+-------------+------------------------------------------+
|
|
| ???? | 2 bytes | memo_len | Total length of the memo |
|
|
| ???? | 2 bytes | bitmask | See values |
|
|
| ???? | 4 bytes | lval_dp | Data pointer to LVAL page (if needed) |
|
|
| 0x00 | 4 bytes | unknown | |
|
|
+------+---------+-------------+------------------------------------------+
|
|
Values for the bitmask:
|
|
|
|
0x8000= the memo is in a string at the end of this header (memo_len bytes)
|
|
0x4000= the memo is in a unique LVAL page in a record type 1
|
|
0x0000= the memo is in n LVAL pages in a record type 2
|
|
|
|
If the memo is in a LVAL page, we use row_id of lval_dp to find the row.
|
|
offset_start of memo = (int16*) LVAL_page[ 10 + row_id * 2]
|
|
if (rowid=0)
|
|
offset_stop of memo = 2048
|
|
else
|
|
offset_stop of memo = (int16*) LVAL_page[ 10 + row_id * 2 - 2]
|
|
|
|
The length (partial if type 2) for the memo is:
|
|
memo_page_len = offset_stop - offset_start
|
|
|
|
LVAL Pages
|
|
----------
|
|
(LVAL Page are particular data pages for long data storages )
|
|
|
|
The header of a LVAL page looks like this (10 bytes) :
|
|
+------+---------+-------------+------------------------------------------+
|
|
| data | length | name | description |
|
|
+------+---------+-------------+------------------------------------------+
|
|
| 0x01 | 1 bytes | page_type | 0x01 indicate a data page |
|
|
| 0x01 | 1 bytes | unknown | |
|
|
| ???? | 2 bytes | free_space | The free space in this page |
|
|
| LVAL | 4 bytes | lval_id | The word 'LVAL' |
|
|
| ???? | 2 bytes | num_rows | Number of rows in this page |
|
|
+-------------------------------------------------------------------------+
|
|
| Iterate for the number of records |
|
|
+-------------------------------------------------------------------------+
|
|
| ???? | 2 bytes | row_offset | to the records location on this page |
|
|
+-------------------------------------------------------------------------+
|
|
|
|
Each memo record type 1 looks like this:
|
|
+------+---------+-------------+------------------------------------------+
|
|
| data | length | name | description |
|
|
+------+---------+-------------+------------------------------------------+
|
|
| ???? | n bytes | memo_value | A string which is the memo |
|
|
+-------------------------------------------------------------------------+
|
|
|
|
Each memo record type 2 looks like this:
|
|
+------+---------+-------------+------------------------------------------+
|
|
| data | length | name | description |
|
|
+------+---------+-------------+------------------------------------------+
|
|
| ???? | 4 bytes | lval_dp | Next page LVAL type 2 if memo is too long|
|
|
| ???? | n bytes | memo_value | A string which is the memo (partial) |
|
|
+-------------------------------------------------------------------------+
|
|
|
|
In a LVAL type 2 data page, you have
|
|
10 bytes for the header of the data page,
|
|
2 bytes for an offset,
|
|
4 bytes for the next lval_pg
|
|
So you have a bloc of 2048 - (10+2+4) = 2032 bytes max in a page.
|
|
|
|
Indices
|
|
-------
|
|
|
|
Indices are not completely understood but here is what we know.
|
|
|
|
+------+---------+-------------+------------------------------------------+
|
|
| data | length | name | description |
|
|
+------+---------+-------------+------------------------------------------+
|
|
| 0x01 | 1 bytes | page_type | 0x03 indicate an index page |
|
|
| 0x01 | 1 bytes | unknown | |
|
|
| ???? | 2 bytes | free_space | The free space at the end this page |
|
|
| ???? | 4 bytes | parent_page | The page number of the TDEF for this idx |
|
|
| ???? | 4 bytes | prev_page | Previous page at this index level |
|
|
| ???? | 4 bytes | next_page | Next page at this index level |
|
|
| ???? | 4 bytes | leaf_page | Pointer to leaf page, purpose unknown |
|
|
+-------------------------------------------------------------------------+
|
|
|
|
Index pages come in two flavors.
|
|
|
|
0x04 pages are leaf pages which contain one entry for each row in the table.
|
|
Each entry is composed of a flag, the indexed column values and a page/row
|
|
pointer to the data.
|
|
|
|
0x03 index pages make up the rest of the index tree and contain a flag, the
|
|
indexed columns, the page/row contains this entry, and the leaf page or
|
|
intermediate (another 0x03 page) page pointer for which this is the first
|
|
entry on.
|
|
|
|
Both index types have a bitmask starting at 0x16 which identifies the starting
|
|
location of each index entry on this page. The first entry is assumed and
|
|
the count starts from the low order bit. For example take the data:
|
|
|
|
00 20 00 04 80 00 ...
|
|
|
|
This first entry starts at 0xf8 (always). Convert the bytes to binary starting with the low order bit and stopping at the first "on" bit:
|
|
|
|
0000 0000 0000 01
|
|
-- 00 --- -- 20 -->
|
|
This next entry starts 14 (0xe) bytes in at 0x105. Proceding from here, the next
|
|
entry:
|
|
|
|
00 0000 0000 001
|
|
<-- 20 -- -- 00 --- -- 04
|
|
|
|
starts 13 (0xd) bytes further in at 0x112. The final entry starts at
|
|
|
|
0 0000 0000 0001
|
|
<-- 04 -- -- 80 ---
|
|
|
|
or 13 (0xd) bytes more at 0x120. In this example the rest of the mask (up to offset 0xf8) would be zero filled and thus this last entry at 0x120 isn't an actual entry but the stopping point of the data.
|
|
|
|
Since 0xf8 = 248 and 0x16 = 22, (248 - 22) * 8 = 1808 and 2048 - 1808 = 240 leaving just enough space for the bit mask to encode the remainder of the page. One wonders why MS didn't use a row offset table like they did on data pages,
|
|
seems like it would have been easier and more flexible.
|
|
|
|
So now we come to the index entries for type 0x03 pages which look like this:
|
|
|
|
+------+---------+-------------+------------------------------------------+
|
|
| data | length | name | description |
|
|
+------+---------+-------------+------------------------------------------+
|
|
| 0x7f | 1 byte | flags | 0x80 LSB, 0x7f MSB, 0x00 null? |
|
|
| ???? | variable| indexed cols| indexed column data |
|
|
| ???? | 3 bytes | data page | page containing row referred to by this |
|
|
| | | | index entry |
|
|
| ???? | 1 byte | data row | row number on that page of this entry |
|
|
| ???? | 4 bytes | child page | next level index page containing this |
|
|
| | | | entry as first entry. Could be a leaf |
|
|
| | | | node. |
|
|
+-------------------------------------------------------------------------+
|
|
|
|
The flag field is generally either 0x00, 0x7f, 0x80. 0x80 is the one's
|
|
complement of 0x7f and all text data in the index would then need to be negated.
|
|
The reason for this negation is unknown, although I suspect it has to do with
|
|
descending order. The 0x00 flag indicates that the key column is null, and no
|
|
data will follow, only the page pointer. In multicolumn indexes the flag field plus data is repeated for the number of columns participating in the key.
|
|
|
|
Update: There is a compression scheme utilized on leaf pages as follows:
|
|
Normally an index entry with an integer primary key would be 9 bytes (1 for the flags field, 4 for the integer, 3 for page, and 1 for row). The entry can be shorter than 9, containing only 5 bytes, the first byte is the last octet of the
|
|
encoded primary key field (integer) and the last four are the page/row pointer.
|
|
Thus if the first key value on the page is 1 and it points to page 261 (00 01 05
|
|
) row 3, it becomes
|
|
7f 00 00 00 01 00 01 05 03
|
|
|
|
the next index entry can be:
|
|
02 00 01 05 04
|
|
|
|
that is, the key value is 2 (the last octet changes to 02) page 261 row 4.
|
|
|
|
Access stores an 'alphabetic sort order' version of the text key columns in the index. Basically this means that upper and lower case characters A-Z are merged and start at 0x60. Digits are 0x56 through 0x5f. Once converted into this
|
|
(non-ascii) character set, the text value is able to be sorted in 'alphabetic'
|
|
order. A text column will end with a NULL (0x00 or 0xff if negated).
|
|
|
|
The leaf page entries store the key column and the 3 byte page and 1 byte row
|
|
number.
|
|
|
|
The value of the index root page in the index definition may be an index page
|
|
(type 0x03), an index leaf page (type 0x04) if there is only one index page,
|
|
or (in the case of tables small enough to fit on one page) a data page
|
|
(type 0x01).
|
|
|
|
So to search the index, you need to convert your value into the alphabetic
|
|
character set, compare against each index entry, and on successful comparison
|
|
follow the page and row number to the data. Because text data is managled
|
|
during this conversion there is no 'covered querys' possible (a query that can
|
|
be satisfied by reading the index, without descending to the leaf page to read
|
|
the data).
|
|
|
|
KKD Records
|
|
-----------
|
|
|
|
Design View table definitions appear to be stored in 'KKD' records (my name for
|
|
them...they always start with 'KKD\0'). Again these reside on pages, packed to
|
|
the end of the page.
|
|
|
|
Update: The KKD records are stored in LvProp column of MSysObjects so they are stored as other OLE/Memo fields are.
|
|
|
|
They look a little like this: (this needs work...see the kkd.c)
|
|
|
|
'K' 'K' 'D' 0x00
|
|
16 bit length value (this includes the length)
|
|
0x00 0x00
|
|
0x80 0x00 (0x80 seems to indicate a header)
|
|
Then one or more of: 16 bit length field and a value of that size.
|
|
For instance:
|
|
0x0d 0x00 and 'AccessVersion' (AccessVersion is 13 bytes, 0x0d 0x00 intel order)
|
|
|
|
Next comes one of more rows of data. (column names, descriptions, etc...)
|
|
16 bit length value (this includes the length)
|
|
0x00 0x00
|
|
0x00 0x00
|
|
16bit length field (this include the length itself)
|
|
4 bytes of unknown purpose
|
|
16 bit length field (non-inclusive)
|
|
value (07.53 for the AccessVersion example above)
|
|
|
|
See kkd.c for an example, although it needs cleanup.
|
|
|