Add new mdbi_register_backend2 function with the features introduced in
PRs #321 and #322. The purpose is to prevent API/ABI breakage in the
1.0 release.
Access's `LIKE` is actually case-insensitive, but to prevent breaking existing
programs that rely on mdbtools' case-sensitive behavior, introduce a new
`ILIKE` operator to perform a case-insensitive match. Use GLib's `g_utf8_casefold`
to make the comparison UTF-8 aware. A "poor man's" version is implemented
in fakeglib, which relies on `towlower`, and won't work with multi-grapheme
case transformations (e.g. German Eszett).
Fixes#233
Per #260 I am assuming that the internal version 6 refers to files
created with Access 2019. I can't find any documentation on this format,
so I am calling it ACE17. Testing welcome.
* Differentiate character lengths from byte lengths (see #112)
* Use GNU-style indexed initializers for clarity
* Remove needs_quotes since it's not used anywhere
GLib will automatically convert command line options to UTF-8 provided that setlocale(LC_TYPE, "") is called first, and the argument type is STRING (but not FILENAME). Update the CLI tools to take advantage of this behavior, and likewise implement it in fakeglib.
GLib does not automatically convert non-option arguments (i.e. everything remaining in argv after option processing), so manually call g_locale_to_utf8 on these arguments when they represent table names. This should fix the CLI tools when processing non-ASCII table names in non-UTF-8 locales. Also update fakeglib to implement a fast and loose version of g_locale_to_utf8, and factor out some of the code page => iconv name logic in iconv.c so it can be used in our fake g_locale_to_utf8. This adds a new symbol mdb_iconv_name_from_code_page that is not advertised in the main header file. I did not want to include mdbtools.h from fakeglib.c, but maybe that's not important.
Other programs (e.g. gmdb2) use mdb_print_col, so restore the old enum
names and values. MDB_EXPORT_ESCAPE_INVISIBLE can be OR'ed into the
last argument to enable C-style escaping of text fields.
According to the HACKING file, the file's default language ID is stored
in the database header. Use this value instead of a generic English
language locale for indexing JET4 files.
Columns can have their own text sorting rules, including language ID
distinct from the file's language ID, but this is not addressed as we'd
have to break the mdb_index_hash_text function signature, which I'm not
prepared to do just yet.
There appear to be two bytes after the language ID that may indicate
additional sorting flags. These bytes need additional research.
Using the notes and RC4 key provided in the HACKING file, decrypt the
database definition page all at once instead of decrypting individual
fields with ad-hoc keys. Use the newly decrypted header to access the
database code page at offset 0x3C, and use this numeric value to
initialize the iconv converter with an appropriate charset name for
popular windows code pages. More encodings can be added later, with
the eventual goal of getting rid of the MDB_JET3_CHARSET environment
variable.
Note that individual columns can have their own code pages but this
issue is not addressed.
An extra field is added to the MdbFile structure - because this
struct is allocated internally, this should not break the public
ABI.
Finally, only set the db_passwd field if it's a JET3 database (see #144)
Replace the jerry-built UTF-16 => Latin-1 code path with a cross-platform wcstombs solution that emits UTF-8.
This adds an element to the end of the MdbHandle struct, but should not break any existing code.
A run-time option could be added later to emit other encodings, but people who care about such things can just use the iconv code path.
Merge in pull request #108 with a few changes:
* Use the newer mdb_print_col function
* Redefine the last argument of mdb_print_col to be a flags argument
* Rename and redefine the BINEXPORT enums. While technically public,
these were never intended as a public API.
* Name the command line option --escape-c
mdb_init() and mdb_exit() have done nothing for a while.
mdb_get_coltype_string() and mdb_coltype_takes_length() were previously
removed, but remained in the header file by accident.
Updated the SQL parser to support "SELECT TOP n [PERCENT]... " queries,
matching the Mocrosoft Access SQL language.
Export these queries from databases with mdb-queries.
* Separate -D (date only) and -T (date/time) format options in mdb-export and mdb-json
* New public mdb_set_shortdate_fmt() function in libmdb
* New private(ish) mdb_col_is_shortdate() function
I'm calling it "shortdate" in order to preserve the existing API.
See https://github.com/mdbtools/mdbtools/issues/12
This should fix long-standing complaints about the default bind size
without causing undue memory inflation in existing applications.
Could make this adjustable on the command line later.
Supersedes:
https://github.com/mdbtools/mdbtools/pull/137
Quickstart (requires Clang 6 or later):
$ export LIB_FUZZING_ENGINE=/path/to/fuzzing/library.a
$ ./configure --enable-fuzz-testing
$ make
$ cd src/fuzz
$ make fuzz_mdb
$ ./fuzz_mdb
Also add a new `mdb_open_buffer function` to facilitate in-memory
fuzz-testing. This requires fmemopen, which may not be present on all
systems. The internal API has been reworked to use file streams instead
of file descriptors. This allows reading from memory and reading from
files using a consistent API.