According to the HACKING file, the file's default language ID is stored
in the database header. Use this value instead of a generic English
language locale for indexing JET4 files.
Columns can have their own text sorting rules, including language ID
distinct from the file's language ID, but this is not addressed as we'd
have to break the mdb_index_hash_text function signature, which I'm not
prepared to do just yet.
There appear to be two bytes after the language ID that may indicate
additional sorting flags. These bytes need additional research.
Using the notes and RC4 key provided in the HACKING file, decrypt the
database definition page all at once instead of decrypting individual
fields with ad-hoc keys. Use the newly decrypted header to access the
database code page at offset 0x3C, and use this numeric value to
initialize the iconv converter with an appropriate charset name for
popular windows code pages. More encodings can be added later, with
the eventual goal of getting rid of the MDB_JET3_CHARSET environment
variable.
Note that individual columns can have their own code pages but this
issue is not addressed.
An extra field is added to the MdbFile structure - because this
struct is allocated internally, this should not break the public
ABI.
Finally, only set the db_passwd field if it's a JET3 database (see #144)
Replace the jerry-built UTF-16 => Latin-1 code path with a cross-platform wcstombs solution that emits UTF-8.
This adds an element to the end of the MdbHandle struct, but should not break any existing code.
A run-time option could be added later to emit other encodings, but people who care about such things can just use the iconv code path.
There was some confusion as to whether the destination buffer length
should include space for the null terminator. Some callers of the
function assumed that a terminator would be added beyond the end
of the stated buffer size, while others did not. Make everything
consistent and also fix an overrun when there was insufficient
space for the output in the non-iconv implementation.
As stated in a code comment, a better solution would follow the lead
of libxls and use wcstombs and friends when iconv is not available.
But this gets into the weeds with conversion functions named differently
across platforms. The goal here is to fix the buffer overrun.
See oss-fuzz/28773
This is used to build RPMs, but it is out of date and not under test
coverage. If someone would like to restore it, please add some kind of
test coverage so that it does not fall out of date in the future.
See #201