if skip missing fonts is set we want to read the file
as much as possible so we will also skip any missing
xobjects like images, forms or postscript code
default word extractor consumed words sorted
by descending y value order and had a check for
when the following letter had a value more than 0.5
units different to the current baseline (from first letter)
position. however we were checking if the new value was
more than the current baseline which it could never
be since the letter was always guaranteed to have an
equal or lower y value based on initial sort (since pdf
y dimension runs top to bottom)
if we're parsing a known dictionary (e.g. all keys are required
and there are no additional optional keys) and we encounter
an error we provide the possibility to recover by assuming
a dictionary end token after all required tokens are consumed
if parsing by looking for dictionary end failed due to a format
exception
these issues reported that parsing was failing due to a missing
token being reference in the tounicode entry. since neither
issue included a sample file it's impossible to determine the
right fix accurately, however since the tounicode entry is
optional in the spec we can try being more lenient here, this
might just result in more errors once we try to use the font
but the logger will at least prevent parsing the entire document
failing
* For Type3 font with a zero width/height bounding box, set it to a sensible
default using the font matrix. This ensures the letter bounding boxes will
not have height 0.
* Also added a test to check for non-zero height in the sample Type3 PDF
* Prevent division by zero error
---------
Co-authored-by: mvantzet <mark@radialsg.com>
* Add Named Destinations to Catalog so that bookmarks and links can access
them.
The named destinations require access to page nodes, so created Pages object
that is made using PagesFactory (which contains the page-related code from
Catalog).
* Further implementation of destinations:
- Implement NamedDestinations in AnnotationProvider, so that we can look
up named destinations for annotations and turn them into explicit destinations.
Reused existing code inside BookmarksProvider to get destinations/actions.
- Added GoToE action
- According to the PDF reference, destinations are also required for
external destinations and hence for ExternalBookmarkNode. This allows us
to push up DocumentBookmarkNode.Destination to BookmarkNode.
* Implemented stateful appearance streams and integration test
* Added AppearanceStream to public API because it is used in the (public)
Annotation constructor
* After #552, must push down ExplicitDestination do DocumentBookmarkNode since it
does not apply to UriBookmarkNode.
* Added actions, which fits the PDF model better and works well with the
new bookmarks code (after PR #552)
* Rename Action to PdfAction + removed unused using in ActionProvider.cs
---------
Co-authored-by: mvantzet <mark@radialsg.com>