Commit Graph

969 Commits

Author SHA1 Message Date
Eliot Jones
98af575ee3 version 0.1.2 2020-07-04 16:55:14 +01:00
Eliot Jones
5caf5f2686
Merge pull request #187 from BobLd/dla-example-1
Add AdvancedTextExtraction example
2020-07-01 17:35:49 +01:00
Eliot Jones
7d166131ad
Merge pull request #186 from BobLd/dupOverlap
Implement DuplicateOverlappingTextProcessor
2020-07-01 16:09:48 +01:00
BobLd
a60be8d60a add AdvancedTextExtraction 2020-06-26 12:28:12 +01:00
BobLd
4cda06c2fa make Letter public and flag letter as bold 2020-06-26 10:29:44 +01:00
BobLd
fe8bc0e5be Implement DuplicateOverlappingTextProcessor 2020-06-25 15:00:20 +01:00
Eliot Jones
0f65397f48
Merge pull request #184 from BobLd/docstrum-v2.3
Fix DocstrumBoundingBoxes when dXj=0
2020-06-20 15:27:05 +01:00
BobLd
4b88f4adbe correct typo in section numers 2020-06-20 14:27:13 +01:00
BobLd
4f78e58195 remove unnecessary 'inheritdoc' 2020-06-20 14:24:29 +01:00
BobLd
924c0138e0 fix Docstrum's GetTranslatedPoint() to handle dXj=0 2020-06-20 14:19:23 +01:00
BobLd
091c17bdf8 Merge branch 'master' of https://github.com/UglyToad/PdfPig 2020-06-20 13:30:30 +01:00
Eliot Jones
5fb04582a7 0.1.2-alpha003 2020-06-20 12:54:31 +01:00
Eliot Jones
982c331935 re-use truetype parser for opentype cid fonts 2020-06-20 12:46:41 +01:00
BobLd
4758820db5 Merge branch 'master' of https://github.com/UglyToad/PdfPig 2020-06-20 12:29:39 +01:00
Eliot Jones
79dea8d314
Merge pull request #179 from BobLd/master
Improve ContentStreamProcessor
2020-06-20 12:13:40 +01:00
BobLd
8ef70d9a9d Merge branch 'master' of https://github.com/BobLd/PdfPig 2020-06-18 18:31:29 +01:00
BobLd
7a393383de Merge branch 'master' of https://github.com/UglyToad/PdfPig 2020-06-17 22:56:41 +01:00
BobLd
33f92cd11c handle page rotation by updating initial TransformationMatrix 2020-06-02 16:12:24 +01:00
BobLd
6e773446df simplify double cast 2020-06-01 14:55:45 +01:00
BobLd
2d9a4e5adb fix CurrentTransformationMatrix multiplication order in ProcessFormXObject 2020-06-01 14:00:17 +01:00
BobLd
958beada48 Merge branch 'master' of https://github.com/UglyToad/PdfPig 2020-06-01 13:54:01 +01:00
Eliot Jones
bf45602ac5 fix #176, allow startxref to appear earlier in the document 2020-05-31 17:01:38 +01:00
BobLd
4312aa470e minor optimisations 2020-05-30 13:03:59 +01:00
BobLd
755e199fed Add new dla images 2020-05-30 13:03:59 +01:00
BobLd
b3665f10c9 Delete recursive xy cut example.png 2020-05-30 13:03:59 +01:00
BobLd
0ec6e2389f Delete nearest neighbour word example.PNG 2020-05-30 13:03:59 +01:00
BobLd
1d6b819579 Delete docstrum bounding boxes example.png 2020-05-30 13:03:59 +01:00
BobLd
3ac26bb1bc fix bbox for TextLine and TextBlock 2020-05-30 13:03:59 +01:00
BobLd
14454184ad update RecursiveXYCutTests 2020-05-30 13:03:59 +01:00
BobLd
6d31ef80a7 add RecursiveXYCutTests 2020-05-30 13:03:59 +01:00
BobLd
aa0e75d768 update DocstrumBoundingBoxesTests 2020-05-30 13:03:59 +01:00
BobLd
208e1dd8f2 add DocstrumBoundingBoxesTests 2020-05-30 13:03:59 +01:00
BobLd
75e9046c16 add DlaHelper for tests and correct minor typos 2020-05-30 13:03:59 +01:00
BobLd
05d96cd9c4 add documents for tests 2020-05-30 13:03:59 +01:00
BobLd
465cf3f072 update word rotated bbox with previous PdfRectangle constructor order 2020-05-30 13:03:59 +01:00
BobLd
dacf816a86 add summary doc to Clipper 2020-05-30 13:03:59 +01:00
BobLd
f883b56e72 completely rework DocstrumBoundingBoxes, now handle rotated text 2020-05-30 13:03:59 +01:00
BobLd
a16f377d5a update DefaultPageSegmenter to use DlaOptions 2020-05-30 13:03:59 +01:00
BobLd
1438fec741 update RecursiveXYCut to use DlaOptions 2020-05-30 13:03:59 +01:00
BobLd
5362a335f5 update XYLeaf with word separator 2020-05-30 13:03:59 +01:00
BobLd
79b78f486a add ReadingOrderHelper 2020-05-30 13:03:59 +01:00
BobLd
ec613d337f correct Word bounding box 2020-05-30 13:03:59 +01:00
BobLd
8f1ab2022f update NearestNeighbourWordExtractor to use DlaOptions, stop ordering words 2020-05-30 13:03:59 +01:00
BobLd
43a68693ba allow oriented bounding box for TextBlock 2020-05-30 13:03:59 +01:00
BobLd
5b0b0a6db3 allow oriented bounding box for TextLine 2020-05-30 13:03:59 +01:00
BobLd
bb94348127 add text Separator in TextBlock and TextLine 2020-05-30 13:03:59 +01:00
BobLd
5f75205e41 rename TextDirection into TextOrientation 2020-05-30 13:03:59 +01:00
BobLd
33ee66af42 add PageSegmenterOptions abstract class 2020-05-30 13:03:59 +01:00
BobLd
dd546dcfc8 update IPageSegmenter with DlaOptions 2020-05-30 13:03:59 +01:00
BobLd
3cf7c45994 add DlaOptions abstract class 2020-05-30 13:03:59 +01:00