[PdfPig](https://github.com/UglyToad/PdfPig) is a fully open-source Apache 2.0 licensed and .NET Standard compatible library that enables users to read and create PDFs in C#, F# and other .NET languages. It supports all versions of .NET back to .NET 4.5.
+ Extracts the position and size of letters from any PDF document. This enables access to the text and words in a PDF document.
+ Allows the user to retrieve images from the PDF document.
+ Allows the user to read PDF annotations, PDF forms, embedded documents and hyperlinks from a PDF.
+ Provides access to metadata in the document.
+ Exposes the internal structure of the PDF document.
+ Creates PDF documents containing text and path operations.
+ Read content from encrypted files by providing the password.
This provides an alternative to the commercial libraries such as [SpirePDF](https://www.e-iceblue.com/Introduce/pdf-for-net-introduce.html) or copyleft alternatives such as [iText 7](https://github.com/itext/itext7-dotnet) (AGPL) for some use-cases.
It should be noted the library does not support use-cases such as converting HTML to PDF or from other document formats to PDF. For HTML to PDF a good quality solution is [wkhtmltopdf](https://wkhtmltopdf.org/). It also does not currently support generating images from PDF pages. If you need this functionality see if [docnet](https://github.com/GowenGit/docnet) meets your requirements.
The Portable Document Format (PDF) is a document format which is focused on presentation. This means as far as possible PDFs will appear the same on most devices. For this reason PDFs tend to lose semantic meaning for their content including ordering of text, separation of text sections, etc.
This also shows accessing document metadata using the `document.Information` property. All metadata is optional according to the specification so all entries can be `null`.
Letters can be used by consumers to build text and content extraction capabilities into their software. For example table detection. There are many properties for letters in PDFs. See the [wiki page](https://github.com/UglyToad/PdfPig/wiki/Letters) for full details of the `Letter` API.

PdfPig can be used to make a PDF document in C# and other .NET languages. At the moment the API supports drawing letters and paths. The code snippet shows creating a new PDF document with 1 A4 page and writing some text on that page in Helvetica before saving the file to `C:\temp\file.pdf`:
The output is a file with the text "My first PDF document!" and then "Hello World!". Since PDF coordinates run from the bottom of the page upwards the Y coordinate of the top of the page is higher than 0 and the bottom of the page has a Y value of 0. The output file is shown below in Chrome's PDF viewer:

The `PdfDocument` provides access to XMP format metadata, AcroForms, Embedded files used by file annotations, bookmarks indicating the internal structure of the document and much more. Some examples are shown in the code sample:
using System;
using System.Collections.Generic;
using System.Xml.Linq;
using UglyToad.PdfPig;
using UglyToad.PdfPig.AcroForms;
using UglyToad.PdfPig.AcroForms.Fields;
using UglyToad.PdfPig.Content;
using UglyToad.PdfPig.Outline;
public static class Program
{
public static void Main()
{
using (PdfDocument document = PdfDocument.Open(@"C:\temp\file.pdf"))
{
Console.WriteLine($"Document has {document.NumberOfPages} pages.");
if (document.TryGetForm(out AcroForm form))
{
foreach (AcroFieldBase field in form.GetFieldsForPage(1))
{
switch (field)
{
case AcroCheckboxField cb:
if (cb.IsChecked)
{
Console.WriteLine($"Checkbox was checked: {cb.Information.MappingName}.");
}
break;
}
}
}
if (document.TryGetXmpMetadata(out XmpMetadata metadata))
{
XDocument xmp = metadata.GetXDocument();
}
if (document.TryGetBookmarks(out Bookmarks bookmarks))
{
Console.WriteLine($"Document contained bookmarks with {bookmarks.Roots.Count} root nodes.");
}
Console.WriteLine($"Document uses version {document.Version} of the PDF specification.");
if (document.Advanced.TryGetEmbeddedFiles(out IReadOnlyList<EmbeddedFile> embeddedFiles))
PdfPig also comes with some tools for document layout analysis such as the Recursive XY Cut, Document Spectrum and Nearest Neighbour algorithms, along with others. It also provides support for exporting page contents to Alto, PageXML and hOcr format.
An example of the output of the Recursive XY Cut algorithm viewed in an external viewer such as [LayoutEvalGUI](https://www.primaresearch.org/tools/PerformanceEvaluation) is shown below:
