editing existing pdf files in java

1. Overview

In this article, we will see how to edit the contents of an existing PDF file in Java. At first, we’ll just add new content. Then, we’ll look at removing or replacing some of the already existing content.

2. Adding iText7 Dependencies

We will be using the iText7 library to add content to the PDF file. Later, we will use the pdfSweep add-on to remove or replace the content.

Note that iText is licensed under the AGPL, which may limit the distribution of a commercial application: the iText license model.

First, let’s add these dependencies to our pom.xml,

<dependency>
    <groupId>com.itextpdf</groupId>
    <artifactId>itext7-core</artifactId>
    <version>7.2.3</version>
    <type>pom</type>
</dependency>
<dependency>
    <groupId>com.itextpdf</groupId>
    <artifactId>cleanup</artifactId>
    <version>3.0.1</version>
</dependency>

3. File Handling

Let’s understand the steps to handle your PDF with iText7:

  • First, we open a PDF reader To read the contents of the source file. it throws a IOException If an error occurs at any time while reading the file.
  • Then, we open a pdf writer for the destination file. If this file does not exist or cannot be created, a FileNotFoundException is thrown.
  • After that, we’ll open a pdf document who uses our PDF reader And pdf writer,
  • Finally, closing pdf document closes both built-in PDF reader And pdf writer,

let’s write one Main() The way in which our whole treatment goes. For the sake of simplicity, we’ll re-throw any Exception which could be:

public static void main(String[] args) throws IOException {
    PdfReader reader = new PdfReader("src/main/resources/baeldung.pdf");
    PdfWriter writer = new PdfWriter("src/main/resources/baeldung-modified.pdf");
    PdfDocument pdfDocument = new PdfDocument(reader, writer);
    addContentToDocument(pdfDocument);
    pdfDocument.close();
}

In the following section, we will go through step by step AddContentToDocument() Method for filling our PDF with new content. The source document is a PDF file containing only “Hello Baeldung .” text is included, top left. The destination file will be created by the program.

4. Adding Content to the File

Now we will add different types of content to the file.

4.1. add a form

We’ll start by adding a form to the file. Our form will be very simple and will have a unique field called Name,

In addition, we need to tell iText where to place the field. In this case, we’ll put it at the following point: (35,400), coordinates (0,0) Look at the bottom left of the document. Finally, we’ll set the field’s dimension to 100×30,

PdfFormField personal = PdfFormField.createEmptyField(pdfDocument);
personal.setFieldName("information");
PdfTextFormField name = PdfFormField.createText(pdfDocument, new Rectangle(35, 400, 100, 30), "name", "");
personal.addKid(name);
PdfAcroForm.getAcroForm(pdfDocument, true)
    .addField(personal, pdfDocument.getFirstPage());

Additionally, we have explicitly specified iText to add the form to the first page of the document.

4.2. add a new page

Let us now see how we can add a new page to the document. we will use addNewPage() way.

This method can accept the index of the page added if we want to specify it. For example, we can add a new page to the beginning of the document:

pdfDocument.addNewPage(1);

4.3. add a comment

Now we would like to add an annotation to the document. Simply put, an annotation looks like a square comic bubble.

We’ll add it to the top of the form that is now located on the second page of the document. As a result, we will place it at the coordinates (40,435), Additionally, we will give it a simple name and content. These will be visible only when hovering over the annotation:

PdfAnnotation ann = new PdfTextAnnotation(new Rectangle(40, 435, 0, 0)).setTitle(new PdfString("name"))
    .setContents("Your name");
pdfDocument.getPage(2)
    .addAnnotation(ann);

Here’s what the middle of our second page looks like now:

 

4.4. add an image

From now on, we will be adding layout elements to the page. To do this, we will not be able to manipulate pdf document straight now. we would rather make a document Work with it and with it. Besides, we have to close document Ultimately. close document automatically closes the base PDF document. so we can remove the part where we closed pdf document before:

Document document = new Document(pdfDocument);
// add layout elements
document.close();

Now, to add the image, we have to load it from its location. we will do this by using create() method of ImageDataFactory Class. it throws Malformed URL Exception If the passed file URL cannot be parsed. In this example, we’ll use an image of the Baeldung logo placed in the Resources directory:

ImageData imageData = ImageDataFactory.create("src/main/resources/baeldung.png");

The next step would be to set the properties of the image in the file. We will set its size to 550×100, We will put this on the first page of our PDF (10,50) coordinates. Let’s see the code to add the image:

Image image = new Image(imageData).scaleAbsolute(550,100)
    .setFixedPosition(1, 10, 50);
document.add(image);

The image is automatically resized to the given size. So here’s how it looks in the document:

 

4.5. add a paragraph

The iText library brings some tools to add text to the file. Font can be parameterized on fragments or directly Article Element.

For example, let’s add the following sentence to the top of the first page: This is a demo of Baeldung tutorial, We will set the font size of the beginning of this sentence to 16 and global font size Article To 8,

Text title = new Text("This is a demo").setFontSize(16);
Text author = new Text("Baeldung tutorials.");
Paragraph p = new Paragraph().setFontSize(8)
    .add(title)
    .add(" from ")
    .add(author);
document.add(p);

4.6. add a table

Last but not least, we can also add a table to the file. For example, we’ll define a double-entry table with two cells and two headers above them. We will not specify any conditions. So it will naturally be added to the top of the document, right after Article We just added:

Table table = new Table(UnitValue.createPercentArray(2));
table.addHeaderCell("#");
table.addHeaderCell("company");
table.addCell("name");
table.addCell("baeldung");
document.add(table);

Let’s now look at the beginning of the first page of the document:

5. Deleting Content from File

Let us now see how we can remove the content from the PDF file. To keep things simple, we’ll write another Main() way.

Our source PDF file will be baeldung-modified.pdf file and destination will be a new baeldung-cleaned.pdf file. we will work directly pdf document Thing. From now on, we’ll be using iText’s pdfSweep add-on.

5.1. delete text from file

To remove a given text from a file, we need to define a cleanup strategy. In this example, the strategy would be to find all text matches only. hairless, the last step is to call auto sweepcleanup() static method of PDF Cleaner, This method will create a custom pdfcleanuptool who will throw a IOException If an error occurs during file management:

CompositeCleanupStrategy strategy = new CompositeCleanupStrategy();
strategy.add(new RegexBasedCleanupStrategy("Baeldung"));
PdfCleaner.autoSweepCleanUp(pdfDocument, strategy);

As we can see, the events of hairless The words in the source file are overlayed with a black rectangle in the result file. This behavior is suitable, for example, for data anonymization:

5.2. deleting other content from the file

Unfortunately, it is very hard to detect any non-text content in the file. However, pdfSweep offers the possibility to erase the contents of a portion of the file. Thus, if we know where the content we want to remove is located, we will be able to take advantage of this possibility.

As an example, we’ll erase the contents of a rectangle of shape 100×35 situated at (35,400) on the second page. This means that we will get rid of all the content and annotations of the form. Also we will erase the rectangle of shape 90×70 situated at (10,50) of the first page. it basically removes b From the logo of Baeldung. using the pdfcleanuptool class, the code to do it all is:

List<PdfCleanUpLocation> cleanUpLocations = Arrays.asList(new PdfCleanUpLocation(1, new Rectangle(10, 50, 90,70)), new PdfCleanUpLocation(2, new Rectangle(35, 400, 100, 35)));
PdfCleanUpTool cleaner = new PdfCleanUpTool(pdfDocument, cleanUpLocations, new CleanUpProperties());
cleaner.cleanUp();

Now we can see the following image baeldung-cleaned.pdf,

6. Changing the Contents in the File

In this section, we’ll do the same thing as before, except that We’ll replace the old text with a new one instead of just deleting it,

For more clarity, we will use a new Main() method again. our source file will be baeldung-modified.pdf file. our destination file will be a new one baeldung-fixed.pdf file.

Previously we noticed that the deleted text was overlayed with a black background. However, this color is configurable. As we know that the background of the text in our file is white, we will force the overlay to be white. The treatment will begin the same way as we did before, except we’ll be exploring the text Baldung Tutorial,

However, after calling auto sweepcleanup()We will question the strategy of getting the location of the removed code. then we will immediately do a pdf canvas which will contain the replacement text hidden, Additionally, we’ll remove the top margin to align it a bit better with the original text. The default alignment is actually not that good. Let’s look at the resulting code:

CompositeCleanupStrategy strategy = new CompositeCleanupStrategy();
strategy.add(new RegexBasedCleanupStrategy("Baeldung").setRedactionColor(ColorConstants.WHITE));
PdfCleaner.autoSweepCleanUp(pdfDocument, strategy);
for (IPdfTextLocation location : strategy.getResultantLocations()) {
    PdfPage page = pdfDocument.getPage(location.getPageNumber() + 1);
    PdfCanvas pdfCanvas = new PdfCanvas(page.newContentStreamAfter(), page.getResources(), page.getDocument());
    Canvas canvas = new Canvas(pdfCanvas, location.getRectangle());
    canvas.add(new Paragraph("HIDDEN").setFontSize(8)
        .setMarginTop(0f));
}

And we can take a look at the file:

7. Conclusion

In this tutorial, we have seen how to edit the contents of a PDF file. We’ve seen that we can add new content, delete existing content, and even replace the text of the original file with a new one.

As always, the code for this article can be found on GitHub.

       

notes

  • In response to Michael. Thanks, Michael. We will add a note about… by Loredana Crusovenu
  • Thanks for providing the feedback in response to Ulf. We’ll add a note… by Loredana Crusovenu
  • Keep in mind that there is a freely available version of iText… by ULFE
  • I might find it interesting to note that iText suggests using this… by Michael

 

Leave a Comment