PDFBox How to read PDF file in Java
By:Roy.LiuLast updated:2019-08-11
This article shows you how to use Apache PDFBox to read a PDF file in Java.
1. Get PDFBox
pom.xml
<dependency> <groupId>org.apache.pdfbox</groupId> <artifactId>pdfbox</artifactId> <version>2.0.6</version> </dependency>
2. Print PDF file
Example to extract all text from a PDF file.
ReadPdf.java
package com.mkyong; import org.apache.pdfbox.pdmodel.PDDocument; import org.apache.pdfbox.text.PDFTextStripper; import org.apache.pdfbox.text.PDFTextStripperByArea; import java.io.File; import java.io.IOException; public class ReadPdf { public static void main(String[] args) throws IOException { try (PDDocument document = PDDocument.load(new File("/path-to/abc.pdf"))) { document.getClass(); if (!document.isEncrypted()) { PDFTextStripperByArea stripper = new PDFTextStripperByArea(); stripper.setSortByPosition(true); PDFTextStripper tStripper = new PDFTextStripper(); String pdfFileInText = tStripper.getText(document); //System.out.println("Text:" + st); // split by whitespace String lines[] = pdfFileInText.split("\\r?\\n"); for (String line : lines) { System.out.println(line);
Note
Please refer to this pdfbox svn for more examples
Please refer to this pdfbox svn for more examples
References
From:一号门
Previous:Java How to Split String by New Line
COMMENTS