Hello Readers, In this Blog I am Discussing about How we can extract Text From Pdf file ?
With the help of itextpdf we can convert the Pdf file to Text . Lets see the following the steps:
Step 1: Download the itextpdf jar file .You can download from the following Link :
http://www.java2s.com/Code/JarDownload/itextpdf/itextpdf-5.1.0.jar.zip
Step 2:
Write the Following Code:
package com.tech;
import java.io.*;
import com.itextpdf.text.pdf.PdfReader;
import com.itextpdf.text.pdf.parser.PdfTextExtractor;
public class Convertor {
public static void main(String[] args) {
try {
// PdfReader helps in Reading Pdf from the Specified Location
//Change the Location where Your File Resides.
PdfReader rd = new PdfReader("c:/sample.pdf");
//reader.getNumberOfPages helps in counting the Number of Pages Available In the Pdf File We had Provided.
System.out.println("This PDF has "+rd.getNumberOfPages()+" pages.");
//PdfTextExtractor helps in reading text in the Respective Page.
String page = PdfTextExtractor.getTextFromPage(rd, 1);
//Displaying The Content of File
System.out.println("Page Content:\n\n"+page+"\n\n");
//isTempered and isEncrypted will provide the Details whether the File Is Tempered or Not.
System.out.println("Is this document tampered: "+rd.isTampered());
} catch (IOException e) {
e.printStackTrace();
}
}
}
Step 3: Sample Data
https://www.polyu.edu.hk/iaee/files/pdf-sample.pdf
Step 4: OUTPUT :
This PDF has 1 pages.
Page Content:
Adobe Acrobat PDF Files
Adobe Portable Document Format (PDF) is a universal file format that preserves all
of the fonts, formatting, colours and graphics of any source document, regardless of
the application and platform used to create it.
Adobe PDF is an ideal format for electronic document distribution as it overcomes the
problems commonly encountered with electronic file sharing.
Anyone, anywhere can open a PDF file. All you need is the free Adobe Acrobat
Reader. Recipients of other file formats sometimes can't open files because they
don't have the applications used to create the documents.
PDF files always print correctly on any printing device.
PDF files always display exactly as created, regardless of fonts, software, and
operating systems. Fonts, and graphics are not lost due to platform, software, and
version incompatibilities.
The free Acrobat Reader is easy to download and can be freely distributed by
anyone.
Compact PDF files are smaller than their source files and download a
page at a time for fast display on the Web.
Is this document tampered: false
Note :
This Conversion is Only Possible for Text in a PDF Document. Image Text Cannot be Retrieved from this Method.
I hope this will help in converting PDF Document into Text Format. Thanks For Reading The Blog.
0 Comment(s)