Join the social network of Tech Nerds, increase skill rank, get work, manage projects...
 
Node is saved as draft in My Content >> Draft
  • How to Extract Text From PDF ?

    • 0
    • 0
    • 0
    • 0
    • 0
    • 0
    • 0
    • 0
    • 587
    Comment on it

    Hello Readers, In this Blog I am Discussing about How we can extract Text From Pdf file ?

    With the help of itextpdf we can convert the Pdf file to Text . Lets see the following the steps:

    Step 1: Download the itextpdf jar file .You can download from the following Link :

    http://www.java2s.com/Code/JarDownload/itextpdf/itextpdf-5.1.0.jar.zip

    Step 2:

    Write the Following Code:

    package com.tech;
    import java.io.*;
    
    import com.itextpdf.text.pdf.PdfReader;
    import com.itextpdf.text.pdf.parser.PdfTextExtractor;
    public class Convertor {
    	public static void main(String[] args) {
    	        try {
    	            // PdfReader helps in Reading Pdf from the Specified Location
                        //Change the Location where Your File Resides.
    	            PdfReader rd = new PdfReader("c:/sample.pdf");
    	            //reader.getNumberOfPages helps in counting the Number of Pages Available In the Pdf File We had Provided.
    	            System.out.println("This PDF has "+rd.getNumberOfPages()+" pages.");
    	            //PdfTextExtractor helps in reading text in the Respective Page.
    	            String page = PdfTextExtractor.getTextFromPage(rd, 1);
    	             //Displaying The Content of File
    	            System.out.println("Page Content:\n\n"+page+"\n\n");
    	            //isTempered and isEncrypted will provide the Details whether the File Is Tempered or Not.
    	            System.out.println("Is this document tampered: "+rd.isTampered());
    	            	 
    	        } catch (IOException e) {
    	            e.printStackTrace();
    	        }
    	 
    	    }
    	 
    	}
    	

    Step 3: Sample Data

    https://www.polyu.edu.hk/iaee/files/pdf-sample.pdf

    Step 4: OUTPUT :

    This PDF has 1 pages.
    Page Content:
    
    Adobe Acrobat PDF Files
    Adobe Portable Document Format (PDF) is a universal file format that preserves all
    of the fonts, formatting, colours and graphics of any source document, regardless of
    the application and platform used to create it.
    Adobe PDF is an ideal format for electronic document distribution as it overcomes the
    problems commonly encountered with electronic file sharing.
     Anyone, anywhere can open a PDF file. All you need is the free Adobe Acrobat
    Reader. Recipients of other file formats sometimes can't open files because they
    don't have the applications used to create the documents.
     PDF files always print correctly on any printing device.
     PDF files always display exactly as created, regardless of fonts, software, and
    operating systems. Fonts, and graphics are not lost due to platform, software, and
    version incompatibilities.
     The free Acrobat Reader is easy to download and can be freely distributed by
    anyone.
     Compact PDF files are smaller than their source files and download a
    page at a time for fast display on the Web.
    
    
    Is this document tampered: false

    Note :

    This Conversion is Only Possible for Text in a PDF Document. Image Text Cannot be Retrieved from this Method.

    I hope this will help in converting PDF Document into Text Format. Thanks For Reading The Blog.

     

    How to Extract Text from PDF File How to Convert Pdf to Text

 0 Comment(s)

Sign In
                           OR                           
                           OR                           
Register

Sign up using

                           OR                           
Forgot Password
Fill out the form below and instructions to reset your password will be emailed to you:
Reset Password
Fill out the form below and reset your password: