Join the social network of Tech Nerds, increase skill rank, get work, manage projects...
 
  • Data Scraping in Cakephp

    • 0
    • 0
    • 0
    • 0
    • 0
    • 0
    • 0
    • 0
    • 840
    Comment on it

    Hello Reader's ,

    Hope your are doing good today.

    Today we will discuss about Scraping.Sometime we want to scrap some data from another website So here we will learn how we can do this?
    Before going to start scraping you need to download simple_html_dom.php library and put it into a Vendor folder in your project  Directory.


    /Project_name/app/Vendor/simple_html_dom.php


    Now create the function in your controller and use cURL for getting data from a website. Here we are getting results data from the govt result portal.

      

     $Url='http://sarkariresults.info/page/latestresult.php';
        $ch = curl_init();
        curl_setopt($ch, CURLOPT_URL, $Url);
        curl_setopt($ch, CURLOPT_RETURNTRANSFER, true);
        $output = curl_exec($ch);
        curl_close($ch);

    Now Import simple_html_dom.php file in your controller and fetch data from cUrl.

    App::import('Vendor', 'SimpleName', array('file' => 'simple_html_dom.php'));
    set_time_limit(0);
    $html = new simple_html_dom();
    $html->load($output);
    
    $first=array();
    $item=array();
    foreach($html->find('ol') as $article) 
    {	
    foreach($article->find('li') as $art)
    {
    //echo $art;
    $tag = 'span';
    $item['Result']['title']     =preg_replace('#</?'.$tag.'[^>]*>#is', '', $art->find('a', 0)->innertext);			
    $item['Result']['href']     = $art->find('a',0)->href;			  
    $first[]=$item;
    }				
    }
    $this->set('results',$first);

     For more detail about simple_html_dom    click here

    Put All code together.

    <?php 
    
    
    $Url='http://sarkariresults.info/page/latestresult.php';
    $ch = curl_init();
    curl_setopt($ch, CURLOPT_URL, $Url);
    curl_setopt($ch, CURLOPT_RETURNTRANSFER, true);
    $output = curl_exec($ch);
    curl_close($ch);	
    
    Now Import simple_html_dom.php file in your controller and fetch data from cUrl.
    
    App::import('Vendor', 'SimpleName', array('file' => 'simple_html_dom.php'));
    set_time_limit(0);
    $html = new simple_html_dom();
    $html->load($output);
    
    $first=array();
    $item=array();
    foreach($html->find('ol') as $article) 
    {	
    foreach($article->find('li') as $art)
    {
    //echo $art;
    $tag = 'span';
    $item['Result']['title']     =preg_replace('#</?'.$tag.'[^>]*>#is', '', $art->find('a', 0)->innertext);		//Remove span tag from text	
    $item['Result']['href']     = $art->find('a',0)->href;			  
    $first[]=$item;
    }				
    }
    $this->set('results',$first);
    
    
    ?>

    You can find(simple_html_dom.php) Attached file below

    I hope this will help you. Please feel free to give us your feedback in comments.

 0 Comment(s)

Sign In
                           OR                           
                           OR                           
Register

Sign up using

                           OR                           
Forgot Password
Fill out the form below and instructions to reset your password will be emailed to you:
Reset Password
Fill out the form below and reset your password: