Hello Reader's ,
Hope your are doing good today.
Today we will discuss about Scraping.Sometime we want to scrap some data from another website So here we will learn how we can do this?
Before going to start scraping you need to download simple_html_dom.php library and put it into a Vendor folder in your project Directory.
/Project_name/app/Vendor/simple_html_dom.php
Now create the function in your controller and use cURL for getting data from a website. Here we are getting results data from the govt result portal.
$Url='http://sarkariresults.info/page/latestresult.php';
$ch = curl_init();
curl_setopt($ch, CURLOPT_URL, $Url);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, true);
$output = curl_exec($ch);
curl_close($ch);
Now Import simple_html_dom.php file in your controller and fetch data from cUrl.
App::import('Vendor', 'SimpleName', array('file' => 'simple_html_dom.php'));
set_time_limit(0);
$html = new simple_html_dom();
$html->load($output);
$first=array();
$item=array();
foreach($html->find('ol') as $article)
{
foreach($article->find('li') as $art)
{
//echo $art;
$tag = 'span';
$item['Result']['title'] =preg_replace('#</?'.$tag.'[^>]*>#is', '', $art->find('a', 0)->innertext);
$item['Result']['href'] = $art->find('a',0)->href;
$first[]=$item;
}
}
$this->set('results',$first);
For more detail about simple_html_dom click here
Put All code together.
<?php
$Url='http://sarkariresults.info/page/latestresult.php';
$ch = curl_init();
curl_setopt($ch, CURLOPT_URL, $Url);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, true);
$output = curl_exec($ch);
curl_close($ch);
Now Import simple_html_dom.php file in your controller and fetch data from cUrl.
App::import('Vendor', 'SimpleName', array('file' => 'simple_html_dom.php'));
set_time_limit(0);
$html = new simple_html_dom();
$html->load($output);
$first=array();
$item=array();
foreach($html->find('ol') as $article)
{
foreach($article->find('li') as $art)
{
//echo $art;
$tag = 'span';
$item['Result']['title'] =preg_replace('#</?'.$tag.'[^>]*>#is', '', $art->find('a', 0)->innertext); //Remove span tag from text
$item['Result']['href'] = $art->find('a',0)->href;
$first[]=$item;
}
}
$this->set('results',$first);
?>
You can find(simple_html_dom.php) Attached file below
I hope this will help you. Please feel free to give us your feedback in comments.
0 Comment(s)