How to crawl many pages at one page hit:
I was facing issues for getting the data from multiple urls by loop. The problem is that in the first loop I was getting the data by function Cron, but when second page was called by Cron function then I was getting "No DATA RECEIVED" message by the browser. I also tried to delay the function but this logic did not work for me.
This is code for crawling multiple pages at once:
public function Crawl(){
$url=array(0=>array("ABCD"=>
array(0=>"http://abcd.com"
)),
1=>array("XYZ"=>
array(0=>"xyz.com"
))
);
foreach ($url as $key => $value) {
$this->Cron($value);
sleep(5);
}
}
public function Cron($url=null){
ob_start();
set_time_out(0);
foreach($url as $urlkey=>$urlvalue){
for($prodcount=0;$prodcount<count($urlvalue);$prodcount++){
$ch = curl_init(); // Initialising cURL
curl_setopt($ch, CURLOPT_URL, $urlvalue[$prodcount]); // Setting cURL's URL option with the $url variable passed into the function
curl_setopt($ch, CURLOPT_RETURNTRANSFER, TRUE); // Setting cURL's option to return the webpage data
$data = curl_exec($ch); // Executing the cURL request and assigning the returned data to the $data variable
curl_close($ch); // Closing cURL
//echo $data; // Returning the data from the function
$html = str_get_html($data);
echo $html;
ob_flush();
}
}
}
This issue was resolved later. Solution was to add the following lines of code in html header:
<meta http-equiv="Content-Type" content="text/html; charset=UTF-8">
Thanks for reading the post.
0 Comment(s)