Simple Ways of Getting Data from URL in PHP (Web Scraping)
When you need to get data / content from a certain website url (ie. for scrapping, data fetching, or something like that). There are 2 simple ways you can use. By using built-in php function file_get_contents() and using cURL library.file_get_contents is so far the most simple way of getting data, but it lacks options if compared to cURL library. So decide wisely which method suits your need. If you just need to get a website content, then file_get_contents is the best choice. But if you need to do something more, like setting user agent, timeout, response type, error handling, or to access HTTPS URLs. cURL is the most capable.
Get Content from URL using PHP file_get_contents()
Getting data using file_get_contens() is as simple as :<?php $url="http://example.com"; $data=file_get_contents($url); echo $data; ?>
Get Content from URL using PHP DOMDocument class
Another way to fetch a web content in PHP is to use PHP DOMDocument class. In this method, we can directly process the retrieved document by DOM. The downside is, the content is only limited to HTML / XML. Usually, DOMDocument is already bundled in php so you don't have to install anything.Using DOMDocument is as simple as :
<?php $dom = new DOMDocument(); libxml_use_internal_errors(true); $dom->loadHTMLFile('http://example.com/'); $data = $dom->getElementById("banner"); echo $data->nodeValue."\n" ?>
First, create a new DOMDocument object by
$dom = new DOMDocument();
Then load the HTML File using loadHTMLFile method with url as parameter.
$dom->loadHTMLFile('http://example.com/');
Finally, access the DOM (you can read the docs here)
$dom->getElementById("elementId");
In real world, not ALL html document is well formed. Whether it's an unclosed tags, mismatch brackets, malformed attribute-value pair, etc. Those will raise confuse DOMDocument causing it to throw an exception. To avoid that, we need to set
libxml_use_internal_errors(true);
to suppress libxml's internal errors.
Get Content from URL using PHPQuery class
If you are familiar with how jquery works, this method is the best for you. This class library is originally created by Tobiasz Cudnik. This class implement css3 selector based on jquery javascript library.By using PHPQuery class you can do something like:
$doc['ul > li'] ->addClass('active-list');
Feels familiar?
You can find more example here.
Beside those three above, there are many more alternatives methods to acquire data from webpages in PHP. IMHO, those three above is the best ways i am already familiar with.
Subscribe to:
Post Comments
(
Atom
)
thank u blogger
ReplyDelete
ReplyDeleteThanks for sharing this great blog. That is very interesting to read and I am always searching for informative blog like this.
Html5 Training in Chennai
DOT NET Training in Chennai
Best DOT NET Training in Chennai
dot net coaching centers in chennai
best .net training institute in chennai
dot net training center in chennai
QTP Training in Chennai
LoadRunner Training in Chennai
Excellent idea! Kind sharing this blog. I am sure that reading your blog leaves the people with some good learning experience. Great job.
ReplyDeleteSpoken English Class in Anna Nagar
Spoken English Classes in Chennai Anna Nagar
Spoken English Classes in Chennai
Best Spoken English Classes in Chennai
Spoken English Class in Chennai
Spoken English in Chennai
IELTS Coaching in Chennai
English Speaking Classes in Mumbai
IELTS Classes in Mumbai
thanks for share a nice post dear b2b spa in delhi with extra service
ReplyDeleteSandwich massage in Delhi
body to body spa in New Delhi
best body to body massage in Delhi
body to body Nuru massage in Delhi
Full body to body spa in delhi
body to body massage in paharganj delhi
body to body spa near me
body to body massage in hauz khas
body to body massage in new delhi railway station
This is a nice blog you shared with great information. If you are searching for full open sex with extra services provides with Mussoorie call girl at the very cheapest price. Visit our website click here: VIP Mussoorie call girls services
ReplyDeleteIt is among the very best leaders in computer security. Malwarebytes Anti-Malware premium crack +keygen latest version. Malwarebytes premium key is the ideal Malwarebytes Antimalware Serial
ReplyDeleteThis is an awesome and helpful article for me. I appreciate your work providing such useful information; thank you so much! Drupal Hosting Services
ReplyDelete