Simmons Consulting, the Website of Toby Simmons

PHP class function for screen scraping

10
Jan

I’ve updated my simple PHP function (the one to replace fopen()) for grabbing URLs using cURL. I’ve added some features and made it a class instead of a straight PHP function. One improvement is the ability to normalize URLs so you can use relative URLs. It also has more error checking and uses a standard user-agent by default.

The syntax is a little different from the previous version. To use it, you create an instance of the object then call the proper method:

  1. $urlScoop = new UrlGrabber;
  2. $rawhtml=$urlScoop->_get($urlScoop->_normalize("https://www.simmonsconsulting.com/"));

Fetching a relative url would look like this:

  1. $urlScoop = new UrlGrabber;
  2. $rawhtml=$urlScoop->_get($urlScoop->_normalize("../../Photos/"));

The function is included in the jump.
(more…)