Why not use curl instead ? #4

ccazette · 2011-07-27T08:12:45Z

Simple change... A lot more compatible, and far less compatibility issues than file_get_contents (allow_url_fopen must be on, other issues I couldn't even sort out on fetching remote content via file_get_contents() on my production server)...

Just a suggestion, as I made the change on my own and things work fine for me now.

/**
* Fetches a URI and parses it for Open Graph data, returns
* false on error.
*
* @param $URI URI to page to parse for Open Graph data
* @return OpenGraph
*/
static public function fetch($URI) {
$ch = curl_init();
curl_setopt($ch, CURLOPT_URL, $URI);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, true);
$contents = curl_exec($ch);
curl_close($ch);
return self::_parse($contents);
}

tmaiaroto · 2012-01-06T17:38:37Z

I've used cURL in my fork if you like.

Argonalyst · 2012-06-17T15:02:14Z

Well, actually, I don't know why, but a lot of websites don't let you parse the contents using cURL... I tried to use this cURL funtion to get the contents from http://nytimes.com, as an example, and I just can't... nonetheless with the regular file_get_contents I was succesful retrieving the open graph... cURL could be faster, but the file_get_contents by now is handling a greater range of websites in my point of view..

tmaiaroto · 2012-06-17T15:48:12Z

That's interesting. I wonder if there's any options that could be passed with cURL to change that. Maybe they try to prevent scraping so if a user agent or something was passed with the request maybe for example.

MitchellMcKenna · 2013-02-18T12:09:45Z

in pull request #8, with the cURL options I have set, I have found I actually am getting more results back with cURL than file_get_contents(). nytimes.com worked fine. I did notice some websites required a user-agent to be set, they didn't seem to care what it's set to, just as long as it was set, so I set it to $_SERVER['HTTP_USER_AGENT'].

feelsickened · 2015-05-04T16:18:26Z

Hi Guys,
I'm not the most advanced user, but this opengraph script works a treat - for all the sites I'm working with except nytimes.com. I've migrated from the version that used file_get_contents over to cURL - and have even attempted manipulating my own HTTP_USER_AGENT. for example:

$user_agent = 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_7_5) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/42.0.2311.135 Safari/537.36';
curl_setopt($curl, CURLOPT_USERAGENT, $user_agent);

Result is always:
array(1) { [0]=> string(5) "title" } NULL title => Log In - The New York Times

What workarounds/tricks have resolved this for you?

MitchellMcKenna mentioned this issue Feb 18, 2013

Replace file_get_contents() with cURL for speed/security #8

Merged

MitchellMcKenna mentioned this issue Feb 18, 2013

Various enhancements #5

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Why not use curl instead ? #4

Why not use curl instead ? #4

ccazette commented Jul 27, 2011

tmaiaroto commented Jan 6, 2012

Argonalyst commented Jun 17, 2012

tmaiaroto commented Jun 17, 2012

MitchellMcKenna commented Feb 18, 2013

feelsickened commented May 4, 2015

Why not use curl instead ? #4

Why not use curl instead ? #4

Comments

ccazette commented Jul 27, 2011

tmaiaroto commented Jan 6, 2012

Argonalyst commented Jun 17, 2012

tmaiaroto commented Jun 17, 2012

MitchellMcKenna commented Feb 18, 2013

feelsickened commented May 4, 2015