Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Why not use curl instead ? #4

Open
ccazette opened this issue Jul 27, 2011 · 5 comments
Open

Why not use curl instead ? #4

ccazette opened this issue Jul 27, 2011 · 5 comments

Comments

@ccazette
Copy link

Simple change... A lot more compatible, and far less compatibility issues than file_get_contents (allow_url_fopen must be on, other issues I couldn't even sort out on fetching remote content via file_get_contents() on my production server)...

Just a suggestion, as I made the change on my own and things work fine for me now.

/**
* Fetches a URI and parses it for Open Graph data, returns
* false on error.
*
* @param $URI URI to page to parse for Open Graph data
* @return OpenGraph
*/
static public function fetch($URI) {
$ch = curl_init();
curl_setopt($ch, CURLOPT_URL, $URI);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, true);
$contents = curl_exec($ch);
curl_close($ch);
return self::_parse($contents);
}

@tmaiaroto
Copy link

I've used cURL in my fork if you like.

@Argonalyst
Copy link

Well, actually, I don't know why, but a lot of websites don't let you parse the contents using cURL... I tried to use this cURL funtion to get the contents from http://nytimes.com, as an example, and I just can't... nonetheless with the regular file_get_contents I was succesful retrieving the open graph... cURL could be faster, but the file_get_contents by now is handling a greater range of websites in my point of view..

@tmaiaroto
Copy link

That's interesting. I wonder if there's any options that could be passed with cURL to change that. Maybe they try to prevent scraping so if a user agent or something was passed with the request maybe for example.

@MitchellMcKenna
Copy link
Contributor

in pull request #8, with the cURL options I have set, I have found I actually am getting more results back with cURL than file_get_contents(). nytimes.com worked fine. I did notice some websites required a user-agent to be set, they didn't seem to care what it's set to, just as long as it was set, so I set it to $_SERVER['HTTP_USER_AGENT'].

@feelsickened
Copy link

Hi Guys,
I'm not the most advanced user, but this opengraph script works a treat - for all the sites I'm working with except nytimes.com. I've migrated from the version that used file_get_contents over to cURL - and have even attempted manipulating my own HTTP_USER_AGENT. for example:

$user_agent = 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_7_5) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/42.0.2311.135 Safari/537.36';
curl_setopt($curl, CURLOPT_USERAGENT, $user_agent);

Result is always:
array(1) { [0]=> string(5) "title" } NULL title => Log In - The New York Times

What workarounds/tricks have resolved this for you?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants