From: Antonio Ospite Date: Sat, 27 Jul 2019 20:06:15 +0000 (+0200) Subject: src/Tweeper.php: enable cookie handling to fix scraping twitter.com X-Git-Tag: v1.4.1~2 X-Git-Url: https://git.ao2.it/tweeper.git/commitdiff_plain/2efcaf768f68d35872c0d06136279e673128c46f?hp=4b5f17e2b3d0851f7f038e84cbcf9b5a97b0e9e9 src/Tweeper.php: enable cookie handling to fix scraping twitter.com When the user agent used by a client matches an actual browser, twitter.com enables content-security-policy and redirects the client on the first request to make it reload the content. After the redirection, the server assumes that the client sets cookies appropriately, however cURL does not do that by default. Enable cookie handling in cURL to fix scraping twitter.com. NOTE: the CURLOPT_COOKIEFILE option is set to an empty string to enable in-memory handling of the cookies, removing the need for a temporary file on the filesystem, see: https://www.php.net/manual/en/function.curl-setopt.php --- diff --git a/src/Tweeper.php b/src/Tweeper.php index 09bd7cc..7ecbf2f 100644 --- a/src/Tweeper.php +++ b/src/Tweeper.php @@ -121,6 +121,7 @@ class Tweeper { CURLOPT_CONNECTTIMEOUT => Tweeper::$maxConnectionTimeout, // Follow http redirects to get the real URL. CURLOPT_FOLLOWLOCATION => TRUE, + CURLOPT_COOKIEFILE => "", CURLOPT_RETURNTRANSFER => TRUE, CURLOPT_SSL_VERIFYHOST => FALSE, CURLOPT_SSL_VERIFYPEER => FALSE,