From: Antonio Ospite <ao2@ao2.it>
Date: Sun, 9 Feb 2020 22:31:43 +0000 (+0100)
Subject: src/Tweeper.php: use a minimal User-Agent string to fix scraping twitter.com
X-Git-Tag: v1.4.2~10
X-Git-Url: https://git.ao2.it/tweeper.git/commitdiff_plain/da4568f5a2d24e0933d44b16b5ef180095c42dab?ds=inline

src/Tweeper.php: use a minimal User-Agent string to fix scraping twitter.com

Twitter.com has started serving the user timeline via json when the user
agent is a modern browser, this breaks scraping in Tweeper which expects
html content.

Remove any version info from the User-Agent header used by Tweeper to
make twitter.com think it is talking with a very old browser, tricking
it into serving html content.

NOTE: Tweeper cannot just use the default User-Agent from the CURL
library because this would break scraping Facebook.com; using a minimal
but still browser-like User-Agent seems to be a viable common
denominator for all sites currently supported by Tweeper.
---

diff --git a/src/Tweeper.php b/src/Tweeper.php
index 877e882..aedde4d 100644
--- a/src/Tweeper.php
+++ b/src/Tweeper.php
@@ -36,7 +36,7 @@ date_default_timezone_set('UTC');
  */
 class Tweeper {
 
-  private static $userAgent = "Mozilla/5.0 (Windows NT 6.1; WOW64; rv:64.0) Gecko/20100101 Firefox/64.0";
+  private static $userAgent = "Mozilla/5.0";
   private static $maxConnectionTimeout = 5;
   private static $maxConnectionRetries = 5;