Revert "src/Tweeper.php: fix rendering Instagram images in some feed readers" This reverts commit 6525c19868a0511abaaac9d2ba452ba640899209. The problem was not really about Instagram using Cross Origin Resource Policy but probably more about Liferea not parsing the images URLs correctly. So revert the change, since this broke images in other feed readers which do not support Data URLs, like for example newsboat.
src/Tweeper.php: fix rendering Instagram images in some feed readers Instagram.com is using Cross Origin Resource Policy and this prevents images in RSS items from being displayed in the Web view of some feed readers like Liferea. Add a function to generate Data URLs with base64 payloads and use that for instagram images a s a workaround to fix rendering images in some feed readers.
Tweeper.php: fix Invalid Character Error when converting Instagram json to XML Converting Instagram json data to XML was failing with the following message: PHP Fatal error: Uncaught DOMException: Invalid Character Error in /usr/share/php/Symfony/Component/Serializer/Encoder/XmlEncoder.php:445 This was caused by some item starting with a number which resulted in invalid XML element names. Remove the items containing the problematic names from the json data before converting to XML. Also stop handling the "knobs" element which does not seem to be there anymore.
src/Tweeper.php: check http response code and return error for error codes Check http response code from curl and return error for codes greater than 400. In particular this covers the case of non-existing accounts on social media sites as the failure will propagate to the main function which will exit with a non-zero code.
src/Tweeper.php: set User-Agent to impersonate a Google crawler Set User-Agent to impersonate a Google crawler, this makes twitter.com return the old desktop UI which can be more easily scraped. This restore brings back support for twitter.com which has stopped serving the mobile UI which was still scrapeable somehow.
src/Tweeper.php: only override the User-Agent to a mobile one for twitter.com Using a mobile User-Agent made it possible to scrape twitter.com again but it also had side effects: it was forcing facebook.com to serve the mobile version too. However tweeper expected the desktop version of facebook.com so this was breaking support for facebook.com Scraping the mobile version of facebook.com would be inconvenient because the xsl would have to be rewritten extensively, and also the date of posts is not readily available as a timestamp in the mobile version. So override the User-Agent for twitter.com only, this makes the code a little uglier but it works well enough for now.
src/Tweeper.php: allow overriding the User-Agent in cURL requests Allow overriding the User-Agent in cURL requests, to make it possible to use different user agents for different requests. This can be useful to have a finer control on the version of the site served by the different supported services.
Fix style issues pointed out by PHP_CodeSniffer Fix the following errors from PHP_CodeSniffer with the help og phpcbf: FILE: /home/ao2/Proj/Tweeper/tweeper/tweeper.php ---------------------------------------------------------------------- FOUND 3 ERRORS AFFECTING 3 LINES ---------------------------------------------------------------------- 54 | ERROR | [x] Short array syntax must be used to define arrays 65 | ERROR | [x] Short array syntax must be used to define arrays 124 | ERROR | [x] Short array syntax must be used to define arrays ---------------------------------------------------------------------- PHPCBF CAN FIX THE 3 MARKED SNIFF VIOLATIONS AUTOMATICALLY ---------------------------------------------------------------------- FILE: /home/ao2/Proj/Tweeper/tweeper/src/Tweeper.php ---------------------------------------------------------------------- FOUND 15 ERRORS AFFECTING 12 LINES ---------------------------------------------------------------------- 162 | ERROR | [x] Short array syntax must be used to define arrays 169 | ERROR | [x] Short array syntax must be used to define arrays 183 | ERROR | [x] Short array syntax must be used to define arrays 212 | ERROR | [x] Short array syntax must be used to define arrays 313 | ERROR | [x] Short array syntax must be used to define arrays 313 | ERROR | [x] Short array syntax must be used to define arrays 315 | ERROR | [x] Short array syntax must be used to define arrays 378 | ERROR | [x] Short array syntax must be used to define arrays 378 | ERROR | [x] Short array syntax must be used to define arrays 437 | ERROR | [x] Short array syntax must be used to define arrays 466 | ERROR | [x] Short array syntax must be used to define arrays 466 | ERROR | [x] Short array syntax must be used to define arrays ---------------------------------------------------------------------- PHPCBF CAN FIX THE 12 MARKED SNIFF VIOLATIONS AUTOMATICALLY ---------------------------------------------------------------------- Time: 273ms; Memory: 10MB
Add back partial support for twitter.com using the old twitter mobile UI On June 1st 2020 twitter.com completely disabled serving the legacy UI which tweeper kept supporting using a User-Agent trick. The new official UI uses retrieves json after authenticating with cookies and generates the HTML client-side, so it's too complicated for the current Tweeper structure. Work around the issue with the help of another User-Agent trick, pretend to be an old Android phone, which makes tweeper serve the old mobile UI which can be easily scraped by tweeper. This approach looses support for some functionalities like embedded media but at least makes Tweeper work again with twitter.com
Add option to enable or disable showing verbose output Tweeper by default shows non-fatal errors and warnings from the php XML parser. These messages can be distracting for some users, so add a '-v' option to enable or disable the verbose output. Keep the current behavior of showing verbose output as the default one for backwards compatibility, the user can pass '-v 0' to silence it.
src/Tweeper.php: do not disable CURLOPT_SSL_VERIFYHOST and CURLOPT_SSL_VERIFYPEER Do not disable CURLOPT_SSL_VERIFYHOST and CURLOPT_SSL_VERIFYPEER to actually enforce certificate verification on TLS connections. This was a relic of some early experimental code and should have not made it to the stable release. Moreover the value passed to CURLOPT_SSL_VERIFYHOST was also of the wrong type, it should have been an integer rather than a boolean.