Antonio Ospite [Sun, 27 Dec 2020 17:07:53 +0000 (18:07 +0100)]
debian/changelog: release package version 1.4.3-1
Gbp-Dch: ignore
Antonio Ospite [Sun, 27 Dec 2020 17:13:04 +0000 (18:13 +0100)]
debian/control: bump Standards-Version to 4.5.1
Antonio Ospite [Sun, 27 Dec 2020 17:05:34 +0000 (18:05 +0100)]
Merge tag 'v1.4.3' into debian/master
Release v1.4.3
Antonio Ospite [Sun, 27 Dec 2020 16:59:54 +0000 (17:59 +0100)]
NEWS: add release notes for the v1.4.3 release
Antonio Ospite [Sun, 27 Dec 2020 16:13:50 +0000 (17:13 +0100)]
src/Tweeper.php: stop and return failure when Instagram.com redirects to login page
Instagram redirects to the login page when too many consecutive
connections have been made from the same IP, detect that case and stop
pressing and return a failure.
Antonio Ospite [Thu, 24 Dec 2020 09:10:55 +0000 (10:10 +0100)]
src/Tweeper.php: check http response code and return error for error codes
Check http response code from curl and return error for codes greater
than 400.
In particular this covers the case of non-existing accounts on social
media sites as the failure will propagate to the main function which
will exit with a non-zero code.
Antonio Ospite [Thu, 24 Dec 2020 09:04:59 +0000 (10:04 +0100)]
src/Tweeper.php: set User-Agent to impersonate a Google crawler
Set User-Agent to impersonate a Google crawler, this makes twitter.com
return the old desktop UI which can be more easily scraped.
This restore brings back support for twitter.com which has stopped
serving the mobile UI which was still scrapeable somehow.
Antonio Ospite [Fri, 18 Dec 2020 21:30:16 +0000 (22:30 +0100)]
Revert "Add back partial support for twitter.com using the old twitter mobile UI"
This reverts commit
af103c976dd4992d79e9d9a71837aecff30d6e9c.
Antonio Ospite [Fri, 18 Dec 2020 21:29:59 +0000 (22:29 +0100)]
Revert "src/Tweeper.php: only override the User-Agent to a mobile one for twitter.com"
This reverts commit
b922824bc561f7f3e31c6f9962d96e9084497ced.
Antonio Ospite [Wed, 10 Jun 2020 21:18:51 +0000 (23:18 +0200)]
debian/changelog: release package version 1.4.2-1
Gbp-Dch: ignore
Antonio Ospite [Fri, 12 Jun 2020 21:30:29 +0000 (23:30 +0200)]
debian/control: add Rules-Requires-Root field
Add Rules-Requires-Root as suggested by lintian:
P: tweeper source: rules-requires-root-missing
Antonio Ospite [Wed, 10 Jun 2020 22:01:48 +0000 (00:01 +0200)]
debian/control: increase debhelper compatibility version to 13
Increase debhelper compatibility version to 13 as pointed out by
lintian:
P: tweeper source: package-uses-old-debhelper-compat-version 12
Antonio Ospite [Wed, 10 Jun 2020 21:34:39 +0000 (23:34 +0200)]
debian/copyright: update copyright years
Antonio Ospite [Wed, 10 Jun 2020 21:29:38 +0000 (23:29 +0200)]
debian/control: bump Standards-Version to 4.5.0
Antonio Ospite [Thu, 11 Jun 2020 22:04:52 +0000 (00:04 +0200)]
README: fix license so that 'licensecheck' determines the right one
Antonio Ospite [Wed, 10 Jun 2020 21:17:19 +0000 (23:17 +0200)]
Merge tag 'v1.4.2' into debian/master
Release v1.4.2
Antonio Ospite [Wed, 10 Jun 2020 20:39:54 +0000 (22:39 +0200)]
NEWS: add release notes for the v1.4.2 release
Antonio Ospite [Wed, 10 Jun 2020 20:38:05 +0000 (22:38 +0200)]
NEWS: fix indentation for some entries
Antonio Ospite [Tue, 9 Jun 2020 22:28:54 +0000 (00:28 +0200)]
src/Tweeper.php: only override the User-Agent to a mobile one for twitter.com
Using a mobile User-Agent made it possible to scrape twitter.com again
but it also had side effects: it was forcing facebook.com to serve the
mobile version too.
However tweeper expected the desktop version of facebook.com so this was
breaking support for facebook.com
Scraping the mobile version of facebook.com would be inconvenient
because the xsl would have to be rewritten extensively, and also the
date of posts is not readily available as a timestamp in the mobile
version.
So override the User-Agent for twitter.com only, this makes the code
a little uglier but it works well enough for now.
Antonio Ospite [Tue, 9 Jun 2020 22:27:35 +0000 (00:27 +0200)]
src/Tweeper.php: allow overriding the User-Agent in cURL requests
Allow overriding the User-Agent in cURL requests, to make it possible to
use different user agents for different requests.
This can be useful to have a finer control on the version of the site
served by the different supported services.
Antonio Ospite [Tue, 9 Jun 2020 22:11:12 +0000 (00:11 +0200)]
src/Tweeper.php: use file_get_contents to retrieve the local stylesheet
Using Tweeper::getUrlContents(), which uses cURL, is really overkill to
get local file contents, keep things simple and use file_get_contents.
Antonio Ospite [Mon, 8 Jun 2020 21:58:50 +0000 (23:58 +0200)]
Fix style issues pointed out by PHP_CodeSniffer
Fix the following errors from PHP_CodeSniffer with the help og phpcbf:
FILE: /home/ao2/Proj/Tweeper/tweeper/tweeper.php
----------------------------------------------------------------------
FOUND 3 ERRORS AFFECTING 3 LINES
----------------------------------------------------------------------
54 | ERROR | [x] Short array syntax must be used to define arrays
65 | ERROR | [x] Short array syntax must be used to define arrays
124 | ERROR | [x] Short array syntax must be used to define arrays
----------------------------------------------------------------------
PHPCBF CAN FIX THE 3 MARKED SNIFF VIOLATIONS AUTOMATICALLY
----------------------------------------------------------------------
FILE: /home/ao2/Proj/Tweeper/tweeper/src/Tweeper.php
----------------------------------------------------------------------
FOUND 15 ERRORS AFFECTING 12 LINES
----------------------------------------------------------------------
162 | ERROR | [x] Short array syntax must be used to define arrays
169 | ERROR | [x] Short array syntax must be used to define arrays
183 | ERROR | [x] Short array syntax must be used to define arrays
212 | ERROR | [x] Short array syntax must be used to define arrays
313 | ERROR | [x] Short array syntax must be used to define arrays
313 | ERROR | [x] Short array syntax must be used to define arrays
315 | ERROR | [x] Short array syntax must be used to define arrays
378 | ERROR | [x] Short array syntax must be used to define arrays
378 | ERROR | [x] Short array syntax must be used to define arrays
437 | ERROR | [x] Short array syntax must be used to define arrays
466 | ERROR | [x] Short array syntax must be used to define arrays
466 | ERROR | [x] Short array syntax must be used to define arrays
----------------------------------------------------------------------
PHPCBF CAN FIX THE 12 MARKED SNIFF VIOLATIONS AUTOMATICALLY
----------------------------------------------------------------------
Time: 273ms; Memory: 10MB
Antonio Ospite [Mon, 8 Jun 2020 21:55:06 +0000 (23:55 +0200)]
Update copyright years in recently modified files
Antonio Ospite [Mon, 8 Jun 2020 21:49:15 +0000 (23:49 +0200)]
Add back partial support for twitter.com using the old twitter mobile UI
On June 1st 2020 twitter.com completely disabled serving the legacy UI
which tweeper kept supporting using a User-Agent trick.
The new official UI uses retrieves json after authenticating with
cookies and generates the HTML client-side, so it's too complicated for
the current Tweeper structure.
Work around the issue with the help of another User-Agent trick, pretend
to be an old Android phone, which makes tweeper serve the old mobile UI
which can be easily scraped by tweeper.
This approach looses support for some functionalities like embedded
media but at least makes Tweeper work again with twitter.com
Antonio Ospite [Mon, 8 Jun 2020 21:32:00 +0000 (23:32 +0200)]
Add option to enable or disable showing verbose output
Tweeper by default shows non-fatal errors and warnings from the php XML
parser.
These messages can be distracting for some users, so add a '-v' option
to enable or disable the verbose output.
Keep the current behavior of showing verbose output as the default one
for backwards compatibility, the user can pass '-v 0' to silence it.
Antonio Ospite [Wed, 3 Jun 2020 20:15:49 +0000 (22:15 +0200)]
src/Tweeper.php: do not disable CURLOPT_SSL_VERIFYHOST and CURLOPT_SSL_VERIFYPEER
Do not disable CURLOPT_SSL_VERIFYHOST and CURLOPT_SSL_VERIFYPEER to
actually enforce certificate verification on TLS connections.
This was a relic of some early experimental code and should have not
made it to the stable release.
Moreover the value passed to CURLOPT_SSL_VERIFYHOST was also of the
wrong type, it should have been an integer rather than a boolean.
Antonio Ospite [Sun, 9 Feb 2020 22:31:43 +0000 (23:31 +0100)]
src/Tweeper.php: use a minimal User-Agent string to fix scraping twitter.com
Twitter.com has started serving the user timeline via json when the user
agent is a modern browser, this breaks scraping in Tweeper which expects
html content.
Remove any version info from the User-Agent header used by Tweeper to
make twitter.com think it is talking with a very old browser, tricking
it into serving html content.
NOTE: Tweeper cannot just use the default User-Agent from the CURL
library because this would break scraping Facebook.com; using a minimal
but still browser-like User-Agent seems to be a viable common
denominator for all sites currently supported by Tweeper.
Antonio Ospite [Sat, 7 Sep 2019 20:12:41 +0000 (22:12 +0200)]
debian/changelog: release package version 1.4.1-1
Gbp-Dch: ignore
Antonio Ospite [Sat, 7 Sep 2019 20:30:56 +0000 (22:30 +0200)]
debian/copyright: update copyright years
Antonio Ospite [Sat, 7 Sep 2019 20:27:07 +0000 (22:27 +0200)]
debian/control: remove unnecessary versioned dependency on pkg-php-tools
Remove an unnecessary versioned dependency on pkg-php-tools, as
suggested by the following cme warning:
-----------------------------------------------------------------------
Warning in 'control source Build-Depends:1': unnecessary greater-than versioned dependency: pkg-php-tools (>= 1.7~). Debian has oldoldstable -> 1.28; oldstable -> 1.35; stable -> 1.37; unstable -> 1.37; testing -> 1.37;
Offending value: 'pkg-php-tools (>= 1.7~)'
-----------------------------------------------------------------------
Antonio Ospite [Sat, 7 Sep 2019 20:25:45 +0000 (22:25 +0200)]
debian/control: bump Standards-Version to 4.4.0
Bump Standards-Version to 4.4.0 as suggested by lintian:
I: tweeper source: out-of-date-standards-version 4.2.1 (released 2018-08-25) (current is 4.4.0)
Antonio Ospite [Sat, 7 Sep 2019 20:20:11 +0000 (22:20 +0200)]
debian/{compat,control}: increase debhelper compatibility version to 12
Increase debhelper compatibility version to 12 as pointed out by
lintian:
P: tweeper source: package-uses-old-debhelper-compat-version 11
Also depend on debhelper-compat instead of debhelper and drop the
debian/compat file, as suggested by the following cme warnings:
-----------------------------------------------------------------------
Warning in 'control source Build-Depends:0': compat parameter is deprecated. Please use debhelper-compat dependency. See debhelper(7) for details.
Offending value: 'debhelper (= 12)'
Warning in 'control source Build-Depends:0': debhelper dependency is deprecated. It should be a dependency for debhelper-compat package
Offending value: 'debhelper (= 12)'
-----------------------------------------------------------------------
Antonio Ospite [Sat, 7 Sep 2019 20:10:00 +0000 (22:10 +0200)]
Merge tag 'v1.4.1' into debian/master
Release v1.4.1
Antonio Ospite [Sat, 7 Sep 2019 19:57:34 +0000 (21:57 +0200)]
NEWS: add release notes for the v1.4.1 release
Antonio Ospite [Fri, 6 Sep 2019 21:19:55 +0000 (23:19 +0200)]
src/Tweeper.php: bump version in the User-Agent string
By using a more recent version of the User-Agent, twitter.com will
return entries in the result when visiting hashtag pages.
This fixes scraping hashtag pages.
This change is similar to what was done in commit 45060bb (Tweeper.php:
bump version in the User-Agent string, 2018-08-13)
Antonio Ospite [Sat, 27 Jul 2019 20:06:15 +0000 (22:06 +0200)]
src/Tweeper.php: enable cookie handling to fix scraping twitter.com
When the user agent used by a client matches an actual browser,
twitter.com enables content-security-policy and redirects the client on
the first request to make it reload the content.
After the redirection, the server assumes that the client sets cookies
appropriately, however cURL does not do that by default.
Enable cookie handling in cURL to fix scraping twitter.com.
NOTE: the CURLOPT_COOKIEFILE option is set to an empty string to enable
in-memory handling of the cookies, removing the need for a temporary
file on the filesystem, see:
https://www.php.net/manual/en/function.curl-setopt.php
Antonio Ospite [Sat, 17 Nov 2018 22:39:58 +0000 (23:39 +0100)]
debian/changelog: release package version 1.4.0-1
Gbp-Dch: ignore
Antonio Ospite [Fri, 16 Nov 2018 22:32:30 +0000 (23:32 +0100)]
debian/control: bump Standards-Version to 4.2.1
Suggested by lintian:
I: tweeper source: out-of-date-standards-version 4.1.4 (released 2018-04-05) (current is 4.2.1)
Antonio Ospite [Fri, 16 Nov 2018 22:21:33 +0000 (23:21 +0100)]
Merge tag 'v1.4.0' into debian/master
Release v1.4.0
Antonio Ospite [Fri, 16 Nov 2018 22:16:07 +0000 (23:16 +0100)]
NEWS: add release notes for the v1.4.0 release
Antonio Ospite [Fri, 16 Nov 2018 22:06:38 +0000 (23:06 +0100)]
src/Tweeper.php: make enclosure validate when there is no Content-Length
When the server does not provide a Content-Length header, curl_getinfo()
would return a negative value for "download_content_length".
However RSS recommends to use 0 when the enclosure's size cannot be
determined.
See: https://www.feedvalidator.org/docs/error/UseZeroForUnknown.html
Antonio Ospite [Fri, 16 Nov 2018 17:27:16 +0000 (18:27 +0100)]
src/rss_converter_dilbert.com.xsl: fix generating enclosures
Enclosures were not generated for Dilbert.com because the URL of the
picture are protocol-relative and curl cannot work with these URLs.
Fix the URLs by prepending a protocol schema to them.
Antonio Ospite [Fri, 16 Nov 2018 10:50:12 +0000 (11:50 +0100)]
Add option to enable or disable showing multimedia content in RSS items
Tweeper by default shows multimedia contents like Twitter and Instagram
images in items descriptions.
However sometimes just having multimedia contents in the <enclosure/>
element may be enough, so make it optional to also have the content in
the item description.
Keep the current default behavior for backwards compatibility.
Antonio Ospite [Wed, 14 Nov 2018 16:24:30 +0000 (17:24 +0100)]
Fix PHP_CodeSniffer errors
Fix the following errors reported by PHP_CodeSniffer:
FILE: .../tweeper.php
-----------------------------------------------------------------------------
FOUND 1 ERROR AFFECTING 1 LINE
-----------------------------------------------------------------------------
1 | ERROR | [x] The PHP open tag must be followed by exactly one blank line
-----------------------------------------------------------------------------
PHPCBF CAN FIX THE 1 MARKED SNIFF VIOLATIONS AUTOMATICALLY
-----------------------------------------------------------------------------
FILE: .../src/Tweeper.php
-----------------------------------------------------------------------------------------------------------------------------------
FOUND 6 ERRORS AFFECTING 2 LINES
-----------------------------------------------------------------------------------------------------------------------------------
373 | ERROR | [x] Incorrect spacing between argument "$host" and equals sign; expected 1 but found 0
373 | ERROR | [x] Incorrect spacing between default value and equals sign for argument "$host"; expected 1 but found 0
373 | ERROR | [x] Incorrect spacing between argument "$validate_scheme" and equals sign; expected 1 but found 0
373 | ERROR | [x] Incorrect spacing between default value and equals sign for argument "$validate_scheme"; expected 1 but found 0
388 | ERROR | [x] Inline comments must start with a capital letter
388 | ERROR | [x] Inline comments must end in full-stops, exclamation marks, colons, question marks, or closing parentheses
-----------------------------------------------------------------------------------------------------------------------------------
PHPCBF CAN FIX THE 6 MARKED SNIFF VIOLATIONS AUTOMATICALLY
-----------------------------------------------------------------------------------------------------------------------------------
FILE: .../autoload.php
-----------------------------------------------------------------------------
FOUND 1 ERROR AFFECTING 1 LINE
-----------------------------------------------------------------------------
1 | ERROR | [x] The PHP open tag must be followed by exactly one blank line
-----------------------------------------------------------------------------
PHPCBF CAN FIX THE 1 MARKED SNIFF VIOLATIONS AUTOMATICALLY
-----------------------------------------------------------------------------
Time: 260ms; Memory: 10Mb
Antonio Ospite [Wed, 14 Nov 2018 16:18:12 +0000 (17:18 +0100)]
TODO: remove the item about trigger_error, the concern has been addressed
Tweeper stopped using E_USER_ERROR and survives after trigger_error()
calls.
Antonio Ospite [Wed, 14 Nov 2018 16:03:06 +0000 (17:03 +0100)]
src/Tweeper.php: add a retry mechanism for cURL sessions
Sometimes the connection to a remote host may stall and a resource
cannot be retrieved. This makes Tweeper hang for a very long time which
can be annoying for users.
Setting a shorter timeout and a retry mechanism usually works around the
problem allowing the resource to be retrieved eventually.
Implement such a mechanism by adding curlExec() method and while at it
move non-curl related messages outside of getUrlContents() and
getUrlInfo() to give the user a better understanding of what actually
failed when even the retry mechanism was not able to retrieve the
resource.
Antonio Ospite [Wed, 14 Nov 2018 14:57:36 +0000 (15:57 +0100)]
src/Tweeper.php: harmonize error messages
Since the Tweeper class is supposed to be used as a library don't let
any error be fatal and convert all current uses of E_USER_ERROR into
E_USER_WARNING.
Also convert the few instances of E_USER_NOTICE into E_USER_WARNING.
Finally, stop using error_log as well in favour of trigger_error which
provides more context in the produced message.
Antonio Ospite [Tue, 13 Nov 2018 16:56:44 +0000 (17:56 +0100)]
src/Tweeper.php: make code more robust by properly check return values
Check return values to catch error earlier, and while at it also emit
more error messages in case of failures.
Antonio Ospite [Tue, 13 Nov 2018 15:14:09 +0000 (16:14 +0100)]
Add option to enable or disable showing usernames in RSS items
Tweeper shows usernames by default in items created from multi-user
sites like Twitter or Instagram.
This is because the main use case is to aggregate multiple feeds in the
same viewer, and in this scenario having some info about where the
messages is coming from can be useful.
However sometimes tweeper can be used to track one single feed and in
this case having always the same username repeated over and over is
unnecessary.
Make showing the username optional, but keep the current behavior as
default.
NOTE: for Twitter keep always showing the username in case of retweets
($screen-name != $user-name).
Antonio Ospite [Tue, 13 Nov 2018 15:29:57 +0000 (16:29 +0100)]
src/rss_converter_*.xsl: add missing generate-enclosure parameter
XSL parameters do not necessarily need to be declared in the stylesheet
if no default value is explicitly set, however tweeper is doing that for
other stylesheet, so declare the parameter in rss_converter_pump.io.xsl
and rss_converter_dilbert.com.xsl as well for consistency.
Antonio Ospite [Fri, 9 Nov 2018 14:42:28 +0000 (15:42 +0100)]
src/Tweeper.php: silence error message when processing Instagram json
Remove the "knobs" element from the Instagram json data because it
contains elements with an undefined namespace which results in an error
message when json is converted to XML.
Antonio Ospite [Fri, 9 Nov 2018 14:40:17 +0000 (15:40 +0100)]
src/Tweeper.php: put a comment right before the code it refers to
Antonio Ospite [Fri, 9 Nov 2018 14:25:01 +0000 (15:25 +0100)]
src/Tweeper.php: rearrange blank lines to a consistent style
In other parts of the file there is no blank line between and the
assignment and check for the return value of a function call.
Use the same style everywhere.
Antonio Ospite [Fri, 9 Nov 2018 14:21:24 +0000 (15:21 +0100)]
Remove unneeded attribute extension-element-prefixes from xsl stylesheets
It looks like the "extension-element-prefixes" attribute is not strictly
needed for php extension functions to work, so remove it.
If it turns out that the attribute is actually needed in some cases it
can always be added back.
Antonio Ospite [Thu, 8 Nov 2018 08:29:35 +0000 (09:29 +0100)]
rss_converter_twitter.com.xsl: explain why the style attribute is removed
Since commit 6817108 (rss_converter_twitter.com.xsl: strip the style
attribute from HTML elements, 2017-06-08) the twitter.com stylesheet
removes the "style" attribute from elements when copying them.
This is in order to create a more visually neutral output, but also
because the style attribute may even contain dangerous content:
https://validator.w3.org/feed/docs/warning/DangerousStyleAttr.html
However someone who reads the code may not be familiar with (or have
forgotten) why this is done, so explain that in a comment to avoid them
the burden of digging in the project history.
Antonio Ospite [Mon, 13 Aug 2018 15:17:41 +0000 (17:17 +0200)]
src/rss_converter_twitter.com.xsl: add a label to tweets containing GIFs
The static scraped content only provides a preview of GIF files with the
first frame only, just like in the case of videos.
Set a label when a tweet contains a GIF so that the user can decide to
open the tweet in a full fledged browser to properly see the GIF.
Antonio Ospite [Mon, 13 Aug 2018 15:14:25 +0000 (17:14 +0200)]
src/rss_converter_twitter.com.xsl: make images more adaptive
Adapt images to the screen width to avoid horizontal scrolling in the
feed reader.
Antonio Ospite [Mon, 13 Aug 2018 15:08:03 +0000 (17:08 +0200)]
Tweeper.php: bump version in the User-Agent string
By using a more recent version of the User-Agent, twitter.com will
return more entries in the result when visiting hashtag pages.
This makes tracking hashtag pages more usable.
This change is similar to what was done in commit 0db2f37 ("Tweeper.php:
bump version in the User-Agent string", 2018-06-06)
Antonio Ospite [Wed, 6 Jun 2018 14:14:42 +0000 (16:14 +0200)]
debian/changelog: release package version 1.3.0-1
Gbp-Dch: ignore
Antonio Ospite [Wed, 6 Jun 2018 14:20:33 +0000 (16:20 +0200)]
debian/control: bump Standards-Version to 4.1.4
Suggested by lintian:
I: tweeper source: out-of-date-standards-version 4.1.3 (released 2017-12-27) (current is 4.1.4)
Antonio Ospite [Wed, 6 Jun 2018 14:08:18 +0000 (16:08 +0200)]
Merge tag 'v1.3.0' into debian/master
Release v1.3.0
Antonio Ospite [Wed, 6 Jun 2018 13:50:13 +0000 (15:50 +0200)]
NEWS: add release notes for the v1.3.0 release
Antonio Ospite [Wed, 6 Jun 2018 13:36:39 +0000 (15:36 +0200)]
src/rss_converter_twitter.com.xsl: only output channel image when it's available
Hashtag pages do not have an image usable as a channel logo, and in
cases like this the <url/> element would be empty, but this would make
the feed invalid according to https://www.feedvalidator.org
So, to produce feeds which validate, avoid outputting the whole <image/>
element when there is no suitable image to use as a channel logo.
Antonio Ospite [Wed, 6 Jun 2018 13:34:12 +0000 (15:34 +0200)]
src/rss_converter_twitter.com.xsl: fix getting description for hashtag pages
Antonio Ospite [Wed, 6 Jun 2018 12:59:30 +0000 (14:59 +0200)]
TODO: remove the entry about instagram tags, tweeper can now track them
Antonio Ospite [Wed, 6 Jun 2018 12:57:10 +0000 (14:57 +0200)]
src/rss_converter_instagram.com.xsl: add support for Instagram.com tags
Supporting Instagram tags is quite easy, so let's do it and while at it
refactor how the channel description is set depending of the kind of
page.
Antonio Ospite [Wed, 6 Jun 2018 12:46:07 +0000 (14:46 +0200)]
src/rss_converter_pump.io.xsl: fix getting the channel logo URL
Antonio Ospite [Wed, 6 Jun 2018 11:20:28 +0000 (13:20 +0200)]
Tweeper.php: bump version in the User-Agent string
By using a more recent version of the User-Agent, twitter.com will
return more entries in the result when visiting hashtag pages.
This makes tracking hashtag pages actually usable.
Tested with https://twitter.com/hashtag/tweeper
Antonio Ospite [Thu, 24 May 2018 21:43:16 +0000 (23:43 +0200)]
Tweeper.php: update the User-Agent string to fix parsing twitter.com
It looks like twitter.com started serving the mobile version of the site
to old browsers and Tweeper cannot parse that content.
By using a more up to date User-Agent string twitter.com returns the
desktop version of the page which Tweeper can process without problems.
Antonio Ospite [Tue, 3 Apr 2018 16:12:22 +0000 (18:12 +0200)]
rss_converter_instagram.com.xsl: don't put location coordinates in screen name
Remove location coordinates from the location screen name as the latter
also shows up in item titles, but still emit the coordinates in the
channel description.
Antonio Ospite [Tue, 3 Apr 2018 16:11:04 +0000 (18:11 +0200)]
rss_converter_instagram.com.xsl: use the screen name in item titles
The user name is not always defined, for example in case of locations,
so use the screen name in item titles.
Antonio Ospite [Tue, 3 Apr 2018 16:08:59 +0000 (18:08 +0200)]
rss_converter_instagram.com.xsl: fix scraping Instagram.com
Antonio Ospite [Fri, 16 Mar 2018 11:49:41 +0000 (12:49 +0100)]
rss_converter_twitter.com.xsl: show again the user name in the description
Having the user name also in the description makes it easier to see who
the author is in case of re-tweeted messages.
Leave the line-break after the username to have the actual message start
at the beginning of the line, this is done to preserve the formatting of
the original message as much as possible.
Antonio Ospite [Thu, 15 Mar 2018 08:04:15 +0000 (09:04 +0100)]
Update copyright years
Antonio Ospite [Thu, 15 Mar 2018 08:00:03 +0000 (09:00 +0100)]
INSTALL: add some notes for about dependencies
Antonio Ospite [Thu, 15 Mar 2018 07:30:16 +0000 (08:30 +0100)]
INSTALL: explain better what "usable HTML" means in this context
Antonio Ospite [Sun, 25 Feb 2018 09:45:30 +0000 (10:45 +0100)]
debian/changelog: release package version 1.2.0-1
Gbp-Dch: ignore
Antonio Ospite [Sun, 25 Feb 2018 09:28:47 +0000 (10:28 +0100)]
debian/copyright: update copyright years
Antonio Ospite [Sat, 24 Feb 2018 18:12:55 +0000 (19:12 +0100)]
debian/control: fix the short description
Remove parentheses from the short description as this may trigger
a false positive in lintian.
Parentheses are usually used to specify the role of alternative
packages, see the Debian Developer's Reference, section 6.2.2.
The change fixes this lintian notice:
I: tweeper: description-synopsis-might-not-be-phrased-properly
Antonio Ospite [Sat, 24 Feb 2018 18:05:18 +0000 (19:05 +0100)]
debian/control: bump Standards-Version to 4.1.3
Suggested by lintian:
I: tweeper source: out-of-date-standards-version 4.0.0 (released 2017-05-28) (current is 4.1.3)
Antonio Ospite [Sat, 24 Feb 2018 18:03:21 +0000 (19:03 +0100)]
debian/{compat,control}: increase debhelper compatibility version to 11
This has been suggested by lintian:
P: tweeper source: package-uses-old-debhelper-compat-version 9
Antonio Ospite [Sat, 24 Feb 2018 17:53:42 +0000 (18:53 +0100)]
debian/{control,gbp.conf}: change the debian branch name to follow DEP-14
Antonio Ospite [Sat, 24 Feb 2018 17:52:09 +0000 (18:52 +0100)]
Merge tag 'v1.2.0' into debian/master
Release v1.2.0
Antonio Ospite [Sat, 24 Feb 2018 15:29:31 +0000 (16:29 +0100)]
NEWS: add release notes for the v1.2.0 release
Antonio Ospite [Sat, 24 Feb 2018 14:33:58 +0000 (15:33 +0100)]
rss_converter_instagram.com.xsl: fix validation for Instagram location feeds
Avoid outputting an <image/> element without an empty <url/>, this
breaks validation.
Antonio Ospite [Fri, 23 Feb 2018 15:10:52 +0000 (16:10 +0100)]
Tweeper.php: a more robust fix for
4b9692a19e06f3cf698d23a3854fd34b9914a32a
The "qe" element in the json data is the one containing the problematic
element mentioned in commit
4b9692a19e06f3cf698d23a3854fd34b9914a32a and
it may contain multiple elements with problematic names, so just remove
the "qe" element altogether.
Antonio Ospite [Fri, 23 Feb 2018 14:34:02 +0000 (15:34 +0100)]
rss_converter_twitter.com.xsl: preserve spaces in tweet content
Wrap the tweet content into a span element with a CSS style attribute
set to "white-space: pre-wrap", this allows to have the spaces rendered
like on the twitter web page: with spaces and newlines preserved.
This is especially desirable if the tweet content contains any ASCII
art, like in https://twitter.com/sarahjeong/status/
955651919279722496
Antonio Ospite [Fri, 23 Feb 2018 14:29:44 +0000 (15:29 +0100)]
rss_converter_twitter.com.xsl: add support for permalink URLs
This way it is possible to generate an RSS feed of all the replies to
a certain tweet using its permalink URL.
Antonio Ospite [Fri, 23 Feb 2018 13:55:10 +0000 (14:55 +0100)]
rss_converter_twitter.com.xsl: add a line break after the "(Video)" label
This is to start the actual original tweet content on a new line, this
is important for example if the content contains some ASCII art.
Antonio Ospite [Fri, 23 Feb 2018 13:48:06 +0000 (14:48 +0100)]
rss_converter_twitter.com.xsl: don't print the user name in description
This is in the spirit of leaving the tweet content untouched as much as
possible.
Antonio Ospite [Fri, 23 Feb 2018 13:43:57 +0000 (14:43 +0100)]
rss_converter_twitter.com.xsl: use a different rule to get the tweet user-name
Instead of looking for 'js-stream-tweet' in the class attribute, pick
the element which has the 'data-tweet-id' attribute, this is more
generic and works also with permalink tweets.
Antonio Ospite [Sun, 14 Jan 2018 18:46:54 +0000 (19:46 +0100)]
Tweeper.php: fix converting Instagram data to RSS
There is one new element in the json data served by Instagram named
"404_as_react", and this makes the conversion from json to XML fail
because names starting with a number are illegal in XML.
Fix the problem by prepending an underscore to the problematic name.
Antonio Ospite [Mon, 6 Nov 2017 17:15:59 +0000 (18:15 +0100)]
rss_converter_facebook.com.xsl: fix channel link, image, and description
Antonio Ospite [Mon, 6 Nov 2017 16:53:42 +0000 (17:53 +0100)]
rss_converter_facebook.com.xsl: fix scraping facebook.com pages once again
Add back support for 'userContentWrapper' which seems to be still used.
Antonio Ospite [Mon, 6 Nov 2017 16:52:56 +0000 (17:52 +0100)]
TODO: add an entry about Instagram tags
Antonio Ospite [Mon, 11 Sep 2017 11:17:31 +0000 (13:17 +0200)]
rss_converter_facebook.com.xsl: fix scraping facebook.com pages once again
Tip: in order to get more posts, and not just the last two, append
"/posts" to the facebook page URL, or use the URL of the "See all" link
in the "Posts" section.
Antonio Ospite [Mon, 10 Jul 2017 08:29:01 +0000 (10:29 +0200)]
rss_converter_instagram.com.xsl: support scraping Instagram locations pages
Antonio Ospite [Mon, 10 Jul 2017 08:05:31 +0000 (10:05 +0200)]
rss_converter_instagram.com.xsl: improve the comment about full names
Antonio Ospite [Tue, 27 Jun 2017 12:44:38 +0000 (14:44 +0200)]
debian/changelog: release package version 1.1.0-1
Gbp-Dch: ignore
Antonio Ospite [Tue, 27 Jun 2017 12:42:05 +0000 (14:42 +0200)]
debian/watch: use https for the git repository URL