Antonio Ospite [Tue, 28 Dec 2021 22:49:39 +0000 (23:49 +0100)]
Tweeper.php: fix Invalid Character Error when converting Instagram json to XML
Converting Instagram json data to XML was failing with the following
message:
PHP Fatal error: Uncaught DOMException: Invalid Character Error in
/usr/share/php/Symfony/Component/Serializer/Encoder/XmlEncoder.php:445
This was caused by some item starting with a number which resulted in
invalid XML element names.
Remove the items containing the problematic names from the json data
before converting to XML.
Also stop handling the "knobs" element which does not seem to be there
anymore.
Antonio Ospite [Sun, 27 Dec 2020 16:59:54 +0000 (17:59 +0100)]
NEWS: add release notes for the v1.4.3 release
Antonio Ospite [Sun, 27 Dec 2020 16:13:50 +0000 (17:13 +0100)]
src/Tweeper.php: stop and return failure when Instagram.com redirects to login page
Instagram redirects to the login page when too many consecutive
connections have been made from the same IP, detect that case and stop
pressing and return a failure.
Antonio Ospite [Thu, 24 Dec 2020 09:10:55 +0000 (10:10 +0100)]
src/Tweeper.php: check http response code and return error for error codes
Check http response code from curl and return error for codes greater
than 400.
In particular this covers the case of non-existing accounts on social
media sites as the failure will propagate to the main function which
will exit with a non-zero code.
Antonio Ospite [Thu, 24 Dec 2020 09:04:59 +0000 (10:04 +0100)]
src/Tweeper.php: set User-Agent to impersonate a Google crawler
Set User-Agent to impersonate a Google crawler, this makes twitter.com
return the old desktop UI which can be more easily scraped.
This restore brings back support for twitter.com which has stopped
serving the mobile UI which was still scrapeable somehow.
Antonio Ospite [Fri, 18 Dec 2020 21:30:16 +0000 (22:30 +0100)]
Revert "Add back partial support for twitter.com using the old twitter mobile UI"
This reverts commit
af103c976dd4992d79e9d9a71837aecff30d6e9c.
Antonio Ospite [Fri, 18 Dec 2020 21:29:59 +0000 (22:29 +0100)]
Revert "src/Tweeper.php: only override the User-Agent to a mobile one for twitter.com"
This reverts commit
b922824bc561f7f3e31c6f9962d96e9084497ced.
Antonio Ospite [Thu, 11 Jun 2020 22:04:52 +0000 (00:04 +0200)]
README: fix license so that 'licensecheck' determines the right one
Antonio Ospite [Wed, 10 Jun 2020 20:39:54 +0000 (22:39 +0200)]
NEWS: add release notes for the v1.4.2 release
Antonio Ospite [Wed, 10 Jun 2020 20:38:05 +0000 (22:38 +0200)]
NEWS: fix indentation for some entries
Antonio Ospite [Tue, 9 Jun 2020 22:28:54 +0000 (00:28 +0200)]
src/Tweeper.php: only override the User-Agent to a mobile one for twitter.com
Using a mobile User-Agent made it possible to scrape twitter.com again
but it also had side effects: it was forcing facebook.com to serve the
mobile version too.
However tweeper expected the desktop version of facebook.com so this was
breaking support for facebook.com
Scraping the mobile version of facebook.com would be inconvenient
because the xsl would have to be rewritten extensively, and also the
date of posts is not readily available as a timestamp in the mobile
version.
So override the User-Agent for twitter.com only, this makes the code
a little uglier but it works well enough for now.
Antonio Ospite [Tue, 9 Jun 2020 22:27:35 +0000 (00:27 +0200)]
src/Tweeper.php: allow overriding the User-Agent in cURL requests
Allow overriding the User-Agent in cURL requests, to make it possible to
use different user agents for different requests.
This can be useful to have a finer control on the version of the site
served by the different supported services.
Antonio Ospite [Tue, 9 Jun 2020 22:11:12 +0000 (00:11 +0200)]
src/Tweeper.php: use file_get_contents to retrieve the local stylesheet
Using Tweeper::getUrlContents(), which uses cURL, is really overkill to
get local file contents, keep things simple and use file_get_contents.
Antonio Ospite [Mon, 8 Jun 2020 21:58:50 +0000 (23:58 +0200)]
Fix style issues pointed out by PHP_CodeSniffer
Fix the following errors from PHP_CodeSniffer with the help og phpcbf:
FILE: /home/ao2/Proj/Tweeper/tweeper/tweeper.php
----------------------------------------------------------------------
FOUND 3 ERRORS AFFECTING 3 LINES
----------------------------------------------------------------------
54 | ERROR | [x] Short array syntax must be used to define arrays
65 | ERROR | [x] Short array syntax must be used to define arrays
124 | ERROR | [x] Short array syntax must be used to define arrays
----------------------------------------------------------------------
PHPCBF CAN FIX THE 3 MARKED SNIFF VIOLATIONS AUTOMATICALLY
----------------------------------------------------------------------
FILE: /home/ao2/Proj/Tweeper/tweeper/src/Tweeper.php
----------------------------------------------------------------------
FOUND 15 ERRORS AFFECTING 12 LINES
----------------------------------------------------------------------
162 | ERROR | [x] Short array syntax must be used to define arrays
169 | ERROR | [x] Short array syntax must be used to define arrays
183 | ERROR | [x] Short array syntax must be used to define arrays
212 | ERROR | [x] Short array syntax must be used to define arrays
313 | ERROR | [x] Short array syntax must be used to define arrays
313 | ERROR | [x] Short array syntax must be used to define arrays
315 | ERROR | [x] Short array syntax must be used to define arrays
378 | ERROR | [x] Short array syntax must be used to define arrays
378 | ERROR | [x] Short array syntax must be used to define arrays
437 | ERROR | [x] Short array syntax must be used to define arrays
466 | ERROR | [x] Short array syntax must be used to define arrays
466 | ERROR | [x] Short array syntax must be used to define arrays
----------------------------------------------------------------------
PHPCBF CAN FIX THE 12 MARKED SNIFF VIOLATIONS AUTOMATICALLY
----------------------------------------------------------------------
Time: 273ms; Memory: 10MB
Antonio Ospite [Mon, 8 Jun 2020 21:55:06 +0000 (23:55 +0200)]
Update copyright years in recently modified files
Antonio Ospite [Mon, 8 Jun 2020 21:49:15 +0000 (23:49 +0200)]
Add back partial support for twitter.com using the old twitter mobile UI
On June 1st 2020 twitter.com completely disabled serving the legacy UI
which tweeper kept supporting using a User-Agent trick.
The new official UI uses retrieves json after authenticating with
cookies and generates the HTML client-side, so it's too complicated for
the current Tweeper structure.
Work around the issue with the help of another User-Agent trick, pretend
to be an old Android phone, which makes tweeper serve the old mobile UI
which can be easily scraped by tweeper.
This approach looses support for some functionalities like embedded
media but at least makes Tweeper work again with twitter.com
Antonio Ospite [Mon, 8 Jun 2020 21:32:00 +0000 (23:32 +0200)]
Add option to enable or disable showing verbose output
Tweeper by default shows non-fatal errors and warnings from the php XML
parser.
These messages can be distracting for some users, so add a '-v' option
to enable or disable the verbose output.
Keep the current behavior of showing verbose output as the default one
for backwards compatibility, the user can pass '-v 0' to silence it.
Antonio Ospite [Wed, 3 Jun 2020 20:15:49 +0000 (22:15 +0200)]
src/Tweeper.php: do not disable CURLOPT_SSL_VERIFYHOST and CURLOPT_SSL_VERIFYPEER
Do not disable CURLOPT_SSL_VERIFYHOST and CURLOPT_SSL_VERIFYPEER to
actually enforce certificate verification on TLS connections.
This was a relic of some early experimental code and should have not
made it to the stable release.
Moreover the value passed to CURLOPT_SSL_VERIFYHOST was also of the
wrong type, it should have been an integer rather than a boolean.
Antonio Ospite [Sun, 9 Feb 2020 22:31:43 +0000 (23:31 +0100)]
src/Tweeper.php: use a minimal User-Agent string to fix scraping twitter.com
Twitter.com has started serving the user timeline via json when the user
agent is a modern browser, this breaks scraping in Tweeper which expects
html content.
Remove any version info from the User-Agent header used by Tweeper to
make twitter.com think it is talking with a very old browser, tricking
it into serving html content.
NOTE: Tweeper cannot just use the default User-Agent from the CURL
library because this would break scraping Facebook.com; using a minimal
but still browser-like User-Agent seems to be a viable common
denominator for all sites currently supported by Tweeper.
Antonio Ospite [Sat, 7 Sep 2019 19:57:34 +0000 (21:57 +0200)]
NEWS: add release notes for the v1.4.1 release
Antonio Ospite [Fri, 6 Sep 2019 21:19:55 +0000 (23:19 +0200)]
src/Tweeper.php: bump version in the User-Agent string
By using a more recent version of the User-Agent, twitter.com will
return entries in the result when visiting hashtag pages.
This fixes scraping hashtag pages.
This change is similar to what was done in commit 45060bb (Tweeper.php:
bump version in the User-Agent string, 2018-08-13)
Antonio Ospite [Sat, 27 Jul 2019 20:06:15 +0000 (22:06 +0200)]
src/Tweeper.php: enable cookie handling to fix scraping twitter.com
When the user agent used by a client matches an actual browser,
twitter.com enables content-security-policy and redirects the client on
the first request to make it reload the content.
After the redirection, the server assumes that the client sets cookies
appropriately, however cURL does not do that by default.
Enable cookie handling in cURL to fix scraping twitter.com.
NOTE: the CURLOPT_COOKIEFILE option is set to an empty string to enable
in-memory handling of the cookies, removing the need for a temporary
file on the filesystem, see:
https://www.php.net/manual/en/function.curl-setopt.php
Antonio Ospite [Fri, 16 Nov 2018 22:16:07 +0000 (23:16 +0100)]
NEWS: add release notes for the v1.4.0 release
Antonio Ospite [Fri, 16 Nov 2018 22:06:38 +0000 (23:06 +0100)]
src/Tweeper.php: make enclosure validate when there is no Content-Length
When the server does not provide a Content-Length header, curl_getinfo()
would return a negative value for "download_content_length".
However RSS recommends to use 0 when the enclosure's size cannot be
determined.
See: https://www.feedvalidator.org/docs/error/UseZeroForUnknown.html
Antonio Ospite [Fri, 16 Nov 2018 17:27:16 +0000 (18:27 +0100)]
src/rss_converter_dilbert.com.xsl: fix generating enclosures
Enclosures were not generated for Dilbert.com because the URL of the
picture are protocol-relative and curl cannot work with these URLs.
Fix the URLs by prepending a protocol schema to them.
Antonio Ospite [Fri, 16 Nov 2018 10:50:12 +0000 (11:50 +0100)]
Add option to enable or disable showing multimedia content in RSS items
Tweeper by default shows multimedia contents like Twitter and Instagram
images in items descriptions.
However sometimes just having multimedia contents in the <enclosure/>
element may be enough, so make it optional to also have the content in
the item description.
Keep the current default behavior for backwards compatibility.
Antonio Ospite [Wed, 14 Nov 2018 16:24:30 +0000 (17:24 +0100)]
Fix PHP_CodeSniffer errors
Fix the following errors reported by PHP_CodeSniffer:
FILE: .../tweeper.php
-----------------------------------------------------------------------------
FOUND 1 ERROR AFFECTING 1 LINE
-----------------------------------------------------------------------------
1 | ERROR | [x] The PHP open tag must be followed by exactly one blank line
-----------------------------------------------------------------------------
PHPCBF CAN FIX THE 1 MARKED SNIFF VIOLATIONS AUTOMATICALLY
-----------------------------------------------------------------------------
FILE: .../src/Tweeper.php
-----------------------------------------------------------------------------------------------------------------------------------
FOUND 6 ERRORS AFFECTING 2 LINES
-----------------------------------------------------------------------------------------------------------------------------------
373 | ERROR | [x] Incorrect spacing between argument "$host" and equals sign; expected 1 but found 0
373 | ERROR | [x] Incorrect spacing between default value and equals sign for argument "$host"; expected 1 but found 0
373 | ERROR | [x] Incorrect spacing between argument "$validate_scheme" and equals sign; expected 1 but found 0
373 | ERROR | [x] Incorrect spacing between default value and equals sign for argument "$validate_scheme"; expected 1 but found 0
388 | ERROR | [x] Inline comments must start with a capital letter
388 | ERROR | [x] Inline comments must end in full-stops, exclamation marks, colons, question marks, or closing parentheses
-----------------------------------------------------------------------------------------------------------------------------------
PHPCBF CAN FIX THE 6 MARKED SNIFF VIOLATIONS AUTOMATICALLY
-----------------------------------------------------------------------------------------------------------------------------------
FILE: .../autoload.php
-----------------------------------------------------------------------------
FOUND 1 ERROR AFFECTING 1 LINE
-----------------------------------------------------------------------------
1 | ERROR | [x] The PHP open tag must be followed by exactly one blank line
-----------------------------------------------------------------------------
PHPCBF CAN FIX THE 1 MARKED SNIFF VIOLATIONS AUTOMATICALLY
-----------------------------------------------------------------------------
Time: 260ms; Memory: 10Mb
Antonio Ospite [Wed, 14 Nov 2018 16:18:12 +0000 (17:18 +0100)]
TODO: remove the item about trigger_error, the concern has been addressed
Tweeper stopped using E_USER_ERROR and survives after trigger_error()
calls.
Antonio Ospite [Wed, 14 Nov 2018 16:03:06 +0000 (17:03 +0100)]
src/Tweeper.php: add a retry mechanism for cURL sessions
Sometimes the connection to a remote host may stall and a resource
cannot be retrieved. This makes Tweeper hang for a very long time which
can be annoying for users.
Setting a shorter timeout and a retry mechanism usually works around the
problem allowing the resource to be retrieved eventually.
Implement such a mechanism by adding curlExec() method and while at it
move non-curl related messages outside of getUrlContents() and
getUrlInfo() to give the user a better understanding of what actually
failed when even the retry mechanism was not able to retrieve the
resource.
Antonio Ospite [Wed, 14 Nov 2018 14:57:36 +0000 (15:57 +0100)]
src/Tweeper.php: harmonize error messages
Since the Tweeper class is supposed to be used as a library don't let
any error be fatal and convert all current uses of E_USER_ERROR into
E_USER_WARNING.
Also convert the few instances of E_USER_NOTICE into E_USER_WARNING.
Finally, stop using error_log as well in favour of trigger_error which
provides more context in the produced message.
Antonio Ospite [Tue, 13 Nov 2018 16:56:44 +0000 (17:56 +0100)]
src/Tweeper.php: make code more robust by properly check return values
Check return values to catch error earlier, and while at it also emit
more error messages in case of failures.
Antonio Ospite [Tue, 13 Nov 2018 15:14:09 +0000 (16:14 +0100)]
Add option to enable or disable showing usernames in RSS items
Tweeper shows usernames by default in items created from multi-user
sites like Twitter or Instagram.
This is because the main use case is to aggregate multiple feeds in the
same viewer, and in this scenario having some info about where the
messages is coming from can be useful.
However sometimes tweeper can be used to track one single feed and in
this case having always the same username repeated over and over is
unnecessary.
Make showing the username optional, but keep the current behavior as
default.
NOTE: for Twitter keep always showing the username in case of retweets
($screen-name != $user-name).
Antonio Ospite [Tue, 13 Nov 2018 15:29:57 +0000 (16:29 +0100)]
src/rss_converter_*.xsl: add missing generate-enclosure parameter
XSL parameters do not necessarily need to be declared in the stylesheet
if no default value is explicitly set, however tweeper is doing that for
other stylesheet, so declare the parameter in rss_converter_pump.io.xsl
and rss_converter_dilbert.com.xsl as well for consistency.
Antonio Ospite [Fri, 9 Nov 2018 14:42:28 +0000 (15:42 +0100)]
src/Tweeper.php: silence error message when processing Instagram json
Remove the "knobs" element from the Instagram json data because it
contains elements with an undefined namespace which results in an error
message when json is converted to XML.
Antonio Ospite [Fri, 9 Nov 2018 14:40:17 +0000 (15:40 +0100)]
src/Tweeper.php: put a comment right before the code it refers to
Antonio Ospite [Fri, 9 Nov 2018 14:25:01 +0000 (15:25 +0100)]
src/Tweeper.php: rearrange blank lines to a consistent style
In other parts of the file there is no blank line between and the
assignment and check for the return value of a function call.
Use the same style everywhere.
Antonio Ospite [Fri, 9 Nov 2018 14:21:24 +0000 (15:21 +0100)]
Remove unneeded attribute extension-element-prefixes from xsl stylesheets
It looks like the "extension-element-prefixes" attribute is not strictly
needed for php extension functions to work, so remove it.
If it turns out that the attribute is actually needed in some cases it
can always be added back.
Antonio Ospite [Thu, 8 Nov 2018 08:29:35 +0000 (09:29 +0100)]
rss_converter_twitter.com.xsl: explain why the style attribute is removed
Since commit 6817108 (rss_converter_twitter.com.xsl: strip the style
attribute from HTML elements, 2017-06-08) the twitter.com stylesheet
removes the "style" attribute from elements when copying them.
This is in order to create a more visually neutral output, but also
because the style attribute may even contain dangerous content:
https://validator.w3.org/feed/docs/warning/DangerousStyleAttr.html
However someone who reads the code may not be familiar with (or have
forgotten) why this is done, so explain that in a comment to avoid them
the burden of digging in the project history.
Antonio Ospite [Mon, 13 Aug 2018 15:17:41 +0000 (17:17 +0200)]
src/rss_converter_twitter.com.xsl: add a label to tweets containing GIFs
The static scraped content only provides a preview of GIF files with the
first frame only, just like in the case of videos.
Set a label when a tweet contains a GIF so that the user can decide to
open the tweet in a full fledged browser to properly see the GIF.
Antonio Ospite [Mon, 13 Aug 2018 15:14:25 +0000 (17:14 +0200)]
src/rss_converter_twitter.com.xsl: make images more adaptive
Adapt images to the screen width to avoid horizontal scrolling in the
feed reader.
Antonio Ospite [Mon, 13 Aug 2018 15:08:03 +0000 (17:08 +0200)]
Tweeper.php: bump version in the User-Agent string
By using a more recent version of the User-Agent, twitter.com will
return more entries in the result when visiting hashtag pages.
This makes tracking hashtag pages more usable.
This change is similar to what was done in commit 0db2f37 ("Tweeper.php:
bump version in the User-Agent string", 2018-06-06)
Antonio Ospite [Wed, 6 Jun 2018 13:50:13 +0000 (15:50 +0200)]
NEWS: add release notes for the v1.3.0 release
Antonio Ospite [Wed, 6 Jun 2018 13:36:39 +0000 (15:36 +0200)]
src/rss_converter_twitter.com.xsl: only output channel image when it's available
Hashtag pages do not have an image usable as a channel logo, and in
cases like this the <url/> element would be empty, but this would make
the feed invalid according to https://www.feedvalidator.org
So, to produce feeds which validate, avoid outputting the whole <image/>
element when there is no suitable image to use as a channel logo.
Antonio Ospite [Wed, 6 Jun 2018 13:34:12 +0000 (15:34 +0200)]
src/rss_converter_twitter.com.xsl: fix getting description for hashtag pages
Antonio Ospite [Wed, 6 Jun 2018 12:59:30 +0000 (14:59 +0200)]
TODO: remove the entry about instagram tags, tweeper can now track them
Antonio Ospite [Wed, 6 Jun 2018 12:57:10 +0000 (14:57 +0200)]
src/rss_converter_instagram.com.xsl: add support for Instagram.com tags
Supporting Instagram tags is quite easy, so let's do it and while at it
refactor how the channel description is set depending of the kind of
page.
Antonio Ospite [Wed, 6 Jun 2018 12:46:07 +0000 (14:46 +0200)]
src/rss_converter_pump.io.xsl: fix getting the channel logo URL
Antonio Ospite [Wed, 6 Jun 2018 11:20:28 +0000 (13:20 +0200)]
Tweeper.php: bump version in the User-Agent string
By using a more recent version of the User-Agent, twitter.com will
return more entries in the result when visiting hashtag pages.
This makes tracking hashtag pages actually usable.
Tested with https://twitter.com/hashtag/tweeper
Antonio Ospite [Thu, 24 May 2018 21:43:16 +0000 (23:43 +0200)]
Tweeper.php: update the User-Agent string to fix parsing twitter.com
It looks like twitter.com started serving the mobile version of the site
to old browsers and Tweeper cannot parse that content.
By using a more up to date User-Agent string twitter.com returns the
desktop version of the page which Tweeper can process without problems.
Antonio Ospite [Tue, 3 Apr 2018 16:12:22 +0000 (18:12 +0200)]
rss_converter_instagram.com.xsl: don't put location coordinates in screen name
Remove location coordinates from the location screen name as the latter
also shows up in item titles, but still emit the coordinates in the
channel description.
Antonio Ospite [Tue, 3 Apr 2018 16:11:04 +0000 (18:11 +0200)]
rss_converter_instagram.com.xsl: use the screen name in item titles
The user name is not always defined, for example in case of locations,
so use the screen name in item titles.
Antonio Ospite [Tue, 3 Apr 2018 16:08:59 +0000 (18:08 +0200)]
rss_converter_instagram.com.xsl: fix scraping Instagram.com
Antonio Ospite [Fri, 16 Mar 2018 11:49:41 +0000 (12:49 +0100)]
rss_converter_twitter.com.xsl: show again the user name in the description
Having the user name also in the description makes it easier to see who
the author is in case of re-tweeted messages.
Leave the line-break after the username to have the actual message start
at the beginning of the line, this is done to preserve the formatting of
the original message as much as possible.
Antonio Ospite [Thu, 15 Mar 2018 08:04:15 +0000 (09:04 +0100)]
Update copyright years
Antonio Ospite [Thu, 15 Mar 2018 08:00:03 +0000 (09:00 +0100)]
INSTALL: add some notes for about dependencies
Antonio Ospite [Thu, 15 Mar 2018 07:30:16 +0000 (08:30 +0100)]
INSTALL: explain better what "usable HTML" means in this context
Antonio Ospite [Sat, 24 Feb 2018 15:29:31 +0000 (16:29 +0100)]
NEWS: add release notes for the v1.2.0 release
Antonio Ospite [Sat, 24 Feb 2018 14:33:58 +0000 (15:33 +0100)]
rss_converter_instagram.com.xsl: fix validation for Instagram location feeds
Avoid outputting an <image/> element without an empty <url/>, this
breaks validation.
Antonio Ospite [Fri, 23 Feb 2018 15:10:52 +0000 (16:10 +0100)]
Tweeper.php: a more robust fix for
4b9692a19e06f3cf698d23a3854fd34b9914a32a
The "qe" element in the json data is the one containing the problematic
element mentioned in commit
4b9692a19e06f3cf698d23a3854fd34b9914a32a and
it may contain multiple elements with problematic names, so just remove
the "qe" element altogether.
Antonio Ospite [Fri, 23 Feb 2018 14:34:02 +0000 (15:34 +0100)]
rss_converter_twitter.com.xsl: preserve spaces in tweet content
Wrap the tweet content into a span element with a CSS style attribute
set to "white-space: pre-wrap", this allows to have the spaces rendered
like on the twitter web page: with spaces and newlines preserved.
This is especially desirable if the tweet content contains any ASCII
art, like in https://twitter.com/sarahjeong/status/
955651919279722496
Antonio Ospite [Fri, 23 Feb 2018 14:29:44 +0000 (15:29 +0100)]
rss_converter_twitter.com.xsl: add support for permalink URLs
This way it is possible to generate an RSS feed of all the replies to
a certain tweet using its permalink URL.
Antonio Ospite [Fri, 23 Feb 2018 13:55:10 +0000 (14:55 +0100)]
rss_converter_twitter.com.xsl: add a line break after the "(Video)" label
This is to start the actual original tweet content on a new line, this
is important for example if the content contains some ASCII art.
Antonio Ospite [Fri, 23 Feb 2018 13:48:06 +0000 (14:48 +0100)]
rss_converter_twitter.com.xsl: don't print the user name in description
This is in the spirit of leaving the tweet content untouched as much as
possible.
Antonio Ospite [Fri, 23 Feb 2018 13:43:57 +0000 (14:43 +0100)]
rss_converter_twitter.com.xsl: use a different rule to get the tweet user-name
Instead of looking for 'js-stream-tweet' in the class attribute, pick
the element which has the 'data-tweet-id' attribute, this is more
generic and works also with permalink tweets.
Antonio Ospite [Sun, 14 Jan 2018 18:46:54 +0000 (19:46 +0100)]
Tweeper.php: fix converting Instagram data to RSS
There is one new element in the json data served by Instagram named
"404_as_react", and this makes the conversion from json to XML fail
because names starting with a number are illegal in XML.
Fix the problem by prepending an underscore to the problematic name.
Antonio Ospite [Mon, 6 Nov 2017 17:15:59 +0000 (18:15 +0100)]
rss_converter_facebook.com.xsl: fix channel link, image, and description
Antonio Ospite [Mon, 6 Nov 2017 16:53:42 +0000 (17:53 +0100)]
rss_converter_facebook.com.xsl: fix scraping facebook.com pages once again
Add back support for 'userContentWrapper' which seems to be still used.
Antonio Ospite [Mon, 6 Nov 2017 16:52:56 +0000 (17:52 +0100)]
TODO: add an entry about Instagram tags
Antonio Ospite [Mon, 11 Sep 2017 11:17:31 +0000 (13:17 +0200)]
rss_converter_facebook.com.xsl: fix scraping facebook.com pages once again
Tip: in order to get more posts, and not just the last two, append
"/posts" to the facebook page URL, or use the URL of the "See all" link
in the "Posts" section.
Antonio Ospite [Mon, 10 Jul 2017 08:29:01 +0000 (10:29 +0200)]
rss_converter_instagram.com.xsl: support scraping Instagram locations pages
Antonio Ospite [Mon, 10 Jul 2017 08:05:31 +0000 (10:05 +0200)]
rss_converter_instagram.com.xsl: improve the comment about full names
Antonio Ospite [Tue, 27 Jun 2017 10:01:14 +0000 (12:01 +0200)]
NEWS: add release notes for the v1.1.0 release
Antonio Ospite [Tue, 27 Jun 2017 08:59:31 +0000 (10:59 +0200)]
TODO: add an entry about the use of trigger_error()
Antonio Ospite [Tue, 27 Jun 2017 08:45:47 +0000 (10:45 +0200)]
Remove support for Howtoons.com, the old blog is not available anymore
Antonio Ospite [Thu, 22 Jun 2017 08:52:41 +0000 (10:52 +0200)]
Add an example of instrumentation to capture the HTML for later analysis
Antonio Ospite [Thu, 22 Jun 2017 08:47:35 +0000 (10:47 +0200)]
rss_converter_twitter.com.xsl: filter out promoted tweets
Antonio Ospite [Thu, 8 Jun 2017 13:35:27 +0000 (15:35 +0200)]
rss_converter_twitter.com.xsl: strip the style attribute from HTML elements
Elements in an RSS item description are not supposed to have a style
attribute, and they don't really need to anyways, so filter it out in
the identity template.
This also fixes an issue with Twitter images being shown with a offset
in liferea.
Antonio Ospite [Wed, 8 Mar 2017 08:20:01 +0000 (09:20 +0100)]
rss_converter_facebook.com.xsl: match both the new and the old wrapper class
Facebook still seems to use the "userContentWrapper" sometimes, it's not
clear if "fbUserContent" was only used for a short period of time or if
both are actually used; in the doubt support both.
Antonio Ospite [Tue, 14 Feb 2017 08:41:35 +0000 (09:41 +0100)]
HACKING: add instructions about installing the Drupal style in PHP_CodeSniffer
Antonio Ospite [Thu, 9 Feb 2017 17:21:17 +0000 (18:21 +0100)]
Add the helper script tests/tweeper_file
The script allows to scrape a local file, this speeds up development and
testing.
Antonio Ospite [Thu, 9 Feb 2017 17:15:54 +0000 (18:15 +0100)]
Add the helper script tests/fetch_facebook_page.sh
The script helps retrieving the actual html of a public page on
facebook.com, ignoring the pages which require the CAPTCHA.
This allows to have a local copy of the page to test tweeper on.
Antonio Ospite [Thu, 9 Feb 2017 15:48:55 +0000 (16:48 +0100)]
Tweeper.php: allow to pass parameters to Tweeper::tweep()
This allows to call Tweeper::tweep() on file:// URLs which can make
development faster.
Antonio Ospite [Thu, 9 Feb 2017 14:49:59 +0000 (15:49 +0100)]
rss_converter_facebook.com.xsl: fix the URL of the channel image
David Kalnischkies [Wed, 8 Feb 2017 23:52:00 +0000 (00:52 +0100)]
rss_converter_facebook.com.xsl: new wrapper classname
Facebook seems to have changed the classname of the wrapping div
from "userContentWrapper" to "fbUserContent".
Antonio Ospite [Sun, 11 Dec 2016 09:23:20 +0000 (10:23 +0100)]
NEWS: add release notes for the v1.0.0 release
The release numbering scheme has been changed to match what composer
expects.
Antonio Ospite [Sat, 10 Dec 2016 23:38:14 +0000 (00:38 +0100)]
composer.json: make the dependencies on symfony components more relaxed
Antonio Ospite [Sat, 10 Dec 2016 21:01:47 +0000 (22:01 +0100)]
Makefile: mention DESTDIR in the "INSTALLATION COMPLETE" message
Antonio Ospite [Sat, 10 Dec 2016 20:59:19 +0000 (21:59 +0100)]
Makefile: make the symlink in BIN_DIR refer to the executable in DESTDIR
Also make the symlink relative, this way it is always valid whether
DESTDIR is specified or not.
Antonio Ospite [Sat, 10 Dec 2016 20:57:38 +0000 (21:57 +0100)]
Makefile: fix installation after the code restructuring
Antonio Ospite [Sat, 10 Dec 2016 18:34:57 +0000 (19:34 +0100)]
tweeper: allow running tweeper from vendor/bin also when it's not a symlink
Antonio Ospite [Sun, 6 Nov 2016 09:06:19 +0000 (10:06 +0100)]
autoload.php: improve the comment about the system-wide dependencies
Antonio Ospite [Sun, 6 Nov 2016 08:43:06 +0000 (09:43 +0100)]
TODO: add a note about the version of the dependencies in composer.json
Antonio Ospite [Sat, 5 Nov 2016 18:25:05 +0000 (19:25 +0100)]
Update copyright years in recently modified files
Antonio Ospite [Sat, 5 Nov 2016 16:55:56 +0000 (17:55 +0100)]
tweeper: allow to run tweeper either with or without composer
Antonio Ospite [Fri, 4 Nov 2016 12:18:08 +0000 (13:18 +0100)]
Add a composer.json file
Antonio Ospite [Fri, 4 Nov 2016 17:02:11 +0000 (18:02 +0100)]
rss_converters_*.xsl: prefix the namespace when calling Tweeper class methods
The Tweeper class is now in a namespace, without this change the XSLT
processor would give errors like this:
PHP Warning: XSLTProcessor::transformToXml(): Unable to call handler Tweeper::epochToRssDate() in .../src/Tweeper.php on line 356
Antonio Ospite [Fri, 4 Nov 2016 12:13:54 +0000 (13:13 +0100)]
tweeper: move the main Tweeper class to its own file under src/
This matches more closely the project structure expected by composer
packages.
Antonio Ospite [Fri, 4 Nov 2016 15:02:26 +0000 (16:02 +0100)]
TODO: improve wording and remove fullstops at the end of items
Antonio Ospite [Sun, 30 Oct 2016 10:34:22 +0000 (11:34 +0100)]
Fix information leakage by validating the URL scheme
Validate the scheme to prevent leaking information by abusing the
file:// scheme.
Before this change it was possible to see what files are available on
the system running tweeper.
The script in tests/test_information_leakage.sh shows the problem on
earlier versions.
Here is an execution with tweeper-0.6:
-----------------------------------------------------------------------
URL file://twitter.com//etc/passwd
--> /etc/passwd
exists
URL file://twitter.com//etc/file_with_an_unlikely_name
... /etc/file_with_an_unlikely_name
does not exist
Staring a test server
URL file://twitter.com//etc/passwd
--> /etc/passwd on http://localhost:8000
exists
URL file://twitter.com//etc/file_with_an_unlikely_name
... /etc/file_with_an_unlikely_name on http://localhost:8000
does not exist
Shutting down the test server
-----------------------------------------------------------------------
Here is an execution after this fix:
-----------------------------------------------------------------------
PHP Fatal error: unsupported scheme: file in /home/ao2/Proj/Tweeper/tweeper/tweeper.php on line 323
URL file://twitter.com//etc/passwd
... /etc/passwd
does not exist
PHP Fatal error: unsupported scheme: file in /home/ao2/Proj/Tweeper/tweeper/tweeper.php on line 323
URL file://twitter.com//etc/file_with_an_unlikely_name
... /etc/file_with_an_unlikely_name
does not exist
Staring a test server
URL file://twitter.com//etc/passwd
... /etc/passwd on http://localhost:8000
does not exist
URL file://twitter.com//etc/file_with_an_unlikely_name
... /etc/file_with_an_unlikely_name on http://localhost:8000
does not exist
Shutting down the test server
-----------------------------------------------------------------------
Antonio Ospite [Sun, 30 Oct 2016 09:28:41 +0000 (10:28 +0100)]
tweeper.php: check the return value of Tweeper::tweep()
If the tweep() method fails return 1 to the calling process so that it
can know that something failed.