Antonio Ospite [Wed, 6 Jun 2018 13:34:12 +0000 (15:34 +0200)]
src/rss_converter_twitter.com.xsl: fix getting description for hashtag pages
Antonio Ospite [Wed, 6 Jun 2018 12:59:30 +0000 (14:59 +0200)]
TODO: remove the entry about instagram tags, tweeper can now track them
Antonio Ospite [Wed, 6 Jun 2018 12:57:10 +0000 (14:57 +0200)]
src/rss_converter_instagram.com.xsl: add support for Instagram.com tags
Supporting Instagram tags is quite easy, so let's do it and while at it
refactor how the channel description is set depending of the kind of
page.
Antonio Ospite [Wed, 6 Jun 2018 12:46:07 +0000 (14:46 +0200)]
src/rss_converter_pump.io.xsl: fix getting the channel logo URL
Antonio Ospite [Wed, 6 Jun 2018 11:20:28 +0000 (13:20 +0200)]
Tweeper.php: bump version in the User-Agent string
By using a more recent version of the User-Agent, twitter.com will
return more entries in the result when visiting hashtag pages.
This makes tracking hashtag pages actually usable.
Tested with https://twitter.com/hashtag/tweeper
Antonio Ospite [Thu, 24 May 2018 21:43:16 +0000 (23:43 +0200)]
Tweeper.php: update the User-Agent string to fix parsing twitter.com
It looks like twitter.com started serving the mobile version of the site
to old browsers and Tweeper cannot parse that content.
By using a more up to date User-Agent string twitter.com returns the
desktop version of the page which Tweeper can process without problems.
Antonio Ospite [Tue, 3 Apr 2018 16:12:22 +0000 (18:12 +0200)]
rss_converter_instagram.com.xsl: don't put location coordinates in screen name
Remove location coordinates from the location screen name as the latter
also shows up in item titles, but still emit the coordinates in the
channel description.
Antonio Ospite [Tue, 3 Apr 2018 16:11:04 +0000 (18:11 +0200)]
rss_converter_instagram.com.xsl: use the screen name in item titles
The user name is not always defined, for example in case of locations,
so use the screen name in item titles.
Antonio Ospite [Tue, 3 Apr 2018 16:08:59 +0000 (18:08 +0200)]
rss_converter_instagram.com.xsl: fix scraping Instagram.com
Antonio Ospite [Fri, 16 Mar 2018 11:49:41 +0000 (12:49 +0100)]
rss_converter_twitter.com.xsl: show again the user name in the description
Having the user name also in the description makes it easier to see who
the author is in case of re-tweeted messages.
Leave the line-break after the username to have the actual message start
at the beginning of the line, this is done to preserve the formatting of
the original message as much as possible.
Antonio Ospite [Thu, 15 Mar 2018 08:04:15 +0000 (09:04 +0100)]
Update copyright years
Antonio Ospite [Thu, 15 Mar 2018 08:00:03 +0000 (09:00 +0100)]
INSTALL: add some notes for about dependencies
Antonio Ospite [Thu, 15 Mar 2018 07:30:16 +0000 (08:30 +0100)]
INSTALL: explain better what "usable HTML" means in this context
Antonio Ospite [Sat, 24 Feb 2018 15:29:31 +0000 (16:29 +0100)]
NEWS: add release notes for the v1.2.0 release
Antonio Ospite [Sat, 24 Feb 2018 14:33:58 +0000 (15:33 +0100)]
rss_converter_instagram.com.xsl: fix validation for Instagram location feeds
Avoid outputting an <image/> element without an empty <url/>, this
breaks validation.
Antonio Ospite [Fri, 23 Feb 2018 15:10:52 +0000 (16:10 +0100)]
Tweeper.php: a more robust fix for
4b9692a19e06f3cf698d23a3854fd34b9914a32a
The "qe" element in the json data is the one containing the problematic
element mentioned in commit
4b9692a19e06f3cf698d23a3854fd34b9914a32a and
it may contain multiple elements with problematic names, so just remove
the "qe" element altogether.
Antonio Ospite [Fri, 23 Feb 2018 14:34:02 +0000 (15:34 +0100)]
rss_converter_twitter.com.xsl: preserve spaces in tweet content
Wrap the tweet content into a span element with a CSS style attribute
set to "white-space: pre-wrap", this allows to have the spaces rendered
like on the twitter web page: with spaces and newlines preserved.
This is especially desirable if the tweet content contains any ASCII
art, like in https://twitter.com/sarahjeong/status/
955651919279722496
Antonio Ospite [Fri, 23 Feb 2018 14:29:44 +0000 (15:29 +0100)]
rss_converter_twitter.com.xsl: add support for permalink URLs
This way it is possible to generate an RSS feed of all the replies to
a certain tweet using its permalink URL.
Antonio Ospite [Fri, 23 Feb 2018 13:55:10 +0000 (14:55 +0100)]
rss_converter_twitter.com.xsl: add a line break after the "(Video)" label
This is to start the actual original tweet content on a new line, this
is important for example if the content contains some ASCII art.
Antonio Ospite [Fri, 23 Feb 2018 13:48:06 +0000 (14:48 +0100)]
rss_converter_twitter.com.xsl: don't print the user name in description
This is in the spirit of leaving the tweet content untouched as much as
possible.
Antonio Ospite [Fri, 23 Feb 2018 13:43:57 +0000 (14:43 +0100)]
rss_converter_twitter.com.xsl: use a different rule to get the tweet user-name
Instead of looking for 'js-stream-tweet' in the class attribute, pick
the element which has the 'data-tweet-id' attribute, this is more
generic and works also with permalink tweets.
Antonio Ospite [Sun, 14 Jan 2018 18:46:54 +0000 (19:46 +0100)]
Tweeper.php: fix converting Instagram data to RSS
There is one new element in the json data served by Instagram named
"404_as_react", and this makes the conversion from json to XML fail
because names starting with a number are illegal in XML.
Fix the problem by prepending an underscore to the problematic name.
Antonio Ospite [Mon, 6 Nov 2017 17:15:59 +0000 (18:15 +0100)]
rss_converter_facebook.com.xsl: fix channel link, image, and description
Antonio Ospite [Mon, 6 Nov 2017 16:53:42 +0000 (17:53 +0100)]
rss_converter_facebook.com.xsl: fix scraping facebook.com pages once again
Add back support for 'userContentWrapper' which seems to be still used.
Antonio Ospite [Mon, 6 Nov 2017 16:52:56 +0000 (17:52 +0100)]
TODO: add an entry about Instagram tags
Antonio Ospite [Mon, 11 Sep 2017 11:17:31 +0000 (13:17 +0200)]
rss_converter_facebook.com.xsl: fix scraping facebook.com pages once again
Tip: in order to get more posts, and not just the last two, append
"/posts" to the facebook page URL, or use the URL of the "See all" link
in the "Posts" section.
Antonio Ospite [Mon, 10 Jul 2017 08:29:01 +0000 (10:29 +0200)]
rss_converter_instagram.com.xsl: support scraping Instagram locations pages
Antonio Ospite [Mon, 10 Jul 2017 08:05:31 +0000 (10:05 +0200)]
rss_converter_instagram.com.xsl: improve the comment about full names
Antonio Ospite [Tue, 27 Jun 2017 10:01:14 +0000 (12:01 +0200)]
NEWS: add release notes for the v1.1.0 release
Antonio Ospite [Tue, 27 Jun 2017 08:59:31 +0000 (10:59 +0200)]
TODO: add an entry about the use of trigger_error()
Antonio Ospite [Tue, 27 Jun 2017 08:45:47 +0000 (10:45 +0200)]
Remove support for Howtoons.com, the old blog is not available anymore
Antonio Ospite [Thu, 22 Jun 2017 08:52:41 +0000 (10:52 +0200)]
Add an example of instrumentation to capture the HTML for later analysis
Antonio Ospite [Thu, 22 Jun 2017 08:47:35 +0000 (10:47 +0200)]
rss_converter_twitter.com.xsl: filter out promoted tweets
Antonio Ospite [Thu, 8 Jun 2017 13:35:27 +0000 (15:35 +0200)]
rss_converter_twitter.com.xsl: strip the style attribute from HTML elements
Elements in an RSS item description are not supposed to have a style
attribute, and they don't really need to anyways, so filter it out in
the identity template.
This also fixes an issue with Twitter images being shown with a offset
in liferea.
Antonio Ospite [Wed, 8 Mar 2017 08:20:01 +0000 (09:20 +0100)]
rss_converter_facebook.com.xsl: match both the new and the old wrapper class
Facebook still seems to use the "userContentWrapper" sometimes, it's not
clear if "fbUserContent" was only used for a short period of time or if
both are actually used; in the doubt support both.
Antonio Ospite [Tue, 14 Feb 2017 08:41:35 +0000 (09:41 +0100)]
HACKING: add instructions about installing the Drupal style in PHP_CodeSniffer
Antonio Ospite [Thu, 9 Feb 2017 17:21:17 +0000 (18:21 +0100)]
Add the helper script tests/tweeper_file
The script allows to scrape a local file, this speeds up development and
testing.
Antonio Ospite [Thu, 9 Feb 2017 17:15:54 +0000 (18:15 +0100)]
Add the helper script tests/fetch_facebook_page.sh
The script helps retrieving the actual html of a public page on
facebook.com, ignoring the pages which require the CAPTCHA.
This allows to have a local copy of the page to test tweeper on.
Antonio Ospite [Thu, 9 Feb 2017 15:48:55 +0000 (16:48 +0100)]
Tweeper.php: allow to pass parameters to Tweeper::tweep()
This allows to call Tweeper::tweep() on file:// URLs which can make
development faster.
Antonio Ospite [Thu, 9 Feb 2017 14:49:59 +0000 (15:49 +0100)]
rss_converter_facebook.com.xsl: fix the URL of the channel image
David Kalnischkies [Wed, 8 Feb 2017 23:52:00 +0000 (00:52 +0100)]
rss_converter_facebook.com.xsl: new wrapper classname
Facebook seems to have changed the classname of the wrapping div
from "userContentWrapper" to "fbUserContent".
Antonio Ospite [Sun, 11 Dec 2016 09:23:20 +0000 (10:23 +0100)]
NEWS: add release notes for the v1.0.0 release
The release numbering scheme has been changed to match what composer
expects.
Antonio Ospite [Sat, 10 Dec 2016 23:38:14 +0000 (00:38 +0100)]
composer.json: make the dependencies on symfony components more relaxed
Antonio Ospite [Sat, 10 Dec 2016 21:01:47 +0000 (22:01 +0100)]
Makefile: mention DESTDIR in the "INSTALLATION COMPLETE" message
Antonio Ospite [Sat, 10 Dec 2016 20:59:19 +0000 (21:59 +0100)]
Makefile: make the symlink in BIN_DIR refer to the executable in DESTDIR
Also make the symlink relative, this way it is always valid whether
DESTDIR is specified or not.
Antonio Ospite [Sat, 10 Dec 2016 20:57:38 +0000 (21:57 +0100)]
Makefile: fix installation after the code restructuring
Antonio Ospite [Sat, 10 Dec 2016 18:34:57 +0000 (19:34 +0100)]
tweeper: allow running tweeper from vendor/bin also when it's not a symlink
Antonio Ospite [Sun, 6 Nov 2016 09:06:19 +0000 (10:06 +0100)]
autoload.php: improve the comment about the system-wide dependencies
Antonio Ospite [Sun, 6 Nov 2016 08:43:06 +0000 (09:43 +0100)]
TODO: add a note about the version of the dependencies in composer.json
Antonio Ospite [Sat, 5 Nov 2016 18:25:05 +0000 (19:25 +0100)]
Update copyright years in recently modified files
Antonio Ospite [Sat, 5 Nov 2016 16:55:56 +0000 (17:55 +0100)]
tweeper: allow to run tweeper either with or without composer
Antonio Ospite [Fri, 4 Nov 2016 12:18:08 +0000 (13:18 +0100)]
Add a composer.json file
Antonio Ospite [Fri, 4 Nov 2016 17:02:11 +0000 (18:02 +0100)]
rss_converters_*.xsl: prefix the namespace when calling Tweeper class methods
The Tweeper class is now in a namespace, without this change the XSLT
processor would give errors like this:
PHP Warning: XSLTProcessor::transformToXml(): Unable to call handler Tweeper::epochToRssDate() in .../src/Tweeper.php on line 356
Antonio Ospite [Fri, 4 Nov 2016 12:13:54 +0000 (13:13 +0100)]
tweeper: move the main Tweeper class to its own file under src/
This matches more closely the project structure expected by composer
packages.
Antonio Ospite [Fri, 4 Nov 2016 15:02:26 +0000 (16:02 +0100)]
TODO: improve wording and remove fullstops at the end of items
Antonio Ospite [Sun, 30 Oct 2016 10:34:22 +0000 (11:34 +0100)]
Fix information leakage by validating the URL scheme
Validate the scheme to prevent leaking information by abusing the
file:// scheme.
Before this change it was possible to see what files are available on
the system running tweeper.
The script in tests/test_information_leakage.sh shows the problem on
earlier versions.
Here is an execution with tweeper-0.6:
-----------------------------------------------------------------------
URL file://twitter.com//etc/passwd
--> /etc/passwd
exists
URL file://twitter.com//etc/file_with_an_unlikely_name
... /etc/file_with_an_unlikely_name
does not exist
Staring a test server
URL file://twitter.com//etc/passwd
--> /etc/passwd on http://localhost:8000
exists
URL file://twitter.com//etc/file_with_an_unlikely_name
... /etc/file_with_an_unlikely_name on http://localhost:8000
does not exist
Shutting down the test server
-----------------------------------------------------------------------
Here is an execution after this fix:
-----------------------------------------------------------------------
PHP Fatal error: unsupported scheme: file in /home/ao2/Proj/Tweeper/tweeper/tweeper.php on line 323
URL file://twitter.com//etc/passwd
... /etc/passwd
does not exist
PHP Fatal error: unsupported scheme: file in /home/ao2/Proj/Tweeper/tweeper/tweeper.php on line 323
URL file://twitter.com//etc/file_with_an_unlikely_name
... /etc/file_with_an_unlikely_name
does not exist
Staring a test server
URL file://twitter.com//etc/passwd
... /etc/passwd on http://localhost:8000
does not exist
URL file://twitter.com//etc/file_with_an_unlikely_name
... /etc/file_with_an_unlikely_name on http://localhost:8000
does not exist
Shutting down the test server
-----------------------------------------------------------------------
Antonio Ospite [Sun, 30 Oct 2016 09:28:41 +0000 (10:28 +0100)]
tweeper.php: check the return value of Tweeper::tweep()
If the tweep() method fails return 1 to the calling process so that it
can know that something failed.
Antonio Ospite [Sun, 30 Oct 2016 09:04:51 +0000 (10:04 +0100)]
tweeper.php: check curl_exec() return value
Also show the message of curl_error() when curl_exec() fails, this way
it's easier to diagnose problems.
Antonio Ospite [Sat, 29 Oct 2016 17:34:10 +0000 (19:34 +0200)]
tweeper.php: support "application/pdf" as an enclosure content type
Antonio Ospite [Sat, 29 Oct 2016 17:17:00 +0000 (19:17 +0200)]
tweeper.php: support "application/octet-stream" as an enclosure content type
This allows binary attachment without a more specific content type to be
supported for the enclosure element.
Adding "application/octet-stream" also covers the weird case of servers
sending multiple Content-Type headers, e.g.:
< HTTP/1.1 200 OK
< Server: Apache
< ETag: "
a46d495ba00c35580f83344dd523ece2:
1473631283"
< Last-Modified: Sun, 11 Sep 2016 22:01:22 GMT
< Accept-Ranges: bytes
< Content-Length:
14346711
< Content-Type: audio/mpeg
< Content-Type: application/octet-stream
< content-disposition: attachment
< Date: Mon, 26 Sep 2016 23:36:11 GMT
< Connection: keep-alive
< Content-Type: application/octet-stream
< content-disposition: attachment
In this case the ideal solution would be to use the more _specific_
content type, but CURL just remembers the _last_ one ad it's not really
worth parsing the HTTP headers in tweeper just for this rare scenario.
Reported-by: Torsten Grote
Antonio Ospite [Mon, 23 May 2016 14:17:37 +0000 (16:17 +0200)]
NEWS: add release notes for the v0.6 release
Antonio Ospite [Mon, 23 May 2016 14:16:57 +0000 (16:16 +0200)]
rss_converter_instagram.com.xsl: strip unneeded trailing space
Antonio Ospite [Mon, 23 May 2016 13:55:34 +0000 (15:55 +0200)]
Add a HACKING file to describe the coding style used in the project
Antonio Ospite [Mon, 23 May 2016 13:45:43 +0000 (15:45 +0200)]
INSTALL: mention php-symfony-property-access as a dependency
Antonio Ospite [Mon, 23 May 2016 13:32:27 +0000 (15:32 +0200)]
tweeper.php: fix a problem with https URLs ending up in the enclosure element
When the remote host forces every URLs to be redirected to https, the
info returned by Tweeper::getUrlInfo() would contain an https URL, which
will end up being used in the "url" attribute of the enclosure element,
and this is invalid according to the RSS specification.
So make sure that an http URL is actually used for the "url" attribute.
Antonio Ospite [Mon, 23 May 2016 13:28:07 +0000 (15:28 +0200)]
Use php:functionString() in the stylesheets where appropriate
Instead of converting to string in XSL and then calling php:function(),
use directly php:functionString() when calling PHP functions which
actually expect a string argument.
Besides possible performance improvements, this is done mainly for
readability.
Antonio Ospite [Mon, 23 May 2016 12:58:45 +0000 (14:58 +0200)]
Don't convert the timestamp to a number when calling php:functionString()
It's not needed to convert the $timestamp variable to a number before
passing it as an argument to php:functionString() as the latter will
convert it to a string anyways.
Antonio Ospite [Mon, 23 May 2016 12:52:18 +0000 (14:52 +0200)]
TODO: remove entry about using enclosure for pump.io images
Antonio Ospite [Mon, 23 May 2016 12:51:38 +0000 (14:51 +0200)]
rss_converter_pump.io.xsl: add enclosure element for images
Antonio Ospite [Fri, 20 May 2016 16:25:28 +0000 (18:25 +0200)]
TODO: remove the item about enclosures for dilbert.com, now they are supported
Antonio Ospite [Fri, 20 May 2016 16:17:59 +0000 (18:17 +0200)]
rss_converter_instagram.com.xsl: don't use a template for the enclosure
Each post has exactly one image, it is enough to copy the generated
element in-place without applying templates.
Antonio Ospite [Fri, 20 May 2016 16:15:29 +0000 (18:15 +0200)]
rss_converter_dilbert.com.xsl: add support for the <enclosure/> element
Antonio Ospite [Fri, 20 May 2016 16:06:32 +0000 (18:06 +0200)]
TODO: add an entry about adding direct links to Instagram videos
Antonio Ospite [Fri, 20 May 2016 16:05:29 +0000 (18:05 +0200)]
rss_converter_instagram.com.xsl: make images adapt to the viewer width
This way the user does not have to scroll horizontally to see the whole
picture.
Antonio Ospite [Fri, 20 May 2016 16:01:44 +0000 (18:01 +0200)]
rss_converter_instagram.com.xsl: use a stricter match for some elements
Antonio Ospite [Fri, 20 May 2016 16:00:50 +0000 (18:00 +0200)]
rss_converter_instagram.com.xsl: fix the channel link
Antonio Ospite [Fri, 20 May 2016 12:40:31 +0000 (14:40 +0200)]
TODO: update the entry about twitter images and cards, images are now supported
Antonio Ospite [Fri, 20 May 2016 12:04:03 +0000 (14:04 +0200)]
rss_converter_twitter.com.xsl: generate enclosure for images
Antonio Ospite [Fri, 20 May 2016 11:57:26 +0000 (13:57 +0200)]
rss_converter_twitter.com.xsl: show explicitly if the item has a video
Tweeper does not provide direct links to videos, so it's useful to tell
users that the content has a video so they can follow the link and view
it on the twitter.com page.
Antonio Ospite [Fri, 20 May 2016 11:48:00 +0000 (13:48 +0200)]
rss_converter_twitter.com.xsl: don't repeat background in embedded media
Tweeper doesn't provide direct links to videos and vines from twitter
but it still shows the preview picture provided by the original HTML
code; unfortunately the picture repeats itself, so avoid that.
Antonio Ospite [Fri, 20 May 2016 11:33:01 +0000 (13:33 +0200)]
rss_converter_twitter.com.xsl: present images in a more convenient way
Make images clickable and pointing to the original full-size picture.
Antonio Ospite [Fri, 20 May 2016 11:18:42 +0000 (13:18 +0200)]
rss_converter_twitter.com.xsl: show media content in the feed item description
Antonio Ospite [Fri, 20 May 2016 09:10:43 +0000 (11:10 +0200)]
rss_converter_twitter.com.xsl: use direct URLs for links, when possible
Replace the t.co URLs with the actual location the link was originally
meant to point to.
Antonio Ospite [Wed, 18 May 2016 11:28:01 +0000 (13:28 +0200)]
rss_converter_twitter.com.xsl: add a mode attribute to the enclosure template
This is needed because another template with
match="a[@data-expanded-url]" will be added in a future commit.
Antonio Ospite [Wed, 18 May 2016 20:41:54 +0000 (22:41 +0200)]
rss_converter_twitter.com.xsl: cleanup titles
Prepend a white space in front of some URLs (those not preceded by an
open parenthesis) because otherwise they get rendered attached to the
preceding text.
Also strip non-breaking spaces and horizontal ellipses, they are not
needed because the RSS feed show the full URLs.
Antonio Ospite [Wed, 18 May 2016 20:38:35 +0000 (22:38 +0200)]
tweeper.php: add "image/png" to the supported types for <enclosure/>
Antonio Ospite [Wed, 18 May 2016 20:35:28 +0000 (22:35 +0200)]
tweeper.php: rename DomDocument() to DOMDocument()
DOMDocument() is the more used form, it is also already used in some
other parts of the file.
Antonio Ospite [Wed, 18 May 2016 20:32:06 +0000 (22:32 +0200)]
Return a DOMElement instead of a string in Tweeper::generateEnclosure()
This make the generated XML have proper indentation in case the
<enclosure/> element gets added.
Antonio Ospite [Wed, 18 May 2016 10:10:15 +0000 (12:10 +0200)]
tweeper.php: move the loadStylesheet() method more down in the file
This way all static methods are grouped together before non-static
methods.
Antonio Ospite [Wed, 18 May 2016 09:13:45 +0000 (11:13 +0200)]
tweeper.php: write XML in upper case inside comments
Antonio Ospite [Wed, 18 May 2016 09:06:34 +0000 (11:06 +0200)]
tweeper.php: make jsonToXml() a static method
Antonio Ospite [Wed, 18 May 2016 08:56:40 +0000 (10:56 +0200)]
tweeper.php: make logXmlError() a static method
Antonio Ospite [Tue, 17 May 2016 22:08:15 +0000 (00:08 +0200)]
tweeper.php: make it clearer that getUrlContents is a static method
Antonio Ospite [Tue, 17 May 2016 22:05:22 +0000 (00:05 +0200)]
tweeper.php: fix naming conventions for the get_xml_ and preprocess_html_ funcs
Antonio Ospite [Tue, 17 May 2016 21:37:35 +0000 (23:37 +0200)]
tweeper.php: rename the ERROR_STREAM variable to error_stream
Variables should be in lower case.
Antonio Ospite [Tue, 17 May 2016 21:28:45 +0000 (23:28 +0200)]
Use more accurate names for the date conversion functions
The new names are epochToRssDate and strToRssDate.
Don't refer to gmdate() in the function names, this is just an
implementation detail which should not have leaked into the external
interface, instead mention RssDate in the function names to communicate
something about the output they produce.
Also, while at it, user the DATE_RSS format when calling gmdate().
Antonio Ospite [Tue, 17 May 2016 21:15:51 +0000 (23:15 +0200)]
Rename epoch_to_gmdate to epochToGmdate, and str_to_gmdate to strToGmdate
Antonio Ospite [Tue, 17 May 2016 21:11:52 +0000 (23:11 +0200)]
tweeper.php: rename getContents to getUrlContents and getInfo to getUrlInfo
The new names should be more explicative.
Antonio Ospite [Tue, 17 May 2016 21:07:57 +0000 (23:07 +0200)]
tweeper.php: use lowerCamel case for methods names
Antonio Ospite [Tue, 17 May 2016 21:04:48 +0000 (23:04 +0200)]
Fix naming conventions for the generate_enclosure functon
Use lowerCamel case for the function name, keep snake_case for the local
variable in the php code, and use a dash-separated case for the xsl
variable.