tweeper.git
7 years agorss_converter_facebook.com.xsl: fix scraping facebook.com pages once again
Antonio Ospite [Mon, 11 Sep 2017 11:17:31 +0000 (13:17 +0200)]
rss_converter_facebook.com.xsl: fix scraping facebook.com pages once again

Tip: in order to get more posts, and not just the last two, append
"/posts" to the facebook page URL, or use the URL of the "See all" link
in the "Posts" section.

7 years agorss_converter_instagram.com.xsl: support scraping Instagram locations pages
Antonio Ospite [Mon, 10 Jul 2017 08:29:01 +0000 (10:29 +0200)]
rss_converter_instagram.com.xsl: support scraping Instagram locations pages

7 years agorss_converter_instagram.com.xsl: improve the comment about full names
Antonio Ospite [Mon, 10 Jul 2017 08:05:31 +0000 (10:05 +0200)]
rss_converter_instagram.com.xsl: improve the comment about full names

7 years agoNEWS: add release notes for the v1.1.0 release v1.1.0
Antonio Ospite [Tue, 27 Jun 2017 10:01:14 +0000 (12:01 +0200)]
NEWS: add release notes for the v1.1.0 release

7 years agoTODO: add an entry about the use of trigger_error()
Antonio Ospite [Tue, 27 Jun 2017 08:59:31 +0000 (10:59 +0200)]
TODO: add an entry about the use of trigger_error()

7 years agoRemove support for Howtoons.com, the old blog is not available anymore
Antonio Ospite [Tue, 27 Jun 2017 08:45:47 +0000 (10:45 +0200)]
Remove support for Howtoons.com, the old blog is not available anymore

7 years agoAdd an example of instrumentation to capture the HTML for later analysis
Antonio Ospite [Thu, 22 Jun 2017 08:52:41 +0000 (10:52 +0200)]
Add an example of instrumentation to capture the HTML for later analysis

7 years agorss_converter_twitter.com.xsl: filter out promoted tweets
Antonio Ospite [Thu, 22 Jun 2017 08:47:35 +0000 (10:47 +0200)]
rss_converter_twitter.com.xsl: filter out promoted tweets

7 years agorss_converter_twitter.com.xsl: strip the style attribute from HTML elements
Antonio Ospite [Thu, 8 Jun 2017 13:35:27 +0000 (15:35 +0200)]
rss_converter_twitter.com.xsl: strip the style attribute from HTML elements

Elements in an RSS item description are not supposed to have a style
attribute, and they don't really need to anyways, so filter it out in
the identity template.

This also fixes an issue with Twitter images being shown with a offset
in liferea.

7 years agorss_converter_facebook.com.xsl: match both the new and the old wrapper class
Antonio Ospite [Wed, 8 Mar 2017 08:20:01 +0000 (09:20 +0100)]
rss_converter_facebook.com.xsl: match both the new and the old wrapper class

Facebook still seems to use the "userContentWrapper" sometimes, it's not
clear if "fbUserContent" was only used for a short period of time or if
both are actually used; in the doubt support both.

7 years agoHACKING: add instructions about installing the Drupal style in PHP_CodeSniffer
Antonio Ospite [Tue, 14 Feb 2017 08:41:35 +0000 (09:41 +0100)]
HACKING: add instructions about installing the Drupal style in PHP_CodeSniffer

7 years agoAdd the helper script tests/tweeper_file
Antonio Ospite [Thu, 9 Feb 2017 17:21:17 +0000 (18:21 +0100)]
Add the helper script tests/tweeper_file

The script allows to scrape a local file, this speeds up development and
testing.

7 years agoAdd the helper script tests/fetch_facebook_page.sh
Antonio Ospite [Thu, 9 Feb 2017 17:15:54 +0000 (18:15 +0100)]
Add the helper script tests/fetch_facebook_page.sh

The script helps retrieving the actual html of a public page on
facebook.com, ignoring the pages which require the CAPTCHA.

This allows to have a local copy of the page to test tweeper on.

7 years agoTweeper.php: allow to pass parameters to Tweeper::tweep()
Antonio Ospite [Thu, 9 Feb 2017 15:48:55 +0000 (16:48 +0100)]
Tweeper.php: allow to pass parameters to Tweeper::tweep()

This allows to call Tweeper::tweep() on file:// URLs which can make
development faster.

7 years agorss_converter_facebook.com.xsl: fix the URL of the channel image
Antonio Ospite [Thu, 9 Feb 2017 14:49:59 +0000 (15:49 +0100)]
rss_converter_facebook.com.xsl: fix the URL of the channel image

7 years agorss_converter_facebook.com.xsl: new wrapper classname
David Kalnischkies [Wed, 8 Feb 2017 23:52:00 +0000 (00:52 +0100)]
rss_converter_facebook.com.xsl: new wrapper classname

Facebook seems to have changed the classname of the wrapping div
from "userContentWrapper" to "fbUserContent".

8 years agoNEWS: add release notes for the v1.0.0 release v1.0.0
Antonio Ospite [Sun, 11 Dec 2016 09:23:20 +0000 (10:23 +0100)]
NEWS: add release notes for the v1.0.0 release

The release numbering scheme has been changed to match what composer
expects.

8 years agocomposer.json: make the dependencies on symfony components more relaxed
Antonio Ospite [Sat, 10 Dec 2016 23:38:14 +0000 (00:38 +0100)]
composer.json: make the dependencies on symfony components more relaxed

8 years agoMakefile: mention DESTDIR in the "INSTALLATION COMPLETE" message
Antonio Ospite [Sat, 10 Dec 2016 21:01:47 +0000 (22:01 +0100)]
Makefile: mention DESTDIR in the "INSTALLATION COMPLETE" message

8 years agoMakefile: make the symlink in BIN_DIR refer to the executable in DESTDIR
Antonio Ospite [Sat, 10 Dec 2016 20:59:19 +0000 (21:59 +0100)]
Makefile: make the symlink in BIN_DIR refer to the executable in DESTDIR

Also make the symlink relative, this way it is always valid whether
DESTDIR is specified or not.

8 years agoMakefile: fix installation after the code restructuring
Antonio Ospite [Sat, 10 Dec 2016 20:57:38 +0000 (21:57 +0100)]
Makefile: fix installation after the code restructuring

8 years agotweeper: allow running tweeper from vendor/bin also when it's not a symlink
Antonio Ospite [Sat, 10 Dec 2016 18:34:57 +0000 (19:34 +0100)]
tweeper: allow running tweeper from vendor/bin also when it's not a symlink

8 years agoautoload.php: improve the comment about the system-wide dependencies
Antonio Ospite [Sun, 6 Nov 2016 09:06:19 +0000 (10:06 +0100)]
autoload.php: improve the comment about the system-wide dependencies

8 years agoTODO: add a note about the version of the dependencies in composer.json
Antonio Ospite [Sun, 6 Nov 2016 08:43:06 +0000 (09:43 +0100)]
TODO: add a note about the version of the dependencies in composer.json

8 years agoUpdate copyright years in recently modified files
Antonio Ospite [Sat, 5 Nov 2016 18:25:05 +0000 (19:25 +0100)]
Update copyright years in recently modified files

8 years agotweeper: allow to run tweeper either with or without composer
Antonio Ospite [Sat, 5 Nov 2016 16:55:56 +0000 (17:55 +0100)]
tweeper: allow to run tweeper either with or without composer

8 years agoAdd a composer.json file
Antonio Ospite [Fri, 4 Nov 2016 12:18:08 +0000 (13:18 +0100)]
Add a composer.json file

8 years agorss_converters_*.xsl: prefix the namespace when calling Tweeper class methods
Antonio Ospite [Fri, 4 Nov 2016 17:02:11 +0000 (18:02 +0100)]
rss_converters_*.xsl: prefix the namespace when calling Tweeper class methods

The Tweeper class is now in a namespace, without this change the XSLT
processor would give errors like this:

PHP Warning:  XSLTProcessor::transformToXml(): Unable to call handler Tweeper::epochToRssDate() in .../src/Tweeper.php on line 356

8 years agotweeper: move the main Tweeper class to its own file under src/
Antonio Ospite [Fri, 4 Nov 2016 12:13:54 +0000 (13:13 +0100)]
tweeper: move the main Tweeper class to its own file under src/

This matches more closely the project structure expected by composer
packages.

8 years agoTODO: improve wording and remove fullstops at the end of items
Antonio Ospite [Fri, 4 Nov 2016 15:02:26 +0000 (16:02 +0100)]
TODO: improve wording and remove fullstops at the end of items

8 years agoFix information leakage by validating the URL scheme
Antonio Ospite [Sun, 30 Oct 2016 10:34:22 +0000 (11:34 +0100)]
Fix information leakage by validating the URL scheme

Validate the scheme to prevent leaking information by abusing the
file:// scheme.

Before this change it was possible to see what files are available on
the system running tweeper.

The script in tests/test_information_leakage.sh shows the problem on
earlier versions.

Here is an execution with tweeper-0.6:

-----------------------------------------------------------------------
URL file://twitter.com//etc/passwd
--> /etc/passwd
    exists

URL file://twitter.com//etc/file_with_an_unlikely_name
... /etc/file_with_an_unlikely_name
    does not exist

Staring a test server

URL file://twitter.com//etc/passwd
--> /etc/passwd on http://localhost:8000
    exists

URL file://twitter.com//etc/file_with_an_unlikely_name
... /etc/file_with_an_unlikely_name on http://localhost:8000
    does not exist

Shutting down the test server
-----------------------------------------------------------------------

Here is an execution after this fix:

-----------------------------------------------------------------------
PHP Fatal error:  unsupported scheme: file in /home/ao2/Proj/Tweeper/tweeper/tweeper.php on line 323
URL file://twitter.com//etc/passwd
... /etc/passwd
    does not exist

PHP Fatal error:  unsupported scheme: file in /home/ao2/Proj/Tweeper/tweeper/tweeper.php on line 323
URL file://twitter.com//etc/file_with_an_unlikely_name
... /etc/file_with_an_unlikely_name
    does not exist

Staring a test server

URL file://twitter.com//etc/passwd
... /etc/passwd on http://localhost:8000
    does not exist

URL file://twitter.com//etc/file_with_an_unlikely_name
... /etc/file_with_an_unlikely_name on http://localhost:8000
    does not exist

Shutting down the test server
-----------------------------------------------------------------------

8 years agotweeper.php: check the return value of Tweeper::tweep()
Antonio Ospite [Sun, 30 Oct 2016 09:28:41 +0000 (10:28 +0100)]
tweeper.php: check the return value of Tweeper::tweep()

If the tweep() method fails return 1 to the calling process so that it
can know that something failed.

8 years agotweeper.php: check curl_exec() return value
Antonio Ospite [Sun, 30 Oct 2016 09:04:51 +0000 (10:04 +0100)]
tweeper.php: check curl_exec() return value

Also show the message of curl_error() when curl_exec() fails, this way
it's easier to diagnose problems.

8 years agotweeper.php: support "application/pdf" as an enclosure content type
Antonio Ospite [Sat, 29 Oct 2016 17:34:10 +0000 (19:34 +0200)]
tweeper.php: support "application/pdf" as an enclosure content type

8 years agotweeper.php: support "application/octet-stream" as an enclosure content type
Antonio Ospite [Sat, 29 Oct 2016 17:17:00 +0000 (19:17 +0200)]
tweeper.php: support "application/octet-stream" as an enclosure content type

This allows binary attachment without a more specific content type to be
supported for the enclosure element.

Adding "application/octet-stream" also covers the weird case of servers
sending multiple Content-Type headers, e.g.:

< HTTP/1.1 200 OK
< Server: Apache
< ETag: "a46d495ba00c35580f83344dd523ece2:1473631283"
< Last-Modified: Sun, 11 Sep 2016 22:01:22 GMT
< Accept-Ranges: bytes
< Content-Length: 14346711
< Content-Type: audio/mpeg
< Content-Type: application/octet-stream
< content-disposition: attachment
< Date: Mon, 26 Sep 2016 23:36:11 GMT
< Connection: keep-alive
< Content-Type: application/octet-stream
< content-disposition: attachment

In this case the ideal solution would be to use the more _specific_
content type, but CURL just remembers the _last_ one ad it's not really
worth parsing the HTTP headers in tweeper just for this rare scenario.

Reported-by: Torsten Grote
8 years agoNEWS: add release notes for the v0.6 release v0.6
Antonio Ospite [Mon, 23 May 2016 14:17:37 +0000 (16:17 +0200)]
NEWS: add release notes for the v0.6 release

8 years agorss_converter_instagram.com.xsl: strip unneeded trailing space
Antonio Ospite [Mon, 23 May 2016 14:16:57 +0000 (16:16 +0200)]
rss_converter_instagram.com.xsl: strip unneeded trailing space

8 years agoAdd a HACKING file to describe the coding style used in the project
Antonio Ospite [Mon, 23 May 2016 13:55:34 +0000 (15:55 +0200)]
Add a HACKING file to describe the coding style used in the project

8 years agoINSTALL: mention php-symfony-property-access as a dependency
Antonio Ospite [Mon, 23 May 2016 13:45:43 +0000 (15:45 +0200)]
INSTALL: mention php-symfony-property-access as a dependency

8 years agotweeper.php: fix a problem with https URLs ending up in the enclosure element
Antonio Ospite [Mon, 23 May 2016 13:32:27 +0000 (15:32 +0200)]
tweeper.php: fix a problem with https URLs ending up in the enclosure element

When the remote host forces every URLs to be redirected to https, the
info returned by Tweeper::getUrlInfo() would contain an https URL, which
will end up being used in the "url" attribute of the enclosure element,
and this is invalid according to the RSS specification.

So make sure that an http URL is actually used for the "url" attribute.

8 years agoUse php:functionString() in the stylesheets where appropriate
Antonio Ospite [Mon, 23 May 2016 13:28:07 +0000 (15:28 +0200)]
Use php:functionString() in the stylesheets where appropriate

Instead of converting to string in XSL and then calling php:function(),
use directly php:functionString() when calling PHP functions which
actually expect a string argument.

Besides possible performance improvements, this is done mainly for
readability.

8 years agoDon't convert the timestamp to a number when calling php:functionString()
Antonio Ospite [Mon, 23 May 2016 12:58:45 +0000 (14:58 +0200)]
Don't convert the timestamp to a number when calling php:functionString()

It's not needed to convert the $timestamp variable to a number before
passing it as an argument to php:functionString() as the latter will
convert it to a string anyways.

8 years agoTODO: remove entry about using enclosure for pump.io images
Antonio Ospite [Mon, 23 May 2016 12:52:18 +0000 (14:52 +0200)]
TODO: remove entry about using enclosure for pump.io images

8 years agorss_converter_pump.io.xsl: add enclosure element for images
Antonio Ospite [Mon, 23 May 2016 12:51:38 +0000 (14:51 +0200)]
rss_converter_pump.io.xsl: add enclosure element for images

8 years agoTODO: remove the item about enclosures for dilbert.com, now they are supported
Antonio Ospite [Fri, 20 May 2016 16:25:28 +0000 (18:25 +0200)]
TODO: remove the item about enclosures for dilbert.com, now they are supported

8 years agorss_converter_instagram.com.xsl: don't use a template for the enclosure
Antonio Ospite [Fri, 20 May 2016 16:17:59 +0000 (18:17 +0200)]
rss_converter_instagram.com.xsl: don't use a template for the enclosure

Each post has exactly one image, it is enough to copy the generated
element in-place without applying templates.

8 years agorss_converter_dilbert.com.xsl: add support for the <enclosure/> element
Antonio Ospite [Fri, 20 May 2016 16:15:29 +0000 (18:15 +0200)]
rss_converter_dilbert.com.xsl: add support for the <enclosure/> element

8 years agoTODO: add an entry about adding direct links to Instagram videos
Antonio Ospite [Fri, 20 May 2016 16:06:32 +0000 (18:06 +0200)]
TODO: add an entry about adding direct links to Instagram videos

8 years agorss_converter_instagram.com.xsl: make images adapt to the viewer width
Antonio Ospite [Fri, 20 May 2016 16:05:29 +0000 (18:05 +0200)]
rss_converter_instagram.com.xsl: make images adapt to the viewer width

This way the user does not have to scroll horizontally to see the whole
picture.

8 years agorss_converter_instagram.com.xsl: use a stricter match for some elements
Antonio Ospite [Fri, 20 May 2016 16:01:44 +0000 (18:01 +0200)]
rss_converter_instagram.com.xsl: use a stricter match for some elements

8 years agorss_converter_instagram.com.xsl: fix the channel link
Antonio Ospite [Fri, 20 May 2016 16:00:50 +0000 (18:00 +0200)]
rss_converter_instagram.com.xsl: fix the channel link

8 years agoTODO: update the entry about twitter images and cards, images are now supported
Antonio Ospite [Fri, 20 May 2016 12:40:31 +0000 (14:40 +0200)]
TODO: update the entry about twitter images and cards, images are now supported

8 years agorss_converter_twitter.com.xsl: generate enclosure for images
Antonio Ospite [Fri, 20 May 2016 12:04:03 +0000 (14:04 +0200)]
rss_converter_twitter.com.xsl: generate enclosure for images

8 years agorss_converter_twitter.com.xsl: show explicitly if the item has a video
Antonio Ospite [Fri, 20 May 2016 11:57:26 +0000 (13:57 +0200)]
rss_converter_twitter.com.xsl: show explicitly if the item has a video

Tweeper does not provide direct links to videos, so it's useful to tell
users that the content has a video so they can follow the link and view
it on the twitter.com page.

8 years agorss_converter_twitter.com.xsl: don't repeat background in embedded media
Antonio Ospite [Fri, 20 May 2016 11:48:00 +0000 (13:48 +0200)]
rss_converter_twitter.com.xsl: don't repeat background in embedded media

Tweeper doesn't provide direct links to videos and vines from twitter
but it still shows the preview picture provided by the original HTML
code; unfortunately the picture repeats itself, so avoid that.

8 years agorss_converter_twitter.com.xsl: present images in a more convenient way
Antonio Ospite [Fri, 20 May 2016 11:33:01 +0000 (13:33 +0200)]
rss_converter_twitter.com.xsl: present images in a more convenient way

Make images clickable and pointing to the original full-size picture.

8 years agorss_converter_twitter.com.xsl: show media content in the feed item description
Antonio Ospite [Fri, 20 May 2016 11:18:42 +0000 (13:18 +0200)]
rss_converter_twitter.com.xsl: show media content in the feed item description

8 years agorss_converter_twitter.com.xsl: use direct URLs for links, when possible
Antonio Ospite [Fri, 20 May 2016 09:10:43 +0000 (11:10 +0200)]
rss_converter_twitter.com.xsl: use direct URLs for links, when possible

Replace the t.co URLs with the actual location the link was originally
meant to point to.

8 years agorss_converter_twitter.com.xsl: add a mode attribute to the enclosure template
Antonio Ospite [Wed, 18 May 2016 11:28:01 +0000 (13:28 +0200)]
rss_converter_twitter.com.xsl: add a mode attribute to the enclosure template

This is needed because another template with
match="a[@data-expanded-url]" will be added in a future commit.

8 years agorss_converter_twitter.com.xsl: cleanup titles
Antonio Ospite [Wed, 18 May 2016 20:41:54 +0000 (22:41 +0200)]
rss_converter_twitter.com.xsl: cleanup titles

Prepend a white space in front of some URLs (those not preceded by an
open parenthesis) because otherwise they get rendered attached to the
preceding text.

Also strip non-breaking spaces and horizontal ellipses, they are not
needed because the RSS feed show the full URLs.

8 years agotweeper.php: add "image/png" to the supported types for <enclosure/>
Antonio Ospite [Wed, 18 May 2016 20:38:35 +0000 (22:38 +0200)]
tweeper.php: add "image/png" to the supported types for <enclosure/>

8 years agotweeper.php: rename DomDocument() to DOMDocument()
Antonio Ospite [Wed, 18 May 2016 20:35:28 +0000 (22:35 +0200)]
tweeper.php: rename DomDocument() to DOMDocument()

DOMDocument() is the more used form, it is also already used in some
other parts of the file.

8 years agoReturn a DOMElement instead of a string in Tweeper::generateEnclosure()
Antonio Ospite [Wed, 18 May 2016 20:32:06 +0000 (22:32 +0200)]
Return a DOMElement instead of a string in Tweeper::generateEnclosure()

This make the generated XML have proper indentation in case the
<enclosure/> element gets added.

8 years agotweeper.php: move the loadStylesheet() method more down in the file
Antonio Ospite [Wed, 18 May 2016 10:10:15 +0000 (12:10 +0200)]
tweeper.php: move the loadStylesheet() method more down in the file

This way all static methods are grouped together before non-static
methods.

8 years agotweeper.php: write XML in upper case inside comments
Antonio Ospite [Wed, 18 May 2016 09:13:45 +0000 (11:13 +0200)]
tweeper.php: write XML in upper case inside comments

8 years agotweeper.php: make jsonToXml() a static method
Antonio Ospite [Wed, 18 May 2016 09:06:34 +0000 (11:06 +0200)]
tweeper.php: make jsonToXml() a static method

8 years agotweeper.php: make logXmlError() a static method
Antonio Ospite [Wed, 18 May 2016 08:56:40 +0000 (10:56 +0200)]
tweeper.php: make logXmlError() a static method

8 years agotweeper.php: make it clearer that getUrlContents is a static method
Antonio Ospite [Tue, 17 May 2016 22:08:15 +0000 (00:08 +0200)]
tweeper.php: make it clearer that getUrlContents is a static method

8 years agotweeper.php: fix naming conventions for the get_xml_ and preprocess_html_ funcs
Antonio Ospite [Tue, 17 May 2016 22:05:22 +0000 (00:05 +0200)]
tweeper.php: fix naming conventions for the get_xml_ and preprocess_html_ funcs

8 years agotweeper.php: rename the ERROR_STREAM variable to error_stream
Antonio Ospite [Tue, 17 May 2016 21:37:35 +0000 (23:37 +0200)]
tweeper.php: rename the ERROR_STREAM variable to error_stream

Variables should be in lower case.

8 years agoUse more accurate names for the date conversion functions
Antonio Ospite [Tue, 17 May 2016 21:28:45 +0000 (23:28 +0200)]
Use more accurate names for the date conversion functions

The new names are epochToRssDate and strToRssDate.

Don't refer to gmdate() in the function names, this is just an
implementation detail which should not have leaked into the external
interface, instead mention RssDate in the function names to communicate
something about the output they produce.

Also, while at it, user the DATE_RSS format when calling gmdate().

8 years agoRename epoch_to_gmdate to epochToGmdate, and str_to_gmdate to strToGmdate
Antonio Ospite [Tue, 17 May 2016 21:15:51 +0000 (23:15 +0200)]
Rename epoch_to_gmdate to epochToGmdate, and str_to_gmdate to strToGmdate

8 years agotweeper.php: rename getContents to getUrlContents and getInfo to getUrlInfo
Antonio Ospite [Tue, 17 May 2016 21:11:52 +0000 (23:11 +0200)]
tweeper.php: rename getContents to getUrlContents and getInfo to getUrlInfo

The new names should be more explicative.

8 years agotweeper.php: use lowerCamel case for methods names
Antonio Ospite [Tue, 17 May 2016 21:07:57 +0000 (23:07 +0200)]
tweeper.php: use lowerCamel case for methods names

8 years agoFix naming conventions for the generate_enclosure functon
Antonio Ospite [Tue, 17 May 2016 21:04:48 +0000 (23:04 +0200)]
Fix naming conventions for the generate_enclosure functon

Use lowerCamel case for the function name, keep snake_case for the local
variable in the php code, and use a dash-separated case for the xsl
variable.

8 years agotweeper.php: user lowerCamel case for class variables
Antonio Ospite [Tue, 17 May 2016 14:16:09 +0000 (16:16 +0200)]
tweeper.php: user lowerCamel case for class variables

8 years agotweeper.php: fix a typo s/Apparenty/Apparently/
Antonio Ospite [Tue, 17 May 2016 14:14:29 +0000 (16:14 +0200)]
tweeper.php: fix a typo s/Apparenty/Apparently/

8 years agotweeper.php: write the word "URL" in upper case
Antonio Ospite [Tue, 17 May 2016 14:09:14 +0000 (16:09 +0200)]
tweeper.php: write the word "URL" in upper case

8 years agotweeper.php: fix style issues pointed out by PHP_CodeSniffer
Antonio Ospite [Tue, 17 May 2016 14:06:39 +0000 (16:06 +0200)]
tweeper.php: fix style issues pointed out by PHP_CodeSniffer

These issues were fixed automatically by phpcbf with some minor manual
touches.

8 years agotweeper.php: use the same parenthesis style for all functions
Antonio Ospite [Mon, 16 May 2016 13:35:58 +0000 (15:35 +0200)]
tweeper.php: use the same parenthesis style for all functions

8 years agotweeper: fix style issues found by Coder Sniffer
Antonio Ospite [Mon, 16 May 2016 13:34:20 +0000 (15:34 +0200)]
tweeper: fix style issues found by Coder Sniffer

----------------------------------------------------------------------
FOUND 3 ERRORS AFFECTING 2 LINES
----------------------------------------------------------------------
 1 | ERROR | [x] Missing file doc comment
 4 | ERROR | [x] "require" is a statement not a function; no
   |       |     parentheses are required
 4 | ERROR | [x] Language constructs must be followed by a single
   |       |     space; expected "require (" but found "require("

8 years agotweeper.php: rename $rootName to $root_node_name
Antonio Ospite [Mon, 16 May 2016 11:12:22 +0000 (13:12 +0200)]
tweeper.php: rename $rootName to $root_node_name

8 years agotweeper.php: make json_to_xml() do strictly what its name says
Antonio Ospite [Mon, 16 May 2016 11:10:28 +0000 (13:10 +0200)]
tweeper.php: make json_to_xml() do strictly what its name says

Extracting the json data from the HTML does not really belong to the
json_to_xml() function.

8 years agoUse https in URLs for Twitter.com and ao2.it
Antonio Ospite [Sun, 15 May 2016 14:48:28 +0000 (16:48 +0200)]
Use https in URLs for Twitter.com and ao2.it

8 years agorss_converter_twitter.com.xsl: fix getting the profile picture URL
Antonio Ospite [Sun, 15 May 2016 14:13:58 +0000 (16:13 +0200)]
rss_converter_twitter.com.xsl: fix getting the profile picture URL

8 years agorss_converter_dilbert.com.xsl: put the full text in the alt attribute
Antonio Ospite [Sun, 15 May 2016 14:12:12 +0000 (16:12 +0200)]
rss_converter_dilbert.com.xsl: put the full text in the alt attribute

8 years agorss_converter_dilbert.com.xsl: ellipsize long titles
Antonio Ospite [Sun, 15 May 2016 14:03:29 +0000 (16:03 +0200)]
rss_converter_dilbert.com.xsl: ellipsize long titles

8 years agorss_converter_facebook.com.xsl: fix getting the item description
Antonio Ospite [Sun, 15 May 2016 13:52:10 +0000 (15:52 +0200)]
rss_converter_facebook.com.xsl: fix getting the item description

Hopefully this is a more stable way to get just the useful content of
a story skipping the header and the footer.

8 years agorss_converter_facebook.com.xsl: fix the permalink
Antonio Ospite [Sun, 15 May 2016 13:34:13 +0000 (15:34 +0200)]
rss_converter_facebook.com.xsl: fix the permalink

Use the page id and the story id to build a more robust permalink URL.

8 years agoNEWS: add release notes for the v0.5 release v0.5
Antonio Ospite [Fri, 13 May 2016 16:12:56 +0000 (18:12 +0200)]
NEWS: add release notes for the v0.5 release

8 years agoINSTALL: mention php-symfony-serializer instead of php-xml-serializer
Antonio Ospite [Fri, 13 May 2016 16:09:13 +0000 (18:09 +0200)]
INSTALL: mention php-symfony-serializer instead of php-xml-serializer

8 years agoUse the Symfony Serializer component instead of the PEAR XML_Serializer
Antonio Ospite [Fri, 13 May 2016 16:00:04 +0000 (18:00 +0200)]
Use the Symfony Serializer component instead of the PEAR XML_Serializer

XML_Serializer is old and unmaintained, and it is going to be removed
from Debian, so use a more robust and supported alternative.

9 years agorss_converter_twitter.com.xsl: restrict tweet selection some more
Antonio Ospite [Mon, 30 Nov 2015 10:22:27 +0000 (11:22 +0100)]
rss_converter_twitter.com.xsl: restrict tweet selection some more

Only select elements which have the 'data-item-id' attribute, this way
we avoid picking up the image gallery at the top of hashtag pages which
does not have an RSS item structure.

JFTR the gallery is inside an element like this:

  <li class="AdaptiveStreamImageGallery AdaptiveSearchTimeline-separationModule js-stream-item"
      data-item-type="tweet">

with no 'data-item-id'.

9 years agorss_converter_twitter.com.xsl: set a fall-back channel title
Antonio Ospite [Fri, 27 Nov 2015 12:46:55 +0000 (13:46 +0100)]
rss_converter_twitter.com.xsl: set a fall-back channel title

When there is no screen-name, like for hashtag ans searches pages, use
the main page title as the RSS channel title.

9 years agorss_converter_twitter.com.xsl: restrict the criterion to match actual tweets
Antonio Ospite [Fri, 27 Nov 2015 11:47:50 +0000 (12:47 +0100)]
rss_converter_twitter.com.xsl: restrict the criterion to match actual tweets

By only using li[@data-item-type='tweet'] sometimes void entries where
selected, and in particular the ones under <ol class="activity-popup-users">.

So just pick the items under <ol id="stream-items-id"> as the actual tweets
with valid contents in them.

9 years agoNEWS: add release notes for the v0.4 release v0.4
Antonio Ospite [Sun, 13 Sep 2015 18:05:53 +0000 (20:05 +0200)]
NEWS: add release notes for the v0.4 release

9 years agorss_converter_instagram.com.xsl: use the username when there is no full name
Antonio Ospite [Sun, 13 Sep 2015 16:43:31 +0000 (18:43 +0200)]
rss_converter_instagram.com.xsl: use the username when there is no full name

9 years agorss_converter_instagram.com.xsl: improve channel description
Antonio Ospite [Sun, 13 Sep 2015 16:06:59 +0000 (18:06 +0200)]
rss_converter_instagram.com.xsl: improve channel description

Some users have a biography, some users only have an external URL, some
users have both and some have neither.

Make the channel description a little smarter trying to handle these
case.

9 years agorss_converter_facebook.com.xsl: fix channel title, link and description
Antonio Ospite [Sun, 13 Sep 2015 15:43:13 +0000 (17:43 +0200)]
rss_converter_facebook.com.xsl: fix channel title, link and description

It looks like using the meta elements does not work anymore.

9 years agoTODO: support for images on Twitter.com can be improved
Antonio Ospite [Sun, 13 Sep 2015 09:59:43 +0000 (11:59 +0200)]
TODO: support for images on Twitter.com can be improved