Antonio Ospite [Fri, 4 Nov 2016 17:02:11 +0000 (18:02 +0100)]
rss_converters_*.xsl: prefix the namespace when calling Tweeper class methods
The Tweeper class is now in a namespace, without this change the XSLT
processor would give errors like this:
PHP Warning: XSLTProcessor::transformToXml(): Unable to call handler Tweeper::epochToRssDate() in .../src/Tweeper.php on line 356
Antonio Ospite [Fri, 4 Nov 2016 12:13:54 +0000 (13:13 +0100)]
tweeper: move the main Tweeper class to its own file under src/
This matches more closely the project structure expected by composer
packages.
Antonio Ospite [Fri, 4 Nov 2016 15:02:26 +0000 (16:02 +0100)]
TODO: improve wording and remove fullstops at the end of items
Antonio Ospite [Sun, 30 Oct 2016 10:34:22 +0000 (11:34 +0100)]
Fix information leakage by validating the URL scheme
Validate the scheme to prevent leaking information by abusing the
file:// scheme.
Before this change it was possible to see what files are available on
the system running tweeper.
The script in tests/test_information_leakage.sh shows the problem on
earlier versions.
Here is an execution with tweeper-0.6:
-----------------------------------------------------------------------
URL file://twitter.com//etc/passwd
--> /etc/passwd
exists
URL file://twitter.com//etc/file_with_an_unlikely_name
... /etc/file_with_an_unlikely_name
does not exist
Staring a test server
URL file://twitter.com//etc/passwd
--> /etc/passwd on http://localhost:8000
exists
URL file://twitter.com//etc/file_with_an_unlikely_name
... /etc/file_with_an_unlikely_name on http://localhost:8000
does not exist
Shutting down the test server
-----------------------------------------------------------------------
Here is an execution after this fix:
-----------------------------------------------------------------------
PHP Fatal error: unsupported scheme: file in /home/ao2/Proj/Tweeper/tweeper/tweeper.php on line 323
URL file://twitter.com//etc/passwd
... /etc/passwd
does not exist
PHP Fatal error: unsupported scheme: file in /home/ao2/Proj/Tweeper/tweeper/tweeper.php on line 323
URL file://twitter.com//etc/file_with_an_unlikely_name
... /etc/file_with_an_unlikely_name
does not exist
Staring a test server
URL file://twitter.com//etc/passwd
... /etc/passwd on http://localhost:8000
does not exist
URL file://twitter.com//etc/file_with_an_unlikely_name
... /etc/file_with_an_unlikely_name on http://localhost:8000
does not exist
Shutting down the test server
-----------------------------------------------------------------------
Antonio Ospite [Sun, 30 Oct 2016 09:28:41 +0000 (10:28 +0100)]
tweeper.php: check the return value of Tweeper::tweep()
If the tweep() method fails return 1 to the calling process so that it
can know that something failed.
Antonio Ospite [Sun, 30 Oct 2016 09:04:51 +0000 (10:04 +0100)]
tweeper.php: check curl_exec() return value
Also show the message of curl_error() when curl_exec() fails, this way
it's easier to diagnose problems.
Antonio Ospite [Sat, 29 Oct 2016 17:34:10 +0000 (19:34 +0200)]
tweeper.php: support "application/pdf" as an enclosure content type
Antonio Ospite [Sat, 29 Oct 2016 17:17:00 +0000 (19:17 +0200)]
tweeper.php: support "application/octet-stream" as an enclosure content type
This allows binary attachment without a more specific content type to be
supported for the enclosure element.
Adding "application/octet-stream" also covers the weird case of servers
sending multiple Content-Type headers, e.g.:
< HTTP/1.1 200 OK
< Server: Apache
< ETag: "
a46d495ba00c35580f83344dd523ece2:
1473631283"
< Last-Modified: Sun, 11 Sep 2016 22:01:22 GMT
< Accept-Ranges: bytes
< Content-Length:
14346711
< Content-Type: audio/mpeg
< Content-Type: application/octet-stream
< content-disposition: attachment
< Date: Mon, 26 Sep 2016 23:36:11 GMT
< Connection: keep-alive
< Content-Type: application/octet-stream
< content-disposition: attachment
In this case the ideal solution would be to use the more _specific_
content type, but CURL just remembers the _last_ one ad it's not really
worth parsing the HTTP headers in tweeper just for this rare scenario.
Reported-by: Torsten Grote
Antonio Ospite [Mon, 23 May 2016 14:17:37 +0000 (16:17 +0200)]
NEWS: add release notes for the v0.6 release
Antonio Ospite [Mon, 23 May 2016 14:16:57 +0000 (16:16 +0200)]
rss_converter_instagram.com.xsl: strip unneeded trailing space
Antonio Ospite [Mon, 23 May 2016 13:55:34 +0000 (15:55 +0200)]
Add a HACKING file to describe the coding style used in the project
Antonio Ospite [Mon, 23 May 2016 13:45:43 +0000 (15:45 +0200)]
INSTALL: mention php-symfony-property-access as a dependency
Antonio Ospite [Mon, 23 May 2016 13:32:27 +0000 (15:32 +0200)]
tweeper.php: fix a problem with https URLs ending up in the enclosure element
When the remote host forces every URLs to be redirected to https, the
info returned by Tweeper::getUrlInfo() would contain an https URL, which
will end up being used in the "url" attribute of the enclosure element,
and this is invalid according to the RSS specification.
So make sure that an http URL is actually used for the "url" attribute.
Antonio Ospite [Mon, 23 May 2016 13:28:07 +0000 (15:28 +0200)]
Use php:functionString() in the stylesheets where appropriate
Instead of converting to string in XSL and then calling php:function(),
use directly php:functionString() when calling PHP functions which
actually expect a string argument.
Besides possible performance improvements, this is done mainly for
readability.
Antonio Ospite [Mon, 23 May 2016 12:58:45 +0000 (14:58 +0200)]
Don't convert the timestamp to a number when calling php:functionString()
It's not needed to convert the $timestamp variable to a number before
passing it as an argument to php:functionString() as the latter will
convert it to a string anyways.
Antonio Ospite [Mon, 23 May 2016 12:52:18 +0000 (14:52 +0200)]
TODO: remove entry about using enclosure for pump.io images
Antonio Ospite [Mon, 23 May 2016 12:51:38 +0000 (14:51 +0200)]
rss_converter_pump.io.xsl: add enclosure element for images
Antonio Ospite [Fri, 20 May 2016 16:25:28 +0000 (18:25 +0200)]
TODO: remove the item about enclosures for dilbert.com, now they are supported
Antonio Ospite [Fri, 20 May 2016 16:17:59 +0000 (18:17 +0200)]
rss_converter_instagram.com.xsl: don't use a template for the enclosure
Each post has exactly one image, it is enough to copy the generated
element in-place without applying templates.
Antonio Ospite [Fri, 20 May 2016 16:15:29 +0000 (18:15 +0200)]
rss_converter_dilbert.com.xsl: add support for the <enclosure/> element
Antonio Ospite [Fri, 20 May 2016 16:06:32 +0000 (18:06 +0200)]
TODO: add an entry about adding direct links to Instagram videos
Antonio Ospite [Fri, 20 May 2016 16:05:29 +0000 (18:05 +0200)]
rss_converter_instagram.com.xsl: make images adapt to the viewer width
This way the user does not have to scroll horizontally to see the whole
picture.
Antonio Ospite [Fri, 20 May 2016 16:01:44 +0000 (18:01 +0200)]
rss_converter_instagram.com.xsl: use a stricter match for some elements
Antonio Ospite [Fri, 20 May 2016 16:00:50 +0000 (18:00 +0200)]
rss_converter_instagram.com.xsl: fix the channel link
Antonio Ospite [Fri, 20 May 2016 12:40:31 +0000 (14:40 +0200)]
TODO: update the entry about twitter images and cards, images are now supported
Antonio Ospite [Fri, 20 May 2016 12:04:03 +0000 (14:04 +0200)]
rss_converter_twitter.com.xsl: generate enclosure for images
Antonio Ospite [Fri, 20 May 2016 11:57:26 +0000 (13:57 +0200)]
rss_converter_twitter.com.xsl: show explicitly if the item has a video
Tweeper does not provide direct links to videos, so it's useful to tell
users that the content has a video so they can follow the link and view
it on the twitter.com page.
Antonio Ospite [Fri, 20 May 2016 11:48:00 +0000 (13:48 +0200)]
rss_converter_twitter.com.xsl: don't repeat background in embedded media
Tweeper doesn't provide direct links to videos and vines from twitter
but it still shows the preview picture provided by the original HTML
code; unfortunately the picture repeats itself, so avoid that.
Antonio Ospite [Fri, 20 May 2016 11:33:01 +0000 (13:33 +0200)]
rss_converter_twitter.com.xsl: present images in a more convenient way
Make images clickable and pointing to the original full-size picture.
Antonio Ospite [Fri, 20 May 2016 11:18:42 +0000 (13:18 +0200)]
rss_converter_twitter.com.xsl: show media content in the feed item description
Antonio Ospite [Fri, 20 May 2016 09:10:43 +0000 (11:10 +0200)]
rss_converter_twitter.com.xsl: use direct URLs for links, when possible
Replace the t.co URLs with the actual location the link was originally
meant to point to.
Antonio Ospite [Wed, 18 May 2016 11:28:01 +0000 (13:28 +0200)]
rss_converter_twitter.com.xsl: add a mode attribute to the enclosure template
This is needed because another template with
match="a[@data-expanded-url]" will be added in a future commit.
Antonio Ospite [Wed, 18 May 2016 20:41:54 +0000 (22:41 +0200)]
rss_converter_twitter.com.xsl: cleanup titles
Prepend a white space in front of some URLs (those not preceded by an
open parenthesis) because otherwise they get rendered attached to the
preceding text.
Also strip non-breaking spaces and horizontal ellipses, they are not
needed because the RSS feed show the full URLs.
Antonio Ospite [Wed, 18 May 2016 20:38:35 +0000 (22:38 +0200)]
tweeper.php: add "image/png" to the supported types for <enclosure/>
Antonio Ospite [Wed, 18 May 2016 20:35:28 +0000 (22:35 +0200)]
tweeper.php: rename DomDocument() to DOMDocument()
DOMDocument() is the more used form, it is also already used in some
other parts of the file.
Antonio Ospite [Wed, 18 May 2016 20:32:06 +0000 (22:32 +0200)]
Return a DOMElement instead of a string in Tweeper::generateEnclosure()
This make the generated XML have proper indentation in case the
<enclosure/> element gets added.
Antonio Ospite [Wed, 18 May 2016 10:10:15 +0000 (12:10 +0200)]
tweeper.php: move the loadStylesheet() method more down in the file
This way all static methods are grouped together before non-static
methods.
Antonio Ospite [Wed, 18 May 2016 09:13:45 +0000 (11:13 +0200)]
tweeper.php: write XML in upper case inside comments
Antonio Ospite [Wed, 18 May 2016 09:06:34 +0000 (11:06 +0200)]
tweeper.php: make jsonToXml() a static method
Antonio Ospite [Wed, 18 May 2016 08:56:40 +0000 (10:56 +0200)]
tweeper.php: make logXmlError() a static method
Antonio Ospite [Tue, 17 May 2016 22:08:15 +0000 (00:08 +0200)]
tweeper.php: make it clearer that getUrlContents is a static method
Antonio Ospite [Tue, 17 May 2016 22:05:22 +0000 (00:05 +0200)]
tweeper.php: fix naming conventions for the get_xml_ and preprocess_html_ funcs
Antonio Ospite [Tue, 17 May 2016 21:37:35 +0000 (23:37 +0200)]
tweeper.php: rename the ERROR_STREAM variable to error_stream
Variables should be in lower case.
Antonio Ospite [Tue, 17 May 2016 21:28:45 +0000 (23:28 +0200)]
Use more accurate names for the date conversion functions
The new names are epochToRssDate and strToRssDate.
Don't refer to gmdate() in the function names, this is just an
implementation detail which should not have leaked into the external
interface, instead mention RssDate in the function names to communicate
something about the output they produce.
Also, while at it, user the DATE_RSS format when calling gmdate().
Antonio Ospite [Tue, 17 May 2016 21:15:51 +0000 (23:15 +0200)]
Rename epoch_to_gmdate to epochToGmdate, and str_to_gmdate to strToGmdate
Antonio Ospite [Tue, 17 May 2016 21:11:52 +0000 (23:11 +0200)]
tweeper.php: rename getContents to getUrlContents and getInfo to getUrlInfo
The new names should be more explicative.
Antonio Ospite [Tue, 17 May 2016 21:07:57 +0000 (23:07 +0200)]
tweeper.php: use lowerCamel case for methods names
Antonio Ospite [Tue, 17 May 2016 21:04:48 +0000 (23:04 +0200)]
Fix naming conventions for the generate_enclosure functon
Use lowerCamel case for the function name, keep snake_case for the local
variable in the php code, and use a dash-separated case for the xsl
variable.
Antonio Ospite [Tue, 17 May 2016 14:16:09 +0000 (16:16 +0200)]
tweeper.php: user lowerCamel case for class variables
Antonio Ospite [Tue, 17 May 2016 14:14:29 +0000 (16:14 +0200)]
tweeper.php: fix a typo s/Apparenty/Apparently/
Antonio Ospite [Tue, 17 May 2016 14:09:14 +0000 (16:09 +0200)]
tweeper.php: write the word "URL" in upper case
Antonio Ospite [Tue, 17 May 2016 14:06:39 +0000 (16:06 +0200)]
tweeper.php: fix style issues pointed out by PHP_CodeSniffer
These issues were fixed automatically by phpcbf with some minor manual
touches.
Antonio Ospite [Mon, 16 May 2016 13:35:58 +0000 (15:35 +0200)]
tweeper.php: use the same parenthesis style for all functions
Antonio Ospite [Mon, 16 May 2016 13:34:20 +0000 (15:34 +0200)]
tweeper: fix style issues found by Coder Sniffer
----------------------------------------------------------------------
FOUND 3 ERRORS AFFECTING 2 LINES
----------------------------------------------------------------------
1 | ERROR | [x] Missing file doc comment
4 | ERROR | [x] "require" is a statement not a function; no
| | parentheses are required
4 | ERROR | [x] Language constructs must be followed by a single
| | space; expected "require (" but found "require("
Antonio Ospite [Mon, 16 May 2016 11:12:22 +0000 (13:12 +0200)]
tweeper.php: rename $rootName to $root_node_name
Antonio Ospite [Mon, 16 May 2016 11:10:28 +0000 (13:10 +0200)]
tweeper.php: make json_to_xml() do strictly what its name says
Extracting the json data from the HTML does not really belong to the
json_to_xml() function.
Antonio Ospite [Sun, 15 May 2016 14:48:28 +0000 (16:48 +0200)]
Use https in URLs for Twitter.com and ao2.it
Antonio Ospite [Sun, 15 May 2016 14:13:58 +0000 (16:13 +0200)]
rss_converter_twitter.com.xsl: fix getting the profile picture URL
Antonio Ospite [Sun, 15 May 2016 14:12:12 +0000 (16:12 +0200)]
rss_converter_dilbert.com.xsl: put the full text in the alt attribute
Antonio Ospite [Sun, 15 May 2016 14:03:29 +0000 (16:03 +0200)]
rss_converter_dilbert.com.xsl: ellipsize long titles
Antonio Ospite [Sun, 15 May 2016 13:52:10 +0000 (15:52 +0200)]
rss_converter_facebook.com.xsl: fix getting the item description
Hopefully this is a more stable way to get just the useful content of
a story skipping the header and the footer.
Antonio Ospite [Sun, 15 May 2016 13:34:13 +0000 (15:34 +0200)]
rss_converter_facebook.com.xsl: fix the permalink
Use the page id and the story id to build a more robust permalink URL.
Antonio Ospite [Fri, 13 May 2016 16:12:56 +0000 (18:12 +0200)]
NEWS: add release notes for the v0.5 release
Antonio Ospite [Fri, 13 May 2016 16:09:13 +0000 (18:09 +0200)]
INSTALL: mention php-symfony-serializer instead of php-xml-serializer
Antonio Ospite [Fri, 13 May 2016 16:00:04 +0000 (18:00 +0200)]
Use the Symfony Serializer component instead of the PEAR XML_Serializer
XML_Serializer is old and unmaintained, and it is going to be removed
from Debian, so use a more robust and supported alternative.
Antonio Ospite [Mon, 30 Nov 2015 10:22:27 +0000 (11:22 +0100)]
rss_converter_twitter.com.xsl: restrict tweet selection some more
Only select elements which have the 'data-item-id' attribute, this way
we avoid picking up the image gallery at the top of hashtag pages which
does not have an RSS item structure.
JFTR the gallery is inside an element like this:
<li class="AdaptiveStreamImageGallery AdaptiveSearchTimeline-separationModule js-stream-item"
data-item-type="tweet">
with no 'data-item-id'.
Antonio Ospite [Fri, 27 Nov 2015 12:46:55 +0000 (13:46 +0100)]
rss_converter_twitter.com.xsl: set a fall-back channel title
When there is no screen-name, like for hashtag ans searches pages, use
the main page title as the RSS channel title.
Antonio Ospite [Fri, 27 Nov 2015 11:47:50 +0000 (12:47 +0100)]
rss_converter_twitter.com.xsl: restrict the criterion to match actual tweets
By only using li[@data-item-type='tweet'] sometimes void entries where
selected, and in particular the ones under <ol class="activity-popup-users">.
So just pick the items under <ol id="stream-items-id"> as the actual tweets
with valid contents in them.
Antonio Ospite [Sun, 13 Sep 2015 18:05:53 +0000 (20:05 +0200)]
NEWS: add release notes for the v0.4 release
Antonio Ospite [Sun, 13 Sep 2015 16:43:31 +0000 (18:43 +0200)]
rss_converter_instagram.com.xsl: use the username when there is no full name
Antonio Ospite [Sun, 13 Sep 2015 16:06:59 +0000 (18:06 +0200)]
rss_converter_instagram.com.xsl: improve channel description
Some users have a biography, some users only have an external URL, some
users have both and some have neither.
Make the channel description a little smarter trying to handle these
case.
Antonio Ospite [Sun, 13 Sep 2015 15:43:13 +0000 (17:43 +0200)]
rss_converter_facebook.com.xsl: fix channel title, link and description
It looks like using the meta elements does not work anymore.
Antonio Ospite [Sun, 13 Sep 2015 09:59:43 +0000 (11:59 +0200)]
TODO: support for images on Twitter.com can be improved
Antonio Ospite [Sun, 13 Sep 2015 09:59:15 +0000 (11:59 +0200)]
tweeper.1.asciidoc: update the copyright years
Antonio Ospite [Sun, 13 Sep 2015 09:57:07 +0000 (11:57 +0200)]
tweeper.1.asciidoc: document how to use the PHP built-in web server
Antonio Ospite [Sun, 13 Sep 2015 09:55:59 +0000 (11:55 +0200)]
README: improve wording in a paragraph
Antonio Ospite [Sun, 13 Sep 2015 09:52:51 +0000 (11:52 +0200)]
tweeper.1.asciidoc: describe what tweeper is in a more generic way
Antonio Ospite [Sun, 13 Sep 2015 09:41:35 +0000 (11:41 +0200)]
README: describe what tweeper is in a more generic way
Since tweeper does not support only Twitter.com but also other social
websites, give a more general idea of what it can be used for.
Antonio Ospite [Wed, 29 Jul 2015 21:05:47 +0000 (23:05 +0200)]
rss_converter_instagram.com.xsl: show the user name in the titles
This makes it easier to see who created the item when different
Instagram feeds are grouped into a directory.
Antonio Ospite [Wed, 29 Jul 2015 20:56:42 +0000 (22:56 +0200)]
tweeper: avoid a reference to $argv in non-cli mode
$argv is not defined in non-cli mode, so protect its usage behind an
is_cli() check.
For instance, this avoids a message in the PHP built-in web server, when
usage() gets called:
PHP Notice: Undefined variable: argv in .../tweeper.php on line 357
Antonio Ospite [Wed, 29 Jul 2015 20:52:15 +0000 (22:52 +0200)]
tweeper: make is_cli() stricter
Assume that tweeper is running from an actual command line only when
php_sapi_name() matches _exactly_ the string "cli".
This makes it possible to use tweeper in a browser using the PHP
built-in web server, for which php_sapi_name() returns "cli-server".
Antonio Ospite [Sat, 25 Jul 2015 13:52:57 +0000 (15:52 +0200)]
TODO: remove item about duplicated RSS items
Now all the generated feeds use the <guid/> element to uniquely identify
items.
Antonio Ospite [Sat, 25 Jul 2015 13:50:12 +0000 (15:50 +0200)]
TODO: remove item about Instagram videos
Even if tweeper does not show the video itself in the RSS item content
it at least tells the user that the content is a video, so consider this
done.
Antonio Ospite [Sat, 25 Jul 2015 13:45:33 +0000 (15:45 +0200)]
rss_converter_instagram.com.xsl: add a label if the content is a video
Antonio Ospite [Sat, 25 Jul 2015 11:08:31 +0000 (13:08 +0200)]
rss_converter_instagram.com.xsl: fix enclosure generation
Antonio Ospite [Sat, 25 Jul 2015 10:48:30 +0000 (12:48 +0200)]
rss_converter_instagram.com.xsl: ellipsize titles
Antonio Ospite [Sat, 25 Jul 2015 10:43:49 +0000 (12:43 +0200)]
rss_converter_instagram.com.xsl: use the image caption as the item content
Instagram has reintroduced serving the image caption in the json data,
so use it; it is way nicer than the stats tweeper was showing before.
Antonio Ospite [Sat, 25 Jul 2015 10:36:58 +0000 (12:36 +0200)]
rss_converter_instagram.com.xsl: use better name for the image variable
Antonio Ospite [Wed, 1 Jul 2015 11:47:53 +0000 (13:47 +0200)]
Add support for Facebook.com public pages
Antonio Ospite [Wed, 1 Jul 2015 11:37:57 +0000 (13:37 +0200)]
tweeper.php: support host-specific methods for preprocessing the HTML data
Some sites serve mangled HTML code, so a mechanism to clean it up before
loading it as XML is needed.
For instance, facebook.com puts come content inside HTML comments, and
these must be stripped in order to make the content available to the
HTML parser when loading the data into a DOMDocument.
Antonio Ospite [Wed, 1 Jul 2015 11:35:56 +0000 (13:35 +0200)]
tweeper.php: strip the leading "www." from hosts
This makes tweeper more forgiving when it is passed URLs either with or
without the "www" subdomain for the same host.
Antonio Ospite [Wed, 1 Jul 2015 11:34:53 +0000 (13:34 +0200)]
tweeper.php: make error about missing stylesheet more explicit
Antonio Ospite [Fri, 12 Jun 2015 10:06:02 +0000 (12:06 +0200)]
rss_converter_instagram.com.xsl: update to new json format
The new Instagram homepage provides json data in a format different than
before, update the xsl to support it.
Unfortunately the data in the new format does not provide the
descriptions of the items, so use some placeholder values (URL, comments
count, likes count) to present at least something.
Antonio Ospite [Sun, 31 May 2015 17:17:28 +0000 (19:17 +0200)]
rss_converter_twitter.com.xsl: update XPath of tweet content
Using the role attribute to differentiate between original tweets and
quoted tweet, as introduced in commit 4c2e986, does not work anymore,
but the fact that original tweets are <li></li> elements while quoted
tweets are <div></div> elements can be used instead.
Antonio Ospite [Tue, 5 May 2015 07:28:23 +0000 (09:28 +0200)]
rss_converter_twitter.com.xsl: improve matching the permalink
Extract the permalink using the @data-permalink-path attribute, this
works for withheld tweets too preventing them from having all the same
guid.
Antonio Ospite [Tue, 5 May 2015 07:25:20 +0000 (09:25 +0200)]
rss_converter_twitter.com.xsl: restrict tweet matching
With new style retweets the quoted text is also matched by
[@data-item-type='tweet'] but then the content is not handled, resulting
in empty items in the RSS feed.
Checking also for @role='listitem' allows to pick up only top-level
tweets.
Antonio Ospite [Tue, 5 May 2015 07:21:48 +0000 (09:21 +0200)]
tweeper.php: make date handling functions a little more robust
Provide at least _some_ error checking and a fall-back value for invalid
dates.
Antonio Ospite [Mon, 2 Mar 2015 14:05:54 +0000 (15:05 +0100)]
tweeper.php: factor out an is_cli() function
Antonio Ospite [Mon, 2 Mar 2015 12:05:32 +0000 (13:05 +0100)]
README: mention the supported sites in the README file
Antonio Ospite [Mon, 2 Mar 2015 12:00:59 +0000 (13:00 +0100)]
tweeper.1.asciidoc: mention the supported sites in the man page