7 years agotweeper.php: fix validation when generating enclosure elements
Antonio Ospite [Sat, 28 Feb 2015 00:43:41 +0000 (01:43 +0100)]
tweeper.php: fix validation when generating enclosure elements

The RSS specification says that the enclosure element url must be http.

So follow the specification, for now. If feedvalidator decides to relax
this requirement this hack will be removed.

7 years agotweeper.php: factor out an html_to_xml() function from the tweep() method
Antonio Ospite [Fri, 27 Feb 2015 16:08:31 +0000 (17:08 +0100)]
tweeper.php: factor out an html_to_xml() function from the tweep() method

Split the operation to get some xml out of the web page, this in
preparation for adding some more flexibility about what can be converted
to xml for a subsequent transformation.

7 years agotweeper.php: move loading the stylesheet into the Tweeper class
Antonio Ospite [Fri, 27 Feb 2015 15:54:57 +0000 (16:54 +0100)]
tweeper.php: move loading the stylesheet into the Tweeper class

Let the Tweeper class load the stylesheet, specifically when the tweep()
method is called.

This way the same Tweeper object can be reused to convert different URLs
from different websites.

7 years use ProfileHeaderCard as channel description
Antonio Ospite [Fri, 27 Feb 2015 15:47:44 +0000 (16:47 +0100)] use ProfileHeaderCard as channel description

7 years agoStrip trailing newlines
Antonio Ospite [Fri, 27 Feb 2015 14:16:55 +0000 (15:16 +0100)]
Strip trailing newlines

7 years make the feed validate with
Antonio Ospite [Fri, 27 Feb 2015 14:05:13 +0000 (15:05 +0100)] make the feed validate with

Make the feed validate with and also
improve compatibility by providing a guid and by excluding the weird
xmlns:php namespace in the output.

7 years make the feed validate with
Antonio Ospite [Fri, 27 Feb 2015 12:59:38 +0000 (13:59 +0100)] make the feed validate with

Make the feed validate with and also
improve compatibility by providing a guid and by excluding the weird
xmlns:php namespace in the output.

7 years fixes for the new
Antonio Ospite [Fri, 27 Feb 2015 12:58:33 +0000 (13:58 +0100)] fixes for the new

Make the stylesheet work again with the new website.

7 years make the feed validate with
Antonio Ospite [Fri, 27 Feb 2015 12:29:32 +0000 (13:29 +0100)] make the feed validate with

Make the feed validate with and
also improve compatibility by providing a guid and by excluding the
weird xmlns:php namespace in the output.

7 years improve the naming of some variables
Antonio Ospite [Fri, 27 Feb 2015 12:24:25 +0000 (13:24 +0100)] improve the naming of some variables

Use item-content and item-permalink instead of tweet-text and
tweet-link, this way other stylesheets can use the same names resulting
in more consistency.

7 years rename twitterBaseURL to BaseURL
Antonio Ospite [Fri, 27 Feb 2015 12:15:59 +0000 (13:15 +0100)] rename twitterBaseURL to BaseURL

This way the same notation is used in all the stylesheets.

7 years make the feed validate with
Antonio Ospite [Fri, 27 Feb 2015 12:10:20 +0000 (13:10 +0100)] make the feed validate with

Make the Twitter feed validate with and also
improve compatibility by providing a guid and by excluding the weird
xmlns:php namespace in the output.

8 years agoNEWS: add release notes for the v0.3 release v0.3
Antonio Ospite [Thu, 24 Apr 2014 12:29:40 +0000 (14:29 +0200)]
NEWS: add release notes for the v0.3 release

8 years agoUpdate email address and copyright years
Antonio Ospite [Thu, 24 Apr 2014 11:50:33 +0000 (13:50 +0200)]
Update email address and copyright years

8 years agoAdd a stylesheet for
Antonio Ospite [Thu, 24 Apr 2014 11:46:33 +0000 (13:46 +0200)]
Add a stylesheet for

8 years fix getting the profile picture
Antonio Ospite [Thu, 24 Apr 2014 10:02:17 +0000 (12:02 +0200)] fix getting the profile picture

Some more values are added tot he class attribute so the old equality
check does not work anymore.

Use a contains() check, this is more future proof and allows also to
support both the classic and the new profile pages.

8 years support the new Twitter profile page
Antonio Ospite [Thu, 24 Apr 2014 09:58:51 +0000 (11:58 +0200)] support the new Twitter profile page

Adjust the XPath expressions to support both the classic and the new
profile pages.

8 years agorss_converter_*.xsl: specify xml:base
Antonio Ospite [Sat, 22 Feb 2014 08:52:38 +0000 (09:52 +0100)]
rss_converter_*.xsl: specify xml:base

Some feed readers expand relative URLs in items by extracting the base
URL from the enclosing <link/> element, however this is not a globally
accepted practice.

Specifying xml:base is useful to cover other ways in which relative URLs
are expanded, see:

8 years agotweeper.php: support generating enclosure for "audio/ogg"
Antonio Ospite [Fri, 27 Dec 2013 08:32:25 +0000 (09:32 +0100)]
tweeper.php: support generating enclosure for "audio/ogg"

8 years agotweeper.1.asciidoc: reword some list items
Antonio Ospite [Wed, 20 Nov 2013 00:12:59 +0000 (01:12 +0100)]
tweeper.1.asciidoc: reword some list items

Remove the unneeded "as", it is in the list preamble already.

8 years agoRemove the auto-generated ChangeLog
Antonio Ospite [Tue, 19 Nov 2013 21:30:03 +0000 (22:30 +0100)]
Remove the auto-generated ChangeLog

It is not that useful once we have a nice NEWS file; and the latter can
even be used as upstream changelog in packages.

8 years agotweeper.1.asciidoc: mention a way to use tweeper from a web server
Antonio Ospite [Mon, 18 Nov 2013 22:16:54 +0000 (23:16 +0100)]
tweeper.1.asciidoc: mention a way to use tweeper from a web server

8 years agoMakefile: fix typo s/INTALLATION/INSTALLATION/
Antonio Ospite [Mon, 18 Nov 2013 16:58:14 +0000 (17:58 +0100)]

Thanks-to: gregor herrmann <>

8 years agoChangelog, NEWS: prepare for the v0.2 release v0.2
Antonio Ospite [Mon, 18 Nov 2013 12:20:02 +0000 (13:20 +0100)]
Changelog, NEWS: prepare for the v0.2 release

8 years agotweeper.1.asciidoc: small fixes to the man page
Antonio Ospite [Mon, 18 Nov 2013 11:56:52 +0000 (12:56 +0100)]
tweeper.1.asciidoc: small fixes to the man page

8 years agotweeper.1.asciidoc: add a missing semicolon
Antonio Ospite [Mon, 18 Nov 2013 11:12:16 +0000 (12:12 +0100)]
tweeper.1.asciidoc: add a missing semicolon

8 years agoAdd a ChangeLog file v0.1
Antonio Ospite [Mon, 18 Nov 2013 00:01:29 +0000 (01:01 +0100)]
Add a ChangeLog file

8 years agoAdd a NEWS file
Antonio Ospite [Sun, 17 Nov 2013 23:59:58 +0000 (00:59 +0100)]
Add a NEWS file

8 years agoAdd a Makefile rule to generate a Changelog file
Antonio Ospite [Sun, 17 Nov 2013 23:59:20 +0000 (00:59 +0100)]
Add a Makefile rule to generate a Changelog file

8 years agoAdd a man page
Antonio Ospite [Sun, 17 Nov 2013 23:43:00 +0000 (00:43 +0100)]
Add a man page

8 years agoAdd a Makefile to simplify installation and packaging
Antonio Ospite [Fri, 8 Nov 2013 15:22:26 +0000 (16:22 +0100)]
Add a Makefile to simplify installation and packaging

8 years agoAdd a wrapper script intended to be called as an executable
Antonio Ospite [Fri, 8 Nov 2013 15:17:48 +0000 (16:17 +0100)]
Add a wrapper script intended to be called as an executable

Add also an INSTALL file which explains how to set up tweeper globally
on the filesystem.

8 years agoWrite error messages on STDERR and return saner values in CLI mode
Antonio Ospite [Fri, 8 Nov 2013 14:01:25 +0000 (15:01 +0100)]
Write error messages on STDERR and return saner values in CLI mode

The previous use of die() was not very useful in CLI mode as the script
was always returning 0.

Fix that and also write error messages to the appropriate output stream.

Note the use of 'php://output' as error stream for non CLI mode, this
ensures that error messages would be readable in the browser window.

8 years agoTODO: add more info about checking UTF output
Antonio Ospite [Fri, 8 Nov 2013 12:09:19 +0000 (13:09 +0100)]
TODO: add more info about checking UTF output

8 years agoHandle errors and warnings from loadHTML()
Antonio Ospite [Fri, 8 Nov 2013 09:44:44 +0000 (10:44 +0100)]
Handle errors and warnings from loadHTML()

When parsing invalid documents loadHTML() spits out warnings and errors
which may end up polluting the output of tweeper depending on the value
of the "display_errors" variable in the PHP configuration; this may
result in the output being invalid RSS.

Handling those messages explicitly makes tweeper more robust against
different PHP configurations.

Thanks-to: gregor herrmann <>

8 years agoShow the actual name of the user the tweet comes from
Antonio Ospite [Sun, 6 Oct 2013 09:01:46 +0000 (11:01 +0200)]
Show the actual name of the user the tweet comes from

The old way of using just the screen name made re-tweeted messages look
like they were coming from the re-twitting user instead of the original
author. This is wrong and causes confusion, fix it.

9 years agoFollow HTTP redirects in get_contents() too
Antonio Ospite [Mon, 12 Aug 2013 08:16:27 +0000 (10:16 +0200)]
Follow HTTP redirects in get_contents() too

This is especially needed when http:// URLs are redirected to https://

9 years agoAdd some entries to the TODO file
Antonio Ospite [Sun, 11 Aug 2013 23:25:56 +0000 (01:25 +0200)]
Add some entries to the TODO file

9 years agoMerge branch 'generate-enclosure-element'
Antonio Ospite [Sun, 11 Aug 2013 23:22:35 +0000 (01:22 +0200)]
Merge branch 'generate-enclosure-element'

9 years agoCosmetics: re-indent cURL options to follow the coding style
Antonio Ospite [Sun, 11 Aug 2013 23:16:10 +0000 (01:16 +0200)]
Cosmetics: re-indent cURL options to follow the coding style

9 years agoUse cURL for Tweeper::get_contents() too
Antonio Ospite [Sun, 11 Aug 2013 23:13:56 +0000 (01:13 +0200)]
Use cURL for Tweeper::get_contents() too

So that the same mechanism is used for getting content and info.

Note, the "file://" scheme has to be prepended to local files so cURL
can handle them.

9 years agoRemove double semicolon in Tweeper::get_info()
Antonio Ospite [Sun, 11 Aug 2013 23:06:23 +0000 (01:06 +0200)]
Remove double semicolon in Tweeper::get_info()

9 years agoMake get_url_info() and generate_enclosure() static methods
Antonio Ospite [Sun, 11 Aug 2013 19:23:42 +0000 (21:23 +0200)]
Make get_url_info() and generate_enclosure() static methods

Also rename get_url_info() to get_info() to match the naming scheme of

9 years agoTurn epoch_to_gmdate() and str_to_gmdate() into static methods
Antonio Ospite [Sun, 11 Aug 2013 19:15:41 +0000 (21:15 +0200)]
Turn epoch_to_gmdate() and str_to_gmdate() into static methods

9 years agoMake get_contents() a static method
Antonio Ospite [Sun, 11 Aug 2013 19:11:03 +0000 (21:11 +0200)]
Make get_contents() a static method

9 years agoCosmetics: sort supported_content_types, remove unneeded spaces
Antonio Ospite [Sun, 11 Aug 2013 18:57:02 +0000 (20:57 +0200)]
Cosmetics: sort supported_content_types, remove unneeded spaces

9 years agoUse an array to list supported content types for enclosures
Antonio Ospite [Sun, 11 Aug 2013 18:52:47 +0000 (20:52 +0200)]
Use an array to list supported content types for enclosures

9 years agoMake it optional to generate the <enclosure/> element
Antonio Ospite [Sun, 11 Aug 2013 18:44:37 +0000 (20:44 +0200)]
Make it optional to generate the <enclosure/> element

9 years agoUse getopt() to parse command line options
Antonio Ospite [Sun, 11 Aug 2013 18:27:36 +0000 (20:27 +0200)]
Use getopt() to parse command line options

This will make it easier to add more options.

9 years agoSplit parsing CLI options from parsing QUERY_STRING ones
Antonio Ospite [Sun, 11 Aug 2013 18:08:37 +0000 (20:08 +0200)]
Split parsing CLI options from parsing QUERY_STRING ones

This will make it easier to add more options.

9 years agoUse templates to generate enclosures
Antonio Ospite [Sun, 11 Aug 2013 11:43:05 +0000 (13:43 +0200)]
Use templates to generate enclosures

This has two benefits:
  - make it possible to handle multiple enclosures;
  - handle _only_ the anchors with the 'data-expanded-url' attribute,
    before the change every anchor with the 'twitter-timeline-link'
    attribute was handled.

The change also makes the DOM navigation a little lighter because now
only $tweet-text is searched for the 'data-expanded-url' attribute.

9 years agoMerge into generate-encolure-elements
Antonio Ospite [Sun, 11 Aug 2013 10:48:21 +0000 (12:48 +0200)]
Merge into generate-encolure-elements

9 years agoFix a typo: s/tweeter/Twitter/
Antonio Ospite [Sun, 11 Aug 2013 10:43:42 +0000 (12:43 +0200)]
Fix a typo: s/tweeter/Twitter/

9 years agoonly enclosify certain mimetypes, use same user agent
Torsten Grote [Sun, 4 Aug 2013 21:22:02 +0000 (23:22 +0200)]
only enclosify certain mimetypes, use same user agent

9 years agoadd initial support for enclosures
Torsten Grote [Sun, 4 Aug 2013 20:00:51 +0000 (22:00 +0200)]
add initial support for enclosures

9 years agoFix a typo in an error message
Antonio Ospite [Sat, 3 Aug 2013 18:56:55 +0000 (20:56 +0200)]
Fix a typo in an error message

9 years agoAdd an RSS conversion stylesheet for
Antonio Ospite [Sun, 28 Jul 2013 20:34:06 +0000 (22:34 +0200)]
Add an RSS conversion stylesheet for

Since June 18, 2013 strips are not accessible anymore
directly from the RSS feed, this message is displayed instead:

  Dilbert readers - Please visit to read this feature. Due
  to changes with our feeds, we are now making this RSS feed a link to

How unhandy is that, was it because of a management decision?
Maybe a parody dilbert strip is needed about this issue...

9 years agoTODO: mention the <ttl/> RSS element
Antonio Ospite [Sun, 28 Jul 2013 20:30:26 +0000 (22:30 +0200)]
TODO: mention the <ttl/> RSS element

9 years use concat() more
Antonio Ospite [Sun, 28 Jul 2013 20:28:55 +0000 (22:28 +0200)] use concat() more

I think it is a little more readable, and it surely takes less

9 years agoAdd an example with
Antonio Ospite [Sat, 27 Jul 2013 15:14:07 +0000 (17:14 +0200)]
Add an example with

9 years agoMention in the README that other sites can be converted to RSS
Antonio Ospite [Sat, 27 Jul 2013 15:05:03 +0000 (17:05 +0200)]
Mention in the README that other sites can be converted to RSS

9 years agoAdd initial support for scraping activity streams
Antonio Ospite [Sat, 27 Jul 2013 14:51:38 +0000 (16:51 +0200)]
Add initial support for scraping activity streams

Use symlinks to represent alternate sites with the same structure (i.e.
same server software).

Symlinks are handy and concise, an alternative way would be to introduce
some equivalence mapping, like in the patch below, but I don't really
like that:

  diff --git a/tweeper.php b/tweeper.php
  index a019684..eb12af2 100755
  --- a/tweeper.php
  +++ b/tweeper.php
  @@ -101,9 +101,18 @@ $url = parse_url($src_url);
   if (FALSE === $url || empty($url["host"]))
     die("Invalid url: $url\n");

  -$stylesheet = __DIR__ . "/rss_converter_" . $url["host"] . ".xsl";
  +$equivalence_map = array(
  +  "" => ""
  +if (array_key_exists($url["host"], $equivalence_map))
  +  $host = $equivalence_map[$url["host"]];
  +  $host = $url["host"];
  +$stylesheet = __DIR__ . "/rss_converter_" . $host . ".xsl";
   if (FALSE === file_exists($stylesheet))
  -  die("Conversion to RSS not supported: {$url["host"]}\n");
  +  die("Conversion to RSS not supported: {$host}\n");

   $tweeper = new Tweeper($stylesheet);
   echo $tweeper->tweep($src_url);

9 years agoChange mode of tweeper.php
Antonio Ospite [Sat, 27 Jul 2013 14:46:23 +0000 (16:46 +0200)]
Change mode of tweeper.php

It is not going to be executed directly anyways.

9 years agoAdd -h and --help options
Antonio Ospite [Sat, 27 Jul 2013 14:45:47 +0000 (16:45 +0200)]
Add -h and --help options

9 years agoAdd another date conversion routine
Antonio Ospite [Sat, 27 Jul 2013 14:38:46 +0000 (16:38 +0200)]
Add another date conversion routine

9 years agoUpdate the documentation to use URLs as arguments
Antonio Ospite [Sat, 27 Jul 2013 14:36:36 +0000 (16:36 +0200)]
Update the documentation to use URLs as arguments

This change of behaviour of the interface makes the implementation of
multi-site support a lot easier.

9 years agoMention as an alternative service
Antonio Ospite [Sat, 27 Jul 2013 14:35:47 +0000 (16:35 +0200)]
Mention as an alternative service

9 years agoUse __DIR__ when building the stylesheet path name
Antonio Ospite [Sat, 27 Jul 2013 14:04:41 +0000 (16:04 +0200)]
Use __DIR__ when building the stylesheet path name

This makes it possible to call tweeper.php with an absolute path.

For now the stylesheet are assumed to be in the same directory of the
program; I have no experience with distributing php software for command
line usage, so I don't know yet how to properly handle include paths.

9 years agoRename formatDate() function to epoch_to_gmdate()
Antonio Ospite [Sat, 27 Jul 2013 14:01:36 +0000 (16:01 +0200)]
Rename formatDate() function to epoch_to_gmdate()

The could be different date conversion functions in the future.

9 years agoBe more verbose in error messages
Antonio Ospite [Sat, 27 Jul 2013 11:31:59 +0000 (13:31 +0200)]
Be more verbose in error messages

9 years agoMake stylesheet file name parametric
Antonio Ospite [Sat, 27 Jul 2013 11:24:44 +0000 (13:24 +0200)]
Make stylesheet file name parametric

The host is encoded in the file name, this is in order to support more
sites with no changes to the code, all that is needed is just new
stylesheets with the host in their filename following the scheme will:


Where HOST has the meaning of the "host" field in the return value of
the PHP parse_url() function.

9 years agoChange of behavior| Now a URL is required as an argument
Antonio Ospite [Sat, 27 Jul 2013 11:09:08 +0000 (13:09 +0200)]
Change of behavior| Now a URL is required as an argument

This makes the program more generic and prepares it to support feed
scraping for more websites.

9 years agoFactor out a usage() function
Antonio Ospite [Sat, 27 Jul 2013 10:49:21 +0000 (12:49 +0200)]
Factor out a usage() function

9 years agoUse php_sapi_name() to check for CLI interface
Antonio Ospite [Sat, 27 Jul 2013 10:43:16 +0000 (12:43 +0200)]
Use php_sapi_name() to check for CLI interface

9 years agoFix a typo
Antonio Ospite [Sun, 7 Jul 2013 13:34:21 +0000 (15:34 +0200)]
Fix a typo

9 years agoAdd more info about how to call Tweeper from command line
Antonio Ospite [Sun, 7 Jul 2013 13:33:26 +0000 (15:33 +0200)]
Add more info about how to call Tweeper from command line

9 years agoEmbed the full HTML content of the tweet in the description field
Antonio Ospite [Sat, 6 Jul 2013 23:22:47 +0000 (01:22 +0200)]
Embed the full HTML content of the tweet in the description field

Use CDATA to embed the exact copy of an element; it is neat and we get
click-able links in the feed reader for free.

9 years agoFormat dates using an external php function
Antonio Ospite [Sat, 6 Jul 2013 21:06:12 +0000 (23:06 +0200)]
Format dates using an external php function

9 years agoInitial import
Antonio Ospite [Sat, 6 Jul 2013 19:51:53 +0000 (21:51 +0200)]
Initial import