tweeper.git
11 years agoUse cURL for Tweeper::get_contents() too
Antonio Ospite [Sun, 11 Aug 2013 23:13:56 +0000 (01:13 +0200)]
Use cURL for Tweeper::get_contents() too

So that the same mechanism is used for getting content and info.

Note, the "file://" scheme has to be prepended to local files so cURL
can handle them.

11 years agoRemove double semicolon in Tweeper::get_info()
Antonio Ospite [Sun, 11 Aug 2013 23:06:23 +0000 (01:06 +0200)]
Remove double semicolon in Tweeper::get_info()

11 years agoMake get_url_info() and generate_enclosure() static methods
Antonio Ospite [Sun, 11 Aug 2013 19:23:42 +0000 (21:23 +0200)]
Make get_url_info() and generate_enclosure() static methods

Also rename get_url_info() to get_info() to match the naming scheme of
get_contents().

11 years agoTurn epoch_to_gmdate() and str_to_gmdate() into static methods
Antonio Ospite [Sun, 11 Aug 2013 19:15:41 +0000 (21:15 +0200)]
Turn epoch_to_gmdate() and str_to_gmdate() into static methods

11 years agoMake get_contents() a static method
Antonio Ospite [Sun, 11 Aug 2013 19:11:03 +0000 (21:11 +0200)]
Make get_contents() a static method

11 years agoCosmetics: sort supported_content_types, remove unneeded spaces
Antonio Ospite [Sun, 11 Aug 2013 18:57:02 +0000 (20:57 +0200)]
Cosmetics: sort supported_content_types, remove unneeded spaces

11 years agoUse an array to list supported content types for enclosures
Antonio Ospite [Sun, 11 Aug 2013 18:52:47 +0000 (20:52 +0200)]
Use an array to list supported content types for enclosures

11 years agoMake it optional to generate the <enclosure/> element
Antonio Ospite [Sun, 11 Aug 2013 18:44:37 +0000 (20:44 +0200)]
Make it optional to generate the <enclosure/> element

11 years agoUse getopt() to parse command line options
Antonio Ospite [Sun, 11 Aug 2013 18:27:36 +0000 (20:27 +0200)]
Use getopt() to parse command line options

This will make it easier to add more options.

11 years agoSplit parsing CLI options from parsing QUERY_STRING ones
Antonio Ospite [Sun, 11 Aug 2013 18:08:37 +0000 (20:08 +0200)]
Split parsing CLI options from parsing QUERY_STRING ones

This will make it easier to add more options.

11 years agoUse templates to generate enclosures
Antonio Ospite [Sun, 11 Aug 2013 11:43:05 +0000 (13:43 +0200)]
Use templates to generate enclosures

This has two benefits:
  - make it possible to handle multiple enclosures;
  - handle _only_ the anchors with the 'data-expanded-url' attribute,
    before the change every anchor with the 'twitter-timeline-link'
    attribute was handled.

The change also makes the DOM navigation a little lighter because now
only $tweet-text is searched for the 'data-expanded-url' attribute.

11 years agoMerge https://github.com/grote/Tweeper into generate-encolure-elements
Antonio Ospite [Sun, 11 Aug 2013 10:48:21 +0000 (12:48 +0200)]
Merge https://github.com/grote/Tweeper into generate-encolure-elements

11 years agoFix a typo: s/tweeter/Twitter/
Antonio Ospite [Sun, 11 Aug 2013 10:43:42 +0000 (12:43 +0200)]
Fix a typo: s/tweeter/Twitter/

11 years agoonly enclosify certain mimetypes, use same user agent
Torsten Grote [Sun, 4 Aug 2013 21:22:02 +0000 (23:22 +0200)]
only enclosify certain mimetypes, use same user agent

11 years agoadd initial support for enclosures
Torsten Grote [Sun, 4 Aug 2013 20:00:51 +0000 (22:00 +0200)]
add initial support for enclosures

11 years agoFix a typo in an error message
Antonio Ospite [Sat, 3 Aug 2013 18:56:55 +0000 (20:56 +0200)]
Fix a typo in an error message

11 years agoAdd an RSS conversion stylesheet for dilbert.com
Antonio Ospite [Sun, 28 Jul 2013 20:34:06 +0000 (22:34 +0200)]
Add an RSS conversion stylesheet for dilbert.com

Since June 18, 2013 dilbert.com strips are not accessible anymore
directly from the RSS feed, this message is displayed instead:

  Dilbert readers - Please visit Dilbert.com to read this feature. Due
  to changes with our feeds, we are now making this RSS feed a link to
  Dilbert.com.

How unhandy is that, was it because of a management decision?
Maybe a parody dilbert strip is needed about this issue...

11 years agoTODO: mention the <ttl/> RSS element
Antonio Ospite [Sun, 28 Jul 2013 20:30:26 +0000 (22:30 +0200)]
TODO: mention the <ttl/> RSS element

11 years agorss_converter_twitter.com.xsl: use concat() more
Antonio Ospite [Sun, 28 Jul 2013 20:28:55 +0000 (22:28 +0200)]
rss_converter_twitter.com.xsl: use concat() more

I think it is a little more readable, and it surely takes less
characters.

11 years agoAdd an example with identi.ca
Antonio Ospite [Sat, 27 Jul 2013 15:14:07 +0000 (17:14 +0200)]
Add an example with identi.ca

11 years agoMention in the README that other sites can be converted to RSS
Antonio Ospite [Sat, 27 Jul 2013 15:05:03 +0000 (17:05 +0200)]
Mention in the README that other sites can be converted to RSS

11 years agoAdd initial support for scraping Pump.io activity streams
Antonio Ospite [Sat, 27 Jul 2013 14:51:38 +0000 (16:51 +0200)]
Add initial support for scraping Pump.io activity streams

Use symlinks to represent alternate sites with the same structure (i.e.
same server software).

Symlinks are handy and concise, an alternative way would be to introduce
some equivalence mapping, like in the patch below, but I don't really
like that:

  diff --git a/tweeper.php b/tweeper.php
  index a019684..eb12af2 100755
  --- a/tweeper.php
  +++ b/tweeper.php
  @@ -101,9 +101,18 @@ $url = parse_url($src_url);
   if (FALSE === $url || empty($url["host"]))
     die("Invalid url: $url\n");

  -$stylesheet = __DIR__ . "/rss_converter_" . $url["host"] . ".xsl";
  +$equivalence_map = array(
  +  "identi.ca" => "pump.io"
  +);
  +
  +if (array_key_exists($url["host"], $equivalence_map))
  +  $host = $equivalence_map[$url["host"]];
  +else
  +  $host = $url["host"];
  +
  +$stylesheet = __DIR__ . "/rss_converter_" . $host . ".xsl";
   if (FALSE === file_exists($stylesheet))
  -  die("Conversion to RSS not supported: {$url["host"]}\n");
  +  die("Conversion to RSS not supported: {$host}\n");

   $tweeper = new Tweeper($stylesheet);
   echo $tweeper->tweep($src_url);

11 years agoChange mode of tweeper.php
Antonio Ospite [Sat, 27 Jul 2013 14:46:23 +0000 (16:46 +0200)]
Change mode of tweeper.php

It is not going to be executed directly anyways.

11 years agoAdd -h and --help options
Antonio Ospite [Sat, 27 Jul 2013 14:45:47 +0000 (16:45 +0200)]
Add -h and --help options

11 years agoAdd another date conversion routine
Antonio Ospite [Sat, 27 Jul 2013 14:38:46 +0000 (16:38 +0200)]
Add another date conversion routine

11 years agoUpdate the documentation to use URLs as arguments
Antonio Ospite [Sat, 27 Jul 2013 14:36:36 +0000 (16:36 +0200)]
Update the documentation to use URLs as arguments

This change of behaviour of the interface makes the implementation of
multi-site support a lot easier.

11 years agoMention http://rssitfor.me as an alternative service
Antonio Ospite [Sat, 27 Jul 2013 14:35:47 +0000 (16:35 +0200)]
Mention rssitfor.me as an alternative service

11 years agoUse __DIR__ when building the stylesheet path name
Antonio Ospite [Sat, 27 Jul 2013 14:04:41 +0000 (16:04 +0200)]
Use __DIR__ when building the stylesheet path name

This makes it possible to call tweeper.php with an absolute path.

For now the stylesheet are assumed to be in the same directory of the
program; I have no experience with distributing php software for command
line usage, so I don't know yet how to properly handle include paths.

11 years agoRename formatDate() function to epoch_to_gmdate()
Antonio Ospite [Sat, 27 Jul 2013 14:01:36 +0000 (16:01 +0200)]
Rename formatDate() function to epoch_to_gmdate()

The could be different date conversion functions in the future.

11 years agoBe more verbose in error messages
Antonio Ospite [Sat, 27 Jul 2013 11:31:59 +0000 (13:31 +0200)]
Be more verbose in error messages

11 years agoMake stylesheet file name parametric
Antonio Ospite [Sat, 27 Jul 2013 11:24:44 +0000 (13:24 +0200)]
Make stylesheet file name parametric

The host is encoded in the file name, this is in order to support more
sites with no changes to the code, all that is needed is just new
stylesheets with the host in their filename following the scheme will:

  rss_converter_HOST.xsl

Where HOST has the meaning of the "host" field in the return value of
the PHP parse_url() function.

11 years agoChange of behavior| Now a URL is required as an argument
Antonio Ospite [Sat, 27 Jul 2013 11:09:08 +0000 (13:09 +0200)]
Change of behavior| Now a URL is required as an argument

This makes the program more generic and prepares it to support feed
scraping for more websites.

11 years agoFactor out a usage() function
Antonio Ospite [Sat, 27 Jul 2013 10:49:21 +0000 (12:49 +0200)]
Factor out a usage() function

11 years agoUse php_sapi_name() to check for CLI interface
Antonio Ospite [Sat, 27 Jul 2013 10:43:16 +0000 (12:43 +0200)]
Use php_sapi_name() to check for CLI interface

11 years agoFix a typo
Antonio Ospite [Sun, 7 Jul 2013 13:34:21 +0000 (15:34 +0200)]
Fix a typo

11 years agoAdd more info about how to call Tweeper from command line
Antonio Ospite [Sun, 7 Jul 2013 13:33:26 +0000 (15:33 +0200)]
Add more info about how to call Tweeper from command line

11 years agoEmbed the full HTML content of the tweet in the description field
Antonio Ospite [Sat, 6 Jul 2013 23:22:47 +0000 (01:22 +0200)]
Embed the full HTML content of the tweet in the description field

Use CDATA to embed the exact copy of an element; it is neat and we get
click-able links in the feed reader for free.

11 years agoFormat dates using an external php function
Antonio Ospite [Sat, 6 Jul 2013 21:06:12 +0000 (23:06 +0200)]
Format dates using an external php function

11 years agoInitial import
Antonio Ospite [Sat, 6 Jul 2013 19:51:53 +0000 (21:51 +0200)]
Initial import