fettig.net

I did not know that

Posted by Abe on Tuesday, November 26, 2002 @ 6:01 pm

Here’s a fairly common programming task:  Given a string, see if it begins with another string.  For example, you may have a list of words, and you want to grab only the ones that begin with "Test".  So you may do something like this (for those who just walked in, we’re talking Python here):

for word in wordList:

    if word[:4] == "Test":

print word

…which is really not too bad, except that you’ve hard coded your test as being the first four characters, so later if you change "Test" to something else you have to change the [:4] bit to match.  So instead you might do this:

searchWord = "Testing"

for word in wordList:

    if word[:(len(searchWord)] == searchWord:

print word

Which is a bit more flexible, but harder to read.  Still, that’s what I had been doing most of the time.  But it turns out that Python gives you a much better way, that I’d never seen before just now:

for word in wordList:

    if word.startswith("Test"):

        print word

Yes, every Python string has a startswith method, that works just the way you’d expect it to.  Thanks to Mark for indirectly pointing this out to me through his code.

A Python library for working with messages

Posted by Abe on Tuesday, November 26, 2002 @ 1:36 pm

Wari figures out how Hep and PyBlosxom are similar: they’re both Python apps that deal with reading, storing, and editing messages that live in files.


So they could both use a common library for handling the nitty-gritty of parsing and generating files.  Moreover this library could provide a way to get generic Message objects out of many of the common formats for storing messages, including:

  • RSS (0.9.x, 1.0, 2.0)

  • maildir

  • mbox

  • blosxom-style text files

Having such a library would make it easier to write weblog tools, RSS aggregators, conversion utilities, and PIMs in Python. 


Then I’d like to write another library sits on top of the first.  This second library would (optionally?) use Twisted to provide asynchronous network operations, and give you the ability to access:

  • RSS/maildir/mbox files on FTP and web servers

  • Weblogs through the Blogger and MetaWeblog APIs

  • Advogato diaries

  • IMAP mailboxes

With such a library, it would be really easy to work with messages in Python without worrying about the underlying protocols involved.  For example, you could copy the contents of an RSS feed to a weblog like so:

store1 = openMessageStore("http://www.mysite.com/rss.xml")

store2 = openMessageStore("blogger://username:password@plant.blogger.com/RPC2/MyBlogID")

for messageNo in range(store1.messageCount):

message = store1.getMessage(messageNo)

store2.appendMessage(message)

A lot of the code to build these libraries is already in Hep.  Probably the most important thing at this point is coming up with nice APIs that will work accross many different message formats, and that other programmers will find easy to use.  I started working on a set of such interfaces last night.  If I have time later today I’ll post what I’ve done so far.

Pyblosxom

Posted by Abe on Monday, November 25, 2002 @ 3:19 pm

There’s a new Pyblosxom release out (0.5i_rev3), and the code is now on SourceForge.  Cool. 


I’ve been thinking about Hep and Pyblosxom, where they overlap, whether they could share some code.  These thoughts are starting to solidify, but not so much that I can actually explain them :-).

An Endorsement

Posted by Abe on Monday, November 25, 2002 @ 1:55 pm

Although he warns that "it takes a couple steps to get it running", Kreblog says that "Hep is definitely worth checking out."


That’s right, kids, say something nice about Hep and I’ll link to you, too.

RSS Politeness

Posted by Abe on Monday, November 25, 2002 @ 1:50 pm

Bill Kearney warns that he’s going to be posting some public criticism of RSS readers that don’t meet his standards for "politeness".  He thinks feeds shouldn’t be fetched more than once every few hours (or days), and that there should be support for feed-indicated scheduling and compression, among other things. 


I’m afraid Hep would get a big fat F on his test at the moment, but I’ll certainly add some of these things to my TODO list.  On the plus side, Wari just contributed a patch to make Hep’s ETag support work with more servers than it had before.  So we’re improving.


The one thing I don’t agree with is Bill’s statement that "most [feeds] could do with being polled every few days instead of every few hours."  To me, one of the big advantages of using RSS is that I get to read news soon after it’s posted.    Before I started using Hep I’d check some sites several times a day to see if there was any new news, and those visits use a lot more bandwidth than grabbing an RSS feed.

More On Highlighting Search Words

Posted by Abe on Tuesday, November 19, 2002 @ 6:12 pm

Wari points to a javascript that will highlight Google search words, at kryogenix.org.  I’d considered that, but I didn’t think there was a way to get the referrer from javascript.  Turns out that there is - document.referrer.  I think this is a better way of highlighting search words.  Although it won’t work for a small percentage of clients, it should degrade gracefully (although I should test it on an old browser and see what happens).  More importantly, it takes the work off the server, and it will work on static HTML files that aren’t processed use pyblosxom (as long as they include the javascript).


This afternoon, in a strange coincidence, I came accross Michael Radwin’s blog (which I’d never read before), and found that he too is working on highlighting Google search terms.  His idea is to implement it on the server, as an Apache module.

Helping people find what they’re searching for

Posted by Abe on Monday, November 18, 2002 @ 4:49 pm

Looking through my access logs the other day I noticed something interesting.  I’m getting some people visiting my site based on Google searches, maybe 10 hits a day.  Usually Google points these people to http://www.fettig.net/ , the default (a.k.a "index") page of my site.


The problem is that I only keep the most recent 5 weblog entries on the front page, so a lot of the time the text Google was trying to direct them to isn’t there anymore.  So to find what they were looking for they have to either:

  • search through the archives manually

  • go back to Google and pull the old version of my default page out of Google’s cache

Of course, I should have a search box on my site.  That would give visitors a better way to find what they were looking for.  But it doesn’t solve the core problem:  People are coming to my site looking for something specific, and I’m giving them the same default page I give everyone else.  The experience they have is similar to when you call tech support and, after explaining your problem, get tranferred to another representative, to whom you must explain your problem all over again:


Google: Hello, how may I help you?

User: I’d like to know about "python twisted asynchronous xml-rpc".

Google: I’d suggest you speak with Fettig.net.

User: OK. [clicks the link]

Fettig.net: Hi, welcome to fettig.net.  Here are the latest 5 weblog posts.

User: [scans posts] Actually, I’m not interested in any of that. [begins looking for a way to search/browse the site]


That looks like bad customer service to me.  But is there any alternative?  Why, yes.  How about checking to see where they came from before giving them the default page?  If they’re coming from Google, and they searched for something specific, don’t give them the latest 5 weblog entries.  Give them the 5 entries most related to what they searched for.


To do this myself, I need to have some way to search my site, and I’m working on that.  But in the meantime I’ve added another feature that hopefully will make this site a little easier to use.  If you come to this weblog from a Google search, all the words that were in your search will be highlighted in orange (for example, search google for "python twisted asynchronous xml-rpc", and click through to fettig.net).  This works on archives as well as the default page, so it should be helpful even when Google directs a user to a specific page.


This works through a pyblosxom preformatter called "highlightsearches", which you can download here.  Don’t use this preformatter if you’re using the preformatter-caching feature of pyblosxom — it needs to run for each page view, for obvious reasons.

Where Hep is Going

Posted by Abe on Friday, November 15, 2002 @ 12:01 pm

With Hep 0.3 out, and the known bugs fixed, I’m thinking about where to go next.  Here are my thoughts, for everyone else to read and comment on -  sort of my "State of Hep" address.


What is Hep?


I don’t think I’ve ever really taken the time to explain what my vision for Hep is.   If I have, I haven’t explained it well.  So I’ll try to do so here.


Hep is a message server.  It’s a server in the sense that it’s a program without a graphical user interface, that runs in the background, waiting for other programs to connect to it over the network.  I call it a "message server" because it does things with messages, which as far as Hep is concerned are any little bits of text or HTML that you want to read, or save, or publish, or pass on to somebody else, or convert into a different format, or organize.


The goal of Hep is to make it possible to work with messages in all these ways, without having to worry about where the messages are, what format they’re stored in, or what protocol you have to use to get at them.  Hep lets you use a program that was designed to to work with a specific kind of messages (like an e-mail client), and use it to work with all kinds of other types or messages (like RSS news feeds, news groups, and your weblog).


Looking Around


There are a lot of interesting message-related projects going on right now.  ZOE is a message server that handles only e-mail,  but has a nice web interface and lets you do cool stuff with searching messages and seeing how they relate to each other.  Spaces is an GUI PIM app that can work with RSS as well as e-mail.  Apparently Spaces is also going to support sending messages to weblogs, as well as some kind of peer-to-peer framework.  Chandler is another GUI PIM application that will do more than just e-mail, although it’s still in the planning stages at this point.  And of course there’s Radio and a whole lot of similar tools for working with RSS feeds and weblogs.


What makes Hep different from all of these is that it’s not tied too closely to any one type of message, or messaging protocol.   Hep isn’t an e-mail tool, or an RSS tool.  It’s more of a universal message tool.


You can connect to Hep using ZOE, Spaces, or Radio if you want.  You can also use a traditional PIM program like Outlook, or Unix tools like fetchmail, procmail, and wget.  And Hep will let you connect these tools in ways that were never possible before.


Where Hep is Today


Hep 0.3 supports the idea of sources and destinations.  A message source is a place Hep can find new messages.  Out of the box Hep supports RSS news feeds, and diaries on Advogato (or other sites that use the Virgule community software).  Hep pulls messages from the sources you set up, and then stores them in it’s Inbox.  You can read the messages in your Hep Inbox by connecting to Hep with your browser.  Or you can download them into your e-mail program by connecting to Hep’s built-in POP server.


A destination is somewhere you send messages to.  Hep 0.3 supports most popular weblog systems as destinations.  Once you’ve set up a destination in Hep you can compose a message in your e-mail client and send it through Hep’s SMTP server.  Hep will deliver it for you, whether that means sending the message to an e-mail recipient or posting it to your weblog.


Where Hep is Going


Getting messages from sources and sending messages to destinations is nice, but it’s also limiting.  All you can do with a source is read message from it, and a destination is basically a black hole that a message disappears into, never to be seen again (at least not in it’s original form).  In some cases this is an accurate representation of what’s actually going on - an RSS feed really is read-only, and an instant message really does vanish into the network - but in many cases looking at messages this way leaves out a lot of possibilities.


The next step for Hep is adding support for message stores, collections of messages that you can read, write, delete from, add to, and edit.  Hep should be able to provide access to messages in its own internal folders, as well as remote stores like weblogs, newsgroups, and IMAP mailboxes.  Of course this means that Hep will have to include some new server protocols: IMAP and WebDAV.


IMAP is the protocol I’m most excited about.  Almost all popular e-mail clients support it, and it’s a designed specifically for managing messages.  Hep’s IMAP server would let you work with both internal and remote stores, so your weblog will look like just another folder full of messages.  Want to publish a set of e-mail messages as a weblog?  Just copy the e-mails into the weblog’s folder, and Hep will automatically convert and publish them.  IMAP also uses a constant connection between the client and server, and lets the server notify the client of changes.  This would let users see new messages as soon as they are recieved, which is important for instant messaging.


WebDAV is less immediately useful, but it’s an interesting possibility.  WebDAV is a protocol for working with remote files, supported by Windows 2000, OS X (I think) and Linux file managers like Nautilus.  If Hep included a WebDAV server, you could manage messages with your file manager, dragging and dropping to move them around, and edit them with standard tools.   I’d like to make the Hep WebDAV server support different ways of viewing messages, so you could make the same message store appear as a folder full of HTML files, or a maildir directectory, or a single RSS file.  And what would be really cool is if you could switch views on the fly - to update an RSS file, just switch to the "bunch of HTML files" view, drag-and-drop the HTML file you want to add, then switch back to the RSS view and you’ve got a new RSS file that includes the HTML you just added.


FTP is a possibility here too, although I don’t think it offers the capabilities that WebDAV does.


Storing messages within Hep also will make it possible to search, filter, and group messages in interesting ways, and easily access the same messages from different client applications.


Making it Happen


So there’s a lot of work to do.  Right now my TODO list looks something like this:

  • Add support for storing messages, creating sub-folders, etc.

  • Upgrade weblog drivers to support viewing and editiing messages as well as posting

  • Write the IMAP server (Twisted doesn’t include IMAP support)

  • Write plugins for Jabber, AIM, NNTP, and other protocols

  • Write the WebDAV server (One of the Twisted developers is already working on support for the DAV protocol)

  • Improve user interaction, error reporting for undeliverable messages, etc.

  • Improve the web interface

  • Add support for searching and filtering messages

  • Build installers for Mac and Windows, and RPMs and debs for Linux.

If you’re interested in Hep, now is a great time to get involved with the project.  Even if you’re not an experienced Python programmer, you can make a significant contribution.  The more people who are using Hep, testing it out, finding bugs, and making suggestions, the faster the development will go.  If you’re interested, send me an e-mail, or join the developer’s mailing list.

Hep 0.3.2

Posted by Abe on Tuesday, November 12, 2002 @ 6:16 pm

Hep 0.3.2 is out.  This is a bugfix release only, no new features.  Changelog.

Hep 0.3.1

Posted by Abe on Wednesday, November 6, 2002 @ 3:33 pm

Hep 0.3.1 is out.


Changes from Hep 0.3:

  • The web view of mailboxes is much nicer (Thanks to David Dorward for his suggestions and patches).

  • A potentially nasty bug in the message-delivery module was fixed.  Under unusual circumstances Hep would incorrectly think that it handn’t delivered a message, and deliver the same message over and over.

  • RSS handling is improved, with support for guid and content:encoded (Patches from Wari Wahab).

Next Page »