fettig.net

Googlebot and RSS

Posted by Abe on Thursday, April 22, 2004 @ 10:38 am

Dave Winer is upset because he thinks Google’s Googlebot web crawler has started looking for Atom and RSS 1.0 files, while excluding RSS 2.0. However, a quick look at my logs reveals that Googlebot is crawling my RSS 2 feed just fine:

64.68.82.143 - - [22/Apr/2004:02:15:25 -0400] "GET /xml/rss2.xml HTTP/1.0"
200 12891 "-" "Googlebot/2.1 (+http://www.googlebot.com/bot.html)"

I don’t see any requests for /atom.xml in my logs. There have been a few requests for /index.rdf, but that’s not suspicious since that file did exist on my server until a couple of months ago, and was linked to from my MovableType-generated home page until I edited the template.

It looks to me like Googlebot is just doing what a web crawler should do: Crawling all files linked to from the main page. If Dave’s anonymous correspondant is seeing hits on /index.rdf and /atom.xml, it probably means that his pages contain links to those files. Googlebot isn’t going to guess that a file called /index.xml exists - if you want Googlebot to crawl it, link to it!

One thing I don’t understand is that Dave’s correspondant says “It’s the first time I’ve seen googlebots looking for these files”. Possible explanations:

  • Googlebot did look for them before, but he never noticed until today.
  • He recently added links to these files.

Update, 1:00 PM

From the comments below, it seems clear that Googlebot is indeed asking some sites for /index.rdf and /atom.xml, even though it hasn’t seen any links to those files, and even when the site itself links to an /index.xml file. Interesting.

Out of curiosity, I ran a few queries to try and figure out how many feeds with common names Google has indexed:

Filename Query Hits
index.rdf filetype:rdf index 188,000
rss.xml filetype:xml rss 323,000
atom.xml filetype:xml atom 11,100

Hiding File Extensions in Movable Type

Posted by Abe on Thursday, April 15, 2004 @ 10:35 am

Brad Choate has a good post on how he uses Moveable Type to generate documents with a file extension, but then removes that file extension from his URls. I’ve tried to accomplish the same thing on fettig.net. Most of what I do is the same as Brad, but there are a few differences in my technique, which I’ll describe here.

I use the following mod_rewrite rule to send a client-side redirect response if a visitor tries to access a URI with a .html, .php, or .cgi extension:

# file.* -> file
RewriteCond %{THE_REQUEST} GET /.*.(html|cgi|php)  [NC]
RewriteRule ^(.*)(.[A-Za-z0-9]*)$ http://%{HTTP_HOST}/$1 [L,R]

This doesn’t address the issue of the Content-Location header that Brad is worried about, but it does make sure a visitors see an extension-less URI in their location bars, which they will then use for linking or bookmarking purposes.
For example, try clicking on this link:
http://www.fettig.net/2004/03/switching_to_emacs.html.

Like brad, I’ve also had to go through all all my MT templates to strip the extension. The only differenct is that I encapsulated the regular expression logic in a little plugin called “strip_extension.pl”. Here’s the code:

use strict;
use MT::Template::Context;

MT::Template::Context->add_global_filter(strip_ext => sub {
        my $s = shift;
        $s =~ s/.[A-Za-z0-9]+Z//g;
        return $s;
});

Then in my templates, I replace all instances of <$MTEntryLink$> with <$MTEntryLink strip_ext="1"$>.

Using your OS X fonts in The Gimp

Posted by Abe on Saturday, April 3, 2004 @ 5:10 pm

If you’ve installed TrueType fonts under your user account in OS X, you can make them available to Gimp.app (or, probably, any other gtk2 application) by linking your ~/Library/Fonts folder to ~/.fonts, which is where Gimp’s font library looks for user-specific fonts. In other words, just type “ln -s ~/Library/Fonts ~/.fonts” in a terminal window, and the next time you run Gimp all your user fonts will be available.

This is, as far as I can tell, an exclusive tip for Fettig.net readers! I just figured it out, and haven’t seen it anywhere else online.

Update, 4/22

Thanks to Alf Eaton at HubLog for correcting an error: I mistakenly wrote ~/System/Fonts instead of ~/Library/Fonts. I’ve corrected that now. Alf also discovered that you can add both system and user-level fonts to the Gimp by using sub-directories of ~/.fonts. For example:

mkdir ~/.fonts
ln -s /Library/Fonts ~/.fonts/sys
ln -s ~/Library/Fonts ~/.fonts/user

Pinstripe for Thunderbird, Quicksilver

Posted by Abe on Thursday, April 1, 2004 @ 10:08 pm

Thunderbird is an excellent mail client, but since I started running it under OS X it’s felt a little out of place - the flat-looking icons, the almost-but-not-quite native controls. It just felt awkward and uncomfortable, so much so that I even thought about using Apple’s Mail.app until I realized what a pain it would be to transfer all my mail and settings. But now, with the just-released Pinstripe theme, Thunderbird has a whole new look, and it’s absolutely gorgeous:

In real life it’s even nicer. If you’re using OS X, grab the latest build and give it a try - it’s truly amazing.

Also, while I’m on the subject of OS X apps: Quicksilver is just as good as everyone says it is. For the uniniated, here’s how it works. I think “I want to run FireFox” - but it’s not in the Dock! So I hit Control-space, and Quicksilver appears. I type ‘F-I’. The Firefox logo appears. I hit enter. Firefox is running. Total time: 2 seconds. No more searching through the Applications folder. Hooray!