fettig.net

Hep CVS, Part 1: Getting Started

Posted by Abe on Monday, April 7, 2003 @ 3:18 pm

First, check out Hep from CVS:


cvs -d:pserver:anonymous@cvs.sourceforge.net:/cvsroot/hepserver login

hit enter; there’s no password.


cvs -z3 -d:pserver:anonymous@cvs.sourceforge.net:/cvsroot/hepserver co hep 

Next, check out the Hep messaging library, which is in a seperate CVS module:


cvs -z3 -d:pserver:anonymous@cvs.sourceforge.net:/cvsroot/hepserver co messaging 

Now you should have two directories, ‘hep’, and ‘messaging’.

Next, install Lupy, a Python port of the Lucene text-indexing system. Hep needs the Lupy modules to run.

cd into the hep directory. Hep needs to be able to find the messaging libraries, which are in the messaging folder you created. So add the folder you your PYTHONPATH environmental variable, like this:


export PYTHONPATH=$PYTHONPATH:/my/working/directory/messaging

Now you’re ready to run Hep! If you want to upgrade from an existing Hep installation, copy your hep.ini file into the hep/ folder, and run ./hep-upgrade.py to upgrade your existing configuration. Otherwise, just rename ‘hep.ini-distrib’ to ‘hep.ini’, and then run ./hep-add-user.py to create a user.

Finally, start hep by running ./hep.py.

This week: a tour of Hep CVS.

Posted by Abe on Monday, April 7, 2003 @ 3:01 pm

Steven Noels

'http://blogs.cocoondev.org/stevenn/archives/000835.html'

>is thinking of ditching Hep, which he says “is a bit undermaintained”. Ouch!

Of course Steven can’t be blamed for thinking Hep is undermaintained, as it’s

been months since the last major release, and lately I haven’t been

posting any updates about the current state of development. But in fact, the

reason things have been so quiet around here is that I’ve been working on

Hep like crazy.

And as of today, the code is in CVS.

I’m excited about this code. So starting today I’m going to be writing about some

the new features available in the CVS version of Hep, features that will be in

version 0.4 when it comes out.

CREATE INDEX, save time.

Posted by Abe on Sunday, April 6, 2003 @ 6:44 pm

This week I learned something new about the value of indexing database tables.

The background: I’m working on a data warehouse project for my employer. The database is Microsoft SQL Server.

The problem: Update table ‘A’ (containing around 600K address records) with the contents of table B (containing the results of running the 600K addresses through address cleaning software). Both tables are temporary (not in the SQL Server sense, just in the I’ll-drop-them-when-I’m-done-with-them sense) tables created by importing .csv files into SQL Server through DTS. Both tables contain a two-field key that can be used to join the matching records.

Since the tables were only going to be used once, by me, I didn’t think to create indexes on them. I just ran the update statement… and waited for 20 minutes, during which time the tempdb system database grew to around 800M (what SQL server was putting in there, I don’t know). Finally, I got impatient and cancelled the transaction.

Then, out of curiosity, I created indexes on both tables (which took less than a minute) and ran the update statement again. It finished in about 3 minutes!

The moral of the story: Even if you’re creating a table that will only ever have a single query run against it, it may still be worth your time to create some indexes. I will remember this in the future. Still, I have to wonder: Shouldn’t the database be smart enough to create a temporary index in situations like this?

« Previous Page