growing a database

I’ve been doing database driven websites for about seven years now, some of my own and some at work. Below I list some tips that have saved me a lot of trouble. Nothing complicated, and generally intended for small to medium sized databases…

Break up the tables

I have a database that stores information about dogs taken into a rescue group. Of course it started out with the basic information: name of dog, where obtained (shelter, owner, etc), who is fostering the dog, where a picture of the dog is, a descriptive “biography” of the dog, and a comment log noting things about the dog, or other relevant information. Something like:

CREATE TABLE Foster_Dogs (
  id MEDIUMINT(8) NOT NULL AUTO_INCREMENT,
  name CHAR(30) NOT NULL,
  source ENUM ("Shelter", "Owner", "Stray", "Vet", "Rescue"),
  intake DATE,
  foster_home MEDIUMINT(8),
  picture VARCHAR(60),
  bio TEXT,
  comments TEXT
  /* and actually much, much more...sigh */
);
 

Of course all kinds of situations have come up that I never thought of when first putting this together. The group started kenneling some dogs in addition to fostering them. Polls, vet bills, sponsorships, etc. In short, more information as time went on.

The first few times, I modified the entire “Foster dog” table to add on new fields, crossing my fingers each time that I woudn’t honk the database. Of course I always backed things up, but restoring in the case of mistakes was always a time consuming chore.

Then I hit on a much better solution. I already had a unique ID for each foster dog, because I knew I could not guarantee unique names for each dog (indeed the first two years alone saw over 20 dogs named “Buddy”). So I created a new table altogether.

As a simple example, some of the fostered dogs are cross indexed with a free advertising group to look for more potential adoptors. But the group wants to be able to remove such dogs from the other group when they have been adopted. So the simplest solution is a new table, consisting of two columns: the dog’s ID, and the name of the group it is cross-listed with. The original foster table is absolutely untouched. And it’s a snap to add this information to the edit pages seamlessly. The volunteers don’t know how the structure is underneath, it all comes up on the display pages, plus they can search on “which dogs are currently listed in which groups” and get the list — quickly.

Quantify select fields!

This bit me in the butt several times. Not so much now with the new tables approach above, but sometimes a table really needs to be altered in some way. In which case, a SELECT statement of the form:

SELECT * FROM Foster_Dogs...

is going to hurt when the wrong values are shoved into the wrong fields. No matter how painful it is to list (although yet again with the micro table method above, this is also alleviated), list them explicitly! That way even if the columns are reordered, the results from the query are always ordered the same way.

No, I don’t know why I didn’t always add new fields at the ends of records :-O Some things are best forgotten, I think…

Use ID’s Everywhere

When coming up with some object on which data will be stored, always set up unique ID’s for the table. Both MySQL and Postgresql have functions for unique, autoincrementing numbers so that I never even have to try and track or generate the numbers. With the unique ID’s I can then mix and match other items.

For example a table that pairs up volunteer and foster dog ID’s along with start/end timestamps might be all I need at this stage to have complete records of who has fostered which dog when (even multiple times, if I consider each element in this table to be a “Foster_Interval” and give this a unique ID.

To recap, that would be a definition something like this:

CREATE TABLE Foster_Intervals (
  id MEDIUMINT(8) NOT NULL AUTO_INCREMENT,
  dog_id MEDIUMINT(8) NOT NULL,
  vol_id MEDIUMINT(8) NOT NULL,
  start_time TIMESTAMP NOT NULL,
  end_time TIMESTAMP
);

The unique id guarantees that any instance of a volunteer fostering a dog will be retained. If the same volunteer were to foster the same dog multiple times, that could overwrite the record so I only had information on the latest fostering stint, not all of them.

Now the query to pull out currently fostered dogs might be done this way:

SELECT Foster_Dogs.name, Volunteers.Firstname, Volunteers.Lastname, Foster_Intervals.start_time
  FROM Foster_Dogs, Volunteers, Foster_Intervals
  WHERE Foster_Intervals.dog_id=Fosters.id AND
            Foster_Intervals.vol_id=Volunteers.id AND
            end_time=NULL
  ORDER BY Foster_Intervals.start_time

A simple table, but pulling together a good deal of info on top of other already constructed tables.

Record those tables

Yes, a dump is generally a good way to get the structures back and both MySQL and Postgresql have ways for doing that. But I’ve had a few occasions where a table got deleted and then good luck with reconstructing the table. So for convenience’s sake and for backup purposes, I maintain a file with the original table declarations in it. In some cases, I’ve even written scripts that automatically dump table declarations and check them into a version control system. I run these scripts before doing anything with the database and this way I have a record of the changes over time to the database.

Other ideas?

I find that sitting down and mapping things out helps me to form a good database structure. Of coruse, there will be different considerations depending on the (eventual) size of the database: much larger structures may start to slow down too much with too many small tables but tables with too much aggregated information are at risk of becoming corrupted over time.

I’d be curious to hear of any other tips folks have acquired from their experience, too.

del.icio.us:growing a database  digg:growing a database

Leave a Comment

Bad Behavior has blocked 1093 access attempts in the last 7 days.