Using Distinct for Deletes
J. Scott Johnson on 2002 June 03
J. Scott Johnson on 2002 June 03
I recently encountered a tricky SQL problem that had me scratching my head until I finally put it together. What I was writing was essentially a url logging system. I wanted to write it fast since I was leaving for the weekend so I used an INSERT statement that DID NOT check for duplicates first
. This had the expected effect of, for a given url, N, duplicating it in the database every time the logging system ran. Adding a simple DISTINCT to my query prevented this from being an issue on the retrieval side but I knew that, someday, I'd have to get that data whipped into shape. Here's how I did it.Step 1: Look at the Table
Below is our table description which I created with this SQL command:
DESCRIBE opml_instantoutlines;
Field Type Null Key Default Extra
io_id int(11) PRI NULL auto_increment
who char(50) YES NULL
iourl char(125) YES NULL
blogurl char(125) YES NULL
type char(2) YES NULL
date datetime YES NULL
ownerName char(100) YES NULL
ownerEmail char(100) YES NULL
title char(100) YES NULL
dateCreated char(50) YES NULL
dateModified char(50) YES NULL
preview char(100) YES NULL
iostatus char(10) YES NULL
size int(11) YES NULL
12 rows selected.
Step 2: BACKUP YOUR TABLE
If you've been lucky enough to never delete data accidentally then I take my hat off to
you. I've never lost data but I don't believe in flying without a parachute, either. I always backup my tables before playing around with complicated SQL.
REMEMBER: There is no undo in mySQL!
Below is the SQL to backup our table. A quick trip to the New Riders mySQL book (excellent) gave me the query I needed. What we are going to do is copy the contents of this table, in full, into a new table, opml_test.
CREATE TABLE opml_test SELECT * FROM opml_instantoutlines;
This creates a new table by selecting the contents of the old table into it. Now we can operate on our table opml_test without worries.
Step 3: Using Distinct
The distinct command in sql is interesting. It lets you select only the values that are unique in a column (i.e. they are distinct). Here's an example
SELECT DISTINCT(iourl) FROM opml_test;
This will return a set of urls that are unique. You can also use unique with count or:
SELECT COUNT(DISTINCT iourl) FROM opml_test;
This counts the total number of distinct urls. You would think that doing our delete would be a piece of cake with:
DELETE FROM opml_test WHERE distinct(iourl);
Nope. Doesn't work. SQL Syntax error. It's simply not legal. We're going to have to think a bit on this. Hmm...
Step 4: Deleting without Actually Deleting
After spending a bunch of time in SQL manuals without getting very far, I thought about what I was trying to do. Here's the goal:
Create a table of entries where there is one entry per unique url
Note that I didn't say that I wanted to delete. I actually don't care if I delete -- as long as I get my result. Since I know how to select the distinct results, could I use that to create my table? So I tried
CREATE TABLE opml_test1 SELECT distinct(iourl) from opml_test;
Perfect! This created a table with 1 column of all the unique urls. Now all I had to do was either change my PHP scripts to use the new table or rename the tables. I chose to use SQL to delete the old tables and rename the new table with:
DROP TABLE opml_instantoutlines;
CREATE TABLE opml_instantoutlines SELECT distinct(iourl) from opml_test1;
DROP TABLE opml_test1;
DROP TABLE opml_test;
Conclusion
Learning SQL is tricky. The way you solve problems isn't always what you think it will be but SQL is an astonishingly powerful tool that is well worth learning.
About the Author
Scott Johnson is a high tech veteran having founded his first software company at 19 (in 1987, long before the "dot com"). That company, NTERGAID, made and shipped hypertext tools before the web was even conceived. After running NTERGAID successfully, Scott sold the company to Dataware where Scott led Enterprise Knowledge Management products. After Dataware, Scott moved to Mascot Network in charge of Product Management, Product Marketing. He's now available for consulting work.
