Using Distinct for Deletes
J. Scott Johnson on 2002 June 03
    

I recently encountered a tricky SQL problem that had me scratching my head until I finally put it together. What I was writing was essentially a url logging system. I wanted to write it fast since I was leaving for the weekend so I used an INSERT statement that DID NOT check for duplicates first

. This had the expected effect of, for a given url, N, duplicating it in the database every time the logging system ran. Adding a simple DISTINCT to my query prevented this from being an issue on the retrieval side but I knew that, someday, I'd have to get that data whipped into shape. Here's how I did it.


Step 1: Look at the Table

Below is our table description which I created with this SQL command:




DESCRIBE opml_instantoutlines;

Field        Type        Null    Key    Default    Extra
io_id        int(11)        PRI    NULL        auto_increment
who        char(50)    YES        NULL
iourl          char(125)    YES        NULL
blogurl        char(125)    YES        NULL
type        char(2)    YES        NULL
date        datetime    YES        NULL
ownerName    char(100)    YES        NULL
ownerEmail    char(100)    YES        NULL
title        char(100)    YES        NULL
dateCreated    char(50)    YES        NULL
dateModified    char(50)    YES        NULL
preview        char(100)    YES        NULL
iostatus    char(10)    YES        NULL
size        int(11)    YES        NULL
 
12 rows selected.






Step 2: BACKUP YOUR TABLE

If you've been lucky enough to never delete data accidentally then I take my hat off to
you. I've never lost data but I don't believe in flying without a parachute, either. I always backup my tables before playing around with complicated SQL.

REMEMBER: There is no undo in mySQL!

Below is the SQL to backup our table. A quick trip to the New Riders mySQL book (excellent) gave me the query I needed. What we are going to do is copy the contents of this table, in full, into a new table, opml_test.



CREATE TABLE opml_test SELECT * FROM opml_instantoutlines;



This creates a new table by selecting the contents of the old table into it. Now we can operate on our table opml_test without worries.

Step 3: Using Distinct

The distinct command in sql is interesting. It lets you select only the values that are unique in a column (i.e. they are distinct). Here's an example



SELECT DISTINCT(iourl) FROM opml_test;



This will return a set of urls that are unique. You can also use unique with count or:



SELECT COUNT(DISTINCT iourl) FROM opml_test;



This counts the total number of distinct urls. You would think that doing our delete would be a piece of cake with:



DELETE FROM opml_test WHERE distinct(iourl);



Nope. Doesn't work. SQL Syntax error. It's simply not legal. We're going to have to think a bit on this. Hmm...



Step 4: Deleting without Actually Deleting

After spending a bunch of time in SQL manuals without getting very far, I thought about what I was trying to do. Here's the goal:

Create a table of entries where there is one entry per unique url

Note that I didn't say that I wanted to delete. I actually don't care if I delete -- as long as I get my result. Since I know how to select the distinct results, could I use that to create my table? So I tried



CREATE TABLE opml_test1 SELECT distinct(iourl) from opml_test;



Perfect! This created a table with 1 column of all the unique urls. Now all I had to do was either change my PHP scripts to use the new table or rename the tables. I chose to use SQL to delete the old tables and rename the new table with:



DROP TABLE opml_instantoutlines;

CREATE TABLE opml_instantoutlines SELECT distinct(iourl) from opml_test1;

DROP TABLE opml_test1;

DROP TABLE opml_test;




Conclusion

Learning SQL is tricky. The way you solve problems isn't always what you think it will be but SQL is an astonishingly powerful tool that is well worth learning.


About the Author

Scott Johnson is a high tech veteran having founded his first software company at 19 (in 1987, long before the "dot com"). That company, NTERGAID, made and shipped hypertext tools before the web was even conceived. After running NTERGAID successfully, Scott sold the company to Dataware where Scott led Enterprise Knowledge Management products. After Dataware, Scott moved to Mascot Network in charge of Product Management, Product Marketing. He's now available for consulting work.
Tentatively planning to Open Soon! (no dates ...) // Doing heavy development now...