[Pdx-pm] [pdxpug] duplicate records

Austin Schutz tex at off.org
Wed Oct 18 21:30:05 PDT 2006


On Wed, Oct 18, 2006 at 06:17:00PM -0700, Thomas J Keller wrote:
> The problem is that the primary keys are unique.
> So SELECT DISTINCT * FROM original_table;
> finds all the records, including the ones that are identical except  
> for the id.
> 
> So the problem remains. I actually have 194 unique records and 12  
> that are the same except for their id's.
> 
> Here's the new table:
> passwords=# SELECT * FROM foobar;
> password_id | username | password | object | url_address |  
> description | status
> -------------+----------+----------+--------+------------- 
> +-------------+--------

	Ok, so let's look for those and delete them.
Let's start with the select:

SELECT max(b.id) as b_id, COUNT(a.id) as a_count from foobar a, foobar b where
  a.username = b.username,
  a.password = b.password,
	etc. etc.
GROUP BY a.id

	That should return the higher id of the different ids and how many of
them there are in the table which have identical data with differing ids. Now
stick in in a subselect:

SELECT DISTINCT(b_id) FROM (
SELECT max(b.id) as b_id, COUNT(a.id) as a_count from foobar a, foobar b where
  a.username = b.username,
  a.password = b.password,
        etc. etc.
GROUP BY a.id
) my_subselect WHERE
 a_count = 2;

	Now you have the ids where there is identical junk. The distinct part
(not strictly necessary) is because you will have the id show up twice for
each dupe row.

DELETE FROM foobar WHERE id in (
... all that crap above.
)

	Well, works in oracle. Should work here too I would imagine.


	Austin


More information about the Pdx-pm-list mailing list