Relational Database Technologies: Duplicate Rows in a primary key Table.

Back again, getting very less time for blogging :)

"ERROR: could not create unique index
DETAIL: Table contains duplicated values."

This error is thrown out by Postgres when it encounters duplicate rows in a primary key table by failing any of these command REINDEX or CREATE UNIQUE INDEX.

Why duplicate rows exists in a table ?

Not sure exactly :) nor any proved explainations out...
Two thing to my mind.

Firstly, it might be delayed index creation or if you have shared sequences in a database, sharing on two different Primary key Tables might be the cause while restoring the data into table (pg_restore). Secondly, if any huge transaction is taking place on that table and at the backend someone has abruptly stopped the instance, which might also fail the index(primary key) to point to the right row.

How to fix it ?

Well,as common practice, when we encounter a duplicate rows in a table (despite of any reason), we first filter the duplicate rows and delete them, and later by doing REINDEX should fix the issue.

Query for finding duplicate rows:

select count(*),primary_column from table_name group by primary_column having count(*) > 1;

Even after deleting the duplicate rows REINDEX or CREATE UNIQUE INDEX fails, it means your index is not cleaned properly. Above query might not be giving 100% result oriented output what you are expecting, because the query is going to pick the index which is already corrupted with duplicate rows. See the explain plan below.

postgres=# explain select count(*),id from duplicate_test group by id having count(*) > 1;
                                              QUERY PLAN
-------------------------------------------------------------------------------------------------------
 GroupAggregate  (cost=0.00..5042.90 rows=99904 width=4)
   Filter: (count(*) > 1)
   ->  Index Scan using duplicate_test_pkey on duplicate_test  (cost=0.00..3044.82 rows=99904 width=4)
(3 rows)

We need to catch CTID of duplicate rows from the main table and delete with conditional statement as CTID + PRIMARY KEY VALUE.

I have played a bit with pg_catalogs to voilate Primary Key Table to reproduce the scenario with similar error. (Please don't it)

postgres=# create unique index idup on duplicate_test(id);
ERROR:  could not create unique index "idup"
DETAIL:  Key (id)=(10) is duplicated.

My Table Definition & Data:

postgres=# \d duplicate_test
Table "public.duplicate_test"
 Column |  Type   | Modifiers
--------+---------+-----------
 id     | integer | not null
 name   | text    |
Indexes:
    "duplicate_test_pkey" PRIMARY KEY, btree (id)

postgres=# select * from duplicate_test ;
 id |  name
----+---------
 10 | Raghav    ---Duplicate
 20 | John H
 30 | Micheal
 10 | Raghav    ---Duplicate
(4 rows)

Now, lets fix this....

Step 1. Create a new table from effected table by pulling only two column values CTID and PRIMARY KEY.

postgres=# CREATE TABLE dupfinder AS SELECT ctid AS tid, id FROM duplicate_test;
SELECT 4

Step 2. Now, lets run the duplicate finder query with CTID to get the exact duplicates.

postgres=# select * from dupfinder x where exists (select 1 from dupfinder y where x.id = y.id and x.tid != y.tid);
  tid  | id
-------+----
 (0,1) | 10
 (0,5) | 10
(2 rows)

Step 3. On above result, now you can delete one row from main table(effected table) with CTID.

postgres=# delete from duplicate_test where ctid='(0,5)' and id=10;
DELETE 1

Step 4. Now, your REINDEX or CREATE UNIQUE INDEX will be successful.

postgres=# create unique index idup on duplicate_test(id);
CREATE INDEX

postgres=# select * from duplicate_test ;
 id |  name
----+---------
 10 | Raghav
 20 | John H
 30 | Micheal
(3 rows)

Step 5. Don't forget to do immediate VACUUM ANALYZE on the table to update the system catalogs as well CTID movement.

Please do share your comments.

--Raghav

6 comments :

Anonymous said...: You're the man! Thank you!; October 11, 2013 at 4:17 AM
Unknown said...: Awesome one...; June 17, 2014 at 3:17 AM
Anonymous said...: Thank you for taking the time to make this information available. Your kindness saved me a lot of time!; September 30, 2014 at 9:17 PM
Anonymous said...: Thanks a lot for this entry, we got the same problem in our production database recently, and this helped to solve the issue.

Best regards.; October 16, 2014 at 12:10 AM
Anonymous said...: This comment has been removed by a blog administrator.; November 3, 2015 at 8:53 PM
Unknown said...: Thank you very much for this explanation. I had the exact problem on a production environment. THANK YOU!!; December 14, 2016 at 6:17 PM

Relational Database Technologies

Sunday, April 8, 2012

Duplicate Rows in a primary key Table.

6 comments :

Post a Comment