DevHeads.net

Help with cleaning up data

I am given a table: ICD9X10 which is a maping of ICD9 codes to
ICD10 codes. Unfortunately the table contains duplicate entries
that I need to remove.

CREATE TABLE `ICD9X10` (
`id` smallint(6) NOT NULL AUTO_INCREMENT,
`icd9` char(8) NOT NULL,
`icd10` char(6) NOT NULL,
PRIMARY KEY (`id`),
UNIQUE KEY `icd9` (`icd9`,`id`),
UNIQUE KEY `icd10` (`icd10`,`id`)
) ENGINE=InnoDB AUTO_INCREMENT=671 DEFAULT CHARSET=ascii

id icd9 icd10
25 29182 F10182
26 29182 F10282
27 29182 F10982

I just can't think of a way to write a querey to delete the
duplicates. Does anyone have a suggestion ?

bill

Comments

Re: Help with cleaning up data

By bill at 03/30/2014 - 07:18

On 3/29/2014 2:26 PM, william drescher wrote:

Thanks for all the suggestions. I learned a lot, which is the
most important part of the exercise.

bill

RE: Help with cleaning up data

By David Lerer at 03/29/2014 - 23:18

Bill, here is one approach:

The following query will return the id's that should NOT be deleted:
Select min (id) from icd9x10 group by icd9, icd10

Once you run it and happy with the results then you subquery it in a DELETE statement. Something like:
Delete from icd9x10 A where A.id not in (Select min (B.id) from icd9x10 B group by B.icd9, B.icd10).

I have not tested it (sorry it is a weekend here...), but I hope it will lead you into the right direction.

David.

David Lerer | Director, Database Administration | Interactive | 605 Third Avenue, 12th Floor, New York, NY 10158
Direct: (646) 487-6522 | Fax: (646) 487-1569 | <a href="mailto: ... at univision dot net"> ... at univision dot net</a> | <a href="http://www.univision.net" title="www.univision.net">www.univision.net</a>

I am given a table: ICD9X10 which is a maping of ICD9 codes to
ICD10 codes. Unfortunately the table contains duplicate entries
that I need to remove.

CREATE TABLE `ICD9X10` (
`id` smallint(6) NOT NULL AUTO_INCREMENT,
`icd9` char(8) NOT NULL,
`icd10` char(6) NOT NULL,
PRIMARY KEY (`id`),
UNIQUE KEY `icd9` (`icd9`,`id`),
UNIQUE KEY `icd10` (`icd10`,`id`)
) ENGINE=InnoDB AUTO_INCREMENT=671 DEFAULT CHARSET=ascii

id icd9 icd10
25 29182 F10182
26 29182 F10282
27 29182 F10982

I just can't think of a way to write a querey to delete the
duplicates. Does anyone have a suggestion ?

bill

Re: Help with cleaning up data

By Fran Garcia at 03/29/2014 - 15:15

Hi Bill,

How big is your table? It seems to me that you might want to change your
unique keys to something like (icd9, icd10), thus guaranteeing that every
mapping will exist only once in your table. You could create a new table
with that constraint and copy all your data to it:

CREATE TABLE `ICD9X10_2` (
`id` smallint(6) NOT NULL AUTO_INCREMENT,
`icd9` char(8) NOT NULL,
`icd10` char(6) NOT NULL,
PRIMARY KEY (`id`),
UNIQUE KEY `icd9_icd10` (`icd9`,`icd10`)
) ENGINE=InnoDB DEFAULT CHARSET=ascii

INSERT IGNORE INTO ICD9X10_2 SELECT * FROM ICD9X10; -- This will skip the
duplicates

-- Once you've checked the new table and it looks fine to you, you can swap
them:
RENAME TABLE ICD9X10 TO ICD9X10_old, ICD9X10_2 TO ICD9X10;

Or, alternatively, you can also directly alter your table by adding that
unique index like this:
ALTER IGNORE TABLE ICD9X10 ADD UNIQUE KEY (ICD9, ICD10);

Hope that helps

2014-03-29 18:26 GMT+00:00 william drescher < ... at techservsys dot com>:

Re: Help with cleaning up data

By Carsten Pedersen at 03/29/2014 - 15:17

On 29-03-2014 19:26, william drescher wrote:
<a href="http://bit.ly/1hKCVHi" title="http://bit.ly/1hKCVHi">http://bit.ly/1hKCVHi</a>