MySQL Query for incrementing field from duplicates in database

martinkaeser · Nov 2, 2020

First, hi to all
I've the following problem:
I have in an existing MySQL-DB duplicates of records and i need to increment the column "proofNr" n+1 for every entry. Please refer the last row in the following resultset.

mysql> SELECT * FROM `korrektur` WHERE `auftrag`= 140802;
+--------+---------+------------+-------------+----------------+------------+-------------+-----------+---------+-----------+
| ID | auftrag | ausgang | ausgang_von | ausgang_medium | eingang | eingang_von | druckfrei | proofNr |
+--------+---------+------------+-------------+----------------+------------+-------------+-----------+---------+-----------+
| 187649 | 140802 | 2020-09-01 | 29 | 1 | 2020-09-08 | 29 | NULL | | 1 | -> OK
| 187783 | 140802 | 2020-09-08 | 29 | 1 | 2020-10-08 | 29 | NULL | | 1 | -> 2
| 188247 | 140802 | 2020-10-08 | 29 | 1 | 2020-10-19 | 29 | NULL | | 1 | -> 3
| 188395 | 140802 | 2020-10-19 | 29 | 1 | NULL | NULL | NULL | | 1 | -> 4
+--------+---------+------------+-------------+----------------+------------+-------------+-----------+---------+-----------+
4 rows in set (0,03 sec)

I hope you see what I mean. Does anyone have an idea? I would appreciate every hint or tip.
Thank you in advance!
kaeser

feherke · Nov 2, 2020

Hi

As the solution would be to use [tt]row_number(), it also needs window functions :

Code:

[b]select[/b]
[teal]*,[/teal]
row_number[teal]()[/teal] [b]over[/b] w proofNr_correct

[b]from[/b] korrektur

[b]window[/b] w [b]as[/b] [teal]([/teal]partition [b]by[/b] auftrag[teal])[/teal]

Or if your goal is to update the table with those values, then common table expressions may also come handy :

Code:

[b]with[/b] cte [b]as[/b] [teal]([/teal]
    [b]select[/b]
    ID,
    row_number[teal]()[/teal] [b]over[/b] w proofNr_correct

    [b]from[/b] korrektur

    [b]window[/b] w [b]as[/b] [teal]([/teal]partition [b]by[/b] auftrag[teal])[/teal]
[teal])[/teal]
[b]update[/b] korrektur
[b]join[/b] cte [b]on[/b] cte[teal].[/teal]ID [teal]=[/teal] korrektur[teal].[/teal]ID

[b]set[/b] korrektur[teal].[/teal]proofNr [teal]=[/teal] cte[teal].[/teal]proofNr_correct

Note that all auftrag groups are handled in a single query. If still need it only for 140802, just put back the [tt]where[/tt] clause. ( No, you can not use [tt]row_number()[/tt] without all that stuff. )

Of course, assuming you have MySQL 8.0, as the used features were introduced there.

Feherke.
feherke.github.io

martinkaeser · Nov 6, 2020

First of all thank you feherke. And sorry about my late answer: I had a bad toothache [evil]

I adopted your example to:

Code:

WITH cte AS (SELECT ID, ROW_NUMBER() OVER w proofNr_correct FROM korrektur_copy WINDOW w AS (PARTITION BY ID)) UPDATE korrektur_copy JOIN cte ON cte.ID = korrektur_copy.ID SET korrektur_copy.proofNr = cte.proofNr_correct WHERE auftrag = 140802;

So far all goes well. But the result set says the following:

Code:

Query OK, 0 rows affected (0.59 sec)
Rows matched: 4  Changed: 0  Warnings: 0

As you can see: nothing happend to the database!
Where do I make the mistake? I can't find it out.
Thanks for any help.
kaeser

feherke · Nov 6, 2020

Hi

kaeser said:
Where do I make the mistake?

Here : [tt][teal]([/teal]PARTITION BY [highlight]ID[/highlight][teal])[/teal][/tt]. That [tt]partition by[/tt] is roughly like a [tt]group by[/tt]. When using window functions, the data set is split on chunks by the [tt]partition by[/tt] expression, then the functions are applied for each chunk separately, so each will have for example its own [tt]min()[/tt] or [tt]max()[/tt] value. And similarly in our case [tt]row_number()[/tt] will number each chunk's rows separately, restarting from 1 for each. As semantically id fields used to contain unique identifiers, partitioning by that will result chunks with 1 row each, making the renumbering from 1 for each chunk a pretty much pointless operation while the old proofNr values were also 1.

Hmm... Wondering how efficient my above explanation was. ( Knowing my skills, probably not at all. ) Let us see in examples.

I used this test data :

[pre]
test> select [teal]*[/teal] from korrektur_copy[teal];[/teal]
+--------+---------+------------+-------------+----------------+------------+-------------+-----------+---------+
| ID | auftrag | ausgang | ausgang_von | ausgang_medium | eingang | eingang_von | druckfrei | proofNr |
+--------+---------+------------+-------------+----------------+------------+-------------+-----------+---------+
| 187649 | 140802 | 2020-09-01 | 29 | 1 | 2020-09-08 | 29 | NULL | 1 |
| 187650 | 140803 | 2020-09-01 | 29 | 1 | 2020-09-08 | 29 | NULL | 1 |
| 187783 | 140802 | 2020-09-08 | 29 | 1 | 2020-10-08 | 29 | NULL | 1 |
| 187784 | 140803 | 2020-09-08 | 29 | 1 | 2020-10-08 | 29 | NULL | 1 |
| 188247 | 140802 | 2020-10-08 | 29 | 1 | 2020-10-19 | 29 | NULL | 1 |
| 188248 | 140803 | 2020-10-08 | 29 | 1 | 2020-10-19 | 29 | NULL | 1 |
| 188395 | 140802 | 2020-10-19 | 29 | 1 | NULL | NULL | NULL | 1 |
| 188396 | 140803 | 2020-10-19 | 29 | 1 | NULL | NULL | NULL | 1 |
+--------+---------+------------+-------------+----------------+------------+-------------+-----------+---------+
8 rows in set (0.00 sec)
[/pre]

My SQL statement with the [tt]partition by[/tt] auftrag separates the data set into 2 chunks of 4 rows each...

[pre]
test> select auftrag[teal],[/teal] count[teal](*)[/teal] from korrektur_copy group by auftrag[teal];[/teal]
+---------+----------+
| auftrag | count(*) |
+---------+----------+
| 140802 | 4 |
| 140803 | 4 |
+---------+----------+
2 rows in set (0.00 sec)
[/pre]

... and [tt]row_number()[/tt]s each chunk from 1 to 4 :

[pre]
test> select
-> id[teal],[/teal] auftrag[teal],[/teal]
-> row_number[teal]()[/teal] over w proofnr_correct
->
-> from korrektur_copy
->
-> window w as [teal]([/teal]partition by auftrag[teal]);[/teal]
+--------+---------+-----------------+
| id | auftrag | proofnr_correct |
+--------+---------+-----------------+
| 187649 | 140802 | 1 |
| 187783 | 140802 | 2 |
| 188247 | 140802 | 3 |
| 188395 | 140802 | 4 |
| 187650 | 140803 | 1 |
| 187784 | 140803 | 2 |
| 188248 | 140803 | 3 |
| 188396 | 140803 | 4 |
+--------+---------+-----------------+
8 rows in set (0.00 sec)
[/pre]

Your SQL statement with the [tt]partition by[/tt] id separates the data set into 8 chunks of 1 row each...

[pre]
test> select id[teal],[/teal] count[teal](*)[/teal] from korrektur_copy group by id[teal];[/teal]
+--------+----------+
| id | count(*) |
+--------+----------+
| 187649 | 1 |
| 187650 | 1 |
| 187783 | 1 |
| 187784 | 1 |
| 188247 | 1 |
| 188248 | 1 |
| 188395 | 1 |
| 188396 | 1 |
+--------+----------+
8 rows in set (0.00 sec)
[/pre]

... and [tt]row_number()[/tt]s each chunk from 1 to 1 :

[pre]
test> select
-> id[teal],[/teal] auftrag[teal],[/teal]
-> row_number[teal]()[/teal] over w proofnr_correct
->
-> from korrektur_copy
->
-> window w as [teal]([/teal]partition by id[teal]);[/teal]
+--------+---------+-----------------+
| id | auftrag | proofnr_correct |
+--------+---------+-----------------+
| 187649 | 140802 | 1 |
| 187650 | 140803 | 1 |
| 187783 | 140802 | 1 |
| 187784 | 140803 | 1 |
| 188247 | 140802 | 1 |
| 188248 | 140803 | 1 |
| 188395 | 140802 | 1 |
| 188396 | 140803 | 1 |
+--------+---------+-----------------+
8 rows in set (0.01 sec)
[/pre]

Feherke.
feherke.github.io

martinkaeser · Nov 6, 2020

Hi Feherke
you gave me the right idea und the understanding.

Code:

WITH cte AS (SELECT ID, ROW_NUMBER() OVER w proofNr_correct FROM korrektur_copy WINDOW w AS (PARTITION BY auftrag)) UPDATE korrektur_copy JOIN cte ON cte.ID = korrektur_copy.ID SET korrektur_copy.proofNr = cte.proofNr_correct WHERE auftrag = 140802;

The change of the «PARTITION BY» keyword did the trick.

Code:

+--------+---------+------------+-------------+----------------+------------+-------------+-----------+---------+
| ID     | auftrag | ausgang    | ausgang_von | ausgang_medium | eingang    | eingang_von | druckfrei | proofNr |
+--------+---------+------------+-------------+----------------+------------+-------------+-----------+---------+
| 187649 |  140802 | 2020-09-01 |          29 |              1 | 2020-09-08 |          29 | NULL      |       1 |
| 187783 |  140802 | 2020-09-08 |          29 |              1 | 2020-10-08 |          29 | NULL      |       2 |
| 188247 |  140802 | 2020-10-08 |          29 |              1 | 2020-10-19 |          29 | NULL      |       3 |
| 188395 |  140802 | 2020-10-19 |          29 |              1 | NULL       |        NULL | NULL      |       4 |
+--------+---------+------------+-------------+----------------+------------+-------------+-----------+---------+

Code:

Query OK, 3 rows affected (0.93 sec)
Rows matched: 4  Changed: 3  Warnings: 0

Thank you very much!
Greetings
kaeser

Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

MySQL Query for incrementing field from duplicates in database

martinkaeser

Technical User

feherke

Programmer

martinkaeser

Technical User

feherke

Programmer

martinkaeser

Technical User

Similar threads

Part and Inventory Search

Sponsor