Find data in comma-delimited lists 1

katehol · May 14, 2008

I need to see if a certain integer appears in a column that contains multiple comma-delimited lists.

eg SELECT * FROM TableA
WHERE TableA.id IN (SELECT list FROM TableB)

...where 'TableB.list' is a comma-delimited list, and there are multiple rows in this TableB table, therefore meaning that there are multiple comma-delimited lists to search for the TableA.id.

Any help would be much appreciated.

Thanks

SQLSister · May 14, 2008

There are FAQS on dealing with comma delimted lists, but in this case, what you really need is to change the structure. This is not a database structure that will ever be good to query and will cause performance problems as it grows.

Comma delimited lists should never be stored in a database table except a work table bringing in data from an outside source. The first step is to normalize your data. Truly.

"NOTHING is more important in a database than integrity." ESquared

katehol · May 14, 2008

I know that a comma delimited list is not ideal, and it is on the very long list of changes that will be done some time in the future. For the moment though, I was hoping someone would be able to help me with my current query.

gmmastros · May 14, 2008

I understand about priorities. SQLSister is right though. Until you fix this problem, all of your queries are going to be a lot slower than they need to be.

However....

Suppose you have these two tables

[tt][blue]TableA
Id Data
-- ----
1 1
2 8
3 10[/blue][/tt]

And

[tt][blue]TableB
Data
---------
1,2,3,4,5
1,3,5,7,9
10[/blue][/tt]

If I understand correctly, you want a result set like this...

[tt][blue]
ID A.Data B.data
-- ------ ---------
1 1 1,2,3,4,5
1 1 1,3,5,7,9
3 10 10
[/blue][/tt]

Well, consider this...

Code:

[COLOR=blue]Declare[/color] @TableA [COLOR=blue]Table[/color](Id [COLOR=blue]Int[/color], Data [COLOR=blue]Int[/color])
[COLOR=blue]Insert[/color] [COLOR=blue]Into[/color] @TableA [COLOR=blue]Values[/color](1,1)
[COLOR=blue]Insert[/color] [COLOR=blue]Into[/color] @TableA [COLOR=blue]Values[/color](2,8)
[COLOR=blue]Insert[/color] [COLOR=blue]Into[/color] @TableA [COLOR=blue]Values[/color](3,10)

[COLOR=blue]Declare[/color] @TableB [COLOR=blue]Table[/color](Data [COLOR=blue]VarChar[/color](200))

[COLOR=blue]Insert[/color] [COLOR=blue]Into[/color] @TableB [COLOR=blue]Values[/color]([COLOR=red]'1,2,3,4,5'[/color])
[COLOR=blue]Insert[/color] [COLOR=blue]Into[/color] @TableB [COLOR=blue]Values[/color]([COLOR=red]'1,3,5,7,9'[/color])
[COLOR=blue]Insert[/color] [COLOR=blue]Into[/color] @TableB [COLOR=blue]Values[/color]([COLOR=red]'10'[/color])


[COLOR=blue]Select[/color] *
[COLOR=blue]From[/color]   @TableA [COLOR=blue]As[/color] A
       [COLOR=blue]Inner[/color] [COLOR=blue]Join[/color] @TableB [COLOR=blue]As[/color] B
         [COLOR=blue]On[/color] [COLOR=red]','[/color] + B.Data + [COLOR=red]','[/color] Like [COLOR=red]'%,'[/color] + [COLOR=#FF00FF]Convert[/color]([COLOR=blue]VarChar[/color](20), A.Data) + [COLOR=red]',%'[/color]

As you can see, I created some table variables so that I could test the query. If this works for you, and you'd like me to explain it, just ask.

Oh, uh... Do you see the way the two tables are joined? There is no possible way this query is going to use any indexes, so, at best, you will get a table/index scan, which is bound to be slow.

Make sense?

-George

"The great things about standards is that there are so many to choose from." - Fortune Cookie Wisdom

sblocher · May 14, 2008

Can you give a couple example rows of data in each table?

The right solution may depend on what the data looks like and how much data there is (especially in TableA). Might be able to create a udf and do this with one query. Or perhaps a cursor to run thru each row in TableA looking for the rows in table B that contain that value, put all these rows in a temp table (with some sort of key), then remove the dupes?

All depends on the data as to which approach i might take.

SQLDenis · May 14, 2008

Unless you have inline SQL all over the place it would be faster to redesign this thing now once and for all. Really how long would it take? 2 hours maybe. It has been 2 hours since you posted the question, get the point.

There are tricks to do this with a number table but it just doesn't make sense. Who designed(if you can call it that) this horrible mess maybe Jeff Sturgeon?

Stop pressing the F5/CTRL + R keys in hope for an answer and redesign now

Denis The SQL Menace
--------------------
SQL Server Code,Tips and Tricks, Performance Tuning
SQLBlog.com, Google Interview Questions

sblocher · May 14, 2008

SQLDenis, do you live in the real world? For those of us that do, we are constantly working with and integrating to databases and systems that we don't "own" and thus do not have the luxury of redesigning as you suggest.

Perhaps you could provide sample queries that katehol could use to put the data from TableA and TableB into a new set of tables as you would if you could redesign them. Then show how easy a query would be to get the desired results.

katehol's could then use your solution -- using #temp tables or table variables -- to get the results she wants

gmmastros · May 14, 2008

sblocher,

I encourage you to re-read katehol's followup post. In it, she says....

I know that a comma delimited list is not ideal, and it is on the very long list of changes that will be done some time in the future.

I interpret Denis's response as.... Don't wait for the future, fix it now.

-George

"The great things about standards is that there are so many to choose from." - Fortune Cookie Wisdom

katehol · May 14, 2008

I've got to agree with sblocher. My database is integrated with a java application, so it's not a simple matter of just redesigning the database table.

For the moment, I would like to be able to do the following:

TableA
id name
1 abc
2 bcd
3 cde
4 def

TableB
id list
1 1,2,3,4,5
2 3,2,7,8
3 2,6,7,8

I would like to be able to check whether the list in TableB contains any of the ids from TableA.

So, basically I want to be able to do...

Code:

SELECT * FROM TableA
WHERE TableA.id IN (SELECT TableB.list FROM TableB)

...but obviously I can't do that as the list is a comma-delimited list...

Does that make sense?

And sorry redesigning isn't an option, but this is something that can't simply be done. Comments like "Really how long would it take? 2 hours maybe. It has been 2 hours since you posted the question, get the point." don't really help me, as it's not just a database sitting by itself - there are a load more factors that need to be considered.

Thanks for any help you can give me.

sblocher · May 14, 2008

Good point, George. katelhol suggests that she has control of the design. So i'll amend my comments to add: "And sometimes business priorities do not allow revising existing designs as part of a defined project."

But my point is appropriate, I think.

Good discussion, although maybe not an appropriate forum?

gmmastros · May 14, 2008

katehol,

Based on your sample data, what are the expected results?

-George

"The great things about standards is that there are so many to choose from." - Fortune Cookie Wisdom

katehol · May 14, 2008

Sorry - I would like to have only the rows from TableA show if they exist in the list in TableB.. So, my example wasn't that useful...

Slightly amended tables are:

TableA
id name
1 abc
2 bcd
3 cde
4 def

TableB
id list
1 1,2,3,
2 3,2,7,8
3 10,6,7,8

So I would like the result to show rows 1-3 of my TableA. Not row 4, as the id 4 is not mentioned in TableB.

Thanks

PS Sorry to have started a heated debate! Didn't realise I would. In future I will mention that even though I am in control of the DB design, I still have to work along side our developers to get schema designs built into our application...

gmmastros · May 14, 2008

katehol,

Take a look at this....

You'll notice that I am using table variables. You can copy/paste this code to a query window and run it without any modifications. Based on your sample data and desired results, this query works. If you are satisfied with it, you will obviously need to change it to work with your tables instead of the table variables.

Code:

[COLOR=blue]Declare[/color] @TableA [COLOR=blue]Table[/color](id [COLOR=blue]int[/color], [COLOR=blue]Name[/color] [COLOR=blue]VarChar[/color](20))
[COLOR=blue]Insert[/color] [COLOR=blue]into[/color] @TableA [COLOR=blue]Values[/color](1,[COLOR=red]'abc'[/color])
[COLOR=blue]Insert[/color] [COLOR=blue]into[/color] @TableA [COLOR=blue]Values[/color](2,[COLOR=red]'bcd'[/color])
[COLOR=blue]Insert[/color] [COLOR=blue]into[/color] @TableA [COLOR=blue]Values[/color](3,[COLOR=red]'cde'[/color])
[COLOR=blue]Insert[/color] [COLOR=blue]into[/color] @TableA [COLOR=blue]Values[/color](4,[COLOR=red]'def'[/color])


[COLOR=blue]Declare[/color] @TableB [COLOR=blue]Table[/color](Id [COLOR=blue]Int[/color], List [COLOR=blue]VarChar[/color](200))
[COLOR=blue]Insert[/color] [COLOR=blue]Into[/color] @TableB [COLOR=blue]Values[/color](1,[COLOR=red]'1,2,3,'[/color])
[COLOR=blue]Insert[/color] [COLOR=blue]Into[/color] @TableB [COLOR=blue]Values[/color](2,[COLOR=red]'3,2,7,8'[/color])
[COLOR=blue]Insert[/color] [COLOR=blue]Into[/color] @TableB [COLOR=blue]Values[/color](3,[COLOR=red]'10,6,7,8'[/color])

[COLOR=blue]Select[/color] [COLOR=#FF00FF]Distinct[/color] A.* 
[COLOR=blue]From[/color]   @TableA [COLOR=blue]As[/color] A
       [COLOR=blue]Inner[/color] [COLOR=blue]Join[/color] @TableB [COLOR=blue]As[/color] B
         [COLOR=blue]On[/color] [COLOR=red]','[/color] + B.List + [COLOR=red]','[/color] Like [COLOR=red]'%,'[/color] + [COLOR=#FF00FF]Convert[/color]([COLOR=blue]VarChar[/color](10), A.Id) + [COLOR=red]',%'[/color]

If you have any questions about how this works, let me know and I will explain it for you.

Make sense?

-George

"The great things about standards is that there are so many to choose from." - Fortune Cookie Wisdom

SQLDenis · May 14, 2008

>>SQLDenis, do you live in the real world? For those of us that do, we are constantly working with and integrating to databases and systems that we don't "own" and thus do not have the luxury of redesigning as you suggest.

No I live purely in [URL unfurl="true"]http://secondlife.com/[/url]

However I work(ed) with a lot of vendors and their data
for example
New York City Transit
Metro North Rail Road
Long Island Rail Road
New Jersey Transit
Reuters
Marketwatch
Wall Street Journal
Bloomberg
Perfact
Fleet
Bank Of America
Citibank
Goldman Sachs

none of these vendors had a table like that.
If you have a 3rd party do work for you (we occasional do) then review their code/design and give them guidelines. If they create a table which looks like an array they shouldn't be paid period.

If you have a vendor that stores data like that import it and normalize it or find a different vendor

Understanding Coupling, Cohesion and Encapsulation is a must in todays 'real' world.........sadly a lot of developers do not. This design flaw can be easy fixed if you use stored procedures (it's not a big deal).

the kludge answer that was given (nothing against George) will work however there is a magic threshold and you might not hit that for 2 years but when it hit your query will slow down to a crawl
How much $$$$$ to fix it 2 years later, is this thing documented, how many apps use it? Is it using inline SQL and is the app an application that has to be installed?

so the answer is yes George's solution will work as long as your dataset is small after that.....the pain will start and tylenol won't help

BTW, why would you ever use a cursor for this? Besides calling a stored proc for every row in a table almost everything can be solved set based

Denis The SQL Menace
--------------------
SQL Server Code,Tips and Tricks, Performance Tuning
SQLBlog.com, Google Interview Questions

sblocher · May 14, 2008

EXCELLENT solution, George. And about as simple and efficient as you can get! An ingenious way to take the commas out of the picture! I don't think i would ever have thought about that approach. I am adding this to my "tool kit" of helpful query examples, with one small improvement in case the comma-delimeted values contain extra spaces anywhere:

Select Distinct A.*
From TableA As A
Inner Join TableB As B
On ',' + replace(B.List,' ','') + ',' Like '%,' + Convert(VarChar(10), A.Id) + ',%'

gmmastros · May 14, 2008

Denis,

I agree that the code I posted is a kludge. If you read my post in this thread dated 14 May 08 10:30, I said "all of your queries are going to be a lot slower than they need to be".

I sincerely hope that katehol views this "kludge" solution as temporary. Performance will be a problem. If not now, then in the future.

sblocher,

There is a fundamental problem with this code. While I honestly believe it will (or can be made to) return the correct data. There will be performance problems with it. You see... no matter what you do, you will NOT be able to cause this query to use an index seek. It simply cannot be done because the query is not [google]SQL Server Sargable[/google]. I encourage you (and katehol) to do a little research on saragable queries. With small tables, everything is going to be 'fast enough', even if you are doing table scans. But when the tables get large, suddenly performance becomes a problem and by then, you will have people screaming at you to fix it immediately. In my opinion, the time to think about performance is when you create a query, not years later when it is too late.

I would NEVER allow this code to see the inside of my database. It's a temporary patch that should exist ONLY until the normalization problems are corrected. The sooner this occurs, the less time you will spend on 'throw away' queries, and the more time you will have for getting 'real work' done.

-George

"The great things about standards is that there are so many to choose from." - Fortune Cookie Wisdom

SQLDenis · May 14, 2008

Another problem is that people don't test with some real size data when developing
fortunately for me I have a 600GB development DB to play around with

But then again over half of developers don't use source control and even less than that do unit testing

Denis The SQL Menace
--------------------
SQL Server Code,Tips and Tricks, Performance Tuning
SQLBlog.com, Google Interview Questions

damber · May 14, 2008

I wonder how that table got like that in the first place ?

Maybe someone came to Tek-Tips and said that the business 'just-had-to-do-it-that-way', as is so common it seems. They probably thought that it would be cheaper and faster (to develop) to have lots of direct SQL access from the java application to those tables too, with little understanding of logical application separation & abstraction.

I think we all understand the pressures of getting things done for a deadline. But throwing good money after bad (or building more poor design on top of old poor design) just makes the problem harder next time round.. as someone will have to fix your code as well as all the previous stuff. I bet they'll just say "well, this project isn't here to fund this kind of stuff".

At the very least, you should flag this up to your managers as a serious design flaw within the application, and one that needs a separate project to rectify.

=======================================
LessThanDot - The IT Community of the 21st Century

A smile is worth a thousand kind words. So smile, it's easy!

katehol · May 14, 2008

gmmastros and sblocher

Thank you both so much for your help. Inheriting an old legacy system with quirks like this can be really hard. And although I know some parts need changing (which, in time, they will), time is often of the essence and I appreciate it that you both understood this.

So thanks - the solution you provided worked a treat.

katehol · May 14, 2008

I forgot to say...

SQLDenis - congratulations on working for so many great companies. So kind of you to let us all know how great you are and that you have never had to take over a database that wasn't perfect in every way. Lucky you...

Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

Find data in comma-delimited lists 1

Technical User

Programmer

Technical User

Programmer

IS-IT--Management

Programmer

IS-IT--Management

Programmer

Technical User

IS-IT--Management

Programmer

Technical User

Programmer

Programmer

IS-IT--Management

Programmer

Programmer

Programmer

Technical User

Technical User

Similar threads

Log in

Part and Inventory Search

Sponsor