Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

  • Congratulations SkipVought on being selected by the Tek-Tips community for having the most helpful posts in the forums last week. Way to Go!

How to identify and return characters in a field

Status
Not open for further replies.

trpnbillie

Programmer
Oct 8, 2002
28
0
0
US
Greetings all!

Does anybody know how to easily scan a field for a list of specific characters and then return them? I have one simple table with one field: "StringList". My table looks like this:

"StringList"
-----------
John Doe
XYZ
aeixyz
try try

I want to scan the entire "StringList" field for vowels and then report on them. For example, I want my recordset to look like this:

"StringList" "AnyVowels" "WhichVowels"
------- ----------- -------------
John Doe Yes o, e
aeixyz Yes a, e, i

I am able to do the easy part of identifying and selecting only the records that have vowels by doing this:

DECLARE @vowels varchar(1000)
SET @vowels = '%[' + char(97) + char(101) + char(105) + char(111) + char(117) + ']%'
SELECT StringList, 'Yes'
FROM MyTable
WHERE patindex(@vowels,StringList) > 1

but i don't know how to return the specific vowels for each field. (i.e. I don't know how to put values under the "WhichVowels" field in my recordset). I assume that i can report on the FIRST vowel, but is it possible to report on all vowels?

Is it not possible?

I thank you for any insight!
 
You could try using the Except() Function. I just found it earlier today in Books Online when I was researching another problem.

However, I'm not quite sure where it would fit in your current code.

Question. Is your code above a stored procedure, a user defined function or an adhoc query that you're running?

If it's a UDF, you might look up table-valued functions in Books Online.

Excuse my lack of definates. I haven't gotten into either of these programming areas yet myself. Can someone elaborate on what I just posted (or at least correct me if I'm mistaken)?



Catadmin - MCDBA, MCSA
Remember, "Work" is a four letter word. And you know what your mother told you about using four letter words!
 
Thank you for your reply! The code now is an ad-hoc query but eventually (if i figure out how to do it) would be in a stored procedure.

Also, i forgot to mention that the vowels were used for simplicity. My real life problem does not involve vowels; instead it involves a rather large list of illegal characters. So I am looking for a solution in which i can scan the list at once (using the wildcard variable) and not search by individual character.

Thank you! I will look up the Except() function!
 
Put all of your special characters into a table that has only one column char(1). Then cross join it with StringList and filter on rows WHERE StringList like '%'+SpecialChar+'%'.
This will return matches, but only 1 on each row. If you have to have them concatenated into one column there's a way to do that too. Let me know.
I think this cross product shouldn't be a problem even with a large table because of the filter.
-Karl

[red] Cursors, triggers, user-defined functions and dynamic SQL are an axis of evil![/red]
[green]Life's uncertain...eat dessert first...www.deerfieldbakery.com[/green]
 
Wow. Thank you so much. The cross join indeed does the job. If you have time to share how to roll them up and concatenate them into one row (instead of separate rows), that would be awesome.

Again, i thank you for taking my headache away!!! :)
 
I know of an easy way to do it with a function, but I've forgotten the better way to do it without a function. Let me think on it for awhile.
BTW, try a test for me. Maybe it's faster to do an INNER JOIN ON (CharIndex(SpecialChar,StringList)>0). Let me know which is faster and how large the row count of both tables.
-Karl

[red] Cursors, triggers, user-defined functions and dynamic SQL are an axis of evil![/red]
[green]Life's uncertain...eat dessert first...www.deerfieldbakery.com[/green]
 
Two things. One, code I wrote to find strange characters (only the first instance of each one will be listed), but it also has the problem of showing many rows.

Code:
create table #thetable (
   thefield varchar(50))
insert into #thetable
   select 'two' + char(13) + char(10) + 'lines' union
   select 'some other strange characters at the end ' + char(2) + char(11) + char(17) + char(27) + char(129) + char(144)

create table #t (
      num int,
      ch char(1)
   )
declare @c int
   set @c = 0
while @c<256 begin
   if char(@c) not like '[ 0-Z]'
      insert into #t
         select @c, Char(@c)
      set @c=@c+1
end
select
      Pos = charindex(ch,thefield),
      AsciiCode = num,
      Chr = SubString(thefield,charindex(ch,thefield),1),
      Value = thefield
   FROM #thetable
      cross join #t
   WHERE
      charindex(ch,thefield) > 0

So you could do a complicated summation query to put them together again, but you have to do something to assign each row an ordinal. I'm assuming table #badrows has your bad rows in it, but it could be a derived table, although that would be messy because you'd have to list it twice. Yukon will fix that with common table expressions, but also that method won't be necessary because of the rank() function (or whatever it is).

Code:
SELECT B1.*, Num = Count(*)
INTO #badrows2
FROM #badrows B1
INNER JOIN #badrows B2 on B1.key = B2.key and B1.charpos >= B2.charpos

SELECT
B.key,
B.whateverfields,
SubString(
  Max(CASE Num WHEN 1 THEN ',' + charfound ELSE '' END)
  + Max(CASE Num WHEN 2 THEN ',' + charfound ELSE '' END)
  + Max(CASE Num WHEN 3 THEN ',' + charfound ELSE '' END)
  + Max(CASE Num WHEN 4 THEN ',' + charfound ELSE '' END)
  + Max(CASE Num WHEN 5 THEN ',' + charfound ELSE '' END)
  ...
  + Max(CASE Num WHEN 99 THEN ',' + charfound ELSE '' END)
  {up to maximum number of chars you want to see at once}
, 2, 8000)
FROM #badrows2

See how nasty that is, though?

Another option is, pick all the allowed characters, and stick them in a big thingie:

Code:
SELECT badchars =
  ... Replace(Replace(Replace(InspectedColumn, 'a', ''), 'b', ''), 'c', '') ...

and keep doing that until they're all in there. This method looks horrible in code but might turn out to be the fastest because you're using an optimized system function. And you could write your own function to create the query, or do it in Excel. In fact, I'll do it. It will take about 15 seconds.

Code:
Replace(Replace(Replace(Replace(Replace(Replace(Replace(Replace(Replace(Replace(Replace(Replace(Replace(Replace(Replace(Replace(Replace(Replace(Replace(Replace(Replace(Replace(Replace(Replace(Replace(Replace(Replace(Replace(Replace(Replace(Replace(Replace(Replace(Replace(Replace(Replace(Replace(Column, 'a', ''), 'b', ''), 'c', ''), 'd', ''), 'e', ''), 'f', ''), 'g', ''), 'h', ''), 'i', ''), 'j', ''), 'k', ''), 'l', ''), 'm', ''), 'n', ''), 'o', ''), 'p', ''), 'q', ''), 'r', ''), 's', ''), 't', ''), 'u', ''), 'v', ''), 'w', ''), 'x', ''), 'y', ''), 'z', ''), ' ', ''), '0', ''), '1', ''), '2', ''), '3', ''), '4', ''), '5', ''), '6', ''), '7', ''), '8', ''), '9', '')

That should convey the idea.

In cells A2 and down, put a b c d and so on.
In cell B1 put your column name.
In cell B2 put this expression and paste it down to the bottom of your column 1 list. Copy and paste the final entry.
Code:
="Replace(" & B1 & ", '" & A2 & "', '')"
 
Thanks everybody! I tried the inner join as well and it worked with the same speed - though my tables are very small at this point!

Something else and very neat that i found was this: A recursive update! It works for my purpose and hopefully it can be useful for somebody else too. Check it out below!

I would still be interested in knowing if there is an easier way to roll records up and concatenate the bad characters (Using the JOIN method) if anyone knows...

DECLARE @illegalchars varchar(1000), @iUpdatedCount int
DECLARE @MyTable TABLE (id int, MyString varchar(100), escapedString varchar(100), whatvowels varchar(100))

SET @illegalchars = '%[' + char(97) + char(101) + char(105) + char(111) + char(117) + ']%'

INSERT INTO @MyTable
SELECT 1, 'hello there hello', NULL, NULL
INSERT INTO @MyTable
SELECT 2, 'try try try', NULL, NULL
INSERT INTO @MyTable
SELECT 3, 'tst test', NULL, NULL

update @MyTable
set escapedstring = MyString

SELECT @iUpdatedCount = 1

WHILE @iUpdatedCount > 0
BEGIN
UPDATE @MyTable
SET escapedstring = replace(escapedstring, substring(escapedstring, patindex(@illegalchars, escapedstring), 1), '-'),
whatVowels = ISNULL(whatVowels, '') + ';' + substring(escapedstring, patindex(@illegalchars, escapedstring), 1)
WHERE patindex(@illegalchars,escapedstring) > 1

SELECT @iUpdatedCount = @@ROWCOUNT
END

select * from @myTable
 
Status
Not open for further replies.

Part and Inventory Search

Sponsor

Back
Top