Smart questions
Smart answers
Smart people
INTELLIGENT WORK FORUMS
FOR COMPUTER PROFESSIONALS

Member Login

Come Join Us!

Are you a
Computer / IT professional?
Join Tek-Tips now!
  • Talk With Other Members
  • Be Notified Of Responses
    To Your Posts
  • Keyword Search
  • One-Click Access To Your
    Favorite Forums
  • Automated Signatures
    On Your Posts
  • Best Of All, It's Free!

Join Tek-Tips
*Tek-Tips's functionality depends on members receiving e-mail. By joining you are opting in to receive e-mail.

LINK TO THIS FORUM!

Add Stickiness To Your Site By Linking To This Professionally Managed Technical Forum.
Just copy and paste the
code below into your site.

Partner With Us!

"Best Of Breed" Forums Add Stickiness To Your Site
Partner Button
(Download This Button Today!)

Feedback

"...Just wanted to let you know that I registered today, and your site is fantastic. I found solutions to problems that I have been encountering for months!..."

Geography

Where in the world do Tek-Tips members come from?

query for match in text with "interruptions" (regex related)Helpful Member! 

OsakaWebbie (Programmer)
16 Jun 12 7:48
Yeah, I know the subject line is cryptic... I have a database that keeps song lyrics with chords interspersed within square brackets, like this:

CODE --> syntax

[D]Thank you, [D/F#]O my [G]Fa[D]ther, for [G]giving us [A]Your [D A]Son
And [D]leav[A/E]ing [D/F#]Your [G]Spirit [D/F#]till the [Em]work on [A]earth is [D]done
I allow the user to search for text in the lyrics, so I have to somehow get past the chords. The code in use has been there for quite a while, and I thought it worked, but I recently discovered that it doesn't always. Since MySQL doesn't have a way to remove the bracketed content on the fly using regex, I was attacking it from the opposite direction, using PHP to create a regex expression from the search string, like this (search string is "leaving" in this example):

CODE --> MySQL

WHERE LOWER(Lyrics) REGEXP '(\\[[^\\]]*\\])?l(\\[[^\\]]*\\])?e(\\[[^\\]]*\\])?a(\\[[^\\]]*\\])?v(\\[[^\\]]*\\])?i(\\[[^\\]]*\\])?n(\\[[^\\]]*\\])?g'
That technique seems to work if the chord(s) within the matching section of lyrics are only a single letter (e.g. "o my father", "and leav", and "work on earth" work fine), but if there are multiple characters inside the brackets, the matching record is not returned. I thought the "*" in my regex would take care of that, but it doesn't seem to. Does someone see what I'm missing, or have a better approach?

It would be cleaner with something like "WHERE LOWER(REGEXP_REPLACE(Lyrics,'(\\[[^\\]]*\\])','')) LIKE '%leaving%'", but although there seem to be two offerings of a UDF for regex replace out there (here and here ), both are very old and not maintained, and I don't know if they will even work with newer versions of MySQL. Plus, one requires recompiling MySQL and the other slightly less invasive but still requires compiling itself and has dependencies. If you know that one or both of those do work okay, or know of something else, let me know. I would prefer to use what is already a part of the engine, though, if I can figure out how.
r937 (TechnicalUser)
16 Jun 12 10:09
possibly not what you want to hear, but the solution is to store the lyrics in two columns, one with chords and one pure text

r937.com | rudy.ca
Buy my new book Simply SQL from Amazon

feherke (Programmer)
16 Jun 12 10:28
Hi

Indeed, regular expression based replace would be ideal.

Even regular expression based matching would be helpful, if it would provide either the matched substring or the match position.

But this can also be solved with the regular string functions, just abit lengthier :

CODE --> MySQL

delimiter $$
create function stripchord(what text)
returns text
deterministic
no sql
begin
declare ch varchar(1);
declare i,start integer default -1;

if what is null then
return null;
end if;

set i=1;
l00p: loop
set ch=mid(what,i,1);
if ch='\\' then
set what=concat(left(what,i-1),mid(what,i+1));
set i=i+1;
elseif ch='[' then
set start=i;
elseif ch=']' and start!=-1 then
set what=concat(left(what,start-1),mid(what,i+1));
set i=start;
set start=-1;
end if;

set i=i+1;
if i>length(what) then
leave l00p;
end if;
end loop;

return what;
end
$$
delimiter ;

select
*

from osakawebbie

where lower(stripchord(lyrics)) like '%father%';
Tested with MySQL 5.5.25.

Some notes :
  • removes substrings between [ and ]
  • from enclosed [ ] pairs removes only the inner one
  • leaves single and unbalanced [ and ] unchanged
  • leaves backslash escaped \[ and \] unchanged
  • removes backslashes

CODE --> mysql

mysql> select * from osakawebbie;
+------------------------------------------------------------------------+
| lyrics |
+------------------------------------------------------------------------+
| zero] one[two[three]four]five [six] \[seven] [eight\] \[nine\] \\ [ten |
+------------------------------------------------------------------------+
1 row in set (0.04 sec)

mysql> select stripchord(lyrics) from osakawebbie;
+-------------------------------------------------------+
| stripchord(lyrics) |
+-------------------------------------------------------+
| zero] one[twofour]five [seven] [eight] [nine] \ [ten |
+-------------------------------------------------------+
1 row in set (0.00 sec)

Feherke.
http://feherke.github.com/

OsakaWebbie (Programmer)
17 Jun 12 1:07
That was beautiful! I had some confusion at first because when I copy/pasted from the forum, the linefeeds disappeared, and I didn't realize that they matter (in regular queries linefeeds are not necessary) - I was getting no errors, but no function either. But when I put the linefeeds back in, it worked like a charm. I haven't tested it completely, but so far it's doing exactly what it should.

Your caveats are fine - there are not supposed to ever be nested brackets, unbalanced brackets, etc., and if there are by mistake, it's fine that it doesn't strip anything. I also see uses for this function in other places in my code where currently I'm doing the stripping in PHP - it will get executed a lot!
OsakaWebbie (Programmer)
18 Jun 12 10:17
Upon further inspection, I'm not as elated as I was. It works if you are patient, but it's really slow. Is there anything I can do to speed it up?

I tried adding a couple indexes that probably should have been there all along, but it didn't help much if at all, and that's not surprising - the bottleneck is dismantling the Lyrics text, which has nothing to do with indexing.

FYI, there are 449 rows in the table, and the Lyrics column totals less than 250kB of text. A query like "SELECT * FROM pw_song WHERE LOWER(stripchord(Lyrics)) LIKE '%foundation%'" (which will return no rows) takes about 10 seconds.
Helpful Member!  feherke (Programmer)
18 Jun 12 10:45
Hi

Quote (OsakaWebbie)

It works if you are patient, but it's really slow. Is there anything I can do to speed it up?
Sorry to hear that. I am afraid there is no way to considerably speed it up. However, given that "there are not supposed to ever be nested brackets, unbalanced brackets, etc.", we can cut off some of the operations and use locate() instead of looping character by character. This should improve the performance abit, but probably not enough :

CODE --> MySQL

delimiter $$
create function stripchord(what text)
returns text
deterministic
no sql
begin
declare start,stop integer default 1;

if what is null then
return null;
end if;

l00p: loop

set start=locate('[',what,start);
if start=0 then
leave l00p;
end if;

set stop=locate(']',what,start);
if stop=0 then
leave l00p;
end if;

set what=concat(left(what,start-1),mid(what,stop+1));

end loop;

return what;
end
$$
delimiter ;

Feherke.
http://feherke.github.com/

OsakaWebbie (Programmer)
18 Jun 12 12:11
You were skeptical, but actually that made a lot of difference - my simple test query (single test, so numbers are rough) went from 9.89sec to 0.56sec, and my full interface (more complex queries) responds well also. You get another star for that - thanks!
feherke (Programmer)
19 Jun 12 2:26
Hi

Grr ! Actually the later function was the one I wrote first, but considered it too rigid and then I wrote the "improved" one I posted first. Sorry for making you run extra rounds.

In meantime I was thinking more on this. For the simple case of lower(stripchord(lyrics)) like '%father%' this approach is an overkill : stripchord() removes all chords ( 15 in your example ) before attempting to like, while would be enough to remove until the first match ( 4 in your example, when searching for "father" ). This could be solved with a function which does the likeing itself after each n chord removals, like stripchord(lower(lyrics),'%father%'). But this is affected by the way your PHP script is building up the where clause from the search expression entered by the used. So if you would like to try such further improving attempt, I need to know more about the building of the query condition. ( Note that the above described approach may be a dead end due to MySQL's poor language features. )

Feherke.
http://feherke.github.com/

OsakaWebbie (Programmer)
19 Jun 12 5:30
Thanks, but that's okay. I do have some complex criteria that can get added to the query depending on the user's situation - it might or might not have any conflict with your idea of bowing out at the first match, but I don't mind the current performance, so it's probably not worth tweaking it further. Plus, I have other uses for a function that can return a whole song's lyrics only (currently doing that in PHP, but this is nice and clean).

So you can save your extra effort for the next problem I have! wink

Reply To This Thread

Posting in the Tek-Tips forums is a member-only feature.

Click Here to join Tek-Tips and talk with other members!

Back To Forum

Close Box

Join Tek-Tips® Today!

Join your peers on the Internet's largest technical computer professional community.
It's easy to join and it's free.

Here's Why Members Love Tek-Tips Forums:

Register now while it's still free!

Already a member? Close this window and log in.

Join Us             Close