Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

  • Congratulations SkipVought on being selected by the Tek-Tips community for having the most helpful posts in the forums last week. Way to Go!

REGEXP replace or equivalent?

Status
Not open for further replies.

OsakaWebbie

Programmer
Feb 11, 2003
628
0
0
JP
Is there any function (used to massage the output in a SELECT) that works like PHP's ereg_replace, i.e. finds all occurrences of a "regular expression" match and replaces it with a specified string?

Here's the background. I have database data (song lyrics and other accompanying info) that I output in various formats, using specifications that are themselves stored in the database - that way I have the maximum flexibility to customize formats without having to change the actual code. I am now adding the capability of including guitar chords with the lyrics - how it is stored in the DB is like: "Happy [G]Birthday to [D7]you..." (sorry for the silly example, but it's a song everybody knows!). When I just display it on the screen for viewing song information, I use PHP, so it's no problem. If the user wants to see the chords, I replace "[" and "]" with appropriate HTML & CSS to display them correctly (the chords go above the letter they precede - CSS is great for this!). If they want only lyrics, I replace "[whatever]" with nothing, which requires regexp rather than a simple string replace. That's in PHP, so no problem. But when I make a song set designed for printing, I do everything in SQL because of all those flexible formats that I don't want to hardcode. The replace() function in MySQL is used very heavily in these outputs to turn plain text into nice HTML with CSS classes, but I can't seem to find a way in MySQL to combine the flexibility of REGEX with the functionality of replace(). Can someone help? The server is currently using MySQL 3.23 (a bit old, I know, but it's not my machine, so I can't upgrade it).

For those of you who like to see the nitty-gritty code, here is what I use now (no consideration for chords):
Code:
SELECT concat('<p class=lyrics>',replace(replace(replace(replace(replace(Lyrics,'  ',' &nbsp;'),'\r\n','\n'),'\r','\n'),'\n','</p>\n<p class=lyrics>'),'<p class=lyrics></p>','<p class=smallspace>&nbsp;</p>'),'</p>\n')
It's hard to read, but it works. (The triple replace for various end of line characters is to cover all combinations for different operating systems - if I could do regexp replace, that could also be simplified.) For the case where the user does want the chords, I can do it by adding two more replace() calls - "[" to "<span class=chord>" and "]" to "</span>". But what about when the user does not want the chords? I need to remove the opening and closing bracket and the unknown content between them - the regular expression would be "\[[^\[]*\]", but if I can't use regexp, I'm stumped.
 
Unfortunately, MySQL only allows regular-expressions to be used for matching, not replacing. You would have to use program code to do regular-expression replacements.
 
Hmm, a part of me thinks the idea of dusting off my C skills to do a little "real programming" sounds like fun (does that statement show my age?), but in reality, I don't have time to tackle that. On the web, it seems that everybody is offering the same set of tools, which is $150 shareware and possibly only works with MySQL version 5.1 anyway.

I'll just rethink my paradigm, letting the choice of chords vs. no chords be a separate selection from which "format set" (which defines which fields are displayed, the SQL to get them, and the CSS to make them pretty) to use. That way I can do what I need in script.
 
Revisiting this thread with a related question...

I got my formatted output code working, but now that chords are being added to my database at a furious rate, I'm in another pickle. This might be solved with the search-type REGEXP in the where clause, but I can't get my brain wrapped around how to do it.

I allow the users to search for text within the lyrics, but now that the lyrics might look like "Happy [G]Birthday to [D7]you...", my current simple code "WHERE Lyrics like '%".$text."%'" will no longer reliably find all the results. Is there a way to do it using REGEXP with \[[^]]*\] (the regexp that matches any chord) somehow incorporated into it? BTW, there is no guarantee that chords will only occur outside of words - sometimes they precede a syllable that's not the first in the word, like "al[Em]mighty God...".
 
Hmm, a possibility just occurred to me, although hopefully there is still a simpler way. I could theoretically put the regexp for an optional chord in between every character in my search string, like this example for the word "almighty":
Code:
a[\[[^]]*\]]{0,1}l[\[[^]]*\]]{0,1}m[\[[^]]*\]]{0,1}i[\[[^]]*\]]{0,1}g[\[[^]]*\]]{0,1}h[\[[^]]*\]]{0,1}t[\[[^]]*\]]{0,1}y
But it's messy - is there a better way?
 
One more addendum. I decided to try my messy idea, but it didn't work - I get no results at all, no matter how simple my search string. Sorry if you don't know PHP, but this is the PHP statement I used to construct the where clause:
Code:
$where = "LOWER(Lyrics) REGEXP '".ereg_replace("(.{1})","[\[[^]]*\]]{0,1}\\1",strtolower($lyrics_search))."'";
I know I only need the chord gibberish between the characters, not before the first one, but this was an easy way to code it, and it should work anyway. Does someone see a flaw, or have an completely different way of going about this?
 
Okay, I discovered that there should have been () instead of [] in one spot (changing it to "(\[[^]]*\]){0,1}\\1"). Now, at least it gives results if there are no chords involved. But it still can't find the songs where a chord is in the middle of the text that would otherwise match.
 
I'm sorry that I previously had too many addendums in a row with minor changes, making the thread confusing, but does anyone have any thoughts on my query? (If I don't hear something soon, I'll assume this thread has become too messy and start over on a new thread.)
 
It's not really a MySQL issue any more by the look of things. If you re-post it in the PHP forum, the experts there could probably help.
 
It's definitely still an SQL issue - sorry if I wasn't clear.

Okay, here's a real example. First of all, this is the contents of the Lyrics field for one record:
Code:
I [D]love You, Lord, and I [Em]lift my [D]voice
To [G]worship [Em]You, O my [Asus]soul, re[A]joice
Take [D]joy, my King, in [Em]what You [D]hear
May it be a [G]sweet, [D/F#]sweet [Em]sound [A]in Your [G  D]ear
And when I ask my code to search the lyrics for the word "rejoice", my latest attempt at the SQL is this:
Code:
SELECT various fields... FROM pw_song WHERE LOWER(Lyrics) REGEXP '(\[[^]]*\]){0,1}r(\[[^]]*\]){0,1}e(\[[^]]*\]){0,1}j(\[[^]]*\]){0,1}o(\[[^]]*\]){0,1}i(\[[^]]*\]){0,1}c(\[[^]]*\]){0,1}e' ORDER BY OrigTitle
It does not find the song (although it does find songs that do not have anything in brackets in the middle of the word). Can someone see why? Or do you know an easier way to do this kind of search, to consider anything in [brackets] as invisible somehow?
 
Sorry, your code is just too complicated to examine, so I'm not going to start. You could however narrow down the possibilities by saying[tt]
WHERE lyrics LIKE "%r%e%j%o%i%c%e%"[tt]
If you're lucky this might return just the one record, but if not then you can use a PHP regex filter to isolate the one required.
 
That would probably return most of the records in my database! Remember, the field Lyrics contains the lyrics for an entire song (the four-line song I posted is much shorter than average). I'll make up an example: "Praise Jesus, lift up your voices" would contain the entire match for "rejoice" in just in one phrase, so think how easy it would be if there were 5 or 20 times that much text in every song. What would be the point is doing something in MySQL just to have to do it again in PHP? I might as well pull in the whole database and then pick it apart in PHP - not at all what I want to do.

My code is not complicated, just long, because I am repeating the same thing between each letter of the search text (I thought I made that clear, but that was several posts ago). The problem is probably that '(\[[^]]*\]){0,1}' doesn't actually work for matching a set of brackets with something in the middle of them. But the intent was:

\[ A literal opening bracket
[^]]* Zero or more occurrences of any character except a closing bracket (so that it doesn't swallow up more than one set of brackets)
\] A literal closing bracket
{0,1} Either zero or one occurrences of the whole thing, i.e there can be one set or not

Isn't there anyone out there who is good with REGEXP who can see what is wrong with this expression?
 
Just a closing comment for anyone who finds this thread through a search: I started a new thread for clarity, and the puzzle was solved there (thread436-1221147), but the short of it is that I needed to double my backslashes to get them through the interface to MySQL. Thanks again to Tony for hanging in there with me!
 
Status
Not open for further replies.

Part and Inventory Search

Sponsor

Back
Top