OsakaWebbie
Programmer
I use mb_split (in a UTF-8 environment that can contain Japanese text) to break apart stanzas of song lyrics I have saved in a database, but it doesn't always find the pattern when it should, and the difference seems to be the previous character. This should be really simple:
Normally it works, recognizing a blank line. But if the previous character is the Japanese character "く" (UTF-8 0x304f), the blank line is not recognized and the two stanzas are not split. There are probably other characters that cause the same problem, but 0x304f is the one I have identified so far.
I thought perhaps something about that character caused mb_split to think the next byte is part of the same character, but I checked my data, and it actually contains a CR before each LF, so even if the CR got swallowed, since the regex is only looking for LF anyway, it still should be fine. I can't figure out what the problem could be. Does anyone have an idea?
If you would like a little more context, here is the result of the query I used to examine the line ending bytes:
The second blank line is recognized, but not the first one.
PHP:
$stanzas = mb_split("\n\s*\n",rtrim($song->Lyrics));
I thought perhaps something about that character caused mb_split to think the next byte is part of the same character, but I checked my data, and it actually contains a CR before each LF, so even if the CR got swallowed, since the regex is only looking for LF anyway, it still should be fine. I can't figure out what the problem could be. Does anyone have an idea?
If you would like a little more context, here is the result of the query I used to examine the line ending bytes:
SQL:
SELECT REPLACE(REPLACE(Lyrics,CHAR(10),'{LF}\n'),CHAR(13),'{CR}') FROM pw_song WHERE SongID=468
Code:
[D]荒野の[D/A]果[A]て[D]に 夕日は[D/A]落[A]ち[D]て{CR}{LF}
[D]妙(たえ)なる[D/A A]調[D]べ 天(あめ)より[D/A A]響[D]く{CR}{LF}
{CR}{LF}
[D Bm G A D G]グローー[A]リア、[D]イン [A/C#]エク[D]セル[G]シス [D/A]デ[A]オ{CR}{LF}
[D Bm G A D G]グローー[A]リア、[D]イン [A/C#]エク[D]セル[G]シス [D/A A]デ[D]オ{CR}{LF}
{CR}{LF}
[D]羊を[D/A A]守[D]る 野辺の[ D/A]牧[A D]人{CR}{LF}
[D]天なる[D/A A]歌[D]を 喜び[D/A]聞[A]き[D]ぬ{CR}{LF}