LovecraftHP
Programmer
Hi,
Been working on a program that among others should be able to split up verbs in a onset-nucleus-coda structure (with the help of this forum, thanks guys). Almost finished now, except for a couple of bugs.
Sample code:
BEGIN { FS="," ; fs="[" ; d="=" }
{
printf $1 FS $2 FS $3
n = split($4, a, fs)
split($5, b, fs)
for (x=(n-2); x<=n; x++) {
f=z=0
k=j=l=""
for (y=1; y<length(a[x]); y++) {
p=substr(a[x],y,1)
q=substr(b[x],(y+z),1)
r=substr(b[x],y,(z+2))
s=substr(a[x],(y+1),1)
if (p=="V") {
k=k q
f=1
}
if (p=="C") {
if (r!="dZ"||r!="tS") {
if (f==0) j=j q
if (f==1) l=l q
}
if (r=="dZ"||r=="tS") {
if (f==0) {
j=r
z++
}
else if (f==1) {
l=r
}
}
}
}
printf FS (j?j:d) FS (k?k:d) FS (l?l:d)
}
printf FS $8 "." RS
}
Used on this data sample:
acknowledge,acknowledged,611,[VC][CV][CVC],[@k][nO][lIdZ],[VC][CV][CVCC],[@k][nO][lIdZd],@+d,318
adjudge,adjudged,15,[V][CVC],[@][dZVdZ],[V][CVCC],[@][dZVdZd],@+d,483
challenge,challenged,545,[CV][CVCC],[tS&][lIndZ],[CV][CVCCC],[tS&][lIndZd],@+d,6955
change,changed,8239,[CVVCC],[tSeIndZ],[CVVCCC],[tSeIndZd],@+d,6997
judge,judged,718,[CVC],[dZVdZ],[CVCC],[dZVdZd],@+d,24555
Produces this output:
acknowledge,acknowledged,611,=,@,k,n,O,=,l,I,dZ,@+d.
adjudge,adjudged,15,=,=,=,=,@,=,dZ,V,d,@+d.
challenge,challenged,545,=,=,=,tS,&,=,l,I,dZ,@+d.
change,changed,8239,=,=,=,=,=,=,tS,eI,nd,@+d.
judge,judged,718,=,=,=,=,=,=,dZ,V,d,@+d.
When it should be:
adjudge,adjudged,15,=,=,=,=,@,=,dZ,V,dZ,@+d.
change,changed,8239,=,=,=,=,=,=,tS,eI,ndZ,@+d.
judge,judged,718,=,=,=,=,=,=,dZ,V,dZ,@+d.
So when working on the same syllable (not different ones, eg "challenge") the second tS or dZ sound gets shortened to a simple t or d. If anyone would be so kind as to help? Thanks.
Been working on a program that among others should be able to split up verbs in a onset-nucleus-coda structure (with the help of this forum, thanks guys). Almost finished now, except for a couple of bugs.
Sample code:
BEGIN { FS="," ; fs="[" ; d="=" }
{
printf $1 FS $2 FS $3
n = split($4, a, fs)
split($5, b, fs)
for (x=(n-2); x<=n; x++) {
f=z=0
k=j=l=""
for (y=1; y<length(a[x]); y++) {
p=substr(a[x],y,1)
q=substr(b[x],(y+z),1)
r=substr(b[x],y,(z+2))
s=substr(a[x],(y+1),1)
if (p=="V") {
k=k q
f=1
}
if (p=="C") {
if (r!="dZ"||r!="tS") {
if (f==0) j=j q
if (f==1) l=l q
}
if (r=="dZ"||r=="tS") {
if (f==0) {
j=r
z++
}
else if (f==1) {
l=r
}
}
}
}
printf FS (j?j:d) FS (k?k:d) FS (l?l:d)
}
printf FS $8 "." RS
}
Used on this data sample:
acknowledge,acknowledged,611,[VC][CV][CVC],[@k][nO][lIdZ],[VC][CV][CVCC],[@k][nO][lIdZd],@+d,318
adjudge,adjudged,15,[V][CVC],[@][dZVdZ],[V][CVCC],[@][dZVdZd],@+d,483
challenge,challenged,545,[CV][CVCC],[tS&][lIndZ],[CV][CVCCC],[tS&][lIndZd],@+d,6955
change,changed,8239,[CVVCC],[tSeIndZ],[CVVCCC],[tSeIndZd],@+d,6997
judge,judged,718,[CVC],[dZVdZ],[CVCC],[dZVdZd],@+d,24555
Produces this output:
acknowledge,acknowledged,611,=,@,k,n,O,=,l,I,dZ,@+d.
adjudge,adjudged,15,=,=,=,=,@,=,dZ,V,d,@+d.
challenge,challenged,545,=,=,=,tS,&,=,l,I,dZ,@+d.
change,changed,8239,=,=,=,=,=,=,tS,eI,nd,@+d.
judge,judged,718,=,=,=,=,=,=,dZ,V,d,@+d.
When it should be:
adjudge,adjudged,15,=,=,=,=,@,=,dZ,V,dZ,@+d.
change,changed,8239,=,=,=,=,=,=,tS,eI,ndZ,@+d.
judge,judged,718,=,=,=,=,=,=,dZ,V,dZ,@+d.
So when working on the same syllable (not different ones, eg "challenge") the second tS or dZ sound gets shortened to a simple t or d. If anyone would be so kind as to help? Thanks.