Guest_imported
New member
- Jan 1, 1970
- 0
Hello,
I've got two kinds of textfiles. The first kind of file contains English sentences and phrases (one on each line). Every word is followed by its part of speech tag. The second kind of file contains the French translations of these English sentences (again one sentence on each line). The first French sentence is the translation of the first sentence in the English file, etc. For instance:
File 1:
He/3PS is/BEZ eating/VBG cheese/NN with/IN a/Art\ yellow/Adj spoon/NN
John/NN took/VBD his/Pr dog/NN with/IN him/Pr
She/Pr was/BED a/Art girl/NN with/IN a/Art big/Adj and/Conjlovely/Adj smile/NN
File 2:
Il/3PS mange/VBS du/Art fromage/NN avec/IN une/Art\ cuillère/NN jaune/Adj
Jean/NN prennait/VBD son/Pr chien/NN avec/IN lui/Pr
Elle/Pr était/BED une/Art fille/NN avec/IN un/Art grand/Adjet/Conj beau/Adj sourire/NN
(The backslashes "\" are used to indicate that the lines do not stop there! On each line there's only one sentence. So, the "\"'s do not belong to the input!)
I need to write a script that combines these two files into the following third file:
is,BEZ,eating,VBG,cheese,NN,with,IN,a,Art,yellow,Adj,\ spoon,NN,avec
took,VBD,his,Pr,dog,NN,with,IN,him,Pr,=,=,=,=,avec
was,BED,a,Art,girl,NN,with,IN,a,Art,big,Adj,and,Conj,avec
So from the English file, I'm only interested in the preposition (in this case "with" and the three words (if any) by which this preposition is preceded and by which it is followed. If there are no three words that precede or follow the preposition, the empty fields should be filled with a "=".
Words and tags should be separated by comma's. From the French file, I'm only interested in the prepositions. These should be placed at the end of each line in the new file. In this way it is easy to see how each English preposition is translated into French. Every line should eventually contain 15 fields.
Is it possible to do this with gawk?
Febri
I've got two kinds of textfiles. The first kind of file contains English sentences and phrases (one on each line). Every word is followed by its part of speech tag. The second kind of file contains the French translations of these English sentences (again one sentence on each line). The first French sentence is the translation of the first sentence in the English file, etc. For instance:
File 1:
He/3PS is/BEZ eating/VBG cheese/NN with/IN a/Art\ yellow/Adj spoon/NN
John/NN took/VBD his/Pr dog/NN with/IN him/Pr
She/Pr was/BED a/Art girl/NN with/IN a/Art big/Adj and/Conjlovely/Adj smile/NN
File 2:
Il/3PS mange/VBS du/Art fromage/NN avec/IN une/Art\ cuillère/NN jaune/Adj
Jean/NN prennait/VBD son/Pr chien/NN avec/IN lui/Pr
Elle/Pr était/BED une/Art fille/NN avec/IN un/Art grand/Adjet/Conj beau/Adj sourire/NN
(The backslashes "\" are used to indicate that the lines do not stop there! On each line there's only one sentence. So, the "\"'s do not belong to the input!)
I need to write a script that combines these two files into the following third file:
is,BEZ,eating,VBG,cheese,NN,with,IN,a,Art,yellow,Adj,\ spoon,NN,avec
took,VBD,his,Pr,dog,NN,with,IN,him,Pr,=,=,=,=,avec
was,BED,a,Art,girl,NN,with,IN,a,Art,big,Adj,and,Conj,avec
So from the English file, I'm only interested in the preposition (in this case "with" and the three words (if any) by which this preposition is preceded and by which it is followed. If there are no three words that precede or follow the preposition, the empty fields should be filled with a "=".
Words and tags should be separated by comma's. From the French file, I'm only interested in the prepositions. These should be placed at the end of each line in the new file. In this way it is easy to see how each English preposition is translated into French. Every line should eventually contain 15 fields.
Is it possible to do this with gawk?
Febri