Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

  • Congratulations strongm on being selected by the Tek-Tips community for having the most helpful posts in the forums last week. Way to Go!

Change content of files

Status
Not open for further replies.

merang

Programmer
Aug 27, 2014
17
MY
Hi all, i have a question to ask..i want to read an input file and write it to a new output file,
the content of my input file as follows (not limited to these item only, can have up to hundreds items):

Device <blank space> Type <blank space> Year <blank space> Status <blank space> Company
electronic/trend/latest/mp3 <blank space> ipod <blank space> 2012 <blank space> secondhand <blank space> apple
electronic/trend/latest/phone <blank space> Samsungs5<blank space> 2014 <blank space> secondhand <blank space> samsung
electronic/trend/latest/laptop <blank space> EliteBook<blank space> 2011 <blank space> secondhand <blank space> hp
electronic/trend/latest/iphone <blank space> iphone6 <blank space> 2014 <blank space> new <blank space> apple
electronic/trend/latest/phone <blank space> Samsungs5<blank space> 2014 <blank space> new <blank space> samsung
electronic/trend/latest/monitor <blank space> xpro <blank space> 2012 <blank space> secondhand <blank space> dell

First i need to remove the duplicate path + device and take the last occurence (in this example Samsungs5 has duplicate path and device, only take last occurence)

Second is need to display in the following format:


Path <blank space> Device <blank space> Type <blank space> Status
electronic/trend/latest <blank space> mp3 <blank space> ipod <blank space> secondhand
electronic/trend/latest <blank space> laptop <blank space> EliteBook<blank space> secondhand
electronic/trend/latest <blank space> iphone <blank space> iphone6 <blank space> new
electronic/trend/latest <blank space> phone <blank space> Samsungs5<blank space> new
electronic/trend/latest <blank space> monitor <blank space> xpro <blank space> secondhand

the Year, Company and the duplicate item is remove and only display last occurence of Samsungs5.

thanks in advance :)
 
Hi merang,
I would use hash to print only unique device type. Here is an example:

merang.tcl
Code:
[COLOR=#0000ff]# input file[/color]
[COLOR=#804040][b]set[/b][/color] fname [COLOR=#ff00ff]"merang.txt"[/color]
[COLOR=#804040][b]set[/b][/color] input_file [[COLOR=#804040][b]open[/b][/color] [COLOR=#008080]$fname[/color] [COLOR=#ff00ff]"r"[/color]]
[COLOR=#0000ff]# output file[/color]
[COLOR=#804040][b]set[/b][/color] new_fname [COLOR=#ff00ff]"merang_result.txt"[/color]
[COLOR=#804040][b]set[/b][/color] output_file [[COLOR=#804040][b]open[/b][/color] [COLOR=#008080]$new_fname[/color] [COLOR=#ff00ff]"w"[/color]]

[COLOR=#804040][b]while[/b][/color] { [[COLOR=#804040][b]gets[/b][/color] [COLOR=#008080]$input_file[/color] line] != -[COLOR=#ff00ff]1[/color] } {
[COLOR=#0000ff]  # skip empty lines[/color]
  [COLOR=#804040][b]if[/b][/color] {[COLOR=#008080]$line[/color]=={}} { [COLOR=#804040][b]continue[/b][/color] }  
[COLOR=#0000ff]  # return a list with the substrings matched by the regex[/color]
  [COLOR=#804040][b]set[/b][/color] line_list [[COLOR=#804040][b]regexp[/b][/color] -all -inline {\S+} [COLOR=#008080]$line[/color]]
[COLOR=#0000ff]  # extract fields from list [/color]
  [COLOR=#804040][b]foreach[/b][/color] {path_device type year status company} [COLOR=#008080]$line_list[/color] {}
[COLOR=#0000ff]  # extract path and device from path_device[/color]
  [COLOR=#804040][b]set[/b][/color] path_device_list [[COLOR=#804040][b]split[/b][/color] [COLOR=#008080]$path_device[/color] [COLOR=#ff00ff]"/"[/color]]
  [COLOR=#804040][b]set[/b][/color] path [[COLOR=#804040][b]join[/b][/color] [[COLOR=#804040][b]lrange[/b][/color] [COLOR=#008080]$path_device_list[/color] [COLOR=#ff00ff]0[/color] end-[COLOR=#ff00ff]1[/color]] [COLOR=#ff00ff]"/"[/color]]
  [COLOR=#804040][b]set[/b][/color] device [[COLOR=#804040][b]lindex[/b][/color] [COLOR=#008080]$path_device_list[/color] end]
[COLOR=#0000ff]  # create output line[/color]
  [COLOR=#804040][b]set[/b][/color] out_line [COLOR=#ff00ff]"[/color][COLOR=#008080]$path[/color][COLOR=#ff00ff] [/color][COLOR=#008080]$device[/color][COLOR=#ff00ff] [/color][COLOR=#008080]$type[/color][COLOR=#ff00ff] [/color][COLOR=#008080]$status[/color][COLOR=#ff00ff]"[/color]

[COLOR=#0000ff]  # create or overwrite hash entry[/color]
  [COLOR=#804040][b]set[/b][/color] myhash([COLOR=#008080]$type[/color]) [COLOR=#008080]$out_line[/color]

[COLOR=#0000ff]  # write line to the screen[/color]
  [COLOR=#804040][b]puts[/b][/color] [COLOR=#ff00ff]"input_line:   '[/color][COLOR=#008080]$line[/color][COLOR=#ff00ff]'"[/color]  
  [COLOR=#804040][b]puts[/b][/color] [COLOR=#ff00ff]"path_device = '[/color][COLOR=#008080]$path_device[/color][COLOR=#ff00ff]'"[/color]
  [COLOR=#804040][b]puts[/b][/color] [COLOR=#ff00ff]"path        = '[/color][COLOR=#008080]$path[/color][COLOR=#ff00ff]'"[/color]
  [COLOR=#804040][b]puts[/b][/color] [COLOR=#ff00ff]"device      = '[/color][COLOR=#008080]$device[/color][COLOR=#ff00ff]'"[/color]
  [COLOR=#804040][b]puts[/b][/color] [COLOR=#ff00ff]"type        = '[/color][COLOR=#008080]$type[/color][COLOR=#ff00ff]'"[/color]
  [COLOR=#804040][b]puts[/b][/color] [COLOR=#ff00ff]"year        = '[/color][COLOR=#008080]$year[/color][COLOR=#ff00ff]'"[/color]
  [COLOR=#804040][b]puts[/b][/color] [COLOR=#ff00ff]"status      = '[/color][COLOR=#008080]$status[/color][COLOR=#ff00ff]'"[/color]
  [COLOR=#804040][b]puts[/b][/color] [COLOR=#ff00ff]"company     = '[/color][COLOR=#008080]$company[/color][COLOR=#ff00ff]'"[/color]
  [COLOR=#804040][b]puts[/b][/color] [COLOR=#ff00ff]"*"[/color]
  [COLOR=#804040][b]puts[/b][/color] [COLOR=#ff00ff]"output line:  '[/color][COLOR=#008080]$out_line[/color][COLOR=#ff00ff]'"[/color]
  [COLOR=#804040][b]puts[/b][/color] [COLOR=#ff00ff]"***"[/color]
}

[COLOR=#0000ff]# print hash entries to the screen and in the file[/color]
[COLOR=#804040][b]puts[/b][/color] [COLOR=#ff00ff]"[/color][COLOR=#6a5acd]\n[/color][COLOR=#ff00ff]Hash entries:"[/color]
[COLOR=#804040][b]foreach[/b][/color] key [[COLOR=#804040][b]array[/b][/color] names myhash] {
[COLOR=#0000ff]  # print all hash values key => list[/color]
  [COLOR=#804040][b]puts[/b][/color] [COLOR=#ff00ff]"[/color][COLOR=#008080]$key[/color][COLOR=#ff00ff] => '[/color][COLOR=#008080]$myhash[/color][COLOR=#ff00ff]([/color][COLOR=#008080]$key[/color][COLOR=#ff00ff])'"[/color]
  [COLOR=#804040][b]puts[/b][/color] [COLOR=#008080]$output_file[/color] [COLOR=#008080]$myhash[/color]([COLOR=#008080]$key[/color])
}

[COLOR=#0000ff]# close files[/color]
[COLOR=#804040][b]close[/b][/color] [COLOR=#008080]$input_file[/color]
[COLOR=#804040][b]close[/b][/color] [COLOR=#008080]$output_file[/color]

Now when I have this input file
merang.txt
Code:
electro/trend/latest/mp3     ipod      2012 secondhand apple
electro/trend/latest/phone   Samsungs5 2014 secondhand samsung
electro/trend/latest/laptop  EliteBook 2011 secondhand hp
electro/trend/latest/iphone  iphone6   2014 new        apple 
electro/trend/latest/phone   Samsungs5 2014 new        samsung
electro/trend/latest/monitor xpro      2012 secondhand dell

and run the script above, it prints some output to the screen and produces the output file merang_result.txt:
Code:
C:\_mikrom\Work>tclsh merang.tcl
input_line:   'electro/trend/latest/mp3     ipod      2012 secondhand apple'
path_device = 'electro/trend/latest/mp3'
path        = 'electro/trend/latest'
device      = 'mp3'
type        = 'ipod'
year        = '2012'
status      = 'secondhand'
company     = 'apple'
*
output line:  'electro/trend/latest mp3 ipod secondhand'
***
input_line:   'electro/trend/latest/phone   Samsungs5 2014 secondhand samsung'
path_device = 'electro/trend/latest/phone'
path        = 'electro/trend/latest'
device      = 'phone'
type        = 'Samsungs5'
year        = '2014'
status      = 'secondhand'
company     = 'samsung'
*
output line:  'electro/trend/latest phone Samsungs5 secondhand'
***
input_line:   'electro/trend/latest/laptop  EliteBook 2011 secondhand hp'
path_device = 'electro/trend/latest/laptop'
path        = 'electro/trend/latest'
device      = 'laptop'
type        = 'EliteBook'
year        = '2011'
status      = 'secondhand'
company     = 'hp'
*
output line:  'electro/trend/latest laptop EliteBook secondhand'
***
input_line:   'electro/trend/latest/iphone  iphone6   2014 new        apple '
path_device = 'electro/trend/latest/iphone'
path        = 'electro/trend/latest'
device      = 'iphone'
type        = 'iphone6'
year        = '2014'
status      = 'new'
company     = 'apple'
*
output line:  'electro/trend/latest iphone iphone6 new'
***
input_line:   'electro/trend/latest/phone   Samsungs5 2014 new        samsung'
path_device = 'electro/trend/latest/phone'
path        = 'electro/trend/latest'
device      = 'phone'
type        = 'Samsungs5'
year        = '2014'
status      = 'new'
company     = 'samsung'
*
output line:  'electro/trend/latest phone Samsungs5 new'
***
input_line:   'electro/trend/latest/monitor xpro      2012 secondhand dell'
path_device = 'electro/trend/latest/monitor'
path        = 'electro/trend/latest'
device      = 'monitor'
type        = 'xpro'
year        = '2012'
status      = 'secondhand'
company     = 'dell'
*
output line:  'electro/trend/latest monitor xpro secondhand'
***

Hash entries:
ipod => 'electro/trend/latest mp3 ipod secondhand'
xpro => 'electro/trend/latest monitor xpro secondhand'
iphone6 => 'electro/trend/latest iphone iphone6 new'
EliteBook => 'electro/trend/latest laptop EliteBook secondhand'
Samsungs5 => 'electro/trend/latest phone Samsungs5 new'

merang_result.txt
Code:
electro/trend/latest mp3 ipod secondhand
electro/trend/latest monitor xpro secondhand
electro/trend/latest iphone iphone6 new
electro/trend/latest laptop EliteBook secondhand
electro/trend/latest phone Samsungs5 new
 
Tq for such a good explaination sir..one question, if some of the path is electro/trend/latest/phone and other path got electro/trend/latest/tech/hardware/phone, is the code still can be use sir? To get the device is phone only..

Thanks in advance :)
 
IMO, yes - it could be used, because $device is set to the last element of the path (e.g.: electro/../../phone) separated by "/" i.e.:
Code:
set device [lindex $path_device_list end]
But you can try it easily self...

For example, when I add to the input file a line with longer path
Code:
electro/trend/latest/mp3     ipod      2012 secondhand apple
electro/trend/latest/phone   Samsungs5 2014 secondhand samsung
electro/trend/latest/laptop  EliteBook 2011 secondhand hp
electro/trend/latest/iphone  iphone6   2014 new        apple 
electro/trend/latest/phone   Samsungs5 2014 new        samsung
electro/trend/latest/monitor xpro      2012 secondhand dell
[highlight]electro/trend/latest/tech/hardware/phone Samsungs5 2014 newest samsung[/highlight]
I get this output
Code:
electro/trend/latest mp3 ipod secondhand
electro/trend/latest monitor xpro secondhand
electro/trend/latest iphone iphone6 new
electro/trend/latest laptop EliteBook secondhand
[highlight]electro/trend/latest/tech/hardware phone Samsungs5 newest[/highlight]
 
OIC..sir,which part of the code that detect the same path and device, remove the duplicate and take only the last occurence? FYI the code should remove the duplicate if and only if the path and device are the same..

From the above example..we have 3 samsung right:

electro/trend/latest/phone Samsungs5 2014 secondhand samsung

electro/trend/latest/phone Samsungs5 2014 new samsung

electro/trend/latest/tech/hardware/phone Samsungs5 2014 newest samsung

The program should only remove the first two and take the last occurrence while the one that u added just now is stay. Is the code do that sir?
 
set line_list [regexp -all -inline {\S+} $line], is this the code to detect the duplicate? Need to modify it so that can meet the purpose of this program?

Or I need to come out with a new condition? Some example sir?

Thanks in advance. .
 
To get 2 phones with different path, we need to modify the source.
We used for hash key $device, now we will use $path_device.
So modify please this line:
Code:
[COLOR=blue]# create or overwrite hash entry[/color]
[highlight]set myhash([b]$path_device[/b]) $out_line[/highlight]

Then we get for this input
Code:
electro/trend/latest/mp3     ipod      2012 secondhand apple
electro/trend/latest/phone   Samsungs5 2014 secondhand samsung
electro/trend/latest/laptop  EliteBook 2011 secondhand hp
electro/trend/latest/iphone  iphone6   2014 new        apple 
electro/trend/latest/phone   Samsungs5 2014 new        samsung
electro/trend/latest/monitor xpro      2012 secondhand dell
electro/trend/latest/tech/hardware/phone Samsungs5 2014 newest samsung
the following output
Code:
electro/trend/latest iphone iphone6 new
electro/trend/latest monitor xpro secondhand
electro/trend/latest laptop EliteBook secondhand
electro/trend/latest mp3 ipod secondhand
electro/trend/latest/tech/hardware phone Samsungs5 newest
electro/trend/latest phone Samsungs5 new
You see there are now two different phones in the output. Is this what you needed?
 
yes, it answers my question sir.. i have modified the code with my own code to fulfill the purpose of my question.

for the header i use another command to set the header. is it possible to use the input file as the header?

Device <blank space> Type <blank space> Year <blank space> Status <blank space> Company
electronic/trend/latest/mp3 <blank space> ipod <blank space> 2012 <blank space> secondhand <blank space> apple

the expected Output:

Path <blank space> Device <blank space>Type <blank space>Status
electro/trend/latest iphone iphone6 new

thanks :)
 
Hi,
i have refer to your code and tried my own code, and it works well. i see that this code is the HARDCODE. if i want it to become a SOFTCODE, as an example if another input file has other added elements beside the previous one (Device Type Year Status Company [highlight #EF2929]Price[/highlight]), how do i modify the previous code so that it can become a SOFTCODE (more robust or universal). Means that if i have any input file regardless the addition of elements, i can put the same output, not just do the HARDCODE.

i think should use different arrays to access each elements, i want to know what is your opinion and an example if can. Thanks in advance.
 
IMO, when you have more fields in the line, you only need to change the field extraction method, that is to replace this loop
Code:
# extract fields from list 
foreach {path_device type year status company} $line_list {}
wit another method
 
I looked at it, use lindex and extract only the fields you need, so:
Code:
...
# extract fields from list 
#foreach {path_device type year status company} $line_list {}
set path_device [lindex $line_list 0]
set type [lindex $line_list 1]
set year [lindex $line_list 2]
set status [lindex $line_list 3]
set company [lindex $line_list 4]
...

now for the modified input file (with longer lines)
Code:
electro/trend/latest/mp3     ipod      2012 secondhand apple   
electro/trend/latest/phone   Samsungs5 2014 secondhand samsung 111  and other junk
electro/trend/latest/laptop  EliteBook 2011 secondhand hp 222 and other junk
electro/trend/latest/iphone  iphone6   2014 new        apple 333  and other junk
electro/trend/latest/phone   Samsungs5 2014 new        samsung 123 and other junk
electro/trend/latest/monitor xpro      2012 secondhand dell 223 and other junk
electro/trend/latest/tech/hardware/phone Samsungs5 2014 newest samsung 323 and other junk

the program delivers the same result as before:
Code:
electro/trend/latest iphone iphone6 new
electro/trend/latest monitor xpro secondhand
electro/trend/latest laptop EliteBook secondhand
electro/trend/latest mp3 ipod secondhand
electro/trend/latest/tech/hardware phone Samsungs5 newest
electro/trend/latest phone Samsungs5 new
 
Yes, because the loop is hardcode, it is fixed for that particular elements only..it is hard to change the elements in the loop everytime when there are changes in the input file.

To make the code more robust, can u give me an example of the another method that you mentioned before?
 
Emm lets say the price is not added at the last but instead add in any place..so I need to change the example that u give in previous post because it assign <set year [lindex $line_list 2]>...I need to modify them like <set price [lindex $line_list 2]> so that I can print out the output of price right?using this method I still need to change the code everytime the input file is changed (correct me if I am wrong)

If we use multidimensional array such as <set myhash($path_device, $type, $year, $status, $company) $out_line , is it possible to do like this so that we can access any elements easily?
 
Hi merang,
I posted some examples to help you to start with solving your problem.
Now, I hope you will understand how the program works and be able to make such small changes self.

I still need to change the code everytime the input file is changed (correct me if I am wrong)

No, but you should first analyze exactly what type of input lines are possible and which fields are relevant for your output.
Then either modify the source I posted so that it will satisfy your needs, or write your own.
 
Status
Not open for further replies.

Part and Inventory Search

Sponsor

Back
Top