Hey peers...
well... the Q... all i need to do is write a parser that would be used to convert data from existing microsoft excel/word files to a SQL database...
this is what i have to do and am just starting off! i thought i'd gather a few inputs from you guys before actually getting into making the LLD!
It can be assumed that *The structure of the EXCEL files would be consistent...* meaning i wud not have fall into the infinite loop of supporting 'n' different structures!
There are also different templates that each of the EXCEL file would fall in... all that i would have to do is make the template part parametric in my parser and parse the EXCEL/WORD files based on the template-structure they have.
but the EXCEL files are not gonna have a simple table-kinda arch. I cud have excel files this way:
here a simple template model (well this is just an example!):
ID CNTRY_NAME CNTRY_CONT ID CNTRY_NAME CNTRY_CONT ID CNTRY_NAME CNTRY_CONT
1 aaa aaaa 4 ddd dddd 7 ggg gggg
2 bbb bbbb 5 eee eeee 8 hhh hhhh
3 ccc cccc 6 fff ffff 9 iii iiii
I am looking at bottling up an XML file for each of the different templates wherein I would be mapping data in specific Word/Excel file positions, to their corresponding location in the Database:
My parser would then parse the EXCEL/WORD files based on the template and move the data into the corresponding locations.
Now, if this approach gets finalized, all i am left with the of how to actually read the Word/Excel files. I have kinda cornered 2 ways of doing it...
1. USE THE MS EXCEL/WORD TYPE LIBRARIES:
Well, this doesnt really need much explanation. i might record the excel columns co-ords in the XML and then just open/read through the EXCEL file.
2. Use Ado.net To Retrieve Data From A Microsoft Excel Workbook with VB.Net:
The Excel Worksheets can be accessed with ADO.Net through the Jet OLEDB provider. The Microsoft Jet database engine can access data in other database file formats, such as Excel workbooks, through installable Indexed Sequential Access Method (ISAM) drivers.
This way, one would be able to open EXCEL documents almost like databases and each of the Worksheets can further be accessed as seperate Tables. The one big advantage of this method would be that I might not have to keep my EXCEL worksheet open all the time.
I would just open them and move them to a dataset and do all the manipulation with the disconnected dataset, thereby involving lesser memory usage.
What I'd like you guys to do is validate the approaches and help me nail down one of the approaches. I also invite criticism, but then it would really hold good if the blast is supported with a strong backup/work-around!
catchya guys on your says...
Sham
Sham aka boolean... be practical/be straight... true/false?!
well... the Q... all i need to do is write a parser that would be used to convert data from existing microsoft excel/word files to a SQL database...
this is what i have to do and am just starting off! i thought i'd gather a few inputs from you guys before actually getting into making the LLD!
It can be assumed that *The structure of the EXCEL files would be consistent...* meaning i wud not have fall into the infinite loop of supporting 'n' different structures!
There are also different templates that each of the EXCEL file would fall in... all that i would have to do is make the template part parametric in my parser and parse the EXCEL/WORD files based on the template-structure they have.
but the EXCEL files are not gonna have a simple table-kinda arch. I cud have excel files this way:
here a simple template model (well this is just an example!):
ID CNTRY_NAME CNTRY_CONT ID CNTRY_NAME CNTRY_CONT ID CNTRY_NAME CNTRY_CONT
1 aaa aaaa 4 ddd dddd 7 ggg gggg
2 bbb bbbb 5 eee eeee 8 hhh hhhh
3 ccc cccc 6 fff ffff 9 iii iiii
I am looking at bottling up an XML file for each of the different templates wherein I would be mapping data in specific Word/Excel file positions, to their corresponding location in the Database:
My parser would then parse the EXCEL/WORD files based on the template and move the data into the corresponding locations.
Now, if this approach gets finalized, all i am left with the of how to actually read the Word/Excel files. I have kinda cornered 2 ways of doing it...
1. USE THE MS EXCEL/WORD TYPE LIBRARIES:
Well, this doesnt really need much explanation. i might record the excel columns co-ords in the XML and then just open/read through the EXCEL file.
2. Use Ado.net To Retrieve Data From A Microsoft Excel Workbook with VB.Net:
The Excel Worksheets can be accessed with ADO.Net through the Jet OLEDB provider. The Microsoft Jet database engine can access data in other database file formats, such as Excel workbooks, through installable Indexed Sequential Access Method (ISAM) drivers.
This way, one would be able to open EXCEL documents almost like databases and each of the Worksheets can further be accessed as seperate Tables. The one big advantage of this method would be that I might not have to keep my EXCEL worksheet open all the time.
I would just open them and move them to a dataset and do all the manipulation with the disconnected dataset, thereby involving lesser memory usage.
What I'd like you guys to do is validate the approaches and help me nail down one of the approaches. I also invite criticism, but then it would really hold good if the blast is supported with a strong backup/work-around!
catchya guys on your says...
Sham
Sham aka boolean... be practical/be straight... true/false?!