Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

  • Congratulations IamaSherpa on being selected by the Tek-Tips community for having the most helpful posts in the forums last week. Way to Go!

ETL without forcing relational data into XML

Status
Not open for further replies.

JoinerX

MIS
Sep 23, 2004
4
CA
Has anyone come across an ETL product that does not require you to force your data into the hierarchical XML model?

I have found one but for what we want to do with the data (a lot of conditional processing and it will be a continuous flow for the next year) being able to get and manipulate the data in a more abstract manner than XML/tree models allow would be helpful.

I'm sure there are products but I have to add another catch, the budget is US$50K. We will have 8 continual data feeds (each proprietary, most of them are flat files) and if they drop, we have a big problem. Once we have implemented the solution, we may get occassional format changes but nothing major.

I think our budget will probably have to treble but any suggestions would be great.

Thanks
 
Good question, well, it's currently SQL Server 2000, a new consultant is talking Oracle as he believes that Oracle, combined with a new structure and reduced query set with caching will reduce a few bottlenecks and give us some other benefits. I'm not aware of all the details.

Does that help?
 
I don't think any of the major ETL products put the data into XML unless you want to. They all pull it relationally and process it in their own representations. Or are you specifically talking about getting data via their "on demand" features? Even there, most interface with messaging products like Tibco, Vitria, MQ and do not require XML unless you are using a web services standard.
 
Thanks for the feedback mm0. I don't think we can get a Tibco etc. solution in our price range.

We have a number of different forms in differing databases plus a continual (intermittent but jumps from 50-600 per drop every day) flow of documents in HTML and a couple of proprietary formats that we must a) scan for certain data; b) perform processes on and modify dependant on the data found; and c) place in the new database structure (whatever that may be when the DBA and consultant make their mind up ;).

We could probably do what is needed with a combo of scripts and maybe a couple of in-house DLL's. My concern is that no-one has experience in these matters and time is running out.

What I want is to know how long is the piece of string, I want it all for an impossible price and I don’t want to have to learn or maintain anything. Easy!

In all seriousness I suspect our need is for a higher budget and a rethink of how we are approaching the problem. Having a few newbies like me looking around may not be the answer either.

NB. If it does help anyone, the processing is mostly, ‘if X contains Y then send to process 2 if not send to folder 1’ nothing to complex but we will have to do this as the documents come in before they hit the database.

JoinerX


 
Hi JoinerX,

Looks like your budget precludes any ETL solutions, and you have to create a custom programming solution.

You could try Perl or Python for doing the entire up-front processing, prior to putting the massaged data in the database. These (and other) scripting languages are powerful enough to avoid programming DLLs as part of the solution.

Perl for example, runs at blazing speed as far as scanning documents for patterns is concerned, can parse HTML and can write to your database of choice using DBI and DBD modules. If the volume of data is very large, you could use the native database utility to load the tables e.g. SQL*Loader for Oracle.

I have used a mix of Perl and Access databases in a shoestring setup (which nevertheless had to analyze ~ 5 million records for complex patterns) successfully. I have also used a mix of Perl and SQL*Plus/PL/SQL in a big DW setup where the job in question was simply too complicated to be done with the (expensive) ETL tool we were using ... : ) ...

Python is more elegant than Perl, but Perl has the edge in pattern scanning (or so I have gathered - I am no Larry Wall/Guido van Rossum). Both languages are free, have detailed tutorials on the net, and are supported by a large on-line community who are generous with tips, scripts, URLs, you-name-it ...

Best wishes ...
 
Status
Not open for further replies.

Part and Inventory Search

Sponsor

Back
Top