Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

  • Congratulations IamaSherpa on being selected by the Tek-Tips community for having the most helpful posts in the forums last week. Way to Go!

datawarehouse interview demo 1

Status
Not open for further replies.

sqlwarehouse

Programmer
Sep 9, 2011
4
GB
hello there,
i am very new to this forum....i have got an interview offer related to datawarehouse jobs... they asked me to give 10 minutes presentation on how would i establish the operational(i.e service) requirements for the data warehouse?

any idea guys ..operational requirements ..please ...Thank you very much
 
The main operation of a data warehouse is to load data from source systems to the warehouse and mart. Think of all the events that can occur. There are many.

How will data be obtained? Will the warehouse link into source systems and pull the data out? If so, what kind of systems are they? Cobol systems with data stored in ISAM/VSAM files. Perhaps Orcale or SQL Server systems. Will you use an ETL tool to get at those file structures? Perhaps it is easier to ask the source system people who are knowledgeable about their systems to create you flat file feeds. That is my preference, since I do not have to get bogged down figuring out where all the data is for each source. If files are sent to you, where are they sent? For example, I have all my sources FTP their flat files to a UNIX directory. Sounds easy, but there are problems associated with that. What if somebody mistakenly sends a file and issues a delete command? What stops him from deleting other files sent by other people? What if your load process starts processing the file before the FTP transmission completes. How do you check if the file is still "hot" and how do you ignore it? What happens if the file contains too many errors to process? Do you continue processing the next file? What if you receive or process a file in the wrong order. Your parent/child relationships between tables will be violated. That will produce incomplete end-user reports when inner joins between tables drop records. How do you enforce parent/child relationships if at all? How do you handle records that were once sent to the warehouse but have since been deleted from the source system? Deletions are a huge challenge since sources may be unable to send deleted records. One way to handle this is to accept a full feed from the source. Any records in the warehouse that are missing from the feed represent implicit deletions. The problem with full feeds is that they can be huge which will give you performance issues. Incremental feeds are preferable if they can provide deletions. Incremental feeds can be problematic too. An incremental feed should contain all data that has changed from the previous time data was sent. So the previous date and time needs to be tracked. It is a bad idea to assume the previous time was yesterday. Also, how do you identify records that have changed? Typically the time stamp on the source record is used, but is it reliable? Perhaps someone can make changes or do maintenance without updating the time stamp. Then there are challenges and quirks associated with data modeling, such a slowly changing dimensions. Do know what Type 1, 2 and 3 are. I have been asked this twice.

I can go on and on. In fact I wrote a 100 page document that discusses these issues and how they are resolved.

I hope this helps you get started. Good luck in the interview

 
hi dkyrtata, tks a lot for ur help...must admit ..have never done dw job before...i wanto get a job..so applied ,,iam developer..and used sql for backend purpose..created sps,views,dml triggers,udfs & job schedules ...etc... but never worked on dw side... dont want to miss th opportunity.... would u recommend any books ..or just practical approach ..i can quickly learn ...i just want to give the presentation on how would i establish the operational(i.e service) requirements for the data warehouse....which i have no clue now ..sorry to bother you ..much appreciated
 
Be aware that the two DW gurus are Ralph Kimball and Bill Inmon, (there is one other, Claudia something I think). They take two different approaches to DW. So you may be asked to briefly describe their methodologies. Most DWs take the ideas from both methodologies as DW is not an exact science.

Occasionally, I reread parts of Kimball's book, "The Data Warehouse Etl Toolkit". He describes all the possible problems and things you should do in a warehouse. It leaves me wonder how any DW project can ever deliver anything to its end users when there is just so many things to do. DW projects are huge and often lead to expensive failures - probably more often than not.

 
Claudia Imhoff. More recent developments center around Dan Linstedts 'Datavault' approach for corporate DW development

Ties Blom

 
anyone ....about operational requirements for the dataware house....

how about these.. can i say these following are operational requirements ....

-Requirement Gathering
-Physical Environment Setup
-Data Modeling
-ETL
-OLAP Cube Design
-Front End Development
-Report Development
-Performance Tuning
-Query Optimization
-Quality Assurance
-Rolling out to Production
-Production Maintenance
-Incremental Enhancements
 
Status
Not open for further replies.

Part and Inventory Search

Sponsor

Back
Top