Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

  • Congratulations gkittelson on being selected by the Tek-Tips community for having the most helpful posts in the forums last week. Way to Go!

trying to create a master table from two other tables 2

Status
Not open for further replies.
Jan 20, 2007
237
US
Hi everyone,
i want to create a master table from these two tables
eng_jobs.dbf with 49,598 records
eng_xref.dbf with 4,886 records

so i want the Master table(in my select sql named "junkmaster") to have all the fields from the two mentioned above and the records that match
Code:
where eng_jobs.job_no=eng_xref.job_no

the junkmaster cursor will have 151 fields.
Please correct me, if the following SELECT SQL is wrong.

Code:
Select * from eng_jobs t1;
	left Outer Join eng_xref t2;
	on t1.job_no = t2.job_no;
	where t1.job_no=t2.job_no and t1.draw_no=t2.layout,AND t1.sht=t2.sht;
	into Cursor junkmaster readwrite && or into table junkmaster

also when running the above select sql, i am getting the following message, so the "junkmaster" cursor cannot be created, any suggestion for that ?
here is the message "File c:\users\admin.M100\appdate\local\00009tr1002i.tmp is too large"3
any help or suggestion is very appreaciated
Thanks
 
Hi Ernesto,

The message means what it says. VFP tables are limited to 2 GB. The message means that your file is larger than that.

The fact that you are using a left outer join means that all 49,598 records in eng_jobs will be present in the output file, even if they don't have a matching record in eng_xref. Furthermore, those that do match eng_xref will have multiple records in the output file - one for each match. Is that what you want?

So that's potentially 49,598 * up to 4,886 * 151 fields * the length of the fields. That could easily exceed 2 GB.

So, the question is: Do you really need 151 fields in the output file? Can you set up a second table for the fields used less frequently? Or can you reduce the size of the fields (for example, if you have a large character field with around 200 characters, consider making that a memo field instead, which takes up much less space in the DBF).

Mike

__________________________________
Mike Lewis (Edinburgh, Scotland)

Visual FoxPro articles, tips and downloads
 
Hi Mike,
i actually need to have the table with all the fields, later i will remove just about 4 or 5 fields but for now, it is not a problem with the field but was when i had the select as below

Code:
Select * from eng_jobs t1;
	left Outer Join eng_xref t2;
	on t1.job_no = t2.job_no;
	where t1.job_no=t2.job_no ;
	into Cursor junkmaster readwrite && or into table junkmaster
so i need actually all the records in eng_jobs and the one that match in eng_xref so, it is wrong the SELECT SQL below ?
Code:
Select * from eng_jobs t1;
	left Outer Join eng_xref t2;
	on t1.job_no = t2.job_no;
	where t1.job_no=t2.job_no and t1.draw_no=t2.layout,AND t1.sht=t2.sht;
	into Cursor junkmaster readwrite && or into table junkmaster
because you know, each record in eng_jobs, do not have a matching record in eng_xref but ENG_XREF has records that actually means they have records in eng_jobs even though both tables does not have one to one matching, sorry it is a little bit difficult to explain, i only need one table with all the records needed, now my junkmaster table has about 34,981 records, so it looks fine but i believe so some other records, tha need to be there, they are not there as eng_jobs has 49,598 records, so in junkmaster should be about more records but maybe the where in the select is the one is dictating it, or the left Outer join, should not be like that
thanks a lot in advance
 
I suspect your main problem is the comma in the WHERE clause before AND.

However, there's also no reason to have the same condition in both the JOIN clause and the WHERE clause. Remove t1.job_no=t2.job_no from the WHERE clause; it's redundant.

Tamar
 
You typically have a xref table connecting two tables in n:m relationship. The problem remains as Mike identifies it first: You can have too many pairs. Actually you can only have a real head table with a strictly hierarchy going from 1 top record to n details each having m details, also called 1:n:m relation and that's not the same as n:m (this isn't math of quotients), it' rather 1:n:n*m vs n:m

I wonder what's your third table, maybe the xref table even makes a self relationship between two eng_jobs records.

It's okay the way it is, you don't go for denormilization of data. All you need is a query or view on your data that takes a portion as necessary for one person or any one value filtering the fraction of data you need at a time. That's why you do have normalized data, to only put it together as necessary at runtime.

Bye, Olaf.

Olaf Doschke Software Engineering
 
What might actually help you would be introducing IDs in the xref table and additional fields/properties added to these records or a third table referenced by a third foreign key in the xref table. n_m references don't need to stay pure links of data, they can be entities, too. Eg contacts could be mail, phone, skype, linkedin profile, address, contact persons and so a contact table can link person to many other tables and even back to other persons and you could choose between having foreign keys for all of them or just a single one per contact but with a contact type. And then you could also collect all the same contact type data into several tables.

You don't get forward with collecting data the way you want to browse it but splitting up data in the logical units easily relatable to each other on the one side and queries/joins putting them together in one way or the other. In NoSQL databases what you want to do is done and instead of joining data all related data is collected in docuemtns, which can each differ and have no 1:1 same structure and even nest data in XML or JSON within such documents. That may be a case of switching to that database paradigm, but I doubt it. You just have a more complex relationship between data and don't wrap your head around it. Please, this is really the fundamental thing to learn about relational databases, you store the atomic pieces of data separately to be able to put them together in different point of views, you don't store them in these point of views, you have views or queries that do so at runtime, not at "designtime".

And it can be faster in certain conditions, but what NoSQL fans often overlook is fetching documents as stored in the denormalized form also takes time, this isn't free, too, you're sparing to find the join and p0ut together the 1K of one side and 2K of the other side, but you still read 3K or much more (redundantly copied) in documents. Data as you have already is in the better form, if you need to put it together at runtime, that's a good sign for relational databases as DBFs are, not a bad sign.

Bye, Olaf.

Olaf Doschke Software Engineering
 
Ernesto, you say that you only want the records from eng_jobs that have matching records in eng_xref. In that case, you should be using an inner join not an outer join. In other words, remove the LEFT OUTER from your SELECT.

That will reduce the number of records in the output table. But the underlying problem will remain, that is, you still have an enormous record structure in the output table. 151 is a very large number of fields, and you really need to consider whether you need all of them.

It might help if you could let us know what you will be using the output table for.

Mike

__________________________________
Mike Lewis (Edinburgh, Scotland)

Visual FoxPro articles, tips and downloads
 
It will help to know more about the nature of the involved tables and data, just the real world meanings. Merging details as you see you get more data, not less. That's even true with inner joins, you still can have multiple matches to one record, even from both sides. Mike is correct that going from outer to inner join reduces the overall result, but only removes those records filling just one side. That removes that data and in the sense of restructuring the database that can't be a goal just because it reduces space. In any case, you multiply data and that's not a sign of creating a head table, that's a sign of creating a detail table. A head table will aggregate data and collect all the common same values of an entity.

SQL gives you every type of grouping and aggregating on multiple levels at hand as you need them and can do that based on the same atomic data. If you need to join data to get any more meaningful result than just a list of atomic things, that's a good sign for your database structure, not a bad sign.

Bye, Olaf.



Olaf Doschke Software Engineering
 
Ok Mike, Olaf,
as always thanks so much, after analyzing and asking why the boss wants, he said this, i just need a table(junkmaster) that contains all the fields from table1 and table2 and have in there all the records that were in table1 and table2, i said this tables have info that are linked to each other by the fields t1.job_no, t1.draw_no, t2.layout, t1.sht and t2.sht, where table1=t1 and table2=t2, for your reference, if i do any kind of join even what i did, even what you are suggesting me, i will leave out a bunch of records that will never match in table1 as table2 does not have ant ID as Olaf mentioned, so this was a bad design from the beginning, in fact this tables are foxpro dos tables, so i was lazy and was doing the below select
Code:
Select * from eng_jobs t1;
	left Outer Join eng_xref t2;
	on t1.job_no = t2.job_no;
	where t1.job_no=t2.job_no and t1.draw_no=t2.layout,AND t1.sht=t2.sht;
	into Cursor junkmaster readwrite
so this is not going to solve my problem because they want all records from eng_jobs and eng_xref in the new table "junkmaster"
so i have not other choice to manually add, in the Select each field and create an empty cursor and then at the end use "append from" each table, of course i need to eliminate the common fields name from eng_xref found in eng_jobs, otherwise i will get field_a, field_b and so on, on common fields, now i use the "append from" then later i "order by" so it can be more organized data.

the whole thing is they want to have all this data in a csv file or excel file, as they want to provide this data to a software vendor, to see if they can utilize this data in the new application, so that is the whole reason, for me it is not feasible this way, i would rather to have less data and just make sure i have linked data from both tables into the master table, so makes sense to have at least in my opinion a record with the correct data from both tables inline but to have as they want , i did this as below
Code:
Select  t1.JOB_NO, t1.REL_CODE, t1.DRAW_NO, t1.BALL_NO, t1.PART_NO, t1.REVISION, t1.DRAW_SIZE, t1.PART_DESC,;
	t1.PART_DESC2, t1.PART_DESC3, t1.PART_DESC4, t1.PART_TYPE, t1.VEN_PARTNO, t1.MFG, t1.QTY, t1.BM_QTY, t1.MATERIAL,;
	t1.SIZE_A, t1.SIZE_B, t1.SIZE_C, t1.SIZE_D, t1.SIZE_E, t1.SIZE_F, t1.SIZE_G, t1.FINISH, t1.OUT_TREAT, t1.Memo,;
	t1.OPERATOR, t1.LATH_OPER, t1.MILL_OPER, t1.PMILL_OPER, t1.CNC_OPER, t1.GRIND_OPER, t1.SHEET_OPER, ;
	t1.WELD_OPER, t1.RUB_OPER, t1.LATH_SET, t1.MILL_SET, t1.PMILL_SET, t1.CNC_SET, t1.GRIND_SET,;
	t1.SHEET_SET, t1.WELD_SET, t1.RUB_SET, t1.LATH_RUN, t1.MILL_RUN, t1.PMILL_RUN, t1.CNC_RUN, t1.GRIND_RUN, ;
	t1.SHEET_RUN, t1.WELD_RUN, t1.RUB_RUN, t1.STAGE, t1.LATH, t1.MILL, t1.PMILL, t1.CNC, t1.GRIND, t1.Sheet,;
	t1.WELD, t1.RUB, t1.LATHSET_E, t1.MILLSET_E, t1.PMILLSET_E, t1.CNCSET_E, t1.GRINDSET_E, t1.SHEETSET_E, ;
	t1.WELDSET_E, t1.RUBSET_E, t1.LATHRUN_E, t1.MILLRUN_E, t1.PMILLRUN_E, t1.CNCRUN_E, t1.GRINDRUN_E, ;
	t1.SHEETRUN_E, t1.RUBRUN_E, t1.WELDRUN_E, t1.ASM_EST, t1.ASM_RUN, t1.TRACKING, t1.REWORKED, t1.SCH_DATE, ;
	t1.COMMENTS, t1.ASM_ORDER, t1.RELEASED, t1.BEFORE_BC, t1.BM_PRT, t1.STD_PRT, t1.PURCH_PRT, t1.EST_DONE, ;
	t1.ENG, t1.SCH, t1.PRT, t1.REV, t1.NREL_0, t1.SHT, t1.SPARE, t1.MYPK, t1.DSUBMIT, t1.DCLOSED, t1.MARKED_BY,;
	t1.XTICK, t1.IMPORTED, t1.HIGHLITE,  t2.NOTES, t2.Start, t2.NO_LOTS, t2.MACH_TYPE, ;
	t2.FLOOR_PLAN, t2.ELEC_SPEC, t2.Paint, t2.INK, t2.LEXAN, t2.HEAD_QTY, t2.HEAD_TYPE, t2.Reference, t2.REF_JOBS, ;
	t2.PPM, t2.DIE_TYPE, t2.DIE_NUM, t2.PROJ_ENG, t2.CUST_NAME, t2.Close, t2.SHIP, t2.ENG_ELEC, ;
	t2.ENG_MECH, t2.MS_REWORK, t2.GRAPHICS, t2.ASM_PULL, t2.ASM_REG, t2.ASM_ELEC, t2.ASM_DEBUG,;
	t2.ASM_REWORK, t2.ASM_ACCEPT, t2.ASM_CRATE, t2.ASSY_REL, t2.SKIRTING, t2.LAYOUT, ;
	t2.Descrip, t2.DWG, t2.ASM_TOOL,  t2.COMPLAI, t2.REV_D, t2.DATE_REL,  ;
	t2.NOTAPPLIC ;
	from eng_jobs t1 ;
	Left Outer Join eng_xref t2 On t2.JOB_NO = t1.JOB_NO;
	WHERE .F.;
	INTO Cursor junk Readwrite  
      Select junk
      Append From Dbf('eng_jobs')
      Append From Dbf('eng_xref')   
   select * from junk  order by job_no, draw_no, ball_no, layout INTO cursor junkresult
   BROWSE  NORMAL  && just to see data
   COPY TO mycsv TYPE csv
if you have something that can improve it, let me know this run really fast and the junkresult now have all the data from each table and ordered, i will probably remove some fields that think maybe won't be needed but this is what solved my problem and they way they want and this against my opinion.

Thanks a lot to all you for trying your best, always open for new suggestion as my plan is to continue learning, my idea will be going from VFP to .NET but i don't find the way to learn easily, neither how to convert from vfp 9.0 to .NET, as i will need to learn either C#, VB.NET, C++ or F#., if i am not good in VFP how i am going to be able to go from vfp to .net ? i will be very happy if to do at least forms, grid and compile to exe files in .NET at least using C# or C++
 
>they want to provide this data to a software vendor, to see if they can utilize this data in the new application

Well, as I am a software vendor, I can tell you the simplest thing you then should do is provide two CSV files.

And mainly explain what the tables and fields are meaning.

It's not helpful that you can join data, that can be done once I read this into SQL Server, eg, too. Even without the 2GB restriction.

More general, yes CSV is a good data exchange format. That doesn't mean you put all data into one CSV file :)
Let me assume your boss knows even less than you about databases and data normalization, well, then leave that to the experts, your contracted software vendor, it's their expertise, as your current app handles this data in two files, that's what they should get and make the best of in the future system, that's not your nor your bosses job and decision to make.

Just one thing to note: When you apply a where clause about t1 and t2 fields on an outer join, you effectively make that an inner join, as any field of t1 or t2 doesn't compare to NULL, so any record with no match is not in the final table and you have less data in the one excel sheet or csv than you have in the single tables.

Bye, Olaf.


Olaf Doschke Software Engineering
 
Ernesto, I'm not sure that you have solved the problem. Your main output file is now a CSV. That's fine. But that's not a solution to the problem of your original DBF being larger than 2 GB. The point is that if the DBF was more than 2 GB, the CSV might be as well. And it's not just DBFs that are subject to the 2 GB limit. Any file processed by VFP has the same limitation, including text files.

In general, a CSV will usually be smaller than a DBF containing the same data. That's because your character fields won't be padded with extra spaces to make up the width of the fields. But as your data grows, you might well reach a point where that no longer applies. It comes back once more to the issue of design.

If your new code is working, that's fine. But keep in mind that the problem might arise again in the future.

Mike



__________________________________
Mike Lewis (Edinburgh, Scotland)

Visual FoxPro articles, tips and downloads
 
It's no solution, as it skips data just like inner joins does.

I repeat my recommendation in short: Keep it in two single DBFs exported to two CSV files.

If the vendor can't work with DBFs (to me that translates to not willing to even use the VFP OLEDB PRovider to query data), then that's the simplest and easiest to conserve all data, load it into any database server and then go from there about restructuring it.

Creating CSV has it's problems with any FPT related fields (memo, blob, ...), so you might use my faq184-7770

Bye, Olaf.

Olaf Doschke Software Engineering
 
Ok Mike and Olaf,
I don't know much but boss knows less even though, he pretend to know more than me in front of everyone, i don't want to talk about Boss but he told me we just need one master file, i told him you should have two tables or i can create two csv file or two excel file, let the vendor dealing with that, guess what ? that vendor is imaginary as so far as i know, i don't even know who is it or what kind of software will be, the only thing i know is he mentioned the back end is mssql, i told the same thing you just need two files but i do what superior ask, does not want to listen fine with me, later most likely, i will have to do it again.

respect to the "join", well as you can see this is an empty cursor, just with the structure that have all the fields from the eng_jobs and eng_xref w/o having in there the duplicated or common fields from each table, so the data went in as it was in eng_jobs and also as it was in eng_xref.

Now if i don't understand the point of the "JOIN" even for an empty cursor that is different, but i understand if i use the JOIN in the SELECT SQL to send data from both tables to the resulted "junkmaster", i now i won't have all the records combined and lot of them will be missing.

Note
sometimes my comprehension don't help me much, as my english it is very poor. i got here to USA when i was about 35 years old, so sometimes it is hard for me to interpret somethings correctly, sorry for that, i know you guys trying your best as i see you trying to explain things in different ways, i am not smart either, just simply i never give up.
again thanks a lot and if there any other thing to make me understand, please you are welcome for it and really appreciate your suggestions and knowledge.
 
Two appends won't join data, but union it. There's no data loss this way, but it's worse than two CSV files. In the upper portion of your result file you have data of eng_jobs and empty fields eng_xref records could fill in - but won't. That still costs disc space. In the lower portion, you have eng_xref data and also empty fields where eng_jobs data would fit. This overall costs more disc space than the two separate files. The second append doesn't start at row one, detects the empty fields there could be used, instead you end up with reccount("eng_job")+reccount("eng_xref") records.

Your earlier join on the same job_no can cause even more disc space usage and an overflow exceeding 2GB, most likely because both tables have several rows for the same job_no. A join will then combine any combination, eg you have 2 rows eng_job with the same job_no and 4 rows in eng_xref. you expect to get 4 overall, but each record of eng_job is combined with all 4 records of enf_xref. So overall 8 records.

And this is just two of several cases that might be the reason for what you get. Without data, the meaning of tables and fields it will be hard to explain why you get what you get. You still didn't tell anything about that.

What's okay with your latest code is that as long as the data now fits you at least have all data in the single file. You don't put data belonging together into the same row, but that can also later be done in mssql. It'd still likely cause misconceptions as the vendor or future vendor you intend to hire or not will see this data in one table and have no clue it was split before. So an essential information is lost, which data comes from where. If it's much data you might notice a difference in the top and bottom rows, but in mssql you typically only browse top rows of a table, so that is hidden.

If you only get this approved that way, you should do yourself and the future software vendor helping you with the conversion a favor and note down the number of records from eng_jobs and eng_xref. Otherwise it's easily overlooked, that this isn't joined data.

It'll perhaps become better visible by your final ordering of data, but that also makes it harder to finally split again or correclty join the data.

Bye, Olaf.

Olaf Doschke Software Engineering
 
Olaf very well explained and understood as well.

yes, i have the concept very clear, that two appends won't join the data only thing it does is been in the same table w/o any linking to each other, that si the reason of that code, yes with the code i have i filled in the resulted cursor with data from both tables and the fields accordingly has been filled, of course on fields that belongs to eng_jobs, data from eng_xref won't get filled and so on.but the corresponding fields will be filled from each tables but not joins at all.

yes i can insist on having both tables in separate csv but of course as you said it will be easily overlooked, that this isn't joined data by time been.
Thanks
Ernesto
 
I guess you inderstood the inverse of my concern. Data put together that way into one csv will be imported into one MSSQL table and that way it does NOT get obvisou this isn't really joined. When you keep two separate CSV fiels and import them into two separate mssql tables, then it will be obvious this either can stay this way or even be split otheriwse or combined. A data anylyisis process looking for smae field names will likely identify that there is a relationship. So it will be better to insist on two CSV files. you only get this approved, if your boss only looks shallow and sees you have fulfilled his requirement to combine the data. But it's stupid.

Bye, Olaf.

Olaf Doschke Software Engineering
 
Status
Not open for further replies.

Part and Inventory Search

Sponsor

Back
Top