Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

  • Congratulations SkipVought on being selected by the Tek-Tips community for having the most helpful posts in the forums last week. Way to Go!

How to find the last input by product in a table which contains different products 4

Status
Not open for further replies.

german12

Programmer
Nov 12, 2001
563
0
16
DE
The table looks like this

Prdukt_ealipu.jpg


The columns from left to right are
product, price, date of input, vendor
The list is sorted by product and their date of input


I would like to find in this sample
only the products with their last input (the youngest input by product)

In this sample it would be
record no 3 (Hefe)
record no 7 (Heringssalat)
record no 9 (Honig)

and only that must to be shown in a new view.

Thanks for the correct select/scan command.

Klaus




Peace worldwide - it starts here...
 
If you are able to use SQL in FoxPro
Code:
create or replace table MY_TABLE (
  NAME char(25),
  PRICE decimal(5, 2),
  PDATE date
)
;

insert into MY_TABLE 
values
    ('HEFE 42 G', 0.09, '2022-03-31'),
    ('HEFE 42 G', 0.13, '2022-09-03'),
    ('HEFE 42 G', 0.15, '2023-07-26'),
    ('HERINGSALAT ROT MSC', 1.09, '2022-03-29'),
    ('HERINGSALAT ROT MSC', 1.09, '2022-05-02'),
    ('HERINGSALAT ROT MSC', 1.19, '2022-05-31'),
    ('HERINGSALAT ROT MSC', 1.29, '2023-09-02'),    
    ('HONIG FEINCREMIG GLAS', 2.29, '2022-03-29'),
    ('HONIG FEINCREMIG GLAS', 2.99, '2022-08-25')
;

ksnip_20230910-003740_azmt8s.png


then a select like this would work

Code:
with LAST_ARTICLE_DATE(NAME, PDATE) as (
  select NAME, max(PDATE)
  from MY_TABLE
  group by NAME
)
select
  t1.NAME,
  t1.PRICE,
  t1.PDATE
from
  MY_TABLE t1
  inner join 
  LAST_ARTICLE_DATE t2
  on t1.NAME = t2.NAME and t1.PDATE = t2.PDATE
;
ksnip_20230910-004049_qrjtdr.png
 
Translated to capabilities of VFP SQL:

Code:
Select * From Product_Prices;
Inner join (select NAME, max(PDATE) as max_pdate;
  from Product_Prices;
  group by NAME) as Latest_Prices;
On Product_Prices.Name = latest_prices.Name AND Product_Prices.Pdate = Latest_Prices.max_pdate

To avoid some problems and have a better performance, it is advisable to not store the data you have in this way. There should be a products table that has an id per productname, then a vendors table with id and vendor name. When you group by an id, no matter if it is a 4 byte integer or a longer guid, that's faster done than grouping by a long char field like the name is. besides, it's in general always advisable to store your data in a normalized database design.

our product_prices table then has a productid instead of productname and a vendorid instead of vendorname. For (humanreadable) display you always can join in the names and show them instead of ids, but ids are better performing and also don't have problems with comparisons in different collations and codepages chracter fields could have. It simple takes less times to group by data in less bytes, to make the explanation short.

Then this becomes your query:
Code:
Select ;
Products.name as Product_Name,;
Product_Prices.Price,;
Product_Prices.Price_Date,;
Vendors.Name as Vendor_Name;
From Product_Prices;
Inner join (select product_id, vendor_id, max(price_date) as Max_price_date;
  from Product_Prices;
  group by product_id, vendor_id) as Latest_Prices;
On Product_Prices.product_id = latest_prices.product_id ;
AND Product_Prices.vendor_id = latest_prices.vendor_id ; 
AND Product_Prices.price_date = Latest_Prices.Max_price_date;
Left Join Products on Product_Prices.Product_id = Products.Id;
Left Join Vendors On Product_Prices.Vendor_id = Vendors.Id

Or, if you're only interested in the latest price and remove all records of other vendors with an older price, then don't group by vendors, only group by products:
Code:
Select ;
Products.name as Product_Name,;
Product_Prices.Price,;
Product_Prices.Price_Date,;
Vendors.Name as Vendor_Name;
From Product_Prices;
Inner join (select product_id, max(price_date) as Max_price_date;
  from Product_Prices;
  group by product_id) as Latest_Prices;
On Product_Prices.product_id = latest_prices.product_id ;
AND Product_Prices.price_date = Latest_Prices.Max_price_date;
Left Join Products on Product_Prices.Product_id = Products.Id;
Left Join Vendors On Product_Prices.Vendor_id = Vendors.Id

Another idea would be to store the latest price you know of a product per vendor in a table current_vendor_prices. And store historical prices separately in an archive table product_price_history. Both tables have the same structure product_id, vendor_id, price and price_date. And whenever you have a newer price for a product and vendor you take the record you have in current_vendor_prices and add it to product_price_history, then update the price and date in the current_vendor_prices table and use that table instead of a query.

So just using a current_vendor_prices will show you the result as per the first query without actually needing to query, you just eed to join in the product and vendornames, but not need to group by anything, as you only have latest prices records anyway. And if you then only want the latest prices independent from the vendor, you can still use a group by query again:

Code:
Select ;
Products.name as Product_Name,;
Product_Prices.Price,;
Product_Prices.Price_Date,;
Vendors.Name as Vendor_Name;
From Current_Product_Prices;
Inner join (Select Product_id, MAX(price_date) as Max_price_date;
  From Current_Product_Prices;
  Group By Product_Id) as Latest_Prices;
On Product_Prices.Product_id = Latest_Prices.Product_id ;
And Product_Prices.Price_date = Latest_Prices.Max_price_date;
Left Join Products on Product_Prices.Product_id = Products.Id;
Left Join Vendors On Product_Prices.Vendor_id = Vendors.Id

And again, if you're always only interested in this list independent from vendors, you can only store those records in Current_Product_Prices and whenever you have a new price input, no matter from which vendor, you store the current record in the history table and update, price, price date and vendor_id in the current price record.

It's a matter of preferences and taste how exactly you structure your data, but at a minimum keep separate tables for the separate lists, like vendors and products, and in tables combining data use IDs as references to these basic data lists. Without deep-diving into database design theory you can easily think of these separate lists that make sense and which become providers of names of vendors, products, etc. for human readable display on the one side and simpler, faster, not error-prone processing by integer IDs or guids on the other side.

Chriss
 
On the other hand, when the table "is sorted by product and their date of input" as stated above, then it could be done without SQL too.
Simply read the table in a loop line by line and if the name change, then print the previous line.
 
Klaus,

if you like the idea of a current price list, no matter if it includes only one record per product or per combination of product and vendor, then one more interesting topic will be an update trigger you can do in a DBC. That will run before the actual update is done and so in that moment, you can store the current record in the history table and then simply return .t. to allow the update.

Here's a little demonstration of that using a copy of the Northwind database:
Code:
* before runing this, create a new directory and change into it with CD 
Copy File (_samples+"Northwind\*.*") To .\*.*

* open the database copy exclusive
Open Database .\northwind.dbc exclusive

Use products
Copy To products_history.dbf DATABASE northwind for .f. && same structure, but empty
Alter Table products_history alter column productid int Add Column Until T
Close Tables all

Text To lcTriggercode noshow    
    Procedure Products_UpdateDelete()
       
       Local CurrentID
       CurrentID = productid
       
       Insert Into products_history Select *, Datetime() as Until from products where productid = m.CurrentID
       Return .T.
    EndProc
EndText

StrToFile(lcTriggercode,GetEnv("TEMP")+'\triggers.txt',.f.)

Append Procedures from (GetEnv("TEMP")+'\triggers.txt')
Compile Database northwind

Delete Trigger On Products for Update
Delete Trigger On Products for Delete
Create Trigger on Products for Update as Products_UpdateDelete()
Create Trigger on Products for Delete as Products_UpdateDelete()

Close Databases All
Open Database .\northwind.dbc Shared
Set Multilocks on
CursorSetProp("Buffering",5,0)
Update products set unitsinstock=unitsinstock-3 where productid=26

Select products_history
Browse fields productid, productname, unitsinstock, until

This will a) copy the northwind database from the VFP samples subdirectory into the current directory (therefore first create a new directory and change into it), then adds a table product_history based on products with just an additional field "until" that will store the datetime() until the record was valid when a products record is updated or deleted.

The interesting code mainly is the update trigger, you can also modify a database and go into the proecedures manually and then code in there, no need to understand the append procedures command.

Any change you make to any product record now will trigger the routine in the dbc and that will copy the current record into history with the current datetime put into the "until" field. That denotes until which datetime that record was this way. For comparison you can verify that the products record in the original northwind database has the value that was stored into products_history.

In this case you could use it to get a history of prices, units unitsinstock, but also any other fields of the products table. And the trigger code is quite short. Nice, isn't it?

It's important to notice one detail: When the trigger starts you don't get hold of the old value, so at best you have to turn on buffering, which my demo program does. A trigger always starts with the workarea that caused it, that means it has the products.dbf open in it and it is on the record that is affected by the insert/update or delete - whichever trigger type is used. With or without buffering, you access the already changed record in the current workarea, but the dbf still has the old record, because your code now mainly decides whether that change will be allowed or not by your return value. To get at the dbf fields values the select from products does that, as the dbf still has the old values, even though a scatter would be simpler and suffice but it wouldn't read the old values and we need the old values for the history.

Yet another reason to always work with buffering, but that's not the point. You see that it's actually only a few lines of code in triggers which maintains the history of data and allows you to concentrate on the latest current data only in your main tables. Any "historical" values can be queried from the history table. That's the main point, avoid putting data you usually want to filter out, just store what you actually use mainly and put anything else separately. That's how I would handle any kind of data instead of mangling old and new data in one table and then always needing filtering. The "historic" data is rarely of interest, maybe to query price inflation but usually, you want to only know the latest price you have.

There are more advanced ways, to only store the actually changed fields, for example, but this way, having a table copy with just the added "until" field is enough for simple purposes.

Chriss
 
Well, and to mention the best benefit of all of this:

Once these triggers are set up you only maintain the latest prices in your main table and get the rest done automatically.

It just works, no matter if you use UPDATEs, REPLACEs or even open up a BROWSE window of the dbf and change a price and vendor there, the trigger automatically will move the current record into history.

If you use table buffering it needs a TABLEUPDATE() to trigger the update trigger, but that's no extra effort, that just is needed anyway when working with buffering.

And, well, of course, I assume so far you did always only add records by append blank or insert. Well, now if you have a product and vendor already in your list, you have to update the price, not add a new record, so at best you make the combination of productid and vendorid a candiadate index that disallows two records of the same combination and get used to maintaining prices by updating existing records unless there is a new combination of product and vendor for which you of course still need to insert a first price. Anyway, as the old data seems to be valuable to you anyway, the history stores that, but there is no work for having the latest prices list, that's simply in the main dbf.

Chriss
 
Mikrom and Chriss,
Thank you very much for your valuable contributions, each of which deserves and has received a star from my side.
It's not just the program code that helps me a lot. The recommendations for better file structure, storage of historical data and the use of triggers are also interesting new things that I didn't know before. I'll go through all of this in detail again after our vacation.
Thanks again.

Klaus

Peace worldwide - it starts here...
 
Hallo Klaus,

das würde auch gehen:

SELECT NAME, PRICE, MAX( PDATE ) AS MAX_PDATE FROM myTable GROUP BY NAME INTO CURSOR myCursor

Grüße
Thomas
 
Sorry, Thomas, but think about what PRICE you get with that.

VFP9 would not execute this and throw the error "GROUP BY clause is invalid.", you can SET SQLBEHAVIOR 70 to overcome this, but look into an example:
Code:
Create Cursor myCursor (name c(20), price y, pdate d)
Insert into myCursor values ('Milk',$0.24,Date(1905,6,1))
Insert into myCursor values ('Milk',$2.24,Date(2023,9,20))
Insert into myCursor values ('Milk',$0.69,Date(1973,9,20))
SET ENGINEBEHAVIOR 70
SELECT NAME, PRICE, MAX( PDATE ) AS MAX_PDATE FROM myCursor GROUP BY NAME INTO CURSOR result
BROWSE
I see a record with the right max date, from 2023, but the price of $0.69 from the record of 1973.

That's not what Klaus needs. To get the full record of a product grouped by names and with max pdate, you first need to determine that list of only name and max_pdate and then join back all other fields of the table, otherwise you don't always get the data from the same record that has the max(pdate).

It also throws an error with VFPs engine behaviors 80 or 90, only VFP7 or older versions allow this group by and that's not a feature, it's a bug that was resolved with VFP8.

Chriss
 
Hello Chris,

you are absolutely right, this only works with SET ENGINEBEHAVIOR 70.

Regards
Thomas
 
Thomas, even if that query runs with SET ENGINEBEHAVIOR 7, the results you get are likely to be wrong. The Price field is chosen randomly from the group (though, in VFP, "randomly" here means from the last member) and may not come from the record in the group withe maximum date. That's why the behavior was changed.

Tamar
 
Tamar said:
"randomly" here means from the last member

That's where Thomas has one point. That is undocumented, but known, so when you arrange data in chronological order, which it normally has, and look for the price at the max date, then you get the right record. The point is that this is a requirement important to know and to comply with, to get correct results. And then I'm not sure what exactly happens with the latest price per vendor and per other group defining values in random order in the table. So I'd rather not depend on order of data for the right query result.

But if I'd make the right order the main point then I'd use an index for that, and I would maybe go back to the other idea of mikrom:
mikrom said:
When the table "is sorted by product and their date of input" as stated above, then...read the table in a loop line by line and if the name change, then print the previous line.
I don't know, but I doubt using the way VFP7 allows wrong group by is faster than doing it that way with a scan, so I don't see fit for using the enginebehavior 70 to get something done unconventionally.

I'd recommend maintaining the current prices and having an archive of the price history by using triggers, anyway. Even if it has more effort in programming, it gives you what you need with neither SQL nor scans to produce that result, just triggers that maintain data in that form at all times. (Yes, and they use code whenever data changes, but then you have what you need with no more effort).

Chriss
 
Back from vacation (Baltic Sea) I would like to thank everyone who helped me
I tested the SQL statements from mikrom and Chriss and was very happy with the result.
There were no problems with VFP 9.
I was also convinced by the information about better normalization of data.
I will definitely pay more attention to this in the future.

Thanks - I'm glad to know this forum.


Klaus

Peace worldwide - it starts here...
 
Hello Tamar
Hello Chris,

thank you for your comments and again something learned.

Thomas
 
With a properly sorted table there is no need to use SQL for a simple report like you need.
There is a technique called "Control Break Processing" or in German known as "Gruppenwechsel" or "Normierte Programmierung". With it you can create more sophisticated reports with totals subtotals
In the past i posted some examples in several forums here on Tek-Tips:
 
For example if we have this CSV file with sorted data

sorted_table.txt
Code:
HEFE 42 G;0.09;2022-03-31
HEFE 42 G;0.13;2022-09-03
HEFE 42 G;0.15;2023-07-26
HERINGSALAT ROT MSC;1.09;2022-03-29
HERINGSALAT ROT MSC;1.09;2022-05-02
HERINGSALAT ROT MSC;1.19;2022-05-31
HERINGSALAT ROT MSC;1.29;2023-09-02   
HONIG FEINCREMIG GLAS;2.29;2022-03-29
HONIG FEINCREMIG GLAS;2.99;2022-08-25

then this simple loop will do the job
sorted_table_report.py
Code:
# to get the last input of the product,
# the table need to be sorted by name and date

table = open("sorted_table.txt")

line_nr = 0
for line in table:
  line_nr = line_nr + 1
  # get fields
  name, price, date = line[:-1].split(";")

  # if not on first line and the name changed,
  # then print the previous line
  if line_nr > 1 and name != name_saved:
    print("%s; %s; %s" % (name_saved, price_saved, date_saved))

  # save fields
  name_saved = name
  price_saved = price
  date_saved = date

# at end print last record, if the file was not empty
if line_nr > 0:
  print("%s; %s; %s" % (name_saved, price_saved, date_saved))
#
print("Done. %d lines processed." % line_nr)

Instead of pseudocode I used rather python and this is the output of the script
Code:
$ python3 sorted_table_report.py
HEFE 42 G; 0.15; 2023-07-26
HERINGSALAT ROT MSC; 1.29; 2023-09-02
HONIG FEINCREMIG GLAS; 2.99; 2022-08-25
Done. 9 lines processed.



 
Status
Not open for further replies.

Part and Inventory Search

Sponsor

Back
Top