Metering Data Volume

stanlyn · Sep 26, 2018

Hi,

Any ideas on best approach on collecting as much data through put and/or meter it as a user uses the system. The system is a document management system where the data and images are stored in SQL remotely. Our new billing model will based on a bucket system, where the user purchases a bucket of X size to be consumed within a defined period. Therefore, I need to journal outgoing query sizes and their returned result sizes as well as image sizes that are actually returned.

Also wondering if there is some sort of control/software that I can place in the app where all data flowing in and out of the app will be passing thru it, and of-course be able to communicate with it.

This topic is more about discussing pros and cons of how to do this, so please, if you see an issue, lets talk about it...

Thanks,
Stanley

stanlyn · Sep 26, 2018

Also, how do I get the name of the underlying table like (_4JW0IPKO6.TMP) that was created from a command like this...
SELECT * FROM users INTO cursor 'curUsers', as I can then get the size of the .tmp file.

Or, is there an easy way to get the size of the in-memory cursor?

Thanks,
Stanley

Mike Gagnon · Sep 26, 2018

SELECT * FROM pers INTO CURSOR tmp
? DBF("tmp")
Will give you the name of the actual tmp file

Mike Gagnon

If you want to get the best response to a question, please check out FAQ184-2483 first.

Olaf Doschke · Sep 26, 2018

I'd say if you use a cloud provider you get traffic information in a log file and so all you'd need to find out is where that log is. Drive/storage usage should also be no problem.

In regard of cursor size, well: recsize()*reccount() gives you the size of all constant width fields if you have memo/general/blob fields you need to iterate them.

The file name of a cursor simply is dbf(alias), so in this case DBF('curUsers'). You'll notice a file with a very similar name, one off in the last letter. This is related, but not the cursor itself.

Look into %TEMP% with the order set to descending date/time before you star VFP. Notice when you start VFP you get two TMP files, that's a DBC, it'll store all DBC features of cursors, like default values or long field names. Cursors are tables of this DBC. But the cursors themselves stay in RAM. The next difficulty, even if that files would measure cursor size, VFP has these files in exclusive access internally, so you get no second handle to act on them.

If you configure a DSN you can specify a log file for long running queries but also a log file for performance monitoring data, which includes columns for bytes sent and read. this log file is a text file and can, of course, be manipulated by your customer, but it would have the information you need.

Bye, Olaf.

Olaf Doschke Software Engineering

https://www.doschke.name

stanlyn · Sep 26, 2018

Hi mgagnon,

Yes, that is what I thought and tried, but is not working for me... It is showing the name of the source table and NOT the cursor...

SELECT * FROM users INTO CURSOR curTest
?DBF('curTest') reports "d:\folder1\data\users.dbf"

Thanks,
Stanley

VFP9sp2

stanlyn · Sep 26, 2018

Olaf Doschke · Sep 27, 2018

That's happening when the cursor is just a filter to the original DBF.

Instead do:

Code:

SELECT * FROM users INTO CURSOR crsTest NOFILTER
? DBF('crsTest')

Besides, FSIZE is field size in the default settings of VFP, you need to SET COMPATIBLE ON (an unfortunate setting you should keep off) to let it get you file size. It won't find the TMP file, as it doesn't get written to disc.

It will never happen to a cursor you fetch from a remote server via SQLExec() or remote view or cursoradapter.

Bye, Olaf.

Olaf Doschke Software Engineering

https://www.doschke.name

Koen Piller · Sep 27, 2018

Stanly,

I get on

Code:

SELECT * FROM naw INTO CURSOR tmp
? DBF("tmp")

C:\users\koenp\appdata\local\00000QLFT001N.TMP

Apart from that, why do you need the internal name? You can always address a cursor with its given alias, in this case "tmp" (Donot get confused, the into cursor tmp here tmp is the alias.

So maybe the example

Code:

SELECT * FROM naw INTO CURSOR myCursor
? DBF("myCursor")

is not so confused since here the alias is myCursor and .TMP the extension of the cursor.

Regards,

koen

Olaf Doschke · Sep 27, 2018

Koen, the idea is to get file size, but try [tt]FILE(DBF("myCursor"))[/tt], this will be .F., so Stanlyn doesn't have that easy way to determine cursor byte size. And recsize()*reccount() will not reflect the size of memo fields (and other types stored in the secondary fpt file you also won't find on hdd).

you could alsways do

Code:

SELECT alias
COPY TO mydata TYPE DBF WITH CDX

And then determine file size, but you do what VFP avoids just to get the file size, you write the cursor data to hdd.

Bye, Olaf.

Olaf Doschke Software Engineering

https://www.doschke.name

stanlyn · Sep 27, 2018

Hi,

Olaf, adding the nofilter points to a non-existent file. The tmp file it points to is NOT on the disk (see screenshot). Explorer is sorted by name. You can also the command window commands I've been running along with the results that ? prints. Note that file() is also reporting the .tmp file is not found.

Stanley

A better image can be found here as engineering.com is changing the res to much...

http://stanlyn.com/public/screenshot1.png

Tore Bleken · Sep 27, 2018

The file reported by DBF() is not necessary a "real" file.

stanlyn · Sep 27, 2018

Then, if that cannot be trusted, then copying it to a dbf looks like one way. Is there any other way, akin to what flush does with updated records still not written to disk?

stanlyn · Sep 27, 2018

The strange behavior of copying curTest using "copy to mydata foxplus" destroys its source cursor "curTest". Why is that, since it is only copying and not moving?

If it has to close it before it can be copied, then isn't the handle gone therefore stopping a copy? It appears to just rename it without anything going to disk as I see no references to either curTest, the ***.tmp from curTest as reported by dbf(), or the "copy to" table or cursor?

I was hoping to avoid copying the structure and populate it by iterating, which may be the only way.

Mike Lewis · Sep 28, 2018

I might have missed something here (I haven't been following this thread in detail). But, instead of SELECTing into a cursor, why don't you SELECT into a physical table? In other words, [tt]INTO TABLE XXX[/tt] rather than [tt]INTO CURSOR xxx[/tt]. There would be no effect in performance, and you will be sure that the physical table will remain in existence until you explicitly delete it.

Mike

__________________________________
Mike Lewis (Edinburgh, Scotland)

Visual FoxPro articles, tips and downloads

stanlyn · Sep 28, 2018

Thanks Mike, that should work, will try tomorrow...

Olaf Doschke · Sep 28, 2018

I don't know why you COPY TO FOXPOLUS when I suggested TYPE DBF.

Let me test. Ok, I see TYPE DBF doesn't work, there is no such type, and the reason is that's the default when you do COPY TO mydata you get a mydata.DBF and FPT (when necessary). The same works with TYPE FOXPLUS. From the resulting DBF/DBT files the DBT is larger than VFPs FPT in a case I tested, but creating the files doesn't change any workarea. In both cases, the cursor remains open. I don't know what you're doing, stanlyn, closing the form and datasession? Do you have a USE afterward?

And I only second Mike in the aspect of getting a real DBF right away, but again said, a cursor has a performance advantage as it is in memory, and as you say you get data from a remote database, there is no option of SQLEXEC() to generate a DBF file instead of a DBF file. Edit: I meant to say: ...there is no option of SQLEXEC to generate a DBF file instead of a cursor, of course.

If you go for what SQLEXEC or a cursoradapter or a remote view give you as a cursor, one other very general way would be determining SYS(1016) before and after the query.

Bye, Olaf.

Olaf Doschke Software Engineering

https://www.doschke.name

stanlyn · Sep 29, 2018

Code:

SELECT alias
COPY TO mydata TYPE DBF WITH CDX

Tried that, does not work, no type of dbf, however foxplus did create the cursor and when copying it using the copy to command, I get a zero size set of files. You have to close the table before it gets written to disk. Still, fsize() would not work either, so I did what I've done in the past and it works, but maybe a lot more overhead than I wanted. Here it is...

Code:

Clear
Set Safety Off

Set Procedure To d:\stanlyn\Server\programs\masterprocs.prg
Set Libr To d:\stanlyn\Server\misc\Phdbase.v50

If !Used('documents')
	Use 'documents' In 0
Endif

Select * From documents Into Table 'temp' WHERE RECNO() < 100
Use In Select('temp')

For x=1 To 2
	If x=1
		ext='dbf'
	Else
		ext='fpt'
	Endif

	lcFile = 'temp.' + ext

	aa = Adir(MyArray, lcFile)

	If aa > 0
		If x = 1
			lnDbfSize = MyArray(1,2)
			? lnDbfSize
		Else
			lnFptSize = MyArray(1,2)
			? lnFptSize 
		Endif
	Endif

	Erase lcFile
NEXT

?lnDbfSize + lnFptSize 

Set Safety On

Anyone know a way to do this completely in memory? SSDs do help, but an in-memory solution would be much faster, unless we use the nvme u2 drives, but thats a different story..

Thanks, Stanley

Tore Bleken · Sep 29, 2018

Back in the "old days" we used what we called a RAM Disk, a virtual drive in memory. Google RAMDISK and you will find many solutions.

Olaf Doschke · Sep 29, 2018

stanlyn said:
using the copy to command, I get a zero size set of files

I didn't, I got the files with size, but I made the cursor with a NOFILTER clause, which is important also to get the data separate and not just a filter.

Please simply do this, it queries all of browser.dbf, a table in Home() and then copies that to hdd:

Code:

LOCAL lcPathAndFileStem
lcPathAndFileStem = ADDBS(GETENV("TEMP"))+sys(2015)
Select * from browser into Cursor crsTemp Nofilter
* SELECT crsTemp && not necessary, as the newly generated cursor automatically is selected
Copy To (lcPathAndFileStem)
Adir(laDBFSize,lcPathAndFileStem+".*")
? laDBFSize[1,1] , laDBFSize[1,2]
? laDBFSize[2,1] , laDBFSize[2,2]
? laDBFSize[1,2] + laDBFSize[2,2]

A RAM disc would be the least overhead you can have, keep the time short and keep things in memory, it's unlikely your cursors get to the 2GB limit, but you need to reserve enough memory for the RAM disc to be able to copy out any cursor for measuring its size.

Another idea would be using INTO ARRAY to measure data size:

Code:

Select * from browser into Cursor crsTemp Nofilter
Select * from crsTemp into array laSize
 
Create Cursor crsArrayStore (mArray M)
Append Blank 
Save To Memo mArray ALL LIKE laSize
? Len(mArray)

The data SAVE creates is obviously much larger than the original cursor itself is, because the array name is repeated for every array element, so the name "LASIZE" appears reccount()*fcount() times, followed by a letter for the element type and then the value itself and a lot of spaces are used as separators of all this. It still would be a measure for the size of the cursor, you could account for the size of metadata.

And measuring the file sizes also measures unfair as it adds a header size not contributing to the data traffic but generated locally.

If I were you I wouldn't use any of this as it just means doing copies of what you have just to measure its size, you're degrading the performance of your application and it's still yet not acting on buffered data. You should go for the traffic measurements you can get from your web hoster/cloud provider and split your overall traffic into percentages you calculate by means of rough estimates like number of queries or reccount()*averagerecsize.

As this seems to be about document management the essential size of the traffic also seems to me to be in one memo/blob field. You could simply use the LEN() of that. Besides, when you would use MSSQL backend with filestream and file tables you could even simply measure the directory size of the directories MSSQL creates and manages.

Bye, Olaf.

Olaf Doschke Software Engineering

https://www.doschke.name

stanlyn · Sep 29, 2018

Olaf,

Looks like the best I can hope for is getting the image size coming from mssql blob (easy) and get the size of the result cursor from their query. I will pulling down the images on demand, and not when the query results are returned. There is a parent/child field that links the main table to the image table.

I see no way of going the isp way and separating each customers traffic in near real time as I'm accumulating this usage data into a separate sql non-metered table. What will be metered is query return size, image size if viewed, and query strings sent and results received. I will need this level of granularity to support a customers billing inquiry.

Currently the bottleneck is the ISP link (50gb). The query above for 100 records takes about 2 second to run and returns 78,002K for the dbf and 141,888K for the fpt. It also takes 2 seconds for 1000 records with 771,902K for the dbf and 1,441,728K for the fpt. 10,000 records takes 2.5 seconds and returns 7,710,902K for the dbf and 16,685,046K for the fpt. Finally, 100,000 takes about 13 seconds and returns 77,100,902K for dbf and 156,894,528K. My workstation is an i7 quad gen2 with ssds. The mssql server is on the same gb lan and is an i7 six core with nvme-u2 drives for sql and standard ssd for boot. Note there are no images being returned here, however the ocr'ed text is in the mix. The record count returned will be limited to maybe 100 per fetch.

Gotcha, my customers will have a full mix of machines, processors and cpu configs, that could have a dramatic effect on these numbers.

Tore,
I'll be testing the ram disk setup next...

Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

Metering Data Volume

Programmer

Programmer

Programmer

Programmer

Programmer

Programmer

Programmer

Programmer

Programmer

Programmer

Programmer

Programmer

Programmer

Programmer

Programmer

Programmer

Programmer

Programmer

Programmer

Programmer

Similar threads

Log in

Part and Inventory Search

Sponsor