archiving data

Nigel Gomm · Jul 26, 2021

Background.
500 or so individual customer databases on cloud hosted mysql(Mariadb) servers all of identical structure. One of the tables in each database holds binarydata (as blob) - JPGs & PDFs for the most part; attached to journal entries showing user activity stretching back years.

The Problem.
Those binarydata tables are getting very very large (>100gb) and i'm eating the cost of ever increasing disk capacity; most of the PDFs especially will probably never be viewed again but it's that "probably" is the problem - they have to be available because sometimes a user needs to look back over the history of their dealings with their customers (landlords and tenants in this instance).
I'd like to archive old files to cheaper online storage (e.g.

https://aws.amazon.com/glacier/).

There are existing connectors for mysql->AWS but they operate on a table level and i'd prefer to work record by record.

The solution.
So i'm planning on running housekeeping jobs that scan for older records, keep the record with filename but move the file itself to AWS or whatever and mark the record as archived. If the user later tries to access that file they see a placeholder and i automatically trigger the background retrieval of the file and its re-insertion into the table.

Any better ideas or similar experiences?

n

Chris Miller · Jul 26, 2021

I never was in your position but what you plan sounds plausible. I'd store a ref like a URL in the original databases, not just the file name. Maybe write your own service checking out a file from some cloud storage, perhaps decrypt it) and provide it back.

Chriss

Nigel Gomm · Jul 26, 2021

Chris,

I've started sketching exactly that...

n

Mike Lewis · Jul 26, 2021

Nigel,

I had a similar requirement once. I needed to store copies (PDFs) of invoices going back many years. The newer invoices had to be easily accessible, but the older they were, the less likely they would be needed.

I took a similar approach to what you are suggesting. I set up a separate directory for each year's invoices. In the main table, I stored the invoice number (which formed part of the PDF filename) and the invoice date (which pointed to the directory for the relevant year).

At the end of each year's accounting period, I moved the oldest directory to the archive, leaving the last five years on the local server. I set a flag against each of the relevant invoices to say they had been archived. That flag, together with the invoice number and date, enabled me to uniquely locate the PDF.

In fact, I went a bit further. In the main table, I also stored certain search terms: customer name^[1], invoice amount, etc. That enabled the user to locate an invoice even if they didn't know the invoice number or date.

This was all about ten years ago. As far as I know, it is still working - at least, the client hasn't complained. So, yes, your plan looks feasible.

[1] Why store the customer name rather than the customer ID? Because the invoices has to be retained after the customer might have been deleted from the customer table.

Mike

__________________________________
Mike Lewis (Edinburgh, Scotland)

Visual FoxPro articles, tips and downloads

Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

archiving data

Nigel Gomm

Programmer

Chris Miller

Programmer

Nigel Gomm

Programmer

Mike Lewis

Programmer

Similar threads

Part and Inventory Search

Sponsor