Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

  • Congratulations SkipVought on being selected by the Tek-Tips community for having the most helpful posts in the forums last week. Way to Go!

Replication over EXTREMELY slow links 2

Status
Not open for further replies.

fs483

Technical User
Jul 7, 2002
977
0
0
CA
Hello,

I have a unique situation. One of my customer has 3 sites (soon to be 4) spread out between North America and West Africa. The geology dept needs to use a special 3D application for site surveys and modeling in the mining field. The application uses a SQL Server DB. Now, we have Citrix implemented for an accounting software (which also uses SQL) at the HQ. This works fine. But for our Geology dept that can't work so we are thinking of putting SQL servers at the remote offices and one larger one at HQ. The Geology dept does many manipulations and simulations, that cannot work in Citrix. I need to have the SQL servers from the remote site replicate it's DB to the HQ SQL server. Each remote site will have their own data and the HQ must have data of all the sites (in separate DBs). There just one HUGE catch, the links between the remote sites and HQ are 128kbits (satellite link with over 800ms ping times). One other thing, the usable bandwidth is less than 5kbits because each Citrix session uses about 20kbits with half a dozen users + local browsing, there isn't much left for anything else. After speaking to the head of Geology dept, they said the files that are modified can vary betwen 500MB to 1.5GB. What possible solutions do I have for DB replication (not realtime) but as frequent as possible ? I don't know much about SQL and how it works...

Thanks
akwong
 
Question, when the remote sites replicate up to HQ, do they have to merge their data with the HQ database or does HQ just need a copy of their database and those dbs can be separate?

I hope its the later. If so, you can use Snapshot replication to do a once-a-day replication. Snapshot is the best type to use when you have very slow or non-persistant connections on your network. Merge replication (if the data needs to be merged with HQ's db) is marginally more difficult to accomplish than a simple Snapshot replication so may or may not work over such a bad connection. You can give it a shot, but if the connections go down, you basically have to start from scratch anyway.

FYI: Snapshot replication replaces the previous "picture" of the database with an all new picture.

Merge replication inserts new data amongst the old data and tracks the different publishers (remote sites) with a unique identifier that includes a priority or timestamp. If two sites change the same piece of data, or if HQ and a remote site change the same piece, they "fight it out" for who did it first or who has precedence. You'll lose one piece of data and the other piece will be slipped into the database. If your connection breaks or times out during this process, there is the potential to lose both changes. Yikes!

BTW, you'll want to read up on Merge Replication to verify I have it right cause I use Transactional Rep, not Merge, and I can't remember if the GUID is a timestamp thing or a priority thing.

Does this help you out?



Catadmin - MCDBA, MCSA
"The only stupid question is the one that *wasn't* asked.
 
Thanks Catadmin. That does help. I'm waiting for more info from the software supplier and also waiting for the hardware (if approved). I hope to have the hardware (just for the HQ) in the next 2 weeks (40 000$ worth of hardware) and do some tests with a demo version of the application and a evaluation copy of SQL2005. If it doesn't work to our satisfaction, we'll have a large file server which is needed anyways. However, if everything goes well then we'll start ordering the hardware and software for the remote sites. This whole project is in the range of 200 000$ so we need to get everything just right.

From what I can see now, the DB received from the remote sites can overwrite the previous copy of the DB stored at HQ. The snapshot replication method would be the way to go and all DBs received from the remote sites would be stored individually. What worries me is the sheer size of the DBs from SQL. I have NO idea how large the DBs will be. If the DBs become too big, then we'll have to start doing merges and only send the updates. Does SQL do a good job at compression it's data ? Do we just send the transaction logs or what ?

The geology dept said that the 3D modeling is basically done by extracting large amounts of data and generates millions of small cubes which are all attached together to make a global picture. You can then click on any part of the model and get extra info from it. They are using a combination of MapInfo, DataMine and Excel currently but want to migrate towards DataShed. What he's telling me is that the output files are between 500 megs and 1.5 gig but I realized this morning that he's not talking about the DB size but the actual output file. I'm waiting for more info from the soft supplier right now but if the same DB is replicated then either the remote site user or the user at HQ can regenerate the same output file.

Thanks
akwong
 
Akwong,

I'm glad I could help. SQL's compression rate isn't the greatest on its own. As a matter of fact, my company just invested in SQL LiteSpeed to send compressed SQL backups across our WAN and restore them. That might be a better option for you than replication. A daily backup, file transfer and restore using a third party utility that speeds things up.

The caveat is that you have to have LiteSpeed on both ends of the data transfer to be able to restore the backed up db. Also, if you install Native Command Support, you don't have to redo all your backup jobs in the LiteSpeed "SQL", you can get LiteSpeed to support the jobs you already have scripted in T-SQL on the SQL Server machine.



Catadmin - MCDBA, MCSA
"The only stupid question is the one that *wasn't* asked.
 
Hi akwong. The database, especially if you are using a DataShed/Maxwell data model will almost definitely be +1Gb after a short time. Have you spoken to the guys at Maxwells (DataShed) about the replication? Send me an email if you would like a specific contact - details on our website.
 
Thanks Catadmin, I'll look into LiteSpeed also.

yetanotherjo : I have spoken to DataShed tech support and so far they haven't addressed the issue of low bandwidth. They have mentionned Snapshot, they asked if we can upgrade the line but the cost is WAY too high, they talked about entering data through Citrix into the HQ copy of DataShed but this doesn't allow the remote sites to generate their own views/modeling, they talked about using MS Access but I don't think MS Access is tough enough to handle that much data.

Since you seem familiar with DataShed/Maxwell, can you tell me if I'm on the right track ? The data that needs to be sent from the remote site to the HQ is only data stored inside SQL ? Then from either end, the users can generate the same models ?

thanks
akwong
 
Yes, I'm very familiar with them.

Did they suggest using MS Access for the main database/s?? You are probably better off with SQL Server from a database security/functionality point of view, although DS unfortunately still doesn't seem to make a lot of use of the SQL engine when crunching data.

Yes again, you need have the same SQL database at both sites if they're going to independently work with the same data. If day to day data management is being done on site, it makes more sense for the database to live there and be backed up to the HQ, especially if the Citrix link is prone to being slow/unstable. This does however mean you need to have at least one DS license at each site, which can up the cost alarmingly.

If you can get away with just passing a backup between sites rather than using merge replication, then this is a better option, and something like LiteSpeed sounds like the way to go.

I assume the modelling is being done in DataMine?
 
The license issue for DataShed is not a problem. We have already figured that in our costs for each site and if we can solve the replication problem, then we can proceed without problem. Is MS Access strong enough to handle huge amounts of Data ? I always though of MS Access for small applications...

Yes modelling is being done in DataMine.
 
Now I understand where your $200 000 budget is going lol!

You have a 1Gb database size limit in Access. Once it gets ~800Mb performance becomes painfully slow when running big queries. This is very easy to reach with a geological database, especially if it's got a normalised assay system (I assume you will be using a normalised assay system?).

The other issue with using Access for your main database is actually if you're using a normalised assay system - it's a pain to maintain as you need to have a separate assay table for each sample table....SQL is much cleaner.
 
where are you based, akwong? I'm in Perth - it's just after midnight so if I start talking gobbledegook, tell me to go away and come back after some sleep.
 
Akwong,

I don't recommend using Access for replication over such a huge distance. In fact, Access is usually only useful for small (read less than 4) connection groups. And the amount of data it can hold is relatively small. As Yetanotherjo posted, once it reaches a certain size, your functionality as well as speed goes down the tubes.

I recommend trying to upsize everyone to SQL (if you're going to stick with a Microsoft product). Get away from Access if your databases are really that large and still growing.



Catadmin - MCDBA, MCSA
"The only stupid question is the one that *wasn't* asked.
 
Yeah that's what I thought about Access. In my old programming days, I had designed a POS system running on Access and from my research, as soon as you have 6 or so connexions to the DB, it gets slow and data corruption can occur. SQL is the way to go. Now SQL Enterprise or SQL Standard ? SQL2000 OR SQL2005...

Normalised Essay ... Dunno... Will have to ask the Geology experts, I'm IT. My geology dude already has 300GB of data on an external HD sitting on his desk and that's a small sample of all the projects. We have data spread out on different PCs at the mine sites. There's no backup as it doesn't fit on the old servers there (which will also need to be replaced, another 60K$ easy). Having local SQL servers with LTO drives there will enable us to do backups locally. Then having that data sent to HQ enables us to analyze the data from our end. However having SQL server there increases the complexity for support.

I'll be ordering a Dual Xeon 3.2GB with 2*73GB HD Raid 1 for OS, 4 GB Ram (can be upgraded to 12GB), a SAN with 10 or 12 * 146GB 15KRPM HDs in Raid 5 with one hot spare. I would of prefer 300GB HDs but they only come in 10KRPM. With 10 * 146GB, I should have about 1.17TB of space. I can go up to 14 HDs in this SAN and if necessary (waiting for confirmation) add a second SAN if the SCSI ctrl can support it.

The HQ is located in North America so we are pretty far apart but I'm used to working all hours of the day. I offer tech support for over a dozen companies (a few in the mining industry) which are spread out across the globe. I'm on-call 24/7/365...

In your opinion, is it necessary to follow a SQL course to properly support DataShed ? We might be planning on having one or 2 people from the mine sites and from HQ go through an SQL course.
 
I don't know anything about DataShed. However, if these people are going to be querying SQL Server, I do advise having them go through a T-SQL course of some sort. The SQL Server developer courses use more T-SQL and discuss querying more than the Admin courses because the Admin courses are more for IT people maintaining the server. However, there might be just plan T-SQL Query courses which would be your best option.

Standard versus Enterprise? Enterprise offers more bells & whistles than Standard does. First of all, you can do clustering in Enterprise (great disaster recovery support), but you can't in Standard. Enterprise offers you the option of bigger DB sizes, more CPUs, and hyper-threading. I don't think Standard offers hyper-threading. Just because Enterprise offers these options, though, doesn't mean you'll want to buy Enterprise.

SQL Server 2000 Books Online is downloadable from: -- Open it up, go to Index and search with "SQL Server, editions listed" for more information on the editions.

SQL Server 2005 Books Online is downloadable from:

Or you can compare SQL Server 2005 edition features on the following link:


SQL Server 2000 versus SQL Server 2005? Well, SQL Server 2000 has all of the bugs identified and everyone knows everything about it. SQL Server 2005 has a lot of new stuff that hasn't been out long enough for people to find the good/bad/ugly about it. There are some great features (or so I've heard) to 2k5 that aren't available in 2000. And MS is going to dump support on 2000 in about 3-5 years, so the question for you and your company is: Is it worth it to invest in SQL Server 2000 at this late date if we're going to have to upgrade to SQL Server 2005 in the next few years anyway?

This is the point where you should start doing a lot of research. Go to Microsoft's site, by all means, but make sure to do a LOT of Googling too. Find out stuff from other sites that MS doesn't talk about on theirs. MS loves its products and doesn't want to diss them. In fact, anyone who posts on their newsgroups about complaints or problems usually gets pressured out or shouted down (something that happens on any software company's website, not just MS), so make sure you go to other sites to get both sides of the story.

There are some people who have already installed SQL Server 2005, so if you Google something like 'SQL Server 2005 Reviews', you should get a decent amount of hits. Once you've done your research on the pros and cons of each edition, you should have a better idea of which to buy.



Catadmin - MCDBA, MCSA
"The only stupid question is the one that *wasn't* asked.
 
Catadmin pretty much covered the SQL version issues. Last I heard DataShed hadn't done much, if any, testing on SQL 2005. I have succesfully linked a 2005 Express database to DS but not done much testing.

Is that 300Gb of data going into DataShed?

With regards training required, it depends who has what responsibilities. DataShed can be set up nicely for low level users, but the administrator/s need a good understanding of what is going on. Who is going to be maintaining the databases - will there be a dedicated DBA? If the geos are doing most of the data admin themselves, there will need to be at least one who has a basic understanding of SQL Server admin. T-SQL will be required for anyone who is going to have permission to set up extensions. A good grasp on Access is also required to manage the extensions database.

What training have DataShed recommeded?
 
akwong,
Just happened across this post as I had a question to post also...

I agree with all the suggestions above and don't really have anything to add as far as the SQL part goes.

But... I think everyone glossed over the EXTREMELY slow connection part. You said usable bandwidth is 5kbs? According to my math 5000 / 10 (appx bits to byte including overhead) = 500Bbs (Bytes per second). If you have a 500MB file to transfer, divide 500MB by 500Bytes and you're looking at 1 million seconds or ~11.57 days to transfer the file. Chances of a uninterrupted transfer would be slim at best. Compression would be paramount as would some system of updating rather than replacing the whole DB.

If I didn't understand correctly let me know but I've ran into this kind of thing before and it just didn't seem like much was said about the mechanics of the actual file transfer which turned out to be the bigger problem in my case.

Hope this helps,
Ken
 
Hmmm. I didn't notice the 5kbs. I was looking at the 128kbs. YIKES!

Ken's right, akwong. You're going to have to invest in upgrading network connectivity between the offices. I don't know what the chances are of getting a dedicated 56k or T1 line between them are, but you should look for some better solution to network connectivity before you upgrade your software and find out that things still don't work.



Catadmin - MCDBA, MCSA
"The only stupid question is the one that *wasn't* asked.
 
akwong,

Fat finger error.... that should have been "= 500Bps" not "= 500Bbs", sorry for any confusion.

If there's no possible cost effective way to increase your connection speed there are less-attractive (although workable) solutions. One would be to put a system in place to detach the DB's and copy them to a DVD-R and physically mail the disks back and forth and re-attach at the other end. With dual-layer you'd be good to ~9GB.

If a T1 is going to cost you > $1000 a month that would pay for lots of overnight or two day international posts. But then again these sites might be too remote for that even?

Wish I had more to offer but as Catadmin said, and I agree, you're biggest problem right now isn't the servers and SQL and DataShed... etc., it's getting a WAN set up to handle the bandwidth or coming up with another solution to get the data back and forth. The rest of it should fall into place once you have that foundation to work from.

Kind regards and good luck!
Ken
 
Upgrading links to T1 ha ha! Maybe in a centry ! I heard a 64k VSAT link runs at 1500$ USD per month with limited bandwidth + equipment retal and I think licensing fee. They don't even have electricity there... They use generators. I travelled to one of the mine sites a few months ago. After 14hour plane ride (with a pause at the airport), I had another 15 hours of off-road truck ride in mountain terrain! Other than the main city where the plane lands, there's not telephone service, no electricity and no running water ! It's ironic, the population of these areas are literally standing on ore/bauxite and other natural ressources but they are extremely poor. Electricity is out of order because people take the copper wires ! Well, at least the first one gets electrocuted and trips the breakers, then the next one leaves with the wires.

I also forgot to mention that on top of the citrix traffic, there's 4 VOIP lines which takes priority over all traffic. The 5kbits bandwidth is not constant for "official business", obviously at night there's usually less usage of the link, but during the day, there's probably even less. The problem is at night, the employees take the down time to chat on MSN with their families. We plan on putting an ISA server there (we're still running Proxy 2.0) and control bandwidth usage.

I'll keep all your suggestions in mind. Either way, I'm waiting for the PO for the hardware at HQ. Then we'll be able to do some testing. I just hope we sort everything out in time for my next trip there... Or else we'll have to plan for a second trip to each site.

Yetanotherjo : I asked about "normalized assay" and they said not yet but it's something they plan on having with DS in the future. In the DS package that we got, they do offer training to the IT (wasn't specified what kind). They also require onsite training at the mine site. I have another client that also had DS installed about a year ago but only at the remote site. The onsite IT ended up taking an extra SQL class but he's more Geology data analyst than IT. Not sure if he took the class to support SQL as an IT or to develop skills to use SQL for querys and such.
 
Sounds like you have lots of worms in that can, akwong. Good luck wrestling them. The other guys who've participated in this discussion are probably better of giving you suggestions regards data transfer. I guess the main consideration is how up to date does the HQ copy need to be? Obviously as recent as possible is always nice, but is eg. a regular weekly mail-out of a backup going to be sufficient?

With regards the database, to clarify my earlier waffle, I'd suggest the main power user/administrator at each site has sufficient SQL Server admin knowledge to be able to effectively maintain the database, make any structural changes required, add users etc. They will also need to be able to design views, triggers and stored procedures in the SQL Server database and queries, forms and reports in the Access extensions database.

The run-of-the-mill users will only need DataShed training, content depending on their user level, as you don't want to give everyone carte-blanche designing queries within the database. It helps if they have enough knowledge of what the database can do to be able to sensibly request improvements, but don't need to be able to actually do the setup.
 
Status
Not open for further replies.

Part and Inventory Search

Sponsor

Back
Top