Options for Distributed/Grid Databases

BNPMike · Dec 9, 2008

I'm researching the area of desktop grids and wondering what options there are for running databases across many nodes. The sorts of applications you normally read about (eg SETI) don't have a typical corporate transactional model.

I've used Oracle 10G in the past, which I think is a good candidate but it's expensive of course. MySQL seems to run its cluster in memory which might be limiting. However maybe replication achieves a similar effect. Do you know of other packages that could be spread across a number of nodes?

ousoonerjoe · Dec 10, 2008

don't forget the MSDE.... It's the 'portable' version of MS SQL Server. Oh... and it's free as long as you stay with in 2GB Database size (If I am remembering correctly).

However, SETI and Rosetta read flat files containing the segment data, process it and write the results to another file. Then once finished, it connects to the central server and uploads the results file.

--------------------------------------------------
"...and did we give up when the Germans bombed Pearl Harbor? NO!"

"Don't stop him. He's roll'n."
--------------------------------------------------

MDXer · Dec 11, 2008

MSDE is being phased out but is being replaced by the Lighter Free versions of SQL 2005 and 2008, Express and Compact.

MS SQL 2008 Versions

BNPMike · Dec 11, 2008

For a grid application you need a database which is spread across several nodes - maybe anything from 4 to 32. It needs however to behave as if it were on one server. I don't know much about SQL Server - does that work in a grid manner?

Olaf Doschke · Dec 12, 2008

It does depend very much on what you want and need to do. Just spreading the load across many desktop stations will only work if each has the full data and then this is the problem, keeping them in sync.

If each station only has partial data and you query something into "the cloud", you will need some central server that knows where to find which data, or you need a p2p protocol that will flood the net with the query and that can make it slow again.

If you distribute data to locally crunch on it and commit a final result (like SETI) any file based database might be a choice besides small sql server editions.

Bye, Olaf.

BNPMike · Dec 12, 2008

You're right Olaf.

I'm thinking you either need some sort of replication arrangement where each node is constantly propagating updates (using some kind of optimistic locking scheme) or where each node calls a single database. If that database has to run on expensive exclusive servers then you're not achieving the full impact of workstation grid. Oracle can appear as a single database but run on many low-cost servers. Whether it would work on workstations connected only by an office network I'm not sure. It may not be able to deal with the extended latency, at least in its current form.

The management processes of distributing work and checking it is being serviced, already exists but I've not so far spotted the conventional transaction database element solved.

Olaf Doschke · Dec 12, 2008

The management processes of distributing work and checking it is being serviced, already exists

So I assume it's more like SETI or distributed calculations of wheater, raytracing, something compared to that?

Is it designed in such a way, that clients "check out" data for their processing and then "check in" the data again, after they are finished?

Or do other clients need to have access to the same data too as it's non exclusive access?

Bye, Olaf.

BNPMike · Dec 12, 2008

Oracle works by dividing up all the blocks equally amongst all the nodes - not the data but lock control. Incoming requests are directed to a specific node depending on loading messages they send to each other. That node 'knows' which other node owns the block(s) it wants, and asks for control. It then behaves as normal and returns control to the owner node when it's finished its transaction. In the meantime the data is locked as in a single cpu situation ie any other node asking for access to the blocks will be denied by the owning node.

The data would be typically on a SAN but I guess you could virtualise that as well.

For Oracle Grid there is no 'master' node. It's a peer arrangement. I think this would work if
1) the heartbeats could be configured to tolerate sufficient latency (it expects a dedicated LAN between the servers)
2) you don't killed on per-cpu pricing.

Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

Options for Distributed/Grid Databases

BNPMike

Technical User

ousoonerjoe

Programmer

MDXer

Technical User

BNPMike

Technical User

Olaf Doschke

Programmer

BNPMike

Technical User

Olaf Doschke

Programmer

BNPMike

Technical User

Similar threads

Part and Inventory Search

Sponsor