Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

  • Congratulations strongm on being selected by the Tek-Tips community for having the most helpful posts in the forums last week. Way to Go!

New to SAN

Status
Not open for further replies.

shpshftr

Technical User
Oct 21, 2002
22
US
Hello everyone. I have a two Proliant DL 580's connecting through a Fiber Channel switch to a SAN. The two servers are clustered. I rebuilt one of the servers the other day and now it cannot connect to the SAN. The other server connects and gets its usual drive mappings but the newly rebuilt one does not. I am very new to SAN and Fibre Channel so I checked some documentation and I think I may have to set the World Wide Name on the new server to match that of the old one and the SAN. Does anyone know where you set the WWN? Also Windows sees the SAN in device manager and Compaq array manager but just can't actually connect to it or see it in disk management. If anyone has any advice on what could be wrong or general suggestions/tips for configuring a FC san any help would be greatly appreciated. Thanks in advance.
 
Well not so easy. There is a port WWN and a node WWN and their implimitation is not 100% universal across products. The node WWN is tied to that specific piece of hardware.

It could be a zoning issue. There is hard and soft zoning. Soft is were the switch is looking for a specific WWN, hard is were it is configured for a specific port. But there also it is not 100% consistant across all SAN implimitations.

Bottom line for my 2cents worth is to start at the switch zoning configuration.
 
Might be a good place to start. The thing is nothing has changed except this one server. The ports, switch, san, and other server are all configured as they used to be. Unfortunately this cluster is extremely mission critical so I can't really modify any settings except on the newly rebuilt server. Where is the setting for the WWN? I believe were using soft zoning. Where would I set the WWN? In the controller properties? Thanks a lot for your help.
 
When you say "SAN", what are you referring to? It sounds to me you're talking about the storage system which also connected to the SAN. This is an area of confusion to those who are just starting out, but a SAN is the whole network, and on this network nodes attach. The nodes can be servers, and storage (disk, tape) systems. The actual network transport mechanism is Fibre Channel, which the switches handle.
The WWN is fixed on the host bus adapter (the PCI card in the server). It cannot be changed (easily at least). It's like a MAC address on a regular NIC.
Usually, access to disks from the storage system (what you probably call the SAN) is regulated by what is commonly known as LUN masking. In HP terminology, it's called Selective Storage Presentation. This is set on the storage system, essentially logical volumes are assigned to WWNs, thus enabling access to the LUN(s).
Since this is done on the storage system, and the only thing that has changed is the server, I don't think it's an SSP issue. I also don't think it's got anything to do with the zoning in the switches. Zoning would not be affected by a server rebuild.
Since you say you can see the "SAN" in Compaq Array Manager, I will assume you have an MSA1000 storage system. It's the only storage system that can be managed that way. In any case, I'd check what driver you're using for the host bus adapter (under SCSI adapters). Do not use what Windows uses by default, check the other server for details.

Also, you mention this is a cluster. You might very well be in a situation where the remaining server has claimed the disks for itself. This is how clustering in Windows (and most other OSes) work. It's called a SCSI reservation. This is most likely the case, and if so you will have trouble seeing the disks from the newly rebuilt server. Have you tried adding the new server as a new node to the cluster?
Have you evicted the old server? Obviously it's the same physical box, but Windows clustering doesn't know that.

In Windows Disk Manager, do you see disk volumes as "missing, unreadable" or anything like that? That would suggest the server is seeing something, but since the other server is keeping the disks locked, it can't access them.
This is how it looks like after a failover in the cluster, but since this is a brand new build, you might not even see the disks at all in Disk Manager since the server has never "seen" the disks.

My course of action would be to:
a) Check the driver on the host bus adapter
b) Check the SSP on the storage system to be sure
c) Try to readd the node in the cluster

Hope that helps
/charles
 
It sounds to me like you have not configured your HBA (Host Bus Adapter) properly.

What is the make and model of your HBA?

Is this a new HBA or an existing one that you are using for the rebuild?
 
The adapter is a Compaq Storageworks RA4000 controller. It's not new it's the same one that was always there. There are two for redundancy. I would think it would of held its config information. I can see the drives in device manager. Just not in disk management or windows explorer. Thanks for your help.
 
Also I did evict the old server from the cluster. Then I joined the new server to the cluster. In cluster managment it shows the drives as resources for the old server but nothing in resources for the new server.
 
Ah, you have an RA4000 storage system. The RA4000 controller is the actual controller in the storage box. This is _not_ the Host Bus Adapter (HBA). The HBA is installed in your server as a PCI card. Since the RA4000/4100 is discontinued, I can't locate the any updated support documentation just now, but looking in my archives I see that the HBA is called "Compaq StorageWorks Fibre Channel Host Adapter/P" or "Compaq StorageWorks Fibre Channel Host Adapter/E".
The correct drivers should be present on any recent Smart Start CD.

You also mention you have two RA4000 for redundancy. By this I assume you mean you have two RA4000 controllers in the same physical box, not two seperate physical boxes. This means you have two paths to the storage system, and therefore you _must_ install SecurePath multipath driver if you haven't done so already.
/charles
 
I have installed secure path as well but that did'nt seem to change anything. Is there any special configuration for secure path as far as you know of? Thanks
 
Although you can define agent access and password for SecurePath, these settings are not necessary for the actual functionality.
You haven't mention the OS yet, so I'll just assume it's Windows. I have seen this behaviour in similar sitation. It all comes back to that SecurePath on the newly installed server can't "see" the LUNs, since they are locked by the other server in cluster (SCSI reservation). You could try just doing a failover of a group with disk resources in it in the the cluster and see if they appear on the new server. However, I doubt the failover will work. I have bad experiences in these configurations. The only reliable way for me has been to completely shut down both nodes and boot up the new server to let them access to the disks. I realize this is exactly what you don't want to do, and there might be another way. However, I'm not privy to any such method. You'll just have to schedule a short maintenance window and send the bill to Bill. :)
/charles
 
To rejoin a server to a cluster and get it to access the shared drives afterwards, you need to run a utility on each server which you can download from Microsofts site. The file I have is called clusterrecovery.msi Thgere's also a similar utility in the server ersource kit.

You will need to shut down both servers, bring one up , run the utility, shut it down,

bring up the other, run the utility, shut it down, then bring up both.
 
Is this cluster Active or Passive? If its Passive then your newly rebuilt node will not "see" the disks since your Active node has control of them. To prove this, you will need to FAILOVER, which will cause downtime of several seconds to several minutes if configured correctly. Go into cluster administrator in Windows and change your "resource group" from your active node to the passive. This is from my experience using Dell PowerVault 220s's and Windows 2000 Active/Passive configurations.
 
Hi,
is this still an issue? I'm jumping in a bit late on this thread. I would suggest as a matter of course to first determine where in the IO chain the conectivity to the storage is falling down. I would first check to see if the HBA has successfully logged into the SAN. This can be checked on most switches by looking at a port view, it should give details of the WWPN which has logged in and also if it is a F port. This is a good place to start, if you do not see a successful login from the HBA then concentrate your efforts on the server and the cabling infrastructure to the switch, also check to see if the switch port is unblocked.
If the HBA has logged in verify your zoning and if thats all okay verifyy your storage device is logging in.
Once you know where the issue lies in the IO chain you will find the problem easier to debug.
all the best,
Colin,
 
Status
Not open for further replies.

Part and Inventory Search

Sponsor

Back
Top