Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

  • Congratulations SkipVought on being selected by the Tek-Tips community for having the most helpful posts in the forums last week. Way to Go!

ServeRAID 6M - Windows 2003 - Disks re-synch ALWAYS on failover...

Status
Not open for further replies.

iPhone

Programmer
Mar 11, 2009
15
0
0
CA
Hi All,

We have IBM ServeRAID 6M Windows 2003 Server which is new. It was two nodes which have been recently setup.

Problem is that when it fails over... all clustered disks re-synch.

Does anyone have experience with this?

John
 
hi,

which type of storage have you linked to hosts ?

However, the way to obtain Cluster by IBM ServeRAID, is almost different from the others.

Typically on host there is just a scsi card, not a RAID one: the RAID controller is on the storage. Is the storage that knows the ARRAY layout and does RAID policy
(ie: disk0 and 1 are in RAID1, disk 2,3,4 in RAID5).

Using ServeRAID Cards (they are not simple scsi but have also RAID capability), BOTH cards MUST know the same ARRAY LAYOUT.

Well, when happens a failover (suppose you reboot host2),
a scsi bus reset is sent, and host1 take control of virtual disks. When host2 comes on, its ServeRAID card
see lost the disks, at bios level, when you have not yet loaded OS and then IBM-Windows-ServeRAID-DeviceDriver is not started. When OS comes on and the part of Cluster IBM solutions starts, all return ok.

I don'know what is your situation, you are in production,
or in test phase. Tell us also the BIOS level of the card,
and level of Windows device driver, and the type-model of the servers.

Some years ago, I wrote something about this faq491-3942 .
The IBM link inside it is not more available.

You can download CD ver 6.0 (version 7 sometimes does not go
with card ServeRAID 6? with some servers)

Follow also the link

I hope this can come useful for you,
ciao
vittorio
 
Hi VictorV,

Thank you very much for your comments. I am especially interested in the point you make about BOTH cards needing to know the configuration of the array sub-system.

Are you saying that when configuring... both ServeRAID cards should be 'logically defined' for physical disks, arrays, and logical disks. The documentation refers ONLY to Server A in this regard does not discuss Server B, as requiring this setup.

I am reading your material carefully... thanks,
John
 
hi John,
while you read docs and browse sites,
please submit us the missing infos:

- storage type and disk layout (5HDs: 0,1=RAID1 2-4=RAID5)
- servers Model-Type (IBM 3650 9999-4HG, HP DL380...)
- ServerRAID FW Level
- ServerRAID Device Driver version
- Is this a fresh setup, or you are in prod with user data ?
- Have you installed all, or you are the customer ?

vic
 
Hi VictorV,

New clean installation... it is to be a test environment for MS Exchange stuff.

- 2 IBM xSeries 336
- 2 LSI RAID integrated Controller 1 (2 channels)
- 2 ServeRAID 6M Controller 2 (2 channels)
- EXP 400
- 5 Disks, 3 Logical, 2 RAID-1, 1 RAID-0
- Q Disk Quorum is on Logical Disk 18 GB (RAID-1)
- M Disk (Exchange) on Logical Disk 70 GB (RAID-1)
- N Disk (RAID-0)

- ServeRAID Device Drivers are 7.12.11
- ServeRAID ROM flashed to 7.12.13
- ServeRAID Application = 9.0

- IBM did the initial installation in Japan.
- We flattened it and did a reinstall for technical reasons
- we are highly technical operating system level developers in satellite and cellular network software development... meaning that we should be able to do this.


We are using Channel 2 on the 6M Controller. I read somewhere that Microsoft Clustering has difficulty with the second channel LUN identification. But I can not find this document now.

 
The IBM disks that came with it... and on which it was running correctly are 7.12.12
 
Good question...

1: 3 x 70 GB disks came in the EXP400
2: 2 x 18 GB disks were taken out of a xSeries 335

When we received this from our partners in Japan only the 3 (70 GB) disks were in it... we added the others ourselves.

I have not seen the EXP400 complain about the disks.
 
Sorry, maybe I misunderstood your question.

Yes, the configuration of hardware that was working was in fact 7.12.12... and these are the CDs that came with the system.

We did not use those CDs but rather downloaded marginally newer version from the IBM website. v7.12.13
 
Hi VictorV,

I notice that you make the statement "Choose copy configuration from disks" with regard to Server B.

This is implying that BOTH cards (A and B) need to have logical disk definitions in their settings. The copy from disk must be a similar method as typing it back into Server B, but there is probably additional information that comes with it.

This could be the difference from what we have done.

Could you confirm that this is what you mean?

Thanks,
John
 
One other perceived differece... the IBM help wizard says:

"Enter a merge-group number for the shared logical drive. This merge-group number must be unique from the partner controller (that is, the merge-group number for any logical drive belonging to the partner controller cannot be the same merge-group number)."

--

Does this mean that if I have on Server A, Merge Group numbers of 1,2,3 for three logical drives... then on Server B the merge group numbers need to be NOT 1,2,3?

--

As we have not recently explicitly 'configured' Server B for logical drives we have not validated this statement but using the ServeRAID application it is apparent that on BOTH nodes A and B, the merge groups are the same set of: 1,2,3

--

John
 
hi,

how you have noticed, the button Cluster in ServeRAID manager, is active only if you boot from ServeRAID CD.

The merge-group-number I believe is a sort of LUN.

step A: boot server1 by CD, cfg disk array (I think 1 RAID1 and 1 RAID5) ; check adapter SCSI-ID is 7 ; elect your 2 volumes in shared mode (cluster button), I don'remember if you can number volumes (however check if raid1 is vol1 and raid5 is vol2) ; write ServerA as servername and ServerB in partner name.

step B: leave disk to synchronize; then shutdown Srv1.

step C: boot Srv2 by CD, and import information from disk.
(this operation is used when your card fails, while the disks are good: in the disk there are the right array layout, while the new card, that come from Adapec, nothing knows about array layout); it is as you have migrated the storage from Srv1 to another but identical Srv2.

Step D: the previous op has copied all to card, but something is wrong and has to be changed:

1) The card scsi ID is 7 while the scsi ID of 2nd card
must be different: set it to 6.

Warning: no disks can have scsi id 6 or 7: the corresponding locations on DS400 MUST be empty
(probably you have no problem, becouse if you have 5 disks
you have used 0,1 2,3,4)

2) You have to swap ServerA and B between servername and partnername.

Now I have to leave,
see you later

ciao
vittorio
 
It is supposed to resync on failover. This is done to insure data integrity. When failover occurs, the controller taking control of the other nodes logical drives cannot know if all the stripes are coherent. If there are incoherent stripes (parity doesnt match data) and a drive subsequently fails... the rebuild operation will rebuild data based on incorrect parity and thus rebuild incorrect data. The resync fixes that exposure.
 
Hi Vittorio,

I believe I understand the process. There is only ONE outstanding question that I have.

The ServeRAID 6M does support IMPORT from Disk in Cluster MS Windows 200x mode.

So I am wondering how I create the Logical Disks on Server B. The choices that I can see would be:

1: Manually re-create as performed previously on Server A.
2: Some other method of import or copy...

But I do not see how #2 can be achieved and I am also not sure whether #1 approach is possible. I mean the second Controller may be to 'copy data' in some way as opposed to using local configuration that is co-incidently the SAME as on Server A.

--


QUESTION:

Can you be specific on how to configure Server B relative to that of Server A?

--

I have discovered that the IBM documentation did not say to wait for/ensure that Server A had synchronized before rebooting. I believe that this was the problem (or one of them) that we experienced.

--

Thanks,
John
 
OK, I imported the Server A configuration via "Copy from Disks" but this could only be performed from the BIOS ServeRAID controller application (that is NOT the IBM Support CD, as there was no functionality of this sort on the CD).

--

The Cluster is installed... and re-synching now...

--

Question:
When the second node of the Cluster is JOINING an existing node... does THAT node require ownership of the DISKs? I decided to leave ownership of the disks with the operational cluster node (A)... while (B) was joining. [But there is no reference in your procedure or IBMs on whether the installing node needs ownership of the Disks during this process.]

--

Question:

Does EACH node have to re-synch the disks?

Is there ONE Disk Cluster Allocation Table or does each Controller Node have it's own Disk Cluster Allocation Table?

I am trying to understand how much 'synchronizing' of disks is required. I can not find documentation on this aspect of Adaptec/ServeRAID documentation. The cluster looks fully functional but I believe the RAID may not be, as I would have thought that synchronizing disks would be performed once.

I do not think that the Microsoft Cluster creation Wizard knows about the ServeRAID system which is why it can not find 'disks' that qualify for cluster devices. When running the Wizard, the EXP400 disks never flash... which suggests to me that Microsoft does not know they exist.

--

I will attempt to apply any thoughts that you have.

Thanks,
John

 
Action performed to cause the failover... was to stop the C32 node.


Node C32 (Machine Name)
--------------------------------------

Log - System

10:13:50: The Cluster Service failed to bring the Resource Group "Cluster Group" completely online or offline.
10:13:50: The Cluster Service is attempting to bring online the Resource Group "Cluster Group".
10:13:49: Cluster node C32 was removed from the active server cluster membership.
Cluster service may have been stopped on the node, the node may have failed,
or the node may have lost communication with the other active server cluster nodes.



Node C31 (Machine Name)
--------------------------------------

Log - System
10:15:38: Cluster service could not join an existing server cluster and could not form a new server cluster. Cluster service has terminated.

10:13:50: Cluster resource 'IPSHA Disk Q:' in Resource Group 'Cluster Group' failed.
10:13:50: The Cluster Service failed to bring the Resource Group "Cluster Group" completely online or offline.
10:13:50: Cluster resource 'IPSHA Disk Q:' in Resource Group 'Cluster Group' failed.
10:13:50: The Cluster Service failed to bring the Resource Group "Cluster Group" completely online or offline.
10:13:50: Cluster resource 'IPSHA Disk Q:' in Resource Group 'Cluster Group' failed.
10:13:50: The Cluster Service failed to bring the Resource Group "Cluster Group" completely online or offline.
10:13:49: The Cluster Service is attempting to bring online the Resource Group "Cluster Group".

Log - Application

01:53:17: Synchronize complete: controller 2, logical drive 2.
10:58:18: Synchronize complete: controller 2, logical drive 1.
10:15:23: Synchronize failed: controller 2, logical drive 2 [66].
10:15:23: Logical drive is offline: controller 2, logical drive 1.
10:15:21: Defunct drive - User accepted: controller 2, channel 2, SCSI ID 0 (FRU Part # 19K1467)..

Notice that Synchronize is attempted immediately (10:15:23) after Logical is Offline.
I would have thought that an offline drive would be attempted to bring Online.
 
hi,
also if you have imported (or copied) configuration from
disk using embedded BIOS, you have to change scsi id to 6
(probably you can again use BIOS), but to set in cluster
menu host=ServerB partner=ServerA, you need use ServeRAID CD. (I am sure there is also the voice "copy cfg from disk",
probably the label is not exactely so, but the sense is this).

All these operations, have to be done, BEFORE, beginning
MSCS setup.

bye
victor
 
Yes, I believe I have a good understanding about SCSI devices. I do not see SCSI conflicts... I think ServeRAID would report these in some way.

--

I just read a comment from Microsoft about DISK SCAN (BIOS controlled) on reboot. They suggest turning this off.

--
 
Question:

I have only two (2) clustered drives in this instance. Quorum on Q: and Mail Disk on M:.

Now... the Q Drive can be see from BOTH nodes while the M Drive can only be seen from the Active Node... using Computer Management app.

The M: Drive is in a Resource Group by itself.
Is this a problem... does this Disk need to be part of a another group. MS Exchange is not installed yet.
 
Correction... no... both Disks are 'unreadable' on the Passive node.

I would think that this is normal. The state may have taken awhile to change... in the user interface.
 
also the Q disk must be in a separate resource group:
the cluster group.

About scsi id, I don't understand if you have set them
one to 6 and other to 7.
I have in my archive pdf about 6m regarding cluster configuration.
You can find 3 pdf in folder books in ServeRAID support CD.

On the web, I have found the same thing for 4x, but it's the same:
ftp://ftp.software.ibm.com/systems/support/system_x/19k6408.pdf, in "Controller considerations" on page 55.

bye
vic
 
Status
Not open for further replies.

Part and Inventory Search

Sponsor

Back
Top