Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

  • Congratulations Chris Miller on being selected by the Tek-Tips community for having the most helpful posts in the forums last week. Way to Go!

How to check the progress of a "mirrorvg" 3

Status
Not open for further replies.

zaxxon

MIS
Dec 12, 2001
226
DE
Hello,

I started a mirrorvg on a VG being about 1,8 Terrabyte in size. The box has 4 CPUs, each 1,45 Ghz (not very busy) and lots of free GB RAM. The disks in the corresponding VG are in a Clariion CX700 disk subsystem, connected by FC and using EMC Powerpath, having a large read/write cache too (1GB/2GB). The corresponding LUN is a Raid-5 group, slow at writing, yes.

The mirrorvg is running now for about 2-3 days and smit is not back, showing still "running"; I started it in the foreground. The box had lots of hours when there was about zero traffic on it to complete its mirroring.

There is also some traffic (not really much, about max 20MB per second) visible with iostat on the relevant disks (hdiskpower), but my question is:

Is there any way to check how far the "mirrorvg" is? Any commands regarding the VG tell me that the VG is locked. So I have no clue how to check, when my smit-window will be back with status "ok" and all will be done...

Thanks for help in forward.

laters
zaxxon
 
try:

lsvg VG


and look for how many left there STALE (not synced) PPs.

Running:

lsvg -l VG

you will see which LVs are syncd and which not. it will also show you the mirroring status - if LV's are mirrored.
 
Hi ogniemi,

as I said, I always get an error/warning when starting the usual ls* commands against the VG. Example:

Code:
root@srarhv05:/repos> lsvg tsmdata2vg
0516-1201 lsvg: Warning: Volume group tsmdata2vg is locked. This command
        will continue retries until lock is free.  If lock is inadvertent
        and needs to be removed, execute 'chvg -u tsmdata2vg'.

laters
zaxxon
 
I am also not sure, what will happen with the data, that has to be mirrored, when I cancel the job still "running" in smit..
If there will be no problem in cancelling it and restarting it with the "background" option instead of my stupid "foreground" option, so I would not care that much what it does in the background, and I wouldn't have to fear what will happen if my ssh session might collapse...

laters
zaxxon
 
You really don't need to worry about you ssh session failing. The mirroring will not hurt you good copy of data. If it fails you can just restart the process and it will pretty much pick up where it left off. The only way I know of to see how far the mirroring has gotten is by looking at the currently running processes and looking for the sync process (can't remember exact name of process right now) running on an LV. The mirroring seems to go sequentially thru the VG in the same order that the LV's are displayed when you run an lsvg -l VGname.


Jim Hirschauer
 
I may be wrong, but if your ssh session 'collapses', the background job will die too. You can circumvent this by preceding the background job with nohup, so that the session isn't killed if the original session is interrupted.
 
Thanks Jim, good to know the original data won't be harmed. I can see the syncd running but it hasn't a big load nor can I see which LV it is currently processing. It's no big problem so far since I am happy the original data will not be corrupted :)
I will wait until next week if the VG is still locked, maybe it just needs it's time since 1,8 TB is a lot of data.


Thanks Ken for the info :)

laters
zaxxon
 
zaxxon,

I am not talking about the syncd process. It is another process that will show up as a child process of your smit session. It is a low level process and I just can't remember the name of it right now. If you look for child processes of your smit session you should be able to find it.


Jim Hirschauer
 
the processes during PP syncing are:

root 21018 1 0 14:55:19 pts/1 0:00 /bin/ksh /usr/sbin/syncvg -v data
root 22600 21018 5 14:55:20 pts/1 0:00 lresynclv -l 00c0e39400004c00000001021c1d03d4
 
I guess you mean something like:

Code:
/bin/ksh /etc/rmlvcopy lvtsmdata2 1

and maybe:

Code:
putlvodm -k 005c717c00004c0000000106c2b594e5 -X 0

?

Since I have only 1 LV in the entire VG it does not really matter which LV is currently being processed but in general it is a handy thing to look for these processes, yes. And thanks again.

laters
zaxxon
 
Hi,

When your VG is locked you can use this command :

lsvg -L -l rootvg

and then you can checks the number of staled PP by this way you can have an idea of the time needed.
 
Uh oh... I just checked for some processes:

Code:
root@srarhv05:/tmp/reusch> ps -ef| grep -iE "vg|lv|sync"
    root  344308       1   0   Oct 30      - 15:49 /usr/sbin/syncd 60
    root  463096       1   0   Oct 30      -  0:00 /usr/bin/rsync --daemon
    root  631032 1253600   0   Nov 01      -  0:02 lsvg -p tsmdata2vg
    root  634988  708666   0 16:18:24  pts/4  0:00 grep -iE vg|lv|sync
    root  827494  843852   0 14:58:09  pts/3  0:00 /bin/ksh /etc/rmlvcopy lvtsmdata2 1
    root  843852  647356   0   Oct 31  pts/3  0:00 /bin/ksh /usr/sbin/mirrorvg tsmdata2vg
    root 1364074 1339620   0 06:05:27      -  0:00 lsvg -p tsmdata2vg
    root 1417344  827494   0 14:58:10  pts/3  0:00 putlvodm -k 005c717c00004c0000000106c2b594e5 -X 0

Looks like it is still trying to mirror, but does a rmlvcopy? The time for the rmlvcopy could be the same time when I cancelled the smit-Job by closing the session since ctrl+d didn't work... not sure though.

I am not totally sure what is going on at the moment..

Here is the "lsvg -L -l", btw:

Code:
root@srarhv05:/tmp/reusch> lsvg -L -l tsmdata2vg
tsmdata2vg:
LV NAME             TYPE       LPs   PPs   PVs  LV STATE      MOUNT POINT
lvtsmdata2          jfs2       1781  3562  2    open/stale    /cmarchiv2/tsmdata/tsmdata2
loglv02             jfs2log    1     2     2    open/stale    N/A

laters
zaxxon
 
I just found also some commands "hanging" like some lsvg, prtconf and so on. I just killed them; they were started by cron in the night and were just for some reporting.

Do you think it is a good idea to kill one or some or all of those processes:
Code:
    root  827494  843852   0 14:58:09  pts/3  0:00 /bin/ksh /etc/rmlvcopy lvtsmdata2 1
    root  843852  647356   0   Oct 31  pts/3  0:00 /bin/ksh /usr/sbin/mirrorvg tsmdata2vg
    root 1364074 1339620   0 06:05:27      -  0:00 lsvg -p tsmdata2vg
    root 1417344  827494   0 14:58:10  pts/3  0:00 putlvodm -k 005c717c00004c0000000106c2b594e5 -X 0

.. and start over with a syncvg?

laters
zaxxon
 
...u can use lscg -L rootvg and see the incrementation or decrementation of staled pps;
Once I wrote a small script that launch this command #lsvg -L rootvg every 10 minutes.
After one hours u have an estimation of the rate of PP copied.
exemple : in 1 hour 300 PPs have been mirrored.

try it

Cheers
Adimstec
 
The processes I listed above were all results of my dead smit-Session. I killed the process for the session and after that all of it's children. There was still a zombie left (putlvodm...) I couldn't kill with SIGKILL and so I had to reboot.
The result was 563 stale PPs, each 1024 MB in size. I started a syncvg and it ran over the whole night, having now 283 stale PPs left, still running.

Thanks all for the advices - you helped me really very much!

laters
zaxxon
 
zaxxon, just out of curiosity, how long did it take to mirror that 1.8 TB VG? And also, why were you mirroring to a RAID device? Were you just trying to move the data to the Clariion?
 
Yes, it seems ridiculous at first that we mirror onto another Raid-secured disk. The problem is, that this data consists of TSM (Tivoli Storage Manager) data, that is very crucial for our company and may not get lost. It was originally data that was stored to large HP Sure Store WORM Libraries and was moved in TSM over to a large Disk Storage Pool. If the "primary" Clariion CX 700 will fail, I don't say that only a disk in the raid 5 array fails or something minor like this, all data would be lost, since our backup to WORM tapes (VolSafe) tapes is not working right now.
I had a TSM scheduled copy job running (backup to a Copy Storage Pool) before setting up the mirror, but the mirror was chosen, because TSM was so awful slow, doing the backup of the Primary Disk Pool and setting up the mirror via AIX was much faster and is more permanent consistent than schedules running form time to time.

The time it took, to get the mirror going was about 3-4 days, if I am correct, including the sync for the stale PPs (1024MB PPsize).
I will have to mirror another 1,8 TB - if I don't forget it, I can give more accurate info on time maybe ;)

laters
zaxxon
 
Status
Not open for further replies.

Part and Inventory Search

Sponsor

Back
Top