Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

  • Congratulations Westi on being selected by the Tek-Tips community for having the most helpful posts in the forums last week. Way to Go!

AIX connection problem - help needed

Status
Not open for further replies.

MCubitt

Programmer
Mar 14, 2002
1,081
GB
We're running an AIX 5.1 server with Oracle 9.2.0.1.0 running on it and several databases therein.

The AIX appears to log itself out of our network every now and again.

I am unable to telnet in (Connection to host lost.) and even going to the physical server, I see the following message:
Action: StartDTscreenBlank
An attempt to start a new process on host "Servername" failed.
To continue, you may need to stop an unneeded process on this host.

I am unable to close the message window.

If I attempt to start a terminal session on the server I get the message again.

I am forced to Exit -> Logout and restart.

This is a weekly event but not at the same time or dy each week (though tends to be overnight).

Are there any error logs or system logs I can check? Has anyone experience of this problem, and a resolution?

 
Bi:
Thanks. I will do that.
I use SMIT sometimes, is it the same as SMITTY? Looks it. presumable just an alias on the command name.
There is no core file in /usr, but good suggestion. I did a listing of all LARGE files in /usr by size. No idea what's what at the moment so STILL waiting for maintenance man.

Using
find /usr -xdev -type f -size +1000 -ls | sort +6n -r > /usrfiles.txt
the files were...

< SIZE > < DATE > < PATH & FILENAME >
42823736 Sep 17 2002 /usr/websm/pc_client/setup.exe
39794033 Apr 17 2001 /usr/lpp/db2_07_01/lib/libdb2e.a
33980416 Aug 27 20:33 /usr/tivoli/tsm/client/ba/bin/dsmsched.log
32027648 Feb 8 2002 /usr/lpp/X11/lib/X11/fonts/TrueType/mtsansds.ttf
30325492 Feb 8 2002 /usr/lpp/X11/lib/X11/fonts/TrueType/tnrwt_k.ttf
29833720 Feb 8 2002 /usr/lpp/X11/lib/X11/fonts/TrueType/tnrwt_j.ttf
29813484 Feb 8 2002 /usr/lpp/X11/lib/X11/fonts/TrueType/tnrwt_s.ttf
29373140 Feb 8 2002 /usr/lpp/X11/lib/X11/fonts/TrueType/tnrwt_t.ttf
28672000 Aug 11 18:57 /usr/tivoli/tsm/client/ba/bin/nohup.out
28567840 Feb 8 2002 /usr/lpp/X11/lib/X11/fonts/TrueType/mtsansdt.ttf
27311812 Feb 8 2002 /usr/lpp/X11/lib/X11/fonts/TrueType/mtsansdk.ttf
26209312 Feb 8 2002 /usr/lpp/X11/lib/X11/fonts/TrueType/mtsansdj.ttf
17153908 May 21 2002 /usr/java130/jre/bin/libjitc_g.a
13865695 Apr 5 2001 /usr/IMNSearch/lib/libite.a
11850677 Sep 10 2002 /usr/sbin/rsct/bin/hagsd
11614146 Feb 20 2003 /usr/lpp/X11/lib/R6/Motif2.1/libXm.a
10285403 Apr 13 2001 /usr/lpp/bos/AIX_file_list
10055680 Aug 5 12:22 /usr/lib/objrepos/inventory
9902649 Apr 17 2001 /usr/lpp/db2_07_01/lib/libdb2.a
9640617 Apr 17 2001 /usr/lpp/db2_07_01/lib/db2_36.o
9580421 Apr 17 2001 /usr/lpp/db2_07_01/lib/db2.o
9203206 Aug 28 2002 /usr/sbin/ikedb
8547478 Sep 20 2002 /usr/lpp/bos.alt_disk_install/boot_images/bosboot.disk.chrp
28547478 Sep 20 2002 /usr/lpp/bos.alt_disk_install/boot_images/chrp_5.1.0_boot
8524931 Apr 17 2001 /usr/lpp/db2_07_01/cfg/mq/ma0f_ax.tar.Z
8315137 Oct 16 2002 /usr/lib/boot/unix_64
7998508 Oct 16 2002 /usr/lib/boot/unix_mp
7918047 Apr 17 2001 /usr/lpp/db2_07_01/doc/en_US/db2s71en.tar.Z
7799810 Apr 18 2002 /usr/lpp/OpenGL/lib/PPC/libglpipe_PPC64++.a
7740131 Nov 20 2001 /usr/lpp/OpenGL/lib/PPC/libglpipe_PPC+.a
7587478 Nov 11 2001 /usr/ldap/web/cgi-bin/ldacgie.exe
7533259 May 21 2002 /usr/java130/jre/lib/rt.jar
7524627 Oct 8 2001 /usr/ldap/java/lib/rt.jar
7170040 Aug 29 2002 /usr/ccs/lib/libp/libc.a
7165788 Jul 11 2002 /usr/lpp/OpenGL/lib/PPC/libglpipe_PPC++.a
6993939 Sep 20 2002 /usr/lpp/bos.alt_disk_install/boot_images/bosboot.disk.rspc
6993939 Sep 20 2002 /usr/lpp/bos.alt_disk_install/boot_images/rspc_5.1.0_boot
6921956 Jul 11 2002 /usr/lpp/OpenGL/lib/P1/libglpipe++.a
6921222 Sep 20 2002 /usr/lpp/bos.alt_disk_install/boot_images/bosboot.disk.rs6k
6921222 Sep 20 2002 /usr/lpp/bos.alt_disk_install/boot_images/rs6k_5.1.0_boot
6609447 Aug 29 2002 /usr/ccs/lib/libc.a
6609447 Aug 29 2002 /usr/lib/libs.a
6486086 Apr 5 2001 /usr/IMNSearch/lib/libcxxbase.a
6478769 Jul 11 2002 /usr/lpp/OpenGL/lib/P1/libglrassft++.a

 
Looks like this

/usr/tivoli/tsm/client/ba/bin/dsmsched.log

could do with a trim - is it a text file ?

'file dsmsched.log'

Alex
 
Generally, you don't want to get rid of anything in /usr -- that's where all your software is. I was even hesitant about suggesting you look for a core file and remove it.

And you are sure you don't have any more room left in rootvg? (lsvg rootvg) Even adding one or two physical partitions may help.

And yes, smit and smitty are the same thing -- smit is graphical, smitty is not.

 
Yes, we are unsure if we can trim that file. Clearly a log file that is rather large is a prime candidate.

/usr/tivoli/tsm/client/ba/bin/dsmsched.log: commands text

It seems to log every file backed up!


I am FTPing to my PC to further interrogate the file.
 
I cleared the log file and am still running very high:
/dev/hd2 7471104 66448 100% 69294 8% /usr

The results from lsvg rootvg:

VOLUME GROUP: rootvg VG IDENTIFIER: 005d87ba00004c00000000f
37bda8054
VG STATE: active PP SIZE: 32 megabyte(s)
VG PERMISSION: read/write TOTAL PPs: 1084 (34688 megabytes)
MAX LVs: 256 FREE PPs: 296 (9472 megabytes)
LVs: 11 USED PPs: 788 (25216 megabytes)
OPEN LVs: 10 QUORUM: 1
TOTAL PVs: 2 VG DESCRIPTORS: 3
STALE PVs: 0 STALE PPs: 0
ACTIVE PVs: 2 AUTO ON: yes
MAX PPs per PV: 1016 MAX PVs: 32
LTG size: 128 kb AUTO SYNC: no
HOT SPARE: no

 
I would think you could trim at least from the top of the log file.

There should be a configuration file (dsm.opt, I believe) somewhere for TSM that will allow you to change the location of that log file. Although it is the default to put it in /usr, I personally would put it somewhere else.

Are you the one in charge of backups too? The TSM Client User's manual says you can change the location of that file with the command:

scheduleogname /path/to/log/dir/dsmched.log

Then I'd also check to see that the change in was made in dsm.opt.

 
Bi:
In actual fact we are not currently backing up our server (yes, shocking I know - I just found out.) Our server admin people (who are responsible for backups) tell me they are waiting for a decision to be made on exactly what, when and how.

The server is not (yet) a production server but is being used for piloting and testing.

I cleared the log file right down so the problem of pointing the log file elsewhere can be handled later.

Anyway, even with that log file gone the device is 100% full.



 
Our messages crossed.

You've got 296 free physical partitions in rootvg, so unless those free partitions are all on one disk and your LV are mirrored, you should be able to extend the filesystem.

Type lsvg -p rootvg to find the hdisks that contain rootvg.

Then type lspv <hdisk name> for each of the disks where hdisk name is the name of the one of the disks you found with the lsvg command.

As long as you have at least 2 free phsyical partitions, you should be able to run lvw's suggested command:

chfs -a size=+100000 /usr

But this filesystem will probably fill up quickly again until you move that log file.
 
Yes, it's 100% full, but you still have some bytes left.

Usually, with AIX, as long as you have some room in /usr, you don't have to worry too much because it will automatically extend the filesystem when you install software through smit or with installp.

But you get in trouble if you install software that doesn't use smit to install and puts the software into /usr anyway.

I bet every sys admin reading this forum can tell a horror story about backups. I work in a place where the person holding the purse strings didn't want to spend money on tapes for backing up my systems because the servers were &quot;just development&quot;.

True enough, but we develop software -- so development is our production!
 
Oh, and you'll get in full filesystem trouble when that log builds up again.
 
lsvg -p rootvg gave me:
rootvg:
PV_NAME PV STATE TOTAL PPs FREE PPs FREE DISTRIBUTION
hdisk0 active 542 296 79..00..00..108..109
hdisk1 active 542 0 00..00..00..00..00

lspv on hdisk0:
PHYSICAL VOLUME: hdisk0 VOLUME GROUP: rootvg
PV IDENTIFIER: 005d87ba6df8c255 VG IDENTIFIER 005d87ba00004c00000000f37
bda8054
PV STATE: active
STALE PARTITIONS: 0 ALLOCATABLE: yes
PP SIZE: 32 megabyte(s) LOGICAL VOLUMES: 11
TOTAL PPs: 542 (17344 megabytes) VG DESCRIPTORS: 2
FREE PPs: 296 (9472 megabytes) HOT SPARE: no
USED PPs: 246 (7872 megabytes)
FREE DISTRIBUTION: 79..00..00..108..109
USED DISTRIBUTION: 30..108..108..00..00

and on hdisk1:
PHYSICAL VOLUME: hdisk1 VOLUME GROUP: rootvg
PV IDENTIFIER: 005d87ba7fb593ab VG IDENTIFIER 005d87ba00004c00000000f37
bda8054
PV STATE: active
STALE PARTITIONS: 0 ALLOCATABLE: yes
PP SIZE: 32 megabyte(s) LOGICAL VOLUMES: 10
TOTAL PPs: 542 (17344 megabytes) VG DESCRIPTORS: 1
FREE PPs: 0 (0 megabytes) HOT SPARE: no
USED PPs: 542 (17344 megabytes)
FREE DISTRIBUTION: 00..00..00..00..00
USED DISTRIBUTION: 109..108..108..108..109


So does only HDisk0 have free space, right?
 
aarrgh. yes.

Are we sure the logical volume for /usr is mirrored? do an lslv -m hd2 | more. If you get two columns of figures, you're mirrored and you are SOL for now, unless you know of any &quot;junk&quot; filesystems that could be deleted to free up disk space?

Check this FAQ for a script that will produce a report for you so you can easily see where everything is: faq52-2441

You don't have to post the results, but I think it will help you see what you have, where it is, and if you can make room by deleting something.

 
Take a look at this file: /usr/tivoli/tsm/client/ba/bin/nohup.out

You should be able to delete or move that one -- those kinds of files are produced when you use nohup on a command. That one has an August timestamp, so what it was nohupping has long since passed. You'll just gain about 20 MB, but it's better than nothing.

 
Woohoo! Good suggestion Bi. /usr is now at 99% free - so at least there is some free space for the system.

I ran the disk tools and am sure they're helpful to people who understand it. Still waiting for maintenance man but at least we have regained 1%. The system has not had a problem so far. Mind you, the last time was Monday morning and I have now scheduled a weekly restart Sunday mornings. So we will see.

Thanks guys and gals for all your help.
 
Good! I think you are going to have to add another disk to rootvg sometime because you might not have enough room in /usr to add any more software or maintenance level patches.

It is too bad the tivoli software was not installed in its own filesystem named /usr/tivoli. You could still do that, but it would involve having to move the tivoli directory tree to a temporary holding place, creating a filesystem named /usr/tivoli and then copying the tivoli directory tree back.

Good luck!
 
don't start having a party until you have that environmental error fixed. it probably is a bad fan but that depends on your machine type. a PSU failure can be a total pain.

since this is still in development, maybe you can find some time to break out your apps into separate filesystems, as bi suggested. that way, if TSM goes nuts at least it won't break your OS or any other app when it fills up /opt/tivoli (or whatever you call the fs). of course you will at least need to tar up your apps and move them, so i would suggest using the same directory names as you have now.

it is also a really great idea to put your applications in a separate volume group, which means disks other than hdisk0 and 1 in your case.

i do realize that you probably have no idea what i am talking about =) but i or someone else can give you a few hints on using the LVM to create filesystems.

IBM Certified -- AIX 4.3 Obfuscation
 
Thank you Bi & Yegoley.

The maintenance guy was the one who installed and maintains Tivoli. So hopefully he will be able to sort out the issues we have.

Thanks for all your help, it's appreciated.
 
Status
Not open for further replies.

Part and Inventory Search

Sponsor

Back
Top