EDD007 - 11C Unable to EDD

absinthium · Nov 22, 2011

Hi All,

Major issue with 11C running 23.35

Unable to run an EDD since.....February. It fails after the CONFIG tries to write to disk.

~~~~~~~
DB SEQ NUM = 3823
CONFIG
EDD007
~~~~~~~

Looks like we can't write to c:/u as LD 117 update data doesn't work either.

There are obviously a lot of changes in RAM at the moment and an INI will lose 9 months of config changes.

I cannot even mkdir in c:/u/ or c:/p/ either.

Here is the DAT output:
~~~~~~~~~~~~~~~~~~~~~~~~~~~~
DATABASE ISSUE DATE(d/m/y)/TIME SIZE(recs) SEQNO
Main 2335 17/02/2011 at 01:05:56 84 3562
Secondary database not accessible.
IntBackup 2335 17/02/2011 at 01:05:56 84 3562
PCMCIA database not accessible.
Current external backup is on PCMCIA drive B
~~~~~~~~~~~~~~~~~~~~~~~~~~~~

Is there any way to bring this switch back or is this unrecoverable?

Firebird Scrambler · Nov 23, 2011

It sounds like the daughterboard is corrupted. I would try and get into "pdt". If you see errors such as unable to access the C drive or ERROR - Unable to create Access File. etc, then the board would be corrupted.

Doing a dump gives TEMU0131 :

(TEMU0131 Problem setting previous database file's information.)

>LD 43
EDD000

.EDD

DB SEQ NUM = 2572
CONFIG
EDD007
.
.
EDD000

TEMU131 Problem setting previous database files information.

CIOD157 INFO: CMDU 0 is ACTIVE, RDUN is ENABLED

Entering pdt gives access error and shows there's no file structure in the/u directory
------------------------------------------------------------------------------------------------------------------------
PDT: login on /sdi/tty1
Password:
PDT in Progress. Please Wait....

** ERROR - Unable to create Access File.
pdt>
pdt> cd /u
pdt> ll

Directory of '/u':

SIZE DATE TIME NAME
---------- ----------- -------- ------------
512 Jul-30-2008 12:28:58 SMP_DB <DIR>

pdt> cd ..

Suggested a re-install will need to be done. However, the following may help below.

Dump failure EDD007
The EDD failed at the ALARM_MGT point in the Dump
Associated errors point to problems with the files in the smp_db directory as below :

Extract from the EDD
-------------------------------
PLUGIN
CPND
CPND NM
GPHT
SPECIFIC DATA
HI
ALARM_MGT
EDD007

BUG9011 Can't create/open Alarm Management Database file /u/smp_db/smpconf.tmp.

BUG9011 Can't create/open Alarm Management Database file /u/smp_db/smpserv.tmp.

BUG9008 Error occurred writing Alarm Management Database /u/smp_db.

Navigating to the smp_db directory in pdt and listing its contents showed errors in the directory structure.
pdt> cd smp_db
pdt> ll

Directory of '/u/smp_db':

SIZE DATE TIME NAME
---------- ----------- -------- ------------

pdt>

This is what the directory structure should look like :

pdt> ll

Directory of '/u/smp_db':

SIZE DATE TIME NAME
---------- ----------- -------- ------------
512 Aug-14-2007 15:50:48 . <DIR>
512 Aug-14-2007 15:50:48 .. <DIR>

pdt>

To overcome the problem I deleted the smp_db directory and reprovided it as below :
The smp_db directory is off the 'u' directory

Directory of '/u' showing the smp_db dir

SIZE DATE TIME NAME
---------- ----------- -------- ------------
67 Nov-30-2027 00:03:22 COPYLOOP.DAT
512 Nov-30-2027 00:01:18 LOADWARE <DIR>
1114 Mar-08-2006 19:06:14 KEYCODE
512 Aug-13-1998 15:15:04 DB <DIR>
512 Aug-13-1998 15:15:04 RPT <DIR>
512 Aug-13-1998 15:15:04 PATCH <DIR>
512 Aug-13-1998 15:15:04 SMP_DB <DIR>

pdt>

So in the 'u' directory and using the commands you can re-created the smp_db dir.
'rmdir smp_db' to remove the dir
'mkdir smp_db' to reprovide the directory.

After this the directory structure looked correct :
pdt> cd /u/smp_db
pdt> ll

Directory of '/u/smp_db':

SIZE DATE TIME NAME
---------- ----------- -------- ------------
512 Aug-14-2007 15:50:48 . <DIR>
512 Aug-14-2007 15:50:48 .. <DIR>

After this pdt change doing an EDD allowed access to the smp_db dir for read/write functions and the Dump completed.

Also look at this below.

Dump fails at the same point with an EDD007:

CPND
GPHT
SPECIFIC DATA
HI
EDD00007

There were no hardware information files within the /U/DB/HI directory and we were unable to copy files to it. Once the HI directory was removed and re-provided we could then copy the relevant files from /P/HIDIR to /U/DB/HI.Data dump was then successful.

Also found this below that requires a patch to be fitted at 22.46 and 23.47.

Patch 11848 fixes a problem where EDD007 is out put during the midnight dump.
Error Description

CONDITION:
a) Option 11c running 24.04f, daughterboard NTDK81 is equipped
b) Patch MPLR11502 bv82247 is in service
c) Ld 43 in midnight routine.

ACTION:
1) Dump is performed during midnight routine

EXPECTED RESPONSE:
1) Dump is performed with no problems

ACTUAL RESPONSE:

1) EDD 007 is printed, no dump because Flash write failure

DEFECT CAUSE:
The c: drive block driver will become unstable under heavy load often seen during MIDN EDD's. Various error messages are printed to the console, however the bug message BUG 6347 'Logical block mapping is invalid' is the most interesting.
The workings of the c: drive are complex so for the rest of this explanation an understanding of the following things are assumed:
-properties of Flash EEPROM
-the IO subsystem as implemented by VxWorks
-the Dos File System as it is implemented by VxWorks
-the c: drive block driver (ssDrv) and it's interface to the system and inner workings/mechanisms
-the basic theories behind synchronization of concurrent access to shared resources

Symptom one:
Under high load the c: drive will obviously take longer to service read and write requests than it does under normal load. Under high load the packer needs to run almost constantly to keep the free track pool within threshold values which in turn increases the wait time for lower priority processes such as the tRpt task (for example).
When the tRpt task times-out waiting on the admin semaphore it will try to 'un-stuck' it. The problem arose
when the low priority task enters the ssdrvStuckSemaphore function the high priority task may have already given back the semaphore (that's what the packer does to relieve hogging) it (tRpt) will get a 0x0 back for the task which will pass it's criteria and then semMGiveForce the admin semaphore!!! This isn't a big problem unless another high priority task grabs the semaphore again and tRpt won't know any different!!! Under these circumstances two tasks could then be working within their critical regions and messing up the logical block map and any other really important data structures!!!!

Symptom two:
The packer process would panic and call packtrack with threshold one if it did not reclaim a track right away. This would essentially 'hose' the system (to use the most technical term possible) leading to a higher
probability to see 'Symptom one'. One can now easily see that this would cause a cascading effect where the entire system would become really really 'hosed'.

Symptom three:
The mechanism to recover erasable tracks that were not recovered by an erase task for reasons of timeouts, BERR or whatever was preventing the packtrack function from being able to actually pack tracks as it would attempt to erase the same track over and over until the actual erase was complete. This resulted (in high load situations) in allowing the drive to go into dangerous low levels of free tracks often resulting in a 'no free tracks' bug.
Other minor errors existed that are both not interesting and not worth mentioning.

SOLUTION:
Symptom one:
The ssdrvStuckSemaphore function was modified to do a taskLock before checking anything to ensure that the snapshot it takes to do it's checking is the same when it actually performs the check! As well an additional check is performed to not let task id's of 0x0 get through...

Symptom two:
This was changed to take a more gentle approach to the situation where it would use a basic linear equation of sorts to adjust the threshold value being used to call packtrack. The limits used for low free tracks and high free tracks were used slightly differently to also make the packer less aggressive.

Symptom three:
To prevent this extra checks were put into place to check if an erase was in progress using the per-chip
state variable. This variable wasn't always reliable because it was updated before the update to the
accounting records was complete resulting in the same phenomenon from above. Therefore some changes to where the state was updated was done to create a more accurate picture of what was happening for the erase.

I think it's wise to obtain as many listings as possible such as DNB.TNB etc. I have done a listing script file for use with Procomm Plus that would help.

There is another thing that might help below.

Patch 11199.........22.46 and 23.47.

Error messages during Datadump (EDD007), no access to PDT, no access to Overlays 117 and 135. A manual INI solves the problem. In marginal condition, it is also reported that the problem can only be solved by a Reload.
Memory leakage problem: It is possible for PDT to lose 8K of memory for every invalid login if an interactive shell is used. If a user types the incorrect password twice and the user is denied entry to PDT the system will lose memory. This is one of many scenarios that can result in a memory leak within the cpsPdtShell in pdtShell.c. The memory leak comes from the ledOpen routine which allocates a history of size 20 * 408 = 8K There are places which return without doing a ledClose ( Which frees up the history).

Patches 11199,11843,11848,11965.......22.16 22.46 23.47.

A new occurrence of a C-Drive (Software Daughter Board) lockup problem has been recently experienced in two different European countries. When the problem occurs the C-Drive is no longer available for use. The Flashrom where the C-Drive is located must be erased and the software reinstalled. Contact the MOC for help in doing this. As a workaround solution, the daughterboard (NTSK06) can be replaced by a new one, then the Meridian Software reinstalled via PCMCIA.
Bad Patch: Feedback from the field was that a problem started to occur with deployment of patch mplr11140 . Investigation from Technology group confirmed that this patch might have to be improved. The problem manifests itself in the form of:
- Error message EDD007 during Datadump
- No access to PDT
- No access to overlays 117 and 135
- BUG 6347, BUG 6351, BUG 6348, BUG 6363, BUG 6364
- Memory leakage (You can lose 8K of memory every time an invalid password is input when trying to access pdt. There are other scenarios that can result in a memory leak within the cpsPdtShell in pdtShell.c)
- Z-drive backup errors
- System freeze
Affected sites
Option 11C running Rls 22 or Rls 23 market Releases, and Rls 24 pre-market Release.
Preventative action
Install the following suite of patches:
mplr11199 (Cures Memory Leak)
mplr11843 (System freeze due to amd flash driver's blocking under certain hardware failure conditions).
mplr11848 (Various C: Drive errors mostly seen as BUG 6347. Affects entire system, reported during datadumps, when the datadump fails you will see EDD 007. Replaces patch mplr11502)
mplr11965 (Cures the Z drive problems)

All the best

Firebird Scrambler
Meridian 1 / Succession and BCM / Norstar Programmer in the UK

http://www.linkedin.com/in/davidbromley

If it's working, then leave it alone!.

absinthium · Nov 23, 2011

Wow, thanks. Comprehensive!

I have PDT access without error. I cannot mkdir on the c: in any folder level.

My main concern is the stuff in memory that hasn't been commited to disk.

I think the only way out may be a full TNB,CDB,SLT etc... dump and re-key after reinstallation.

I can view the file system successfully and cat files in c:/u/ to read things like INET.DB. It appears I can read from c: but just cant write to it.

Is there any way to dump the config DB and REC from RAM to remote disk (laptop) or even PCMCIA?

bhassell2000 · Nov 23, 2011

can you do a database archive in ld 143?

Thanks,

Buddy

Linked in Profile

http://www.linkedin.com/in/buddyhassell

KCFLHRC · Nov 23, 2011

Have you tried the EDD CLR command ???

absinthium · Nov 23, 2011

I haven't attempted an archive yet as my main concern is dumping the memory to disk.

EDD CLR returns the same EDD007 error.

bhassell2000 · Nov 23, 2011

if you do an archive you can save the database to a PCMCIA card and its better than losing everything

Thanks,

Buddy

Linked in Profile

http://www.linkedin.com/in/buddyhassell

Firebird Scrambler · Nov 23, 2011

As Buddy has stated, go into LD 143 from TTY 0 and use upgrade

Select 3 Utilities and follow the example below.

>ld 143
CCBR000
.upgrade

SOFTWARE INSTALLATION PROGRAM
************************************
Verify
Security ID: 10239999
************************************

Main Cabinet Software Installation Main Menu :
1. New Install or Option 11/11E Upgrade - From Software DaughterBoard
2. System Upgrade
3. Utilities
4. New System Installation - From Software Delivery Card
[q]uit, [h]elp or [?], <cr> - redisplay

Enter Selection : 3

Utilities Menu :
1. Restore Backed Up Database
2. Archive Database Utilities
3. Install Archived Database
4. Review Upgrade Information
5. Clear Upgrade Information
6. Undo Installation
7. Flash Boot ROM Utilities
8. Current Installation Summary
9. Change 3900 series set languages.
10. IP FPGA Utilities
[q]uit, [p]revious, [m]ain menu, [h]elp or [?], <cr> - redisplay

Enter Selection : 2

Customer Database Archives:
1. List customer databases.
2. Remove customer database.
3. Archive a customer database.
[q]uit, [p]revious, [m]ain menu, [h]elp or [?], <cr> - redisplay

Enter Selection : 3

I can't guarantee that this would work, but it's worth a try.

All the best

Firebird Scrambler
Meridian 1 / Succession and BCM / Norstar Programmer in the UK

http://www.linkedin.com/in/davidbromley

If it's working, then leave it alone!.

Firebird Scrambler · Nov 23, 2011

After you have pressed 3 as shown in the last thread, you should see the following below. The archive can take up to a minute to complete.

Enter Selection : 3

Enter a Customer name for your customized data :
24_11_11

Customer database created: 24_11_11

Copying database from primary drive to 24_11_11

Archive copy completed.

Customer Database Archives:
1. List customer databases.
2. Remove customer database.
3. Archive a customer database.
[q]uit, [p]revious, [m]ain menu, [h]elp or [?], <cr> - redisplay

Enter Selection : 1

Customer Database Archives available:
1. 24_11_11
2. SITEA

Customer Database Archives:
1. List customer databases.
2. Remove customer database.
3. Archive a customer database.
[q]uit, [p]revious, [m]ain menu, [h]elp or [?], <cr> - redisplay

Enter Selection : Q
Are you sure? (y/n/[a]bort) : Y

.****
>logo

All the best

Firebird Scrambler
Meridian 1 / Succession and BCM / Norstar Programmer in the UK

http://www.linkedin.com/in/davidbromley

If it's working, then leave it alone!.

Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

EDD007 - 11C Unable to EDD

absinthium

IS-IT--Management

Firebird Scrambler

Systems Engineer

absinthium

IS-IT--Management

bhassell2000

Technical User

KCFLHRC

Technical User

absinthium

IS-IT--Management

bhassell2000

Technical User

Firebird Scrambler

Systems Engineer

Firebird Scrambler

Systems Engineer

Similar threads

Part and Inventory Search

Sponsor