Please help…
My company currently has 4 locations on our frame relay network that connect to our main office. Each remote location has a frame relay circuit with a 512K Port speed. The main office has a full T1. The frame is provided by AT&T and all circuits have a CIR that equals the port speed.
The traffic at all remote locations typically consists of a few SSH sessions to an AIX host at the Main Office, Browser based applications and the occasional print job generated from the main office and sent to a remote printer. Occasionally during the day an imaging application is used to capture "Tickets" locally on a Windows XP Pro system and transmit them to a Windows 2003 Server at the main office.
Starting about three weeks ago one location (Location X) started having errors with the imaging application. When saving a batch after capturing or making corrections they would occasionally receive errors from the application. The application did not consistently generate an error nor was the error always the same. The messages seemed to indicate a communication problem with the main server.
The following list has the most common messages:
"The semaphore timeout period has expired"
"The specified network name is no longer available"
The path is too deep"
I was initially unable to recreate the problem outside the imaging application, I soon discovered that if I copied files to the Main Office that the copies would occasionally fail with the same error messages that the imaging application was generating. Interestingly I did not experience the same problems copying files from the Main Office.
In an attempt to isolate the problem I have taken the following steps:
Updated all drivers and software on PC at Location X - Errors
Used a PC known to operate without errors from another location at Location X – Errors
Used the PC from Location X at another site – No Errors
Replaced Network Switch, Router and CSU – Errors
Recertified network drops and replaced patch cords – Errors
Moved PC to certified network drop in another area of the building - Errors
Installed Voltage Regulating UPS at Location X and monitored for power anomalies (None found) – Errors
Reviewed differences between Location X 1841 router and routers at other location, updated to same version of IOS – Errors
Corrected issue with minor T1 errors at Main Office – Errors
Corrected issue with high number of Frame-Relay BECN's on Router at Main Office by implementing traffic shaping for remote VC's – Errors
Based on these steps I came to the following conclusions:
As all hardware with the exception of the Telco Equipment (Smart Jack) has been replaced; the issue is not Network Equipment related.
As the PC in question operates correctly at an identically configured location; the issue is not software or driver related.
As all other locations are not experiencing the problem; the equipment and configurations at the Main Office are not the cause.
This has left me to investigating the Frame Relay circuit. I had AT&T test the T1 and associated frame relay circuit. They are not reporting any problems. I have checked the statistics on both Routers (Including CSU stats) and am not seeing any obvious errors.
After using Ethereal to capture traffic from both the server and the client I discovered a number of "TCP Duplicate ACK", "TCP Retransmission" and "TCP Previous Segment Lost" conditions. These error conditions do not seem to occur at another location.
My company currently has 4 locations on our frame relay network that connect to our main office. Each remote location has a frame relay circuit with a 512K Port speed. The main office has a full T1. The frame is provided by AT&T and all circuits have a CIR that equals the port speed.
The traffic at all remote locations typically consists of a few SSH sessions to an AIX host at the Main Office, Browser based applications and the occasional print job generated from the main office and sent to a remote printer. Occasionally during the day an imaging application is used to capture "Tickets" locally on a Windows XP Pro system and transmit them to a Windows 2003 Server at the main office.
Starting about three weeks ago one location (Location X) started having errors with the imaging application. When saving a batch after capturing or making corrections they would occasionally receive errors from the application. The application did not consistently generate an error nor was the error always the same. The messages seemed to indicate a communication problem with the main server.
The following list has the most common messages:
"The semaphore timeout period has expired"
"The specified network name is no longer available"
The path is too deep"
I was initially unable to recreate the problem outside the imaging application, I soon discovered that if I copied files to the Main Office that the copies would occasionally fail with the same error messages that the imaging application was generating. Interestingly I did not experience the same problems copying files from the Main Office.
In an attempt to isolate the problem I have taken the following steps:
Updated all drivers and software on PC at Location X - Errors
Used a PC known to operate without errors from another location at Location X – Errors
Used the PC from Location X at another site – No Errors
Replaced Network Switch, Router and CSU – Errors
Recertified network drops and replaced patch cords – Errors
Moved PC to certified network drop in another area of the building - Errors
Installed Voltage Regulating UPS at Location X and monitored for power anomalies (None found) – Errors
Reviewed differences between Location X 1841 router and routers at other location, updated to same version of IOS – Errors
Corrected issue with minor T1 errors at Main Office – Errors
Corrected issue with high number of Frame-Relay BECN's on Router at Main Office by implementing traffic shaping for remote VC's – Errors
Based on these steps I came to the following conclusions:
As all hardware with the exception of the Telco Equipment (Smart Jack) has been replaced; the issue is not Network Equipment related.
As the PC in question operates correctly at an identically configured location; the issue is not software or driver related.
As all other locations are not experiencing the problem; the equipment and configurations at the Main Office are not the cause.
This has left me to investigating the Frame Relay circuit. I had AT&T test the T1 and associated frame relay circuit. They are not reporting any problems. I have checked the statistics on both Routers (Including CSU stats) and am not seeing any obvious errors.
After using Ethereal to capture traffic from both the server and the client I discovered a number of "TCP Duplicate ACK", "TCP Retransmission" and "TCP Previous Segment Lost" conditions. These error conditions do not seem to occur at another location.