Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

  • Congratulations Mike Lewis on being selected by the Tek-Tips community for having the most helpful posts in the forums last week. Way to Go!

FTP Issues

Status
Not open for further replies.

DrB0b

IS-IT--Management
May 19, 2011
1,425
US
Hello all,
We are trying to figure out a FTP issue with three of our servers. Two are on 2008 and one on 2016 an they all have the same issue which just cropped up about 4 weeks ago. To my knowledge nothing has changed network wise and about a week ago I made the new FTP 2016 server just to see if it was a config or 2008 version issue and it has the same issue. The issue is random dropped connection from clients. It will work perfect about 80% of the time and then randomly they will receive a connection timeout and cannot either traverse folders or retrieve files. Im looking through the logs but am not 100% sure what Im looking for. If someone has a functioning FTP server and would post 30 lines of the log file just to compare to I would be very appreciative so I have something to compare against. Obviously remove the public IPs. Ill post a couple connections from a time it supposedly didnt work right and see if anyone sees anything. I tried googling a standard FTP log and had no luck.

This is supposed to be a time that they had an issue. I dont see anything wrong with the transaction.

2018-06-05 15:45:02 x.x.x.128 FTP2\CcFirearms x.x.x.22 21 PASS *** 230 0 0 faaf1373-de56-4f7c-ac3d-2f207f057dfe /
2018-06-05 15:45:02 x.x.x.128 FTP2\CcFirearms x.x.x.22 21 PASV - 227 0 0 faaf1373-de56-4f7c-ac3d-2f207f057dfe -
2018-06-05 15:45:02 x.x.x.128 FTP2\CcFirearms x.x.x.22 21 CWD Inventory/CcFirearms 250 0 0 faaf1373-de56-4f7c-ac3d-2f207f057dfe /Inventory/ClassicFirearms
2018-06-05 15:45:02 x.x.x.128 FTP2\CcFirearms x.x.x.22 21 TYPE A 200 0 0 faaf1373-de56-4f7c-ac3d-2f207f057dfe -
2018-06-05 15:45:02 x.x.x.128 FTP2\CcFirearms x.x.x.22 25188 DataChannelOpened - - 0 0 faaf1373-de56-4f7c-ac3d-2f207f057dfe -
2018-06-05 15:45:02 x.x.x.128 FTP2\CcFirearms x.x.x.22 25188 DataChannelClosed - - 0 0 faaf1373-de56-4f7c-ac3d-2f207f057dfe -
2018-06-05 15:45:02 x.x.x.128 FTP2\CcFirearms x.x.x.22 21 RETR liveinv.csv 226 0 0 faaf1373-de56-4f7c-ac3d-2f207f057dfe /Inventory/ClassicFirearms/liveinv.csv
2018-06-05 15:45:05 x.x.x.128 FTP2\CcFirearms x.x.x.22 21 QUIT - 221 0 0 faaf1373-de56-4f7c-ac3d-2f207f057dfe -
2018-06-05 15:45:05 x.x.x.128 FTP2\CcFirearms x.x.x.22 21 ControlChannelClosed - - 0 0 faaf1373-de56-4f7c-ac3d-2f207f057dfe -

As you can see we have the Passive ports set to 23000 - 26000 as evident in the 25188 port in the middle of the connection. Im in the process of wiresharking the incoming connections to see if I can see anything additional.

Learning - A never ending quest for knowledge usually attained by being thrown in a situation and told to fix it NOW.
 
If all three servers started having problems at the same time, it really does point to outside the servers, to some network infrastructure bringing traffic to them. Especially with them being on two different versions of Windows Server. It's the "three different servers at the same time" clue that sticks with me.

I would also look at patch and update schedules on the servers. Something like a patch could possibly mess up your FTP config, or a network driver, or who knows what. The fact that all three started having problems at the same time means you look for them to be patched about the same time. Or, was something installed on all servers at that time?

Also, if you can narrow down as close as you can to when it started, the Event logs may hold some hints. It's a lot to go through because there's a lot of noise, but if it is a server side problem, a lot of times there are Events logged that often lead to a solution.

And, Wireshark is a good place to start to gather more data.

 
Thanks SamBones,
The servers have WU turned off atm. Not good practice I know but we are in the process of upgrading their OS anyway. Im in the mindset that it is likely a networking issue as well but I cannot for the life of me think of what has changed. Ive created new firewall rules to allow all ports to pass through to our "test" FTP server from known good IPs and we are still having that issue. Tried disabling the Windows Firewall just in case. Essentially the server sits on a switch that is connected to the firewall so it only has one hop before it goes out to the world. I have analyzed some wireshark data and basically the connection just seems to stop between the two devices without any error or reason. Grabbing a straws and starting to run out of them....

Learning - A never ending quest for knowledge usually attained by being thrown in a situation and told to fix it NOW.
 
The only thing I notice in the logs is that when an issue seems to happen, they call PASV to enter passive mode and it seems like it just stalls there. As seen above in the code, it works 80% but will randomly stop usually on that call. Not sure if that is helpful at all.

Also, internally all FTP calls seem fine. External seems to be the kicker.

Learning - A never ending quest for knowledge usually attained by being thrown in a situation and told to fix it NOW.
 
Well, IMO, on a server you should never have automatic updates turned on. In my place of work all server patches are researched and vetted before being applied manually to the servers. Yes, it's a lot of work, but you don't lose servers for days at a time due to something stupid being pushed to them without your knowledge.

I think looking at firewalls is probably a dead end. If you're getting the initial connection, negotiating the log in, and starting the transfer, then the ports you need are there. Plus, since it's working "most" of the time (80% I think you said), then it's most likely NOT a firewall issue.

Have you noticed it to be more likely to fail if the file is very large? Or at a specific time of day? I don't know if you have the means to collect those kind of stats, but it might point toward possible problem gear. What I'm thinking with this one, most Windows network services do automatic throttling. That is they will start a file transfer at a pretty high rate, then if it's going longer, will back it off to a slower sustained rate. Things like routers and load balancers (especially load balancers) have to maintain session information for the connection. They can't maintain them forever, so they have a "time to live" (TTL) you can specify. I have seen it where a load balancer was dropping my apps connections when a very long running process was transferring something. I'm trying to be somewhat vague because these details vary widely from vendor to vendor and product to product. But the bottom line, I have seen connections to things like FTP getting dropped mid-transfer due to a load balancer TTL being too short.

I don't know how your connections between client and server are set up, but I would start looking there. If you have that kind of gear in your data center where the servers are, you start with your network guys. Just ask them what gear is between your servers and ask them if there is a configurable TTL, what it's set to. Once checking all that, I would contact your ISP, or however you are connected to your clients, and ask them what the network and routing values are. Specifically TTL on anything that maintains a session. Usually you can't talk to the actual network guys for your ISP, so you might get nothing buy marketing mumbo-jumbo, but you still need to try.

Hmm, I just went up and looked at the trace messages you posted and it's all within a few seconds. On that I'm not liking my TTL idea as much, btu I'll still post it in case it triggers an idea or insight.


And, the reason I asked about file size is, obviously, they take longer and could be hitting some kind of throttling. The reason I asked about time of day is, if it's mostly occurring during say a one hour window mid-day, then that may be a peak traffic time, which slows everything down. That makes it more likely that your hitting a low TTL somewhere.
 
SO a little Wireshark info. I can see that in the 20% failure window, it will always fail when the FTP passive mode is called. I see that the client calls for passive mode and the server responds but then there is never a reply. The server will send 6 more ACK requests which are never responded to and then it closes the connection. Then the client will try again and some of the time it will work normally and others it will fail again. See here with IPs redacted:
Untitled_pagrkd.png


Learning - A never ending quest for knowledge usually attained by being thrown in a situation and told to fix it NOW.
 
And I am the network dept for here. Basically our topology is as follows: ISP-firewall-switch-FTP server. It is obviously more complex than that but that is the path to and from the FTP server. The firewall is set to allow all connectivity on all ports from certain IPs so all traffic should be passing straight through. There is no load balancer in play. The switches do not have any funky static routing either.

Learning - A never ending quest for knowledge usually attained by being thrown in a situation and told to fix it NOW.
 
And like I said above, it isnt an issue with file transfers as it errors out just after making a successful login and moving to Passive Mode as seen above. The brain numbing thing is that 80% of the time that Passive call is handled correctly........

Learning - A never ending quest for knowledge usually attained by being thrown in a situation and told to fix it NOW.
 
Unsure of what changed initially but I have found a fix. Manually setting the FTP public IP address in the IIS FTP firewall settings allowed it to talk perfectly each time. We have been going this way for about 2 hours now with no issues so I think we can put this to bed. Thanks SamBones for letting me talk through this and to confirm some thoughts.

Learning - A never ending quest for knowledge usually attained by being thrown in a situation and told to fix it NOW.
 
Awesome! Glad I could help, even if it was just as a sounding board. [bigsmile]

 
Status
Not open for further replies.

Part and Inventory Search

Sponsor

Back
Top