Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

  • Congratulations strongm on being selected by the Tek-Tips community for having the most helpful posts in the forums last week. Way to Go!

FTP List parse 'challenge'

Status
Not open for further replies.

AndyGroom

Programmer
May 23, 2001
972
GB
I wrote an FTP package ages ago but some long-standing bugs are making it unusable. The problem is parsing the information returned by the LIST command because there is no hard and fast rule on how it should be formatted.

For any given LIST of information I need to extract the filename, date and size.

Common responses to the LIST command are shown below. Although not apparent from these examples, filenames often contain spaces and may contain date information (Eg. "Financial Report July 2010.xls") and as far as I know there is no way to establish the o/s of the server in order to home-in on the right formatting. Any suggestions on how to parse it?

Code:
    ' UNIX-style listing
    ' "-rw-r--r--   1 root     other        531 Jan 29 03:26 README"
    ' "dr-xr-xr-x   2 root     other        512 Apr  8  1994 etc"
    ' "dr-xr-xr-x   2 root     512 Apr  8  1994 etc"
    ' "lrwxrwxrwx   1 root     other          7 Jan 25 00:17 bin -> usr/bin"
    ' "-rw-r--r--   1 root     root       46508 Dec  8  2006 2006shoot.jpg"
    ' UNIX ls does not show the year for dates in the last six months. So we have to guess the year.
    
    ' Microsoft's FTP servers for Windows:
    ' "----------   1 owner    group         1803128 Jul 10 10:18 ls-lR.Z"
    ' "d---------   1 owner    group               0 May  9 19:45 Softlib"
    
    ' WFTPD for MSDOS:
    ' "-rwxrwxrwx   1 noone    nogroup      322 Aug 19  1996 message.ftp"
    
    ' NetWare:
    ' "d [R----F--] supervisor            512       Jan 16 18:53    login"
    ' "- [R----F--] rhesus             214059       Oct 20 15:27    cx.exe"
    
    ' NetPresenz for the Mac:
    ' "-------r--         326  1391972  1392298 Nov 22  1995 MegaPhone.sit"
    ' "drwxrwxr-x               folder        2 May 10  1996 network"
    
    ' MultiNet (some spaces removed from examples)
    ' "00README.TXT;1      2 30-DEC-1996 17:44 [SYSTEM] (RWED,RWED,RE,RE)"
    ' "CORE.DIR;1          1  8-SEP-1996 16:09 [SYSTEM] (RWE,RWE,RE,RE)"
  
    ' and non-MutliNet VMS:
    ' "CII-MANUAL.TEX;1  213/216  29-JAN-1996 03:33:12  [ANONYMOU,ANONYMOUS]   (RWED,RWED,,)"
    
    ' MSDOS format
    ' 04-27-00  09:09PM       <DIR>          licensed
    ' 07-18-00  10:16AM       <DIR>          pub
    ' 04-14-00  03:47PM                  589 readme.htm

- Andy
___________________________________________________________________
If you think nobody cares you're alive, try missing a couple of mortgage payments
 
Why not issue NLST, followed by MDTM and SIZE for each entry in the returned list?
 
NLIST just returns Unknown Command on Unix using Pure-FTPd as the FTP client (I don't currently have other platforms to try with).

- Andy
___________________________________________________________________
If you think nobody cares you're alive, try missing a couple of mortgage payments
 
You almost have to do that (NLST plus), or else compile a set of parsing templates for every host type you want to support.

I work with one host that even returns two different LIST report formats that are wildly different. This depends on an option set at the host: a "native" format and a Unix-like "standard" format. Some sites set one as global default, some set the other. Some users set one as default in their FTP settings on the host, some set the other.

These are meant to be read by a person, interactively. They were never meant to be parsed as part of an automated process. Many OSs do not use a filesystem naming hierarchy oriented around folders and files, and these can get tricky. Many of them don't have a native concept of "current directory" let alone the ability to change it to descend a hierarchy. When their FTP server tries to simulate such a thing the results can be very confusing.

We get lots of calls due to this very thing from frustrated users who try to use GUI FTP clients. These tend to push the "folder tree" metaphor too far, but mostly struggle parsing LIST reports at all.
 
Status
Not open for further replies.

Part and Inventory Search

Sponsor

Back
Top