Unique sorting of char * data in linked list. 2

marsd · Feb 6, 2004

I would like to be able to find the most efficient
algo for removing duplicates from an unsorted simple
linked list with the datatype being a 'string'.

Any examples, or references?

cdlvj · Feb 6, 2004

Can you do this in C++, as you could use one of the many collection classes.

You could use the collection with unique keys, pull out the dups, and get them sorted.

marsd · Feb 6, 2004

No, I need a C solution.

victorv · Feb 7, 2004

hi,
I will follow this strategy:

1) Load the list in a array of structure
{
int iOriginalPosition ; // progressive
BOOL bDuplicated; // FALSE initialize
char szData[MAXLEN]; // your data
}

2) qsort it on szData field

3) perform a loop on sorted array flagging duplicated record

4) qsort it on bDuplicated then iOriginalPosition fields

5a) loop the arry to skip the dup record

5b) arrived to nodup record, these are in the original order
and you can rebuild the linked list without dups.

bye

Salem · Feb 7, 2004

Here's something I just did 'for fun'

Code:

#include <stdio.h>
#include <stdlib.h>
#include <string.h>

typedef struct node_tag {
    char            *string;
    struct node_tag *next;
} node;

/* allocate and free list nodes */
/* Add NULL checks to suit */
node *newnode ( char *string ) {
    node    *result = malloc( sizeof(*result) );
    if ( result ) {
        result->string = malloc( strlen(string)+1 );
        if ( result->string ) {
            strcpy( result->string, string );
        }
    }
    return result;
}
void delnode ( node *node ) {
    free( node->string );
    free( node );
}

/* list append */
node *append ( node *head, char *string ) {
    node    *new = newnode( string );
    if ( head == NULL ) {
        head = new;
    } else {
        node    *temp = head;
        while ( temp->next != NULL ) temp = temp->next;
        temp->next = new;
    }
    return head;
}

/* delete duplicates from a sub-list
 * return the shortened list
 */
node *del_sublist_dups ( node *head, char *match ) {
    node *this = head;  /* runs down the list */
    node *prev = NULL;  /* trails behind this by one node */
    while ( this != NULL ) {
        node *next = this->next;    /* the next node to link to and goto */
        if ( strcmp( this->string, match ) == 0 ) {
            /* delete the node */
            if ( prev == NULL ) {
                /* the first node being a special case */
                head = next;
            } else {
                /* otherwise make the list skip the current node */
                prev->next = next;
            }
            delnode( this );
        } else {
            prev = this;
        }
        this = next;
    }
    return head;
}

/* delete dups in a list
 * NB: the head element cannot change
 */
void del_dups ( node *head ) {
    while ( head != NULL ) {
        head->next = del_sublist_dups( head->next, head->string );
        head = head->next;
    }
}

void print_list ( node *list ) {
    while ( list != NULL ) {
        printf( &quot;%s &quot;, list->string );
        list = list->next;
    }
    printf( &quot;\n&quot; );
}

void free_list ( node *list ) {
    while ( list != NULL ) {
        node *next = list->next;
        delnode( list );
        list = next;
    }
}

int main (void) {
    node *list = NULL;
    list = append( list, &quot;the&quot; );
    list = append( list, &quot;the&quot; );   /* example at the head of a sub-list */
    list = append( list, &quot;cat&quot; );
    list = append( list, &quot;sat&quot; );
    list = append( list, &quot;on&quot; );
    list = append( list, &quot;the&quot; );   /* example in the middle of a sub-list */
    list = append( list, &quot;mat&quot; );
    print_list( list );
    del_dups( list );
    print_list( list );
    free_list( list );
    return 0;
}

Despite all the operations in victorv's solution, it is still an O(N log(N)) solution, whereas mine is I think O(N*N/2)

--

marsd · Feb 7, 2004

I just needed a start. Got a handle on it now ,thanks.

lionelhill · Feb 9, 2004

Thing that appeals to me on the basis of a good 10 seconds' thought is to run through the list once creating a hash-key for each string, and inserting the hash-key in a list you make as you go along, already sorted. Then at the end you can check the sorted hash-key list for duplicates, and where you find them, check the corresponding strings really are duplicates before rejection.
This shouldn't be too bad a solution because the first and last steps are linear, and creating the hash-key list shouldn't be worse than NlogN (i.e. you can binary search it for insertion place, and it starts off small so the first searches are not long).

Pappy1942 · Feb 9, 2004

Hi,
Why not start at first item in the list and compare it to each successive item ,removing dup's as you go along, until you get to the end of the list? Then do the same to the second item and so on.
If the list does not need to be sorted, this seems the easiest way.

Pappy
You learn something new everyday.

marsd · Feb 9, 2004

I used a simple recursive search of the list
finding duplicates based on a unique index number
in each list 'link'. Very simple and fast.
The sorting is a secondary concern though it
definitely is the more problematic of the two
operations.

lionelhill · Feb 13, 2004

Pappy, possible answer: because that approach would get slower as the square of the number of items in the list, which means it might look very nice on a small test set and grind to a horrible halt when a real user tries it on a real, larger-than-envisaged data set. It all depends on the final purpose and context. You might be right.

Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

Unique sorting of char * data in linked list. 2

marsd

IS-IT--Management

cdlvj

MIS

marsd

IS-IT--Management

victorv

Vendor

Salem

Programmer

marsd

IS-IT--Management

lionelhill

Technical User

Pappy1942

Technical User

marsd

IS-IT--Management

lionelhill

Technical User

Similar threads

Part and Inventory Search

Sponsor