Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

  • Congratulations strongm on being selected by the Tek-Tips community for having the most helpful posts in the forums last week. Way to Go!

implementing a hash table

Status
Not open for further replies.

hpseries

Programmer
Feb 1, 2008
1
FR
Hi all,

I want to implement a sort of hash table in C. I have an array which stores 4 different characters at different positions. I need to make the 4 characters the key and their positions the value in hash table. Like shown below.

int arr[1000] = {a, a, t, g, t , t, a, c, c, g, a, t, g, c }

My hash will be like this:

a: 0, 1, 6, 10
c: 7, 8, 13
g: 3, 9, 12
t: 2, 4, 5, 11

Can anyone help me that which data structures to use. arr is going to contain DNA sequences so any data structures suggested will have to be allocated emory dynamically.
 
How are you going to use it? Are you going to look for

1) all as, cs, gs and ts
2) some sequence of as, cs, gs and ts
3) whether an a, c, g or t is in a particular position

In some cases a hash table may not be the correct structure to use.
 
Single character keys aren't a workable method. Either a tree of a linked list would work. A tree would be better.
 
Rereading your original message, I don't think you mean a hash table at all...

Do you just want 4 tables, each containing the positions of A, C, T, G in order?

If so, there are all sorts of possible structures. If you know your sequence is never going to be more than 1000 characters (as defined) then you could even just make 4 arrays of integers, or a 2d array 4 by 1000.

Linked lists are instantly expandable, but slower than arrays. If you need something that can be matched to any sequence entered, of any length, but which won't be expanded very often, just count the occurances of each base, and allocate an array big enough for the situation dynamically.

<an extra bit:> A conventional hash-table (to my understanding anyway) is a single-dimensioned array of information, accessed via a "key". The key is a number representing the situation for which you are looking, but is not necessarily unique to that situation. For example, if you are looking for chess positions in a game, the key represents the position on the board, but always there will be many more positions than can be stored, and typically there will be more positions than can be coded in a single key. Therefore a hash table is a quick way to retrieve data about a position, but not a totally safe way. You may occasionally get the data for the wrong position (but of course you minimise the risk by (a) choosing a sensibly large table, and (b) storing a very large spare key with each piece of data, so if you get the wrong spare key back, you know you got the wrong data).
 
On second thought you could use a chained hashtable for this, but the buckets would get big quickly.

Quick example:
Code:
#include <stdio.h>
#include <stdlib.h>
#include <string.h>

#define IMEMBERS 200
#define HASHLEN 211

struct hash_el_ {
    int pos[IMEMBERS];
    struct hash_el_ *nxt;
};

typedef struct hash_ {
    int numelements;
    struct hash_el_ *head;
    struct hash_el_ *tail;
} HASH;

void walktable(HASH * h)
{
    int y = 0;
    struct hash_el_ *tmp;
    for (y = 0; y < HASHLEN; y++) {
	if (h[y].numelements > 0) {
	    tmp = h[y].head;
	    printf("Node at indice %d has %d elements.\n", y,
		   h[y].numelements);
	    while (tmp != NULL) {
		printf
		    ("Next list element for table at %p, indice %d = %p\n",
		     &h[y], y, tmp);
		tmp = tmp->nxt;
	    }
	}
    }
}

int h1(int a)
{
    return (a + 31) % HASHLEN;
}

HASH *init_table(void)
{
    int i = 0;
    HASH *node;
    node = malloc(sizeof(*node) * HASHLEN);
    if (node == NULL) {
	return NULL;
    }

    for (i = 0; i < HASHLEN; i++) {
	node[i].numelements = 0;
	node[i].head = NULL;
	node[i].tail = NULL;
    }
    return node;
}

struct hash_el_ *append_table(HASH * h, int k)
{
    int p = 0, hval = 0;
    struct hash_el_ *new, *pt;

    hval = h1(k);
    if (h[hval].numelements == 0) {
	if ((new = malloc(sizeof(new))) == NULL) {
	    return NULL;
	}
	h[hval].head = new;
	h[hval].tail = new;
	new->nxt = NULL;
	h[hval].numelements++;
	return new;
    } else {
	if ((new = malloc(sizeof(new))) == NULL) {
	    return NULL;
	}
	new->nxt = NULL;
	h[hval].tail->nxt = new;
	h[hval].tail = new;
	h[hval].numelements++;
	return new;
    }
}

void destroy_table(HASH * h)
{
    int y = 0;
    struct hash_el_ *pre, *tmp;

    for (y = 0; y < HASHLEN; y++) {
	if (h[y].head != NULL) {
	    tmp = pre = h[y].head;
	    while (tmp != NULL) {
		tmp = tmp->nxt;
		free(pre);
		pre = tmp;
	    }
	}
    }
    free(h);
}


void addstuff(HASH * h)
{
    int i;
    for (i = 'a'; i < 'e'; i++) {
	printf("append_table returned %p\n", append_table(h, i));
    }
}

int main(void)
{
    int i = 0;
    HASH *newtable;

    if ((newtable = init_table()) == NULL) {
	return 1;
    }
    for (i = 0; i < 10; i++) {
	addstuff(newtable);
    }
    walktable(newtable);
    destroy_table(newtable);
    return 0;
}
 
Status
Not open for further replies.

Part and Inventory Search

Sponsor

Back
Top