Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

  • Congratulations Chris Miller on being selected by the Tek-Tips community for having the most helpful posts in the forums last week. Way to Go!

Poor List performance - append() is too slow

Status
Not open for further replies.

ITVictim

Programmer
Jul 22, 2005
5
FR
I have a pretty basic problem: I've got a list of around 15000 (small) objects, which I am trying to store. I originally just appended them to a list, but this took 100% of the CPU and basically didn't finish. On probing, it looks as though the time it takes to append something to a list goes up exponentially with the length of the list, and once the length is more than 10,000 or so, it becomes a problem.

This presumeably is because Python uses linked-lists, but there doesn't appear to be an alternative - the arrays can only be used for simple data types, not objects.

Can anyone advise?

Mike
 
Running Python 2.4 under linux I can do a million appends in a little over a second on my 1.8GHz AMD and 10 million takes 11 seconds. Hardly exponential and hardly the poor performance you're getting.

Code:
import time
start = time.time()

list = []
for thing in range(10000000):
    list.append( thing )

print len( list )
print time.time() - start

output:
Code:
10000000
11.5938689709

Can you isolate the problem to less than ten lines of code and post it along with timings?
 
Eric,

Thanks you for you post. I've had an experiment with simpler cases, trying to isolate the problem, and what I think it is is the size of the object you are trying to append. For small objects such as integers, the size the list has to be before encountering this problem is huge. In my case, I was appending fairly large objects into the list (say a page of Emacs).

I've got a simple example here, and the times - which definitely aren't linear:

Code:
import time

class DummyObject:
    def __init__(self):
        directory = "C://"

i = 10000000

start = time.time()
list = []
for i in range(i):
    list.append(DummyObject())

print len( list )
print time.time() - start

The times on my machine are:
i time
10000 0.03100
100000 0.42199
1000000 10.8439
2000000 36.8440
5000000 Crashes after approximately 5 minutes with a MemoryError

In my application, I've actually replaced the list with a dictionary of dictionaries, so the 24,000 elements are split into 10 dictionary entries, the largest of which is a dictionary of 7500 BDF card objects. The performance of this is OK, although not as good as what I would have expected in LISP, which is where I 'came from'.

Am I just expecting too much comparing Python to a specialised list processing language such as LISP?

Regards,

Mike
 
It sounds like the issue is not the list managment, but memory allocation routines. When you append an item to a list, I believe you are making a copy of the object.

Even if not, what if you just take the list append() out of the mix and see if the problem is the memory allocation outside of the list. I ran my little test appending a string rather than a small int and it increased the running time.

I also tried adding objects to the list and, as expected, the time to instantiate the object took a serious toll on the execution time.

I guess I don't have an answer for you. I, mysqlf, wouldn't imagine designing a program that stored hundreds of thousands of objects in a list, I would expect the performance to be abyssmal.

 
Unfortunately, needs must as the devil drives, as they say. The necessity is to read in a Nastran bulk data file which contains 24000 elements, so I'm going to be handling a lot of data whatever happens.

You may well be right about the memory management. I'm going to have to look into this some more...
 
I always mocked perl programmers when the first thing they did in their program was slurp in an entire file (there was even a function called slurp(), as to encourage bad programming practices from the get go). I'd immediately create a file that was too large to fit in memory and show them how their script went down in flames.
 
Mike,

Sorry to join the party so late, but it sounds like you have an interesting practical problem, and I would like to give it a shot.

Several things come to mind:
-> What do you need the list for - to do a lookup?
-> Further to the point above, do you *need* the list to be in-memory? If not, you could use ZODB, the free standalone OODB of Python, to store the 24000 elements (IMHO!)
-> What about PyTables? That sounds like it is in line with what you want

Best wishes,

- Problemsolver
 
Status
Not open for further replies.

Part and Inventory Search

Sponsor

Back
Top