Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

  • Congratulations biv343 on being selected by the Tek-Tips community for having the most helpful posts in the forums last week. Way to Go!

How to get better at preventing crashes? 1

Status
Not open for further replies.

rlawrence

Programmer
Sep 14, 2000
182
US
I have a few users of our software that complain about frequent crashes. I'm not talking about repeatable program errors. I'm talking about fatal errors where Windows says it has to shut an application down. Then you restart the program, perform the very same task and it works just fine.

One major culprit has been Norton software, but I have already asked and received some good suggestions about that. While Norton may be a major cause, it's not the only cause. When we get support calls for something like this, the general approach is to look for conflicts on the machine or network where our software is installed. Usually, we can come up with a plausible solution, or at least an approach to find the conflict, but it's certainly annoying to our users when this happens. And of course, it looks like our application is the problem to the user.

I'm wondering what other developers are doing in the way of programming and/or testing practices to try to minimize this. Are others experiencing similar behaviors? Our application is a fairly significant business management system--including order entry, accounting, inventory, etc. So, it's pretty substantial, and thus can consume significant resources on a machine.

I wasn't quite sure where to ask this question, but this forum seemed like the best fit. This is a pretty general coding/best practices issue. Any thoughts would be appreciated.

Thanks,

Ron Lawrence
 
Hi Ron,

I recently read an article about Industrial Light and Magic. On their campus, they have a cinema to finally test their special effects, where they are consumed: On the silver screen!

What does that have to do with your question? Well, it shows, one best practice is to have a final test where the application will run in the end.

So if you have such a valuable customer you may do excessive testing at their company within their network on one of their machines. Maybe not for free...

We have such a customer and one thing is, they have a development department themselves. And therefore they know about these demands and have every database twice, once for production, once for testing. The testdata is simply the replicated production data, so even buggy data can be tested.

Of course it would be too expensive and timeconsuming to do every test there, but apart from all unit tests et cetera we do here at our site beforehand, the final test is done there, after a scheme of test steps, which cover all functionality.

Of course you can't do that kind of testing for off-the-shelf or download software bought by individual customers all over the world, comparable to the final cinema audience. Maybe if there are millions of them. But for individual custom made critical software, your customer should have a budget for that, to prevent even higher costs, if employees can't work due to failures.

Bye, Olaf.
 
Good advice from Olaf but there are some things out of our control - the unpredictable C000005 errors for example. I've hit this several times over the years and each time I've hit it it's been due to a different cause. Microsoft have reduced this problem over the years - are you running VFP 9?

If you want some extreme advice on preventing bugs - try It's almost ten years old but reading it still makes me feel very inadequate.

Geoff Franklin
 
I read the article and found it very interesting. The one thing they don't suffer from is the intervention of humans, you know the type of people who press F3 10 times in succession to see what happens.


Keith
 
The one thing they don't suffer from is the intervention of humans

The other thing is that they've also got a decent budget for testing. they've got a budget for testing.

the type of people who press F3 10 times in succession to see what happens.

You've been watching me<g>.

Geoff Franklin
 
Yes, it's a good article. It points out many things that we already try to do. It also assumes NASA's budget. The reality is that we don't have those kinds of resources to do that kind of ritualistic planning and/or testing. On the other hand, I think our coding practices are pretty sound. Still, there is certainly room for improvement, and I guess that's what I'm trying to get at.

What I'm wondering about is whether anyone in this community has their arms around a few basic parameters. Things that go a little beyond the "quick software" solutions. Perhaps another way to think about it would be that if you were to categorize all the failures that cause applications (particularly Foxpro applications) to fail in the field, what would be the top five causes?

Given that we're talking about operating system failures, (generally I think of these as some sort of resource shortage--probably memory) what is it about Foxpro applications that are likely to cause Windows to pewk?

Some things that we have already found over the years:

If an index (CDX or IDX) file has been damaged, unbelievable things can happen. Of course, what causes the CDX to be damaged to begin with?

If Norton is running, watch out. That may be one of the causes to a damaged CDX. Norton seems to get between the application and the hard drive.

What I'm wondering about now is whether there is any common way of anticipating, say a shortage in available memory for an operation.

Thanks for your thoughts. Does anyone have any other thoughts along these lines?

Ron
 
Does anyone have any other thoughts along these lines?

It's difficult to know what to say but I'll start the discussion with some thoughts on data handling:

I always open a table when I need it and close it as soon as possible. I know it slows things down but it reduces the period when the table and index can be corrupted by network problems.

It took me a long time to realise that work area is a global variable with all that that entails. I now specify work area explicitly in commands like SEEK.

The same goes for the record number. I'll pass a primary key across and SEEK it at the start of a procedure rather than assume that I'm on the right record. And I always check whether it's been found.

Ignore all the above if coding for outright speed.

Every table has an integer as a surrogate primary key.

I'll avoid memo fields if at all possible. It's just something else to go wrong.

Give the user a utility to compact the data files and recreate (not rebuild) the indexes.

Geoff Franklin
 
Concerning C0000005 errors I could mostly identify the error being due to corrupt indexes or caused by false release of object references. I'd always first look at the stackinfo the C5 error message gives and see if there may be really some fault at that line.

But there are errors you even get less grip on: Errors, that you only detect, if you check for data inconsitencies. Even very strict referential integrity rules may not prevent such inconsistencies. As those errors don't show up as messages, they are very hard to find, the complexer the system get's. Early detecting these kind of errors demand detailed test cases and expected results. Unfortunately often enough it's hard to get these kind of informations of a customer.

Bye, Olaf.
 
I always open a table when I need it and close it as soon as possible. I know it slows things down but it reduces the period when the table and index can be corrupted by network problems.

Interesting. I am in the process of a major architectural change. One of the changes is to move from a model where all tables are open to each table being open only as long as necessary for the operation at hand.

It took me a long time to realise that work area is a global variable with all that that entails. I now specify work area explicitly in commands like SEEK.

How do you think that specifying the work area helps with reliability?

The same goes for the record number. I'll pass a primary key across and SEEK it at the start of a procedure rather than assume that I'm on the right record. And I always check whether it's been found.

With your first practice, I think this must be essential. I have been experimenting with retaining the record number and relocating the record rather than performing a seek. In the end, it may be that this will introduce more harm than good. It's a localized routine, however. So, if I want to change it, I can pretty well change it in one place.

Ignore all the above if coding for outright speed.

Speed is an interesting topic by itself. As you have pointed out, reliability and speed may be at odds. Unless you work for NASA, reliability is probably the better choice in a lot of cases.

Every table has an integer as a surrogate primary key.

I'm probably not as zealous about this, but I understand and agree with the practice. A notable exception: As I work out the relationships in a database, frequently there are tables that resolve many-to-many relationships between entities.

I'll avoid memo fields if at all possible. It's just something else to go wrong.

Hmmm... The memo field seems like quite a useful feature. If the nature of the data I am storing is truly a variable length, I wouldn't hesitate to use a memo field. I have noticed, however, that like index files, memo files can get clobbered if something goes wrong. I've had to provide a utility with my application that uses Foxpro's own ability to recognize and fix a damaged memo file.

Give the user a utility to compact the data files and recreate (not rebuild) the indexes.

Yup. Referring to your last point, and to this general topic, packing and regenerating indices (not reindexing) are the two main tools I have to test of the overall health of the users database.

Thanks Geoff for taking the time to respond seriously. I'm not sure that we've uncovered anything revolutionary, but it's good review for myself and for others I hope.

I'm still digging though. I have a couple hundred installations--most of which I never hear about. There are a few, however, that seem to have constant problems. Clearly, there are issues that are specific to the installation. I'm searching for things I can do in my software that will make it more resilient to those kinds of issues.

Ron
 
How do you think that specifying the work area helps with reliability?

It doesn't help reliability as such but it does prevent me making stupid mistakes in maintenance.

Typical scenario is to list the customers in a grid on one form and pop up the full details in a separate form. All works well but there are (at least) two ways I can break this in maintenance.

One way is to add a drop-down list of some sort to the Customer Detail form and to populate it from a cursor in the form's Init. All of a sudden I'm in the wrong work area. I know I could avoid this by driving the drop-down from an array instead but if I use an array then I've got to write code for the case where the SQL selects no records and creates no array.

The other way is to realise that I need to show the customer details from another form which is showing a list of invoices and of course I don't have the right table open. I've then got to write code to open the Customer table and Seek the CustID in this situation so I might as well put that code into the Customer Detail form from day 1. I then know I've got a form which is self-contained and can be used anywhere in the app. I can even call it twice from the browse screen and display details for two customers at once.

I wouldn't hesitate to use a memo field.
I used to worry about wasting disk space when I defined a C(255) field for comments which were going to be variable length but it doesn't seem so important these days. Not having a memo field means I don't have to remember to add the MEMO clause to commands like SCATTER and I don't have to worry about memo bloat.

Thanks Geoff for taking the time to respond seriously
A pleasure. I'm having a day off and it's good to sit back and think.


Geoff Franklin
 
Olaf,

Thanks for your thoughts...

Concerning C0000005 errors I could mostly identify the error being due to corrupt indexes or caused by false release of object references. I'd always first look at the stackinfo the C5 error message gives and see if there may be really some fault at that line.

I have to admit that I don't know how to interpret the stackinfo. What do you look for?

But there are errors you even get less grip on: Errors, that you only detect, if you check for data inconsitencies...

Are you referring to referential integity? As a hedge against these types of errors, I have found myself providing a slew of data integrity tests. How do you handle such problems? Even after several years, I'm still finding that I really should have more tests. These types of problems don't typically cause a crash--though I believe that a crash may result in referential integrity problems.

Best wishes,

Ron
 
rlawrence said:
I have to admit that I don't know how to interpret the stackinfo. What do you look for?

Well, how about the line numbers? Within the message you should find somthing like "called by - <object>.<method> line 4"...

Do you include Debuginfo into your exe? I think it's worth the extra size for the line numbers you get out of errorhandling. Maybe if you don't, there's not much you can read out of C5 error messages.

Concerning the other errors: Mustn't be referential integrity errors alone. Integrity can be allright, but still data would be contradictory. Or data within a long hierarchy 1:n:m:eek::p has a "referential bug", that can't be tested with each single table trigger alone.

You are right about data integrity tests. Unfortunately mostly only after you come across an error you know which test you have forgotten to provide. I'd put a smiley here if it wasn't so sad. A very normalized set of tables normally assures best, you don't have contradictory data, but that's only half the solution to these kind of problems.

Bye, Olaf.
 
Based on your description, I'd be very suspicious of hardware. Check the network cables. Make sure the machines in question have a clean, stable power source. Etc., etc.

Also, the wiki has lots of good stuff on C5's and corruption and so forth. Here are a few links:



Tamar
 
Status
Not open for further replies.

Part and Inventory Search

Sponsor

Back
Top