Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

  • Congratulations IamaSherpa on being selected by the Tek-Tips community for having the most helpful posts in the forums last week. Way to Go!

PLEASE HELP ME WITH ANN PROBLEM

Status
Not open for further replies.

drepac

Programmer
Jan 9, 2006
4
CA
Hi Guys,

I kindly ask for your help with regards to my DM project. I am working on a project that is related to the field of agriculture and that has as an objective to find the "optimal values" of the operating conditions that affect the outcome (the amount of meat produced i.e. the weight) of an animal production (chicken broilers in my case). To do so, I have to use historical data of previous productions as my training dataset. The length a production cycle is typically around 44 days. For each production, a data acquisition system stores the real-time and historical data of hundreds of parameters. These parameters represent sensor measurements of all the operating conditions (current temperature, set point temperature, humidity, static pressure, etc...) and these are what I refer to as the inputs. The operating costs and the production outcome are what I refer to as outputs. The operating cost is indirectly computed from parameters like water consumption, feed consumption, heater/cooling runtimes, and lighting runtime; and the outcome of a production is defined by parameters like animal mortality and conversion factor (amount of feed in Lbs to produce 1Lb of meat). So the main objective of this project is to find the set of “optimal daily values” (1value/day) for the inputs that would minimize the operating costs and conversion ratio outputs.
The biggest problem I am facing right now is the following: The historical data that I have in the DB are time series for each measured parameter. Some of these time series follow some kind of cyclic pattern (e.g. daily water/feed consumption …) while others follow an increasing/decreasing trend (animal weight, total heater run time, total water/feed consumption…..). My goal is to be able to come up with a model that suggests a set of curves for the optimal daily values throughout the length of the production cycle, one curve for each measured input/output parameter. This model would allow the farmer to closely monitor his production on a daily basis to make sure his production parameters follow the “optimal curves” suggested by my model. I have looked at ANN and I think it might be the solution to my problem since it allows to model multiple input/outputs problems (Am I wrong?), but I could not figure out a way to model the inputs/outputs as time series (an array of values for each parameter). As far as I know, all kinds of classifiers accept only single valued samples.
One approach would be to create one classifier/day (e.g. for day1: extract a single value for each parameter and use these values as a training sample and repeat this for all previous production to construct the training set). The problem with this approach is that 44 or so classifiers will be constructed (hard to manage all of this) and each of these resulting ANN will be some kind of “typical average” of the training data but not necessarily the “optimal values” leading to the best production outcome, if I am not mistaken.
Another approach would be to find a way to feed in the inputs and outputs as time series (an array of 44 daily values for each input/output parameter). In this case, there would be only one resulting ANN and the training samples, would be a set of arrays for each parameter, as opposed to single daily parameter values in the first case. The problem is, I could not find any classifier that would allow me to do that.

Another issue that I have is the amount of data. While a single production cycle could represent 1-2GB of data, the length of the production cycle (44 days) makes it difficult to have 100’s of production cycle historical data, as I could gather data for no more than 7 full cycles/year. Fortunately, a farm can have many production units (5-10 barns/site in big sites), so this makes it possible to have 40-70 cycles/yr. My question is: would this be enough to come up with an acceptably accurate model or is it necessary to have hundreds of samples?

Thanks for taking the time to reading this lengthy e-mail, and I really appreciate your help and thank you in advance.

Cheers.
 
I think your model building would be better suited for ARMA or ARIMA (Auto Regressive (Integrated) Moving Average) type models. I'd look at Time Series Data models such as Box-Jenkins, etc. SAS offeres some of these models. Many of these Time Series models permit cyclic activities as well as linear and non-linear trends.

Data Mining is useful to identify factors which might be worthy of consideration in model building. You have really gone beyond data mining and into model building.

I would build the model based on 50-70% of your data sets. Then use the other 30-50% to validate the model.

-------------------------
The trouble with doing something right the first time is that nobody appreciates how difficult it was - Steven Wright
 
Hi John and thanks for the reply,

Maybe I was not precise enough last time, so I am going to decribe things more clearly with examples.

Since the produced model will be used as part of an alarm/flagging system, I will have to produce a curve of each of the parameters of interest using 4 values/day=once/6h, and do this for the 44 days, this is to flag and correct any abnormal behaviour ASAP. So, the hole
curve would have 4*44=176 values. E.g. for the water consumption curve: day1: 12AM=65Gal, 6AM=150, 12PM ... DAY44=6PM=1500Gal. I would have to come up with similar curves for each of the parameters of interest (inputs/outputs). Now as far as ANNs are concerned, do I have to produce 176 of these ANNs, one for each predicted value? ANN1: input1 (temperature-value Day1@12AM) input2 (humidity-value Day1@12AM)... output1 (feed onsumption-value Day1@12AM), output2 (heater_runtime-values Day1@12AM)... and train the ANN with the 50-60 samples (Day1@12AM) from previous productions. This would produce an ANN for predicting the value of each parameter for Day1@12AM for
future productions, etc.... This would quite intensive
computationally, so I am wondering if there is a better way to maybe feed-in all the 176 values time series in one shot to have something like input1(temperature-values 1-176), input2(humidity-values 1-176)... output1(feed consumption-values 1-176), output2 (heater runtime-values 1-175)... and this will produce only one ANN which will predict the 176 values for all parameters of future productions?

I would really appreciate your help as I am really stuck at this.

Cheers.
 
Status
Not open for further replies.

Part and Inventory Search

Sponsor

Back
Top