What do you mean by "evaluation criteria"? You seem to imply that you want to test the "strength" of the firewall as opposed to full product selection criteria, which would include things like:
- scalability
- resilience
- performance
- vendor 'trustworthyness'
- vendor stability & longevity
- vendor support quality
- pricing models
- interoperability
- manageability
- ease of use
- deployment models
- etc
If you want to test the quality of the security then you probably need to target common exploits for the technology stack (e.g. for that product, range, or OS if based on linux for example) and then create a target plan for any rulesets you may commonly define. Also black box test the whole setup in a test lab (e.g. a mini network for this PoC) by adding common services / OS / apps behind the firewall and then trying to penetrate those without any preconceived knowledge of the firewall. There are quite a few tools for scanning/probing/testing, and even things like the metasploit framework that may help.
As a PoC, if you have the funds, why not set up a range of servers in a mininetwork/lab, ask the firewall vendors to 'secure' it with their firewall whilst allowing required services, then get an external security testing company to get passed the firewall and into your systems - with a focus on the firewall. There will be many other weaknesses other than the firewall of course, so you would need to filter out what was relevant. But a fun, competitive activity either way
=======================================
LessThanDot - The IT Community of the 21st Century
A smile is worth a thousand kind words. So smile, it's easy! 