Using the regexp package from jakarta, I am implementing a profanity filter for a application I am working on. Yes, I could have done this using normal string checks, but this will get used by a file load process, that may run it 10,000+ times in a single file load. So becuase regular expressions are faster I have gone that route.
What I have so far is a XML file that contains the list of profane words that we need to check for. The util reads in the XML file and builds a pipe separated string of the words. (it's set up to only read the XML file once)
This then gets added to the regular expression string before the RE object gets created, and it all works great.
Problem came up when we tried to add a list of exceptions to the list. Some names came through that the list failed, but are actually valid. Instead of changing the util to pass those names, they want to support a list of exceptions. So I added the exceptions to a different element of the same XML file and get a similar list of exceptions
What I am having trouble with is how do I tell the regular expression that the string it is checking should not have a match in the exception list.
Basically String s is not in (exception list).
I have tried
If anyone has any ideas on how to do this using regex I would really appreciate it.
What I have so far is a XML file that contains the list of profane words that we need to check for. The util reads in the XML file and builds a pipe separated string of the words. (it's set up to only read the XML file once)
Code:
profane1|profane2|profane3|...|profane30
Problem came up when we tried to add a list of exceptions to the list. Some names came through that the list failed, but are actually valid. Instead of changing the util to pass those names, they want to support a list of exceptions. So I added the exceptions to a different element of the same XML file and get a similar list of exceptions
Code:
exception1|exception2|exception3|...|exception 10
What I am having trouble with is how do I tell the regular expression that the string it is checking should not have a match in the exception list.
Basically String s is not in (exception list).
I have tried
Code:
(\\b[^(exception1|exception2)]\\b) | rest of current regex.
(\\b[^(exception1|exception2)]\\b)(rest of regex)
If anyone has any ideas on how to do this using regex I would really appreciate it.