Using Regular Expressions in Check Constraint - no more articles on Obama, McCain, Palin or Biden 20188367001

Using Regular Expressions in Check Constraint – no more articles on Obama, McCain, Palin or Biden

Recently I was working on an RSS feed aggregator – a SQL Merge statement that through the use of httpuritype and XMLTable scans multiple RSS feeds and merges the news items from those feeds into a central RSS_ARCHIVE table. I soon discoveredthat the archive would become flooded with stories on the candidates for the US elections. So I decided I would ban those articles from the RSS_ARCHIVE table: I would not allow any article that had the name of one of the four contenders (President and Vice President) in its title. And the easiest way to enforce such a record level constraint obviously is a Check Constraint.

My first attempt at this check constraint was something old fashioned like:

alter table rss_archive
add constraint rss_ban_stories_chk check 
( instr(lower(title), 'obama')<> -1 and instr(lower(title), 'mcain')<> -1 
  and instr(lower(title), 'palin')<> -1 and instr(lower(title), 'biden')<> -1
)

However, I soon discovered that it would become fairly horrible. So I sat down and browsed a little on the internet about regular expressions. One of those beasts you encounter over and over again and at least for me have still not become an intuitive item in my toolbox. And after some time I found the syntax I needed. That allowed me to create the Check Constraint in a much more elegant way – especially for someone more fluent in regular expressions:

alter table rss_archive
add constraint rss_ban_stories_chk check 
( NOT REGEXP_LIKE(title,'(obama)|(mccain)|(palin)|(biden)' ,'i'))
/

This basically says: the result of trying to find one of the strings obama, mcacin, palin and biden in the title – evaluated in a case insensitive manner (that’s the last parameter to the REGEXP_LIKE operator) should not be true.