Latest CSE Seminar Topic on Tool for Fast Regular Expression Matching

This paper discussed about the DOTSTAR tool which emerged for fast regular expression matching. DOTSTAR compiles regular expressions into a compact automation by using more manageable intermediate representations in a sequence. The resulting automation can easily search with a single pass without any backtracking. 

Overview: 

To define configurable rules for data parsing, regular expressions are a common choice.  Regular expressions help in performing online data filtering for many data intensive applications.  But with the regular expressions there is a possibility of increasing data rates and complexities by sets with hundreds of expressions. 

DOTSTAR addresses this problem by providing a software tool chain to compile large sets of user provided regular expressions first into a sequence of intermediate representations. Dotstar addresses the problems like amnesia and acalculia.  Amnesia is the inability of follow the multiple partial matches’ progress, whereas Acalculia is the inability to count the number of sub-expression occurrences.  Dotstar helps to avoid state explosion by using counting and status bits. Dotstar can reduce the memory requirements by elimination amnesia.

The five elements of Dotstar are classification engine, expander, compiler, compactor and runtime engine. Dotstar can accommodate the complex regex of Posix standard.  For the NIDS regex sets, Dotstar also compares well with the FPGA/accelerator implementations. 

Conclusions: 

DOTSTAR tool offers an automation to recognize small and large regular expression sets. Similar to Aho-Corasick automation, dotstar also can detect an exact matching of every regular expression including overlapping matches in a single pass.  Dotstar can efficiently parses both large and small user provided regular expressions with a very high matching speed for regex sets also Dotstar offers solution to acceleration.  The results given by Dotstar tool are better when compared to that of Regexlib and Boost tools. XML tokenization, SMTP parsing and Network intrusion detection are the three categories in which experimental evaluation has been carried out for Dotstar tool. 

 Download  Latest CSE Seminar Topic on Tool for Fast Regular Expression Matching.

Leave a Reply

Your email address will not be published. Required fields are marked *