Regular expression and performances

Regular expressions are really powerful, but sometimes they can be quite slow, especially when you have to deal with large amount of data.

I have a lot of string in database that have this format …text…||…anothertext…||..anothertext… and we need to split using || as separator. Since string.split accepts only a char, we used a simple regular expression to parse text.

1
@"(>|\||^|(\.\.\.))(?<prev>.+?)(<|\||$|(\.\.\.,))"

This regular expression was a leftover by some old code, it splits string with tag like <xxx>…text</xxx> and was left here even if now the string format is really simplier. Now we experienced some slow excel report creation and I verified that most of the time is spent in parsing this string (that is called previews).

image

In this test 20 seconds are wasted in function that used those old regex. Now I simply rewrite it using indexof while loop.

image

Thanks to dottrace i verify that now the most expensive function is ManageModifiedRowContent. The morale is, pay attention to regular expression because they can be slow. If you need to manage large amount of data, do not use regex for simple task.

alk.

Tags: Regular Expression