The Difference of Regular Expression in each Programming Language
When I make libraries of Intact Case, to use Regular Expression for interconverting camelCase and snake_case. I had to create some patterns interconverting for each programming language, because PHP, Ruby and Javascript has different specification of regular expression. Javascript means ECMAScript5 in this article (July 2015 currently).
Pattern | Kind | PHP | Ruby | Javascript |
---|---|---|---|---|
(?=) | lookahead | ✔ | ✔ | ✔ |
(?!) | ✔ | ✔ | ✔ | |
(?<=) | lookbehind | ✔ | ✔ | none |
(?<!) | ✔ | ✔ | none | |
(?>) | atomic grouping | ✔ | ✔ | none |
?? | lazy matching | ✔ | ✔ | ✔ |
*? | ✔ | ✔ | ✔ | |
+? | ✔ | ✔ | ✔ | |
?+ | greedy matching | ✔ | ✔ | none |
*+ | ✔ | ✔ | none | |
++ | ✔ | ✔ | none |
Javascript has no lookbehind, atomic grouping and greedy matching. it is necessary to devise for writing complicated match patterns.
Bug of Regular Expression
When I create libraries of Intact Case, got some cases that matching results is not in the expected results by some programming languages. As a result of investigation, I found the cause that the difference in the specification of regular expression about "or conditions".
When replace string by match pattern as follows, attempt to think about expected results. Javascript has no (?<=), therefore do not use it.
- Top of string
- After "Abc"
Match pattern | /^|(Abc)/g |
---|---|
Replace string | "$1@" |
Haystack string | AbcAbcAbc |
Expected result | @Abc@Abc@Abc@ |
Codes
PHP | preg_replace('/^|(Abc)/', '$1@', 'AbcAbcAbc'); |
---|---|
Javascript | "AbcAbcAbc".replace(/^|(Abc)/g, "$1@"); |
Ruby | 'AbcAbcAbc'.gsub(/^|(Abc)/) { "#{$1}@" } |
There are results of each programming language. did not get excepted result in Javascript and Ruby.
PHP | @Abc@Abc@Abc@ |
---|---|
Javascript | @AbcAbc@Abc@ |
Ruby | @AbcAbc@Abc@ |