Behavior of OR Conditions
I explain the behavior of /^|(Abc)/. In the following description, it will represent the position of starting to match as ↑.
First condition | ^ | Match in top of string |
---|---|---|
Secound condition | (Abc) | Match "Abc" |
In the OR conditions, if matched in the first one of the conditions, it is not tried to match the second and subsequent. In /^|(Abc)/, when match first one "^", through second condition "(Abc)" and proceed to the next match.
Matched "^" has 0 width, therefore the pointer starting next match is not moved. If so, it makes Infinite loop to match.
# | Match pattern | Starting pointer | Move Pointer |
---|---|---|---|
1 | /^|(Abc)/ | AbcAbcAbc |
|
@AbcAbcAbc |
Move 0 step. | ||
2 | /^|(Abc)/ | @AbcAbcAbc |
|
@@AbcAbcAbc |
Move 0 step. | ||
3 | /^|(Abc)/ | @@AbcAbcAbc |
|
@@@AbcAbcAbc |
Move 0 step. | ||
… | … | … | … |
Javascript と Ruby
To avoid this endless loop, it is considered to have processed as Javascript and Ruby in the following.
# | Match pattern | Starting pointer | Move Pointer |
---|---|---|---|
1 | /^|(Abc)/ | AbcAbcAbc |
|
@AbcAbcAbc |
Move 1 step. | ||
2 | /^|(Abc)/ | @AbcAbcAbc |
|
@AbcAbc@Abc |
Move to after match. | ||
3 | /^|(Abc)/ | @AbcAbc@Abc |
|
@AbcAbc@Abc@ |
Move to after match. |
Because of regular expression of Ruby has a lookbehind, therefore can get excepted results by match pattern as follows.
'AbcAbcAbc'.gsub(/^|(?<=Abc)/) { '@' }
In Javascript, it is difficult to resolve by only ingenuity of regular expression. If allowed to use "if" statements, it can be obtained implementation of the expected results. However, it lacks a little to smart.
"AbcAbcAbc".replace(/^(Abc)|(Abc)/g,
function(all, br1, br2) {
return br1 ? ("@" + br1 + "@") : (br2 + "@");
}
);
Solution in PHP
In PHP, it seems to avoid an infinite match by a different method from Javascript and Ruby.
# | Match pattern | Starting pointer | Move Pointer |
---|---|---|---|
1 | /^|(Abc)/ | AbcAbcAbc |
|
@AbcAbcAbc |
Move 0 step. | ||
2 | /^|(Abc)/ | @AbcAbcAbc |
Fail to match "^". |
@Abc@AbcAbc |
Move to after match. | ||
3 | /^|(Abc)/ | @Abc@AbcAbc |
|
@Abc@Abc@Abc |
Move to after match. | ||
4 | /^|(Abc)/ | @Abc@Abc@Abc |
|
@Abc@Abc@Abc@ |
Move to after match. |
You can attempt the specification by following code.
preg_replace('/(?<=aaa)|(?=bbb)/', '@', 'aaabbb');
When matches which has zero-width are successive, to replace twice at same pointer. (?<=aaa) and (?=bbb), both pattern match same pointer in "aaabbb". If both matches were success, replacing result as follow.
Actually the @ is inserted only one, it is considered that the second half of the (?=bbb) does not match.