The Cusp of Helix

Intact Case

Behavior of OR Conditions

I explain the behavior of /^|(Abc)/. In the following description, it will represent the position of starting to match as .

First condition ^ Match in top of string
Secound condition (Abc) Match "Abc"

In the OR conditions, if matched in the first one of the conditions, it is not tried to match the second and subsequent. In /^|(Abc)/, when match first one "^", through second condition "(Abc)" and proceed to the next match.

Matched "^" has 0 width, therefore the pointer starting next match is not moved. If so, it makes Infinite loop to match.

# Match pattern Starting pointer Move Pointer
1 /^|(Abc)/
AbcAbcAbc
@AbcAbcAbc
Move 0 step.
2 /^|(Abc)/
@AbcAbcAbc
@@AbcAbcAbc
Move 0 step.
3 /^|(Abc)/
@@AbcAbcAbc
@@@AbcAbcAbc
Move 0 step.

Javascript と Ruby

To avoid this endless loop, it is considered to have processed as Javascript and Ruby in the following.

When the match width is zero, the match start pointer is proceed one step.
# Match pattern Starting pointer Move Pointer
1 /^|(Abc)/
AbcAbcAbc
@AbcAbcAbc
Move 1 step.
2 /^|(Abc)/
@AbcAbcAbc
@AbcAbc@Abc
Move to after match.
3 /^|(Abc)/
@AbcAbc@Abc
@AbcAbc@Abc@
Move to after match.

Because of regular expression of Ruby has a lookbehind, therefore can get excepted results by match pattern as follows.

'AbcAbcAbc'.gsub(/^|(?<=Abc)/) { '@' }

In Javascript, it is difficult to resolve by only ingenuity of regular expression. If allowed to use "if" statements, it can be obtained implementation of the expected results. However, it lacks a little to smart.

"AbcAbcAbc".replace(/^(Abc)|(Abc)/g, function(all, br1, br2) { return br1 ? ("@" + br1 + "@") : (br2 + "@"); } );

Solution in PHP

In PHP, it seems to avoid an infinite match by a different method from Javascript and Ruby.

If the pointer does not proceed from the last match pointer in the new match, the match is regarded as failure.
# Match pattern Starting pointer Move Pointer
1 /^|(Abc)/
AbcAbcAbc
@AbcAbcAbc
Move 0 step.
2 /^|(Abc)/
@AbcAbcAbc
Fail to match "^".
@Abc@AbcAbc
Move to after match.
3 /^|(Abc)/
@Abc@AbcAbc
@Abc@Abc@Abc
Move to after match.
4 /^|(Abc)/
@Abc@Abc@Abc
@Abc@Abc@Abc@
Move to after match.

You can attempt the specification by following code.

preg_replace('/(?<=aaa)|(?=bbb)/', '@', 'aaabbb');

When matches which has zero-width are successive, to replace twice at same pointer. (?<=aaa) and (?=bbb), both pattern match same pointer in "aaabbb". If both matches were success, replacing result as follow.

aaa@@bbb

Actually the @ is inserted only one, it is considered that the second half of the (?=bbb) does not match.