Ambiguity in $replace #168

deepilla · 2018-03-02T16:17:24Z

The $replace function can replace matches of a regular expression within a string with another string. The replacement string uses the syntax $N to refer to matching groups of the regex, where N is the position of the matching group.

For example, this expression drops the suffix from ordinal numbers and prefixes the resulting cardinal numbers with a hash sign.

$replace("1st, 2nd, 3rd and 4th", /([0-9]+)(?:st|nd|rd|th)\b/, "#$1")

returns:

"#1, #2, #3 and #4"

The "$1" in the replacement string represents the first submatch of the regex (a run of one or more digits in this case).

There's room for ambiguity in this syntax. An expression like "$10" could either refer to the tenth submatch of a regex, or the first submatch followed by a zero. In fact, the same expression behaves differently depending on the number of submatches present. With ten or more, "$10" evaluates to the tenth submatch. With fewer than ten, "$10" evaluates to the first submatch and a zero. Note that it's not possible to achieve the latter configuration with ten or more submatches.

Go's regular expression library has some similar functionality but with a modification that avoids this issue. In the Go version, "$10" always refers to the tenth submatch, regardless of how many submatches actually exist. To get the first (or any) submatch followed by a digit, you wrap it in braces, e.g.

${1}0

With this approach the intent of the expression is clear and there are no unrepresentable configurations. Perhaps something similar could be implemented in JSONata.

The text was updated successfully, but these errors were encountered:

andrew-coleman · 2018-03-05T14:12:06Z

The behaviour of this aligns with the XPath function fn:replace which defines those rules. I guess the assumption is that you would know the number of parenthesized sub-expressions in the regular expression and so this wouldn't be ambiguous.

andrew-coleman · 2018-04-18T11:28:16Z

Closing as per previous comment - I don't think this is ambiguous as discussed.

andrew-coleman closed this as completed Apr 18, 2018

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Ambiguity in $replace #168

Ambiguity in $replace #168

deepilla commented Mar 2, 2018

andrew-coleman commented Mar 5, 2018

andrew-coleman commented Apr 18, 2018

Ambiguity in $replace #168

Ambiguity in $replace #168

Comments

deepilla commented Mar 2, 2018

andrew-coleman commented Mar 5, 2018

andrew-coleman commented Apr 18, 2018