Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Ambiguity in $replace #168

Closed
deepilla opened this issue Mar 2, 2018 · 2 comments
Closed

Ambiguity in $replace #168

deepilla opened this issue Mar 2, 2018 · 2 comments

Comments

@deepilla
Copy link
Contributor

deepilla commented Mar 2, 2018

The $replace function can replace matches of a regular expression within a string with another string. The replacement string uses the syntax $N to refer to matching groups of the regex, where N is the position of the matching group.

For example, this expression drops the suffix from ordinal numbers and prefixes the resulting cardinal numbers with a hash sign.

$replace("1st, 2nd, 3rd and 4th", /([0-9]+)(?:st|nd|rd|th)\b/, "#$1")

returns:

"#1, #2, #3 and #4"

The "$1" in the replacement string represents the first submatch of the regex (a run of one or more digits in this case).

There's room for ambiguity in this syntax. An expression like "$10" could either refer to the tenth submatch of a regex, or the first submatch followed by a zero. In fact, the same expression behaves differently depending on the number of submatches present. With ten or more, "$10" evaluates to the tenth submatch. With fewer than ten, "$10" evaluates to the first submatch and a zero. Note that it's not possible to achieve the latter configuration with ten or more submatches.

Go's regular expression library has some similar functionality but with a modification that avoids this issue. In the Go version, "$10" always refers to the tenth submatch, regardless of how many submatches actually exist. To get the first (or any) submatch followed by a digit, you wrap it in braces, e.g.

${1}0

With this approach the intent of the expression is clear and there are no unrepresentable configurations. Perhaps something similar could be implemented in JSONata.

@andrew-coleman
Copy link
Member

The behaviour of this aligns with the XPath function fn:replace which defines those rules. I guess the assumption is that you would know the number of parenthesized sub-expressions in the regular expression and so this wouldn't be ambiguous.

@andrew-coleman
Copy link
Member

Closing as per previous comment - I don't think this is ambiguous as discussed.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants