Due to scenarios wherein a formatted link ended up as part of a larger
raw link after parsing, change the containment check to an overlap check
and add appropriate tests for these edge cases.
Testing with #6542 surfaced a crash scenario, caused by formatted links
that had URLs in the display text, for example
[mean example - https://osu.ppy.sh](https://osu.ppy.sh)
In that case the outer Markdown link would get picked up once, and then
reduced to the link text when looking for other links, leading to it
being picked up again the second time when the raw link is found.
Add a check in the raw link parsing path that ensures that the found
URL is not a part of a bigger, pre-existing link.
Extend the Markdown parsing regex to allow parsing so-called inline
links. Within the parenthesis () part of the Markdown URL syntax,
introduce a new capturing group:
(
\s+ // whitespace between actual URL and inline title
(?<title> // start of "title" named group
"" // opening double quote (doubled inside @ string)
(
[^""] // any character but a double quote
| // or
(?<=\\) // the next character should be preceded by a \
"" // a double quote
)* // zero or more times
"" // closing double quote
)
)? // the whole group is optional
This allows for parsing the inline links as-provided by web. Correctness
is displayed by the passing tests.
Allow users to put both balanced round parentheses, as well as
unbalanced escaped ones, in old style link text. The implementation
is the same as for Markdown and new style links, except for swapping
all instances of
\[\]
to
\(\)
for obvious reasons (different type of parenthesis requiring escaping).
Tests also included.
Allow users to put both balanced brackets, as well as unbalanced
escaped ones, in Markdown link text. The implementation is the exact
same as in the case of new format links.
For completion's sake, tests also included.
For the sake of readability, consistency and to make further changes
easier, introduce named groups (?<text>) and (?<url>) to all link
parsing regexes which have parts containing the desired link text
and (optionally) URL.
The introduction of the named groups additionally simplifies
handleMatches() and makes all calls to it consistent.
For users to be able to add square brackets inside of links using
the new format, the regular expression used for parsing those links
contained a balancing group, which can be used for matching pairs
of tokens (in this case, opening and closing brackets, in that order).
However, this means that users could not post links with unmatched
brackets inside of them (ie. ones that contain single brackets, or
a closing bracket and then an opening one). Allow for escaping opening
and closing brackets using the backslash character.
The change substitutes this old fragment of the regex in the display
text group:
[^\[\]]* // any character other than closing/opening bracket
for this one:
(((?<=\\)[\[\]])|[^\[\]])*
The second pattern in the alternative remains the same; the first one
performs the escaping, as follows:
(
(?<=\\) // positive lookbehind expression:
// this match will succeed, if the next expression
// is preceded by a single backslash
[\[\]] // either an opening or closing brace
)
Since the entire display group is matched, unfortunately the lookbehind
expression does not actually strip the backslashes, so they are
manually stripped in handleMatches.
As demonstrated in the unit tests attached, this also allows balanced
brackets to be mixed with escaped ones.