emphasis mixed in text and link description breaks link parsing #96

twouters · 2021-01-11T13:21:28Z

The emphasis in the following example takes precedence over link detection while I think it should be the other way around?

text containing _emphasis start breaks when followed by [link text with emphasis_ end](https://example.com)

GFM results in (as expected): text containing _emphasis start breaks when followed by link text with emphasis_ end

simple test to show unexpected behavior:

        it("should properly parse mixed emphasis and links", function() {
            var parsed6 = inlineParse("text containing _emphasis start breaks when followed by [link text with emphasis_ end](https://example.com)");
            validateParse(parsed6, [{
                content: "text containing ",
                type: "text"
            },{
                content: [{
                    "content": "emphasis start breaks when followed by ",
                    "type": "text"
                },{
                    "content": "[link text with emphasis",
                    "type": "text"
                }],
                type: "em"
            },{
                "content": " end",
                "type": "text"
            },{
                "content": "]",
                "type": "text"
            },{
                "content": "(",
                "type": "text"
            },{
                content: [{
                    type: "text",
                    content: "https://example.com"
                }],
                type: "link",
                target: "https://example.com",
                title: undefined
            },{
                "content": ")",
                "type": "text"
            }]);
        });

The text was updated successfully, but these errors were encountered:

twouters · 2021-01-11T13:25:27Z

Additional test to show expected behavior:

        it("should properly parse mixed emphasis and links", function() {
            var parsed6 = inlineParse("text containing _emphasis start breaks when followed by [link text with emphasis_ end](https://example.com)");
            validateParse(parsed6, [{
                content: "text containing _emphasis start breaks when followed by ",
                type: "text"
            },{
                content: [{
                    "content": "link text with emphasis",
                    "type": "text"
                },{
                    "content": "_ end",
                    "type": "text"
                }],
                type: "link",
                target: "https://example.com",
                title: undefined
            }]);
        });

ariabuckles · 2021-01-12T10:13:33Z

Hi @twouters, thanks for mentioning this!

I took a look at this example and the example you have at Sorunome/slack-markdown#3 . Thanks for the concrete examples, they're really helpful!

My initial thoughts are:

text with _em before [link with em_ in it](http://example.com) doesn't seem well-formed to me. I want simple-markdown to match reasonable behaviour for most well-formed examples, but generally don't try to match other implementations for conflicts like this, as matching other implementations with a different parsing algorithm is quite hard.
- If you want that to parse in the way you expect, you can escape the _ with a backslash: text with \_em before [link with em_ in it](http://example.com)
- What's the context in which you're getting this text? do you have control over the upstream message? Am I missing a compelling reason where this is a common use case?
- If this is a common workflow for you, I might be able to help come up with a specialized em rule that meets your use-case
text with _em before an autolink like <http://example.com/some_link> looks ~well-formed to me: we probably shouldn't be parsing emphasis if the closing _ doesn't have a space/newline/EOF after it
- This is actually fixed in this library see src/index.js line 1609-1612. But slack-markdown has a custom em rule that doesn't handle this case: https://github.com/Sorunome/slack-markdown/blob/master/index.js#L150
- I think if slack-markdown changes the regex to /^\b_(\S(?:\\[\s\S]|[^\\])*?\S|\S)_\b/, it should work?

Does solving problem 2 there meet your needs? If so, I'd be happy to help you make a pull request to slack-markdown to fix it!

If you need problem 1 solved, could you talk me through the situation that's generating that sort of message?

twouters · 2021-01-12T11:43:38Z

Hi,

I don't have control over the upstream message, since they are automated messages from monitoring services which we receive on Slack (those messages are then bridged to Matrix using Soru's slack-markdown as one of many other components.)

I did indeed notice the missing \b in slack-markdown and submitted a PR before reporting the issue here, since it didn't fix all my usecases.

While I'm typing this, I remember that I didn't check if the word boundary is missing for bold tags too (* in slack's formatting).
(This might sound unrelated at first sight, but I noticed that the monitoring messages contain bold tags too)
If that's the case, I think this report might be a bit too far fetched and might not have as big as an impact as I thought.

ariabuckles · 2021-01-28T22:51:30Z

Sorry for the slow response!

I think bold tags in simple-markdown don't have the same \b guards (because at the time, asterisks in input were much less common/problem-causing than underscores), so it that might be causing the problems. But I'd be happy to accept fixes to that if that would help your usecase!

twouters mentioned this issue Jan 11, 2021

messages where markdown formatting overlaps slack formatting are rendered incorrectly Sorunome/slack-markdown#3

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

emphasis mixed in text and link description breaks link parsing #96

emphasis mixed in text and link description breaks link parsing #96

twouters commented Jan 11, 2021 •

edited

Loading

twouters commented Jan 11, 2021 •

edited

Loading

ariabuckles commented Jan 12, 2021

twouters commented Jan 12, 2021

ariabuckles commented Jan 28, 2021

emphasis mixed in text and link description breaks link parsing #96

emphasis mixed in text and link description breaks link parsing #96

Comments

twouters commented Jan 11, 2021 • edited Loading

twouters commented Jan 11, 2021 • edited Loading

ariabuckles commented Jan 12, 2021

twouters commented Jan 12, 2021

ariabuckles commented Jan 28, 2021

twouters commented Jan 11, 2021 •

edited

Loading

twouters commented Jan 11, 2021 •

edited

Loading