Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

emphasis mixed in text and link description breaks link parsing #96

Open
twouters opened this issue Jan 11, 2021 · 4 comments
Open

emphasis mixed in text and link description breaks link parsing #96

twouters opened this issue Jan 11, 2021 · 4 comments

Comments

@twouters
Copy link

twouters commented Jan 11, 2021

The emphasis in the following example takes precedence over link detection while I think it should be the other way around?

text containing _emphasis start breaks when followed by [link text with emphasis_ end](https://example.com)

GFM results in (as expected): text containing _emphasis start breaks when followed by link text with emphasis_ end

simple test to show unexpected behavior:

        it("should properly parse mixed emphasis and links", function() {
            var parsed6 = inlineParse("text containing _emphasis start breaks when followed by [link text with emphasis_ end](https://example.com)");
            validateParse(parsed6, [{
                content: "text containing ",
                type: "text"
            },{
                content: [{
                    "content": "emphasis start breaks when followed by ",
                    "type": "text"
                },{
                    "content": "[link text with emphasis",
                    "type": "text"
                }],
                type: "em"
            },{
                "content": " end",
                "type": "text"
            },{
                "content": "]",
                "type": "text"
            },{
                "content": "(",
                "type": "text"
            },{
                content: [{
                    type: "text",
                    content: "https://example.com"
                }],
                type: "link",
                target: "https://example.com",
                title: undefined
            },{
                "content": ")",
                "type": "text"
            }]);
        });
@twouters
Copy link
Author

twouters commented Jan 11, 2021

Additional test to show expected behavior:

        it("should properly parse mixed emphasis and links", function() {
            var parsed6 = inlineParse("text containing _emphasis start breaks when followed by [link text with emphasis_ end](https://example.com)");
            validateParse(parsed6, [{
                content: "text containing _emphasis start breaks when followed by ",
                type: "text"
            },{
                content: [{
                    "content": "link text with emphasis",
                    "type": "text"
                },{
                    "content": "_ end",
                    "type": "text"
                }],
                type: "link",
                target: "https://example.com",
                title: undefined
            }]);
        });

@ariabuckles
Copy link
Owner

Hi @twouters, thanks for mentioning this!

I took a look at this example and the example you have at Sorunome/slack-markdown#3 . Thanks for the concrete examples, they're really helpful!

My initial thoughts are:

  1. text with _em before [link with em_ in it](http://example.com) doesn't seem well-formed to me. I want simple-markdown to match reasonable behaviour for most well-formed examples, but generally don't try to match other implementations for conflicts like this, as matching other implementations with a different parsing algorithm is quite hard.
    • If you want that to parse in the way you expect, you can escape the _ with a backslash: text with \_em before [link with em_ in it](http://example.com)
    • What's the context in which you're getting this text? do you have control over the upstream message? Am I missing a compelling reason where this is a common use case?
    • If this is a common workflow for you, I might be able to help come up with a specialized em rule that meets your use-case
  2. text with _em before an autolink like <http://example.com/some_link> looks ~well-formed to me: we probably shouldn't be parsing emphasis if the closing _ doesn't have a space/newline/EOF after it

Does solving problem 2 there meet your needs? If so, I'd be happy to help you make a pull request to slack-markdown to fix it!

If you need problem 1 solved, could you talk me through the situation that's generating that sort of message?

@twouters
Copy link
Author

Hi,

I don't have control over the upstream message, since they are automated messages from monitoring services which we receive on Slack (those messages are then bridged to Matrix using Soru's slack-markdown as one of many other components.)

I did indeed notice the missing \b in slack-markdown and submitted a PR before reporting the issue here, since it didn't fix all my usecases.

While I'm typing this, I remember that I didn't check if the word boundary is missing for bold tags too (* in slack's formatting).
(This might sound unrelated at first sight, but I noticed that the monitoring messages contain bold tags too)
If that's the case, I think this report might be a bit too far fetched and might not have as big as an impact as I thought.

@ariabuckles
Copy link
Owner

Sorry for the slow response!

I think bold tags in simple-markdown don't have the same \b guards (because at the time, asterisks in input were much less common/problem-causing than underscores), so it that might be causing the problems. But I'd be happy to accept fixes to that if that would help your usecase!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants