-
-
Notifications
You must be signed in to change notification settings - Fork 726
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Some emoji are not rendered as images #548
Comments
DiscordChatExporter/DiscordChatExporter.Core/Markdown/MarkdownParser.cs Lines 151 to 160 in d01ed13
The current emoji matcher appears to be very crude. Unfortunately, it turns out that emoji codepoints are a mess. The unicode values for Discord's current version of emoji can be found here. The most glaring issue with the current matcher seems to be the omission of all the emoji that can be expressed in a single byte (except those from \u2600 to \u26FF). Other discrepancies include:
I'm in favor of some support being added for the missed single-byte emoji since there are quite a few of those, but it looks like it'd be very difficult to get an entirely accurate mapping. It really doesn't help that Unicode's official files are quite messy (and the one I linked isn't even ordered fully by codepoint despite claiming that it is). I think at this point it's a matter of deciding how much is worth the effort. The first bullet point above is trivial to correct, but the second and third are both a significant amount of hardcoding to fix a relatively small issue. |
Yes, that matcher used to be much more greedy, which resulted in lots of false matches. The export would show broken images for random unicode sequences that the parser assumed were emoji but actually weren't.
That's probably the case.
I was unable to come up with a reliable way to match emoji in the same way that Discord does. It's worth noting that Discord's emoji support is particularly extensive, for example it supports compound emoji where multiple separate emoji can be combined to render a different one (e.g. an emoji for man, woman, and a child, renders the emoji for family). Overall I just thought it wasn't worth the effort because not getting some emoji was not as bad as incorrectly matching unrelated text and breaking the export. After all, the emoji still gets rendered, just not using the Twitter's/Discord's image set. |
With #549 and #599 closed, we have the |
Since this issue has recently popped up again: I doubt it could be useful since this is a JS library, but I think this concept could technically be what we need: https://www.npmjs.com/package/twemoji-parser?activeTab=readme It basically checks a series of characters for emojis and returns the found emojis, even with links to the Twemoji cdn. Here's an example: This
returns
Oh wait, I just noticed that this is probably the same as
I will send this comment anyways, no harm in doing so. U+1F609 There is also this database https://emojibase.dev/docs/datasets, but I think it just contains what an to Emoji 14.0 updated version of our emoji list contains. Lastly, it's also worth noting that our emoji list needs an update as well since it's missing emojis from Emoji 14.0 and 15.0, which will soon be released on Discord. This would also prevent #599 from becoming an issue again. (There seems to be a problem with that anyways, but that needs a bit more investigating from my side.) Getting that list or updating the current one probably won't be as easy as last time, I couldn't find the emoji.json on the new Discord APKs. This seems like a good alternative, it's up to date with Emoji 14.0 and just formatted a bit differently than our list. |
@CanePlayz was this issue fixed by your earlier PR, by any chance? Also, if you can update the emoji list, that would be nice too :) |
Unfortunately not, it just fixed false positives. Emoji codepoints for some emojis are just too complex for our regex. For example these two: render as this:
Yeah, I will do that in the coming days. |
Ah yes, the compound emoji are not handled properly. But in your original post, you pointed out that "⏩" didn't render as an emoji – is that not fixed now? Along with other cases pointed out by @96-LB. As for the compound emoji (skin color, family) not working, I think we can just let it be. Unless there's an easy way to fix it that I missed somehow. |
No, unfortunately, it's not. And in case I'm not missing something integral, I don't know why it should be. My pull request just narrowed down which Unicode sequences we render as emojis. But all the ones that are now rendered were previously rendered as well.
Yeah, if it's too complex that's totally fine. |
Yeah it makes sense. I just thought I'd ask to make sure. |
Turns out that's a bit more complex since not only does the APK not have that file anymore, but the more up-to-date list I provided earlier is not complete either. I will continue looking for alternatives though. |
It will take some time but I will add them manually. 👍 |
No worries |
Is this still relevant? 🙂 |
I guess yes, but if those emojis which consist of multiple Unicode symbols are too complex to be caught by our regex, it's fine if you close it. |
Some emojis aren't shown in their Discord style as they are in Discord but instead in the Windows style (I assume the style depends on the system you're on, but for me (in the Chromium Edge with Windows) it's displayed like that).
Instead of:
It's shown like that:
The text was updated successfully, but these errors were encountered: