Check Links Against Mozilla's Public Suffix List

moderation

#1

I very often see Twitch convert accidental periods directly between two words into a link. For example, Twitch would automatically convert “this.example” into a link in chat.

.example is not a valid TLD.

I also very often see bots are set to automatically purge or time out users that post links in Twitch chat, even if the link cannot possibly be valid.

To alleviate this, PhantomBot should check links generated by Twitch against Mozilla’s Public Suffix List (https://publicsuffix.org/list/public_suffix_list.dat) and not purge or time out any user that posts something that is converted into a link by Twitch but where the suffix does not match anything in this list.

Conversely, if an apparent link is posted in Twitch chat and the domain suffix in the link does match something in this list and the user has chosen to have the bot automatically moderate links, the bot should take action.


#2

Twitch does not tell us what is a link and what isn’t. There are so many bad links out there that I know that list doesn’t contains any or most of them. PhantomBot has its own regex pattern to detect links, same with all the bots out there. Having to call an API each time on every message sent in chat would slow down the bot’s moderation system way too much, even if we cached a list like this we would have to update it a lot.

EDIT:

I will look into it though, to see if I can find a library for it and if it works good. Thinking about it more it sounds like a good feature that no other bot has.


#3

Thank you Scania! :smiley: The list I linked above is the most well known and widely used one.

I’m also talking to Twitch about this because their chat is incorrectly deciding to create a link out of any two words separated by a period whether or not the suffix is valid in global DNS.

Anyway, looking forward to what you/the team think of feasibility of this feature. I didn’t even think of it in terms of something that could make PhantomBot stand out above the rest but it’s definitely a feature that no other bot seems to have.

Might even be a good one if you can make it work well.


#4

I cannot seem to be able to reproduce a “fake” link with our bot, our regex pattern already looks to make sure that the URL has a top level domain, are you using Twitch’s link moderation by any chance?

If I type google.notdomain in my chat nothing happens, but when I use google.com I get timed out.


#5

Welp. I spoke before I tested. PhantomBot is already intelligent enough to block links with valid domains and not block links with invalid domains. Also I now have a separate account from my main and bot accounts for testing this stuff.

Thanks Scania!! Your work is greatly appreciated :smiley:

+1 point to the almighty PhantomBot.