-
Notifications
You must be signed in to change notification settings - Fork 1.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Update func isDomainAllowed #420
base: master
Are you sure you want to change the base?
Conversation
like scrapy. if c.Visit("http://www.xxxx.com.cn/"), colly.AllowedDomains("xxxx.com.cn") should be work for main domain. so if we request 'www.xxxx.com.cn' and AllowedDomains are 'xxxx.com.cn'. This means that we have access to all subdomains under the top-level domain name
@@ -693,6 +693,11 @@ func (c *Collector) isDomainAllowed(domain string) bool { | |||
if c.AllowedDomains == nil || len(c.AllowedDomains) == 0 { | |||
return true | |||
} | |||
for _, d2 := range c.AllowedDomains { | |||
if strings.Contains(domain, d2) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I recommend to use "HasSuffix" method instead of "Contains". Does that make sense?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yeah. "HasSuffix" is well
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
HasSuffix sounds better for me, could you change it? Also, please update the docstring with the new behavior.
This is a great change, I was just looking for a good solution for this issue. 👍 |
Any update about this? ... |
I dunno, this is a pretty breaking change. Depending on your use case, you might want this behaviour, or not. In any case, one can do arbitrary origin check of the URL just before calling |
like scrapy. if c.Visit("http://www.xxxx.com.cn/"), colly.AllowedDomains("xxxx.com.cn") should be work for main domain. so if we request 'www.xxxx.com.cn' and AllowedDomains are 'xxxx.com.cn'. This means that we have access to all subdomains under the top-level domain name