-
Notifications
You must be signed in to change notification settings - Fork 49
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fingerprint for initial request is not saved on redirects #9
Comments
The reason for this issue was that the URL I yielded a Maybe this should be documented in the Wiki and/or http/https of the same page being ignored with an option. |
I don't understand the issue/the behavior you want to be documented. |
I think this could be added to a FAQ or a Wiki to help users prevent tedious debugging sessions. When the URL scraped from a page is different just because the server redirects to the HTTPS version of the page, then deltafetch will process it again which is not obvious. Maybe the reason why a page is not cached could also be logged in debug mode. What do you think? |
Hello @mrueegg , You are right that when requests are redirected, the deltafetch middleware stores the fingerprint of the redirected/final request made, and not the starting request.
And the logs showing that the saved fingerprint is the one for the last hop of redirects:
The original fingerprint for http://docs.scrapy.org, So the issue is confirmed. |
The case can be handled with custom 'deltafetch_key':
|
Hi,
I have a spider that makes usage of
FormRequest
, item loaders andRequest
.Here's an example for a FormRequest:
Here for an item loader:
And here for a request:
Deltafetch is enabled, creates a .db file, but with every spider run, Scrapy does all page requests again, so no delta processing is achieved.
Any ideas? Thanks.
The text was updated successfully, but these errors were encountered: