Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve Batch processing v2 #3192

Open
4 tasks
Jkd-eth opened this issue Jan 21, 2025 · 0 comments
Open
4 tasks

Improve Batch processing v2 #3192

Jkd-eth opened this issue Jan 21, 2025 · 0 comments

Comments

@Jkd-eth
Copy link
Contributor

Jkd-eth commented Jan 21, 2025

User Story:
As a data engineer
I want to set up an internal batch scoring of addresses
so that I can process large datasets efficiently for the data team and provide results

Acceptance Criteria:
GIVEN the current system of batch processing
WHEN the bathc processing runs and there are errors
THEN there is an easy way to reprocess those erorrs quickly and efficiently and provide the consolidated results

This is a continuation of #3184

The batch process is working well and it's pretty quickly to pull results (400k in ~24-30hrs) however, it's an inconvenience for the Engineers & Data Scientists to have to manually pull & re-run the errors.

Update the DB connection to switch from dedicated connection to a data pool

Can we add the following once a batch process has run

  • New UI in Django (see error messages, retry, re-run batch)
  • Keep all records in the DB (store each address in DB)
  • Automatically ‘re-run’ after the batch completion and only the error records are re-processed (e.g if a single network has failed on that network will be re-run
  • Download the finished consolidated file with all wallets with a click of a button

Product & Design Links:

Tech Details:

Open Questions:

Notes/Assumptions:

@Jkd-eth Jkd-eth moved this to Prioritized in Passport New Jan 21, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Status: Prioritized
Development

No branches or pull requests

1 participant