Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Projects dependency map #933

Open
wmalpica opened this issue Aug 11, 2020 · 0 comments
Open

Projects dependency map #933

wmalpica opened this issue Aug 11, 2020 · 0 comments
Labels

Comments

@wmalpica
Copy link
Contributor

We have a lot of big projects that we are currently working on. These are mostly all going to be work on based off of the UCX branch.
This work will be based off of the UCX brand because it already contain a lot of changes all over the code base which you want to keep.

There a big three key projects and each of these have certain common requirements. In order of priority they are:
Communication c++ layer: #926
Batch Job Executor: #807
Avoid recalculating nodes in a graph: #696

The communication project needs to be completed first in order for us to be able to merge in the UCX branch into our main development branch.
The batch job executor project is also key because it will enable us to really scale and manage our memory for TPCX-BB.
These two projects should be tackled in parallel while properly managing their dependencies.

The batch executor will require a lot of refactoring of a code base and reorganizing how the kernels execute their work.
In order to make some of the refactoring work easier "Barriers Required for Distributed execution" #923 should be done first.

Additionally the code and of all the kernels should be retractored such that you can execute all the work using a run_batch function. All this refectoring could then be tested and validated to make sure it doesn't break anything. After the code of the kernels has been refactored then we can actually create a batch executor. Part of the code refactoring may be small projects in of themselves. For example calculating a partition plan for order by should be refractored into its own kernel. The join kernel will also require some non-trivial refactoring.

In the first implementation of the batch executor we can continue to use the same cache data the same ownership paradigm that we have with blazingTable. Cache data and blazing table can change afterwards when we make changes to avoid recalculating nodes in a graph. The new data ownership paradigm is defined in this project:
#924

@wmalpica wmalpica added ? - Needs Triage needs team to review and classify Design and removed ? - Needs Triage needs team to review and classify labels Aug 11, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

1 participant