Skip to content
This repository has been archived by the owner on Jan 11, 2021. It is now read-only.

Support reading & writing Arrow in encoder/decoders. #191

Open
2 tasks
sunchao opened this issue Nov 13, 2018 · 4 comments
Open
2 tasks

Support reading & writing Arrow in encoder/decoders. #191

sunchao opened this issue Nov 13, 2018 · 4 comments
Assignees

Comments

@sunchao
Copy link
Owner

sunchao commented Nov 13, 2018

In order to read into Arrow format, we need to add a get_spaced (borrowing from the c++ version) method in the decoder to leave spaces for null values, in the result value buffer. Same for encoders.

Subtasks:

@sunchao sunchao self-assigned this Nov 13, 2018
@sunchao sunchao mentioned this issue Nov 13, 2018
6 tasks
@sadikovi
Copy link
Collaborator

We have similar thing in record reader.

@sunchao
Copy link
Owner Author

sunchao commented Nov 13, 2018

We have similar thing in record reader.

Hmm... you mean record/reader.rs? I couldn't find anything related. This is on the encoding level though - so we'll need to add a new method in Encoder and Decoder.

@sadikovi
Copy link
Collaborator

How will you add it to the encoder or decoder? They don’t have information about null values - they encode or decode non null values.

If I am not mistaken - https://github.com/sunchao/parquet-rs/blob/master/src/record/triplet.rs#L310

Let me know if this is not what you had in mind, I will delete my comments.

@sunchao
Copy link
Owner Author

sunchao commented Nov 13, 2018

The interface will be similar to here. The valid_bits will be computed from def/rep levels, and passed to the call. See here for an example.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants