Skip to content
This repository has been archived by the owner on Jan 11, 2021. It is now read-only.

parquet_derive for the new RecordWriter trait #197

Closed
wants to merge 17 commits into from
Closed
2 changes: 1 addition & 1 deletion .gitignore
Original file line number Diff line number Diff line change
@@ -1,3 +1,3 @@
Cargo.lock
/target
target
**/*.rs.bk
5 changes: 5 additions & 0 deletions .travis.yml
Original file line number Diff line number Diff line change
Expand Up @@ -44,6 +44,11 @@ script:
- cargo build
- cargo test
- cargo doc --no-deps
- cd parquet_derive_test
- cargo fmt --all -- --check
- cargo test
- cargo doc --no-deps
- cd ..

after_success:
- if [ "$TRAVIS_RUST_VERSION" == "nightly" ]; then
Expand Down
3 changes: 3 additions & 0 deletions Cargo.toml
Original file line number Diff line number Diff line change
Expand Up @@ -46,3 +46,6 @@ num-bigint = "0.2"
[dev-dependencies]
lazy_static = "1"
rand = "0.5"

[workspace]
members = ["parquet_derive", "parquet_derive_test"]
3 changes: 3 additions & 0 deletions appveyor.yml
Original file line number Diff line number Diff line change
Expand Up @@ -28,3 +28,6 @@ install:
build: false
test_script:
- cargo test --verbose --jobs 4
- cd parquet_derive_test
- cargo test
- cd ..
14 changes: 14 additions & 0 deletions parquet_derive/Cargo.toml
Original file line number Diff line number Diff line change
@@ -0,0 +1,14 @@
[package]
name = "parquet_derive"
version = "0.1.0"
authors = ["Xavier Lange <xrlange@gmail.com>"]
edition = "2018"

[lib]
proc-macro = true

[dependencies]
proc-macro2 = "0.4"
quote = "0.6.10"
syn = { version = "0.15.22", features = ["extra-traits"] }
parquet = { path = ".." }
91 changes: 91 additions & 0 deletions parquet_derive/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,91 @@
<!---
Licensed to the Apache Software Foundation (ASF) under one
or more contributor license agreements. See the NOTICE file
distributed with this work for additional information
regarding copyright ownership. The ASF licenses this file
to you under the Apache License, Version 2.0 (the
"License"); you may not use this file except in compliance
with the License. You may obtain a copy of the License at

http://www.apache.org/licenses/LICENSE-2.0

Unless required by applicable law or agreed to in writing,
software distributed under the License is distributed on an
"AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
KIND, either express or implied. See the License for the
specific language governing permissions and limitations
under the License.
-->

# parquet_derive-rs

A crate for automatically deriving `RecordWriter` for arbitrary, _simple_ structs. This does not generate writers for nested structures but it will work great for shallow structures.

## Usage
Add this to your Cargo.toml:
```toml
[dependencies]
parquet = "0.4"
parquet_derive = "0.4"
```

and this to your crate root:
```rust
extern crate parquet;
#[macro_use] extern crate parquet_derive;
```

Example usage of deriving a `RecordWriter` for your struct:

```rust
use parquet;
use parquet::record::RecordWriter;

#[derive(ParquetRecordWriter)]
struct ACompleteRecord<'a> {
pub a_bool: bool,
pub a_str: &'a str,
pub a_string: String,
pub a_borrowed_string: &'a String,
pub maybe_a_str: Option<&'a str>,
pub magic_number: i32,
pub low_quality_pi: f32,
pub high_quality_pi: f64,
pub maybe_pi: Option<f32>,
pub maybe_best_pi: Option<f64>,
}

// Initialize your parquet file
let mut writer = SerializedFileWriter::new(file, schema, props).unwrap();
let mut row_group = writer.next_row_group().unwrap();

// Build up your records
let chunks = ...

// The derived `RecordWriter` takes over here
chunks.write_to_row_group(&mut row_group);

writer.close_row_group(row_group).unwrap();
writer.close().unwrap();
```

## Features
- [X] Support writing `String`, `&str`, `bool`, `i32`, `f32`, `f64`
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Byte arrays and unsigned variants?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good call on the unsigned variants. To be clear, to satisfy that requirement I need to add u32/u64, right? Parquet does not handle 8 or 16 bit values, right?

And when it comes to "byte arrays", are you talking fixed-sized arrays like let arr : [u8; 16] = [0; 16];? Or do you mean let vec : Vec<u8> = vec![0; 16];? I haven't used either of those in parquet-rs so I need a little guidance.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

By "byte arrays" I meant parquet type ByteArray, not just strings.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@sadikovi how do I write unsigned variants? I don't see a column writer for that.

- [ ] Support writing dictionaries
- [ ] Support writing logical types like timestamp
- [X] Derive definition_levels for `Option`
- [ ] Derive definition levels for nested structures

## Requirements
- Same as `parquet-rs`

## Test
Testing a `*_derive` crate requires an intermediate crate. Go to `parquet_derive_test` and run `cargo test` for
unit tests.

## Docs
To build documentation, run `cargo doc --no-deps`.
To compile and view in the browser, run `cargo doc --no-deps --open`.

## License
Licensed under the Apache License, Version 2.0: http://www.apache.org/licenses/LICENSE-2.0.
Loading