Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support serde::Serialize and serde::Deserialize for the engine #624

Closed
azoyan opened this issue Dec 24, 2024 · 13 comments · Fixed by #628
Closed

Support serde::Serialize and serde::Deserialize for the engine #624

azoyan opened this issue Dec 24, 2024 · 13 comments · Fixed by #628
Assignees
Labels
area: library Improvements to the library API quality area: parser Related to `rsonpath-syntax` type: feature New feature or request
Milestone

Comments

@azoyan
Copy link
Contributor

azoyan commented Dec 24, 2024

Is it possible to implement traits Serialization and Deserialization?

If it difficult may be provide API to inner jsonpath query that stored in MainEngine?

@azoyan azoyan added the type: feature New feature or request label Dec 24, 2024
@github-actions github-actions bot added the acceptance: triage Waiting for owner's input label Dec 24, 2024
Copy link

Tagging @V0ldek for notifications

@V0ldek
Copy link
Member

V0ldek commented Dec 24, 2024

Do you mean serializing and deserializing the MainEngine?

So the MainEngine consists of the Automaton, which is the compiled query, and the SimdConfiguration.

  1. The SimdConfiguration is not something one should serialize since it's not portable by design, it contains the SIMD capabilities of the host CPU.
  2. The Automaton can be serialized, most likely automatically derived by serde-derive.

So to serialize one would serialize only the Automaton; and for deserialize one would deserialize the Automaton and use simd::configure() to get a new configuration applicable to the host CPU.

Support for serialization must be behind a new serde optional feature though, to not bloat dependencies unnecessarily. It will have to reach rsonpath-syntax as well (to serialize JsonUInt and JsonString and such). At that point it's probably worth to make the entire JsonPathQuery serializable.

I'd expect some basic proptests to make sure serialization round-trips. Could you tell me what format you're planning on serializing into?

@V0ldek
Copy link
Member

V0ldek commented Dec 24, 2024

One more thing is versioning – we don't expect the Automaton to be binary-compatible between different versions of rsonpath so serialization needs to save an additional version tag. We'll need a snapshot test that will tell us when the binary payload changes.

@azoyan
Copy link
Contributor Author

azoyan commented Dec 24, 2024

We use engine like this: we convert paths from Vec<String> to Vec<RsonpathEngine>

let paths: Vec<String> = vec![
    "$.personal.details.contact.information.phones.home".to_string(),
    "$.personal.details.contact.information.phones.office".to_string(),
];
let engines: Vec<RsonpathEngine> = paths
    .iter()
    .map(|path| {
        let a = Automaton::new(&rsonpath_syntax::parse(path).unwrap()).unwrap();
        RsonpathEngine::from_compiled_query(a)
    })
    .collect();

And I thought it would be a good idea. Deserialize engine from jsonpath, and serialize to path.

But maybe I'm missing something. And this view is one narrow use case among many. Then good way will be provide API just about jsonpathquery ($.personal.details.contact.information.phones.home). I will implement Serialize/Deserialize for my own wrapper using this information.

@V0ldek
Copy link
Member

V0ldek commented Dec 25, 2024

I see. The engine doesn't preserve the underlying query, so you'd need to hold it alongside.

The main reason to make the engine serialize/deserialize would be to save on compilation time. Parsing and compiling an engine is relatively fast, but I'm pretty sure deserializing it from a binary format would be much faster, so it's something that might be worth supporting.

For example, a quick microbenchmark tells me cloning an engine with your query is ~10x faster than recompiling it. Deserialization from a binary payload would presumably be closer to a clone than recompilation.

In any case, a parsed JsonPathQuery can already be converted to string, so this code does work:

let query_str_1 = "$.personal.details.contact.information.phones.home";
let query_1 = rsonpath_syntax::parse(query_str_1).unwrap();
let automaton_1 = Automaton::new(&query_1).unwrap();

let query_str_2 = query_1.to_string();
let query_2 = rsonpath_syntax::parse(&query_str_2).unwrap();
let automaton_2 = Automaton::new(&query_2).unwrap();
assert_eq!(query_1, query_2);
assert_eq!(automaton_1, automaton_2);

Just note that the string representation is not preserved as we convert to a canonical representation, so:

println!("{query_str_2}");

prints

$['personal']['details']['contact']['information']['phones']['home']

@azoyan
Copy link
Contributor Author

azoyan commented Dec 25, 2024

I realize in your code:
main.rs

impl MainEngine {
    pub fn get_jsonpath_strings(&self) -> Vec<String> {
        self.automaton.get_jsonpath_strings()
    }
}

automation.rs

impl Automation {
    #[must_use]
    #[inline(always)]
    pub fn get_jsonpath_strings(&self) -> Vec<String> {
        self.states
            .iter()
            .flat_map(|state| state.member_transitions.iter().map(|(pattern, _)| pattern.unquoted()))
            .map(|pattern| std::str::from_utf8(pattern).unwrap().to_string())
            .collect()
    }
}

I am satisfied with this result:

let paths = vec![
    "$.personal.details.contact.information.phones.home".to_string(),
    "$.personal.details.contact.information.phones.office".to_string(),
];
let engines: Vec<RsonpathEngine> = paths
    .iter()
    .map(|path| {
        let a = Automaton::new(&rsonpath_syntax::parse(path).unwrap()).unwrap();
        RsonpathEngine::from_compiled_query(a)
    })
    .collect();

let s = engines.get(0).unwrap().get_jsonpath_strings();
println!("{:?}", s)
["personal", "details", "contact", "information", "phones", "home"]

Is possible to add API like this?

@V0ldek
Copy link
Member

V0ldek commented Dec 25, 2024

I don't like it, it's not obvious what this function would do for more complex queries. Does $['a', 'b'] return ["a", "b"]? What about nested stuff with filters, like:

$.a[?@.b[?@.c]].d

Is the semantics literally "return the labels from left to right as in the query"? That sounds like a syntactic property, and the Automaton doesn't need to preserve those.

For syntactic properties you have the JsonPathQuery. If you pass it along with the engine you can do this:

use rsonpath_syntax::prelude::*;

let paths = vec![
    "$.personal.details.contact.information.phones.home".to_string(),
    "$.personal.details.contact.information.phones.office".to_string(),
];
let engines: Vec<(JsonPathQuery, RsonpathEngine)> = paths
    .iter()
    .map(|path| {
        let q = rsonpath_syntax::parse(path).unwrap();
        let e = RsonpathEngine::compile_query(&q).unwrap();
        (q, e)
    })
    .collect();

let s: Vec<&str> = engines[0]
    .0
    .segments()
    .iter()
    .filter_map(|s| match s {
        Segment::Child(ss) => match ss.first() {
            Selector::Name(str) => Some(str.unquoted()),
            _ => None,
        },
        _ => None,
    })
    .collect();
println!("{:?}", s);
["personal", "details", "contact", "information", "phones", "home"]

If something like this is useful we could add it to the rsonpath-syntax public API as a left-to-right gathering of all JsonStrings in the query. But it doesn't fit into a method on the engine as that one has no responsibility to preserve the syntax.

@azoyan
Copy link
Contributor Author

azoyan commented Dec 25, 2024

@V0ldek in #625 i make more natural function name.

I use natural names literally as in code

@azoyan
Copy link
Contributor Author

azoyan commented Dec 25, 2024

Or may be rename to quoted_jsonstring_patterns

@V0ldek
Copy link
Member

V0ldek commented Dec 26, 2024

Okay, so continuing the discussion on serialization/deserialization, I can implement it easily. If you could please tell me what target format will you plan to use so that I can add snapshot tests specifically for that it'd be nice.

@azoyan
Copy link
Contributor Author

azoyan commented Dec 26, 2024

Let me tell you how we use it.

All we need is from jsonpathquery compile and match incoming json.

We store jsonpath as parameter config yaml

jsonpath:
  - $.phones.home
  - $.phones.office

And then we check incoming JSON if is matched or not.

Thats why I need to serialize/deserialize to something that looks like jsonpath.

Thats why I use unqoted_segment: I can manually join('.") and make original string

@azoyan
Copy link
Contributor Author

azoyan commented Dec 26, 2024

I think you can implement serialize / deserialize as you see fit.

For my tasks i wrote wrapper around Engine and store original jsonpathquery in the string =)

@V0ldek V0ldek changed the title Serde or API for Jsonpath queries Support serde::Serialize and serde::Deserialize for the engine Dec 29, 2024
@V0ldek V0ldek moved this from Todo to Committed in Active rq development Dec 29, 2024
@V0ldek V0ldek added this to the lib v1.0.0 milestone Dec 29, 2024
@V0ldek V0ldek self-assigned this Dec 29, 2024
@github-actions github-actions bot added acceptance: go ahead Reviewed, implementation can start and removed acceptance: triage Waiting for owner's input labels Dec 29, 2024
@V0ldek V0ldek added area: library Improvements to the library API quality area: parser Related to `rsonpath-syntax` labels Dec 29, 2024
V0ldek added a commit that referenced this issue Dec 30, 2024
Implemented `serde::Serialize` and `serde::Deserialize`
for `MainEngine` in rsonpath-lib, and `JsonPathQuery`
in rsonpath-syntax with all of its constituent substructs.
The `serde` dependency is guarded behind an optional feature.

To properly proptest this feature `rsonpath-lib` needs access
to the arbitrary query generation from `rsonpath-syntax`.
The cleanest way to do this is to extract the logic to a separate
crate, giving rise to `rsonpath-syntax-proptest`.

The serialization format is not stable, as the `Automaton`
is expected to evolve. Thus, serialization includes a version
and deserialization will fail if the version disagrees.

Ref: #624
V0ldek added a commit that referenced this issue Dec 30, 2024
Added snapshot tests based on `insta` to both the rsonpath-syntax
and rsonpath-lib serialization features.

Ref: #624
@github-project-automation github-project-automation bot moved this from Committed to Merged in Active rq development Dec 31, 2024
@github-actions github-actions bot removed the acceptance: go ahead Reviewed, implementation can start label Dec 31, 2024
@V0ldek V0ldek moved this from Merged to Released in Active rq development Dec 31, 2024
@V0ldek
Copy link
Member

V0ldek commented Dec 31, 2024

This was released in 0.9.4.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area: library Improvements to the library API quality area: parser Related to `rsonpath-syntax` type: feature New feature or request
Projects
Status: Released
Development

Successfully merging a pull request may close this issue.

2 participants