Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ProtobufList documentation #3128

Open
randomizedcoder opened this issue Jan 20, 2025 · 4 comments
Open

ProtobufList documentation #3128

randomizedcoder opened this issue Jan 20, 2025 · 4 comments
Labels

Comments

@randomizedcoder
Copy link

Company or project name

No response

Describe the issue

G'day,

I'm not sure this is a issue, but maybe the documentation for protobuflist seems to be missing "repeated"?

https://clickhouse.com/docs/en/interfaces/formats#protobuflist
protobuflist

syntax = "proto3";
message Envelope {
  message MessageType {
    string name = 1;
    string surname = 2;
    uint32 birthDate = 3;
    repeated string phoneNumbers = 4;
  };
  MessageType row = 1;                            // <------------ NOT "repeated" ???
};

The original PR ClickHouse/ClickHouse#35152 has "repeated".
protobufListPR

// schemafile.proto
message Envelope {
  message MessageType {
    uint32 colA = 1;
    string colB = 2;
  }
  repeated MessageType mt = 1;                 // <----------------- "repeated"
}

The original issue that started this also has "repeated". issue16436

It would also be amazing if the protobuflist docs linked to a working example.

Maybe this is an example, but it doesn't have "row"?
https://github.com/ClickHouse/ClickHouse/blob/677b1cbfe76d0326df4b739bb0ad359af125ed87/tests/queries/0_stateless/format_schemas/02240_protobuflist_format_persons_syntax2.proto#L1

This is another example, and it does have "repeated". This example is also confusing, because it's unclear if the message type "Person" needs to be repetitively defined. To compile this protobuf, you do NOT need to define "Person" twice, but maybe you do for Clickhouse?
https://github.com/ClickHouse/ClickHouse/blob/677b1cbfe76d0326df4b739bb0ad359af125ed87/tests/queries/0_stateless/format_schemas/02240_protobuflist1_format_persons.proto#L68

:) oh great. The integration test doesn't cover ProtobufList. Maybe this is why I'm struggling. Maybe it doesn't work anymore?
https://github.com/ClickHouse/ClickHouse/blob/677b1cbfe76d0326df4b739bb0ad359af125ed87/tests/integration/test_storage_kafka/test_produce_http_interface.py#L137
https://github.com/ClickHouse/ClickHouse/blob/677b1cbfe76d0326df4b739bb0ad359af125ed87/tests/integration/test_storage_kafka/test_produce_http_interface.py#L172
...

While I'm here the Protobuf docs reference to some random GEODE project, but then doesn't really answer the question about how to add the length delimiting

See also [how to read/write length-delimited protobuf messages in popular languages](https://cwiki.apache.org/confluence/display/GEODE/Delimiting+Protobuf+Messages).

It might improve the docs to link directly to the golang protodelim MarshalTo
protodelim.MarshalTo
https://pkg.go.dev/google.golang.org/protobuf@v1.36.3/encoding/protodelim#MarshalTo

Thanks

Additional context

No response

@Blargian Blargian transferred this issue from ClickHouse/ClickHouse Jan 22, 2025
@Blargian Blargian added the P1 label Jan 22, 2025
@Blargian
Copy link
Member

Hi @randomizedcoder, thanks for opening this issue! Examples in formats is something we recognise as needing improvement (in some cases there aren't even examples) that we will be tackling in the near future, so I will add this to the list of formats we need to look at.

@randomizedcoder
Copy link
Author

@Blargian Thanks for the reply. Do you have any reason to believe ProtobufList works?

@Blargian
Copy link
Member

@Blargian Thanks for the reply. Do you have any reason to believe ProtobufList works?

@antaljanosbenjamin maybe you could help us out here?

@antaljanosbenjamin
Copy link
Member

I think @rschu1ze is the best to answer this, however I will try to answer it.

I think the fields in the Envelope message doesn't matter. We have tests that uses ProtobufList without any fields, just as a submessage.

I think it depends on the message is embedded in the message Envelope and it doesn't care about the fields. In the code we use the FindMessageTypeByName, which doesn't care about the fields at all.

Now let's answer the questions:

I'm not sure this is a issue, but maybe the documentation for protobuflist seems to be missing "repeated"?

No. The example should work, you can even delete the declared field as long as you keep the embedded message. However because of the extra field, it is super confusing, so it definitely needs improvements, thus thanks for bringing this up!

Maybe this is an example, but it doesn't have "row"?

Yes, that works as our serializer doesn't care about the fields.

To compile this protobuf, you do NOT need to define "Person" twice, but maybe you do for Clickhouse?

The message is only duplicated to make our test scripts simpler.

Do you have any reason to believe ProtobufList works?

Based on the tests I have high confidence in ProtobufList.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

3 participants