Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[opt](exec)lazy deserialize pblock in VDataStreamRecvr::SenderQueue … #46768

Draft
wants to merge 1 commit into
base: branch-3.0
Choose a base branch
from

Conversation

Mryange
Copy link
Contributor

@Mryange Mryange commented Jan 10, 2025

…(#44378)
#44378
Previously, for a pblock (serialized block), the block would be deserialized immediately
after receiving the RPC request and then placed into the data_queue. This approach caused significant time consumption during RPC processing due to the
deserialization process, impacting overall performance. The new approach defers deserialization until getBlock is called. This has the following advantages:

  1. Reduces time spent during the RPC handling phase.
  2. Memory allocation for deserialization happens within the execution thread, improving cache locality
    and reducing contention on memory resources.
  • Test

    • Regression test
    • Unit Test
    • Manual test (add detailed scripts or steps below)
    • No need to test or manual test. Explain why:
  • This is a refactor/code format and no logic has been changed.
    - [ ] Previous test can cover this change. - [ ] No code files have been changed. - [ ] Other reason

  • Behavior changed:

    • No.
    • Yes.
  • Does this need documentation?

    • No.
  • Yes.

  • Confirm the release note

  • Confirm test cases

  • Confirm document

  • Add branch pick label

What problem does this PR solve?

Issue Number: close #xxx

Related PR: #xxx

Problem Summary:

Release note

None

Check List (For Author)

  • Test

    • Regression test
    • Unit Test
    • Manual test (add detailed scripts or steps below)
    • No need to test or manual test. Explain why:
      • This is a refactor/code format and no logic has been changed.
      • Previous test can cover this change.
      • No code files have been changed.
      • Other reason
  • Behavior changed:

    • No.
    • Yes.
  • Does this need documentation?

    • No.
    • Yes.

Check List (For Reviewer who merge this PR)

  • Confirm the release note
  • Confirm test cases
  • Confirm document
  • Add branch pick label

…pache#44378)

Previously, for a `pblock` (serialized block), the block would be
deserialized immediately
after receiving the RPC request and then placed into the `data_queue`.
This approach caused significant time consumption during RPC processing
due to the
deserialization process, impacting overall performance.
The new approach defers deserialization until `getBlock` is called. This
has the following advantages:
1. Reduces time spent during the RPC handling phase.
2. Memory allocation for deserialization happens within the execution
thread, improving cache locality
   and reducing contention on memory resources.

- Test <!-- At least one of them must be included. -->
    - [ ] Regression test
    - [ ] Unit Test
    - [ ] Manual test (add detailed scripts or steps below)
    - [x] No need to test or manual test. Explain why:
- [x] This is a refactor/code format and no logic has been changed.
        - [ ] Previous test can cover this change.
        - [ ] No code files have been changed.
        - [ ] Other reason <!-- Add your reason?  -->

- Behavior changed:
    - [x] No.
    - [ ] Yes. <!-- Explain the behavior change -->

- Does this need documentation?
    - [x] No.
- [ ] Yes. <!-- Add document PR link here. eg:
apache/doris-website#1214 -->

- [ ] Confirm the release note
- [ ] Confirm test cases
- [ ] Confirm document
- [ ] Add branch pick label <!-- Add branch pick label that this PR
should merge into -->
@hello-stephen
Copy link
Contributor

Thank you for your contribution to Apache Doris.
Don't know what should be done next? See How to process your PR.

Please clearly describe your PR:

  1. What problem was fixed (it's best to include specific error reporting information). How it was fixed.
  2. Which behaviors were modified. What was the previous behavior, what is it now, why was it modified, and what possible impacts might there be.
  3. What features were added. Why was this function added?
  4. Which code was refactored and why was this part of the code refactored?
  5. Which functions were optimized and what is the difference before and after the optimization?

@Mryange
Copy link
Contributor Author

Mryange commented Jan 10, 2025

run buildall

@Mryange Mryange marked this pull request as draft January 10, 2025 08:20
@doris-robot
Copy link

TPC-H: Total hot run time: 40957 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpch-tools
Tpch sf100 test result on commit 008b3ffdbd4956caac51b30ee192907baea2f55f, data reload: false

------ Round 1 ----------------------------------
q1	17584	7612	7286	7286
q2	2068	182	162	162
q3	10582	1088	1235	1088
q4	10570	704	686	686
q5	7772	2865	2814	2814
q6	236	150	143	143
q7	986	614	595	595
q8	9365	1953	2026	1953
q9	6613	6416	6390	6390
q10	7044	2338	2341	2338
q11	478	265	257	257
q12	396	211	201	201
q13	17766	3015	3004	3004
q14	259	219	211	211
q15	571	513	522	513
q16	683	612	612	612
q17	963	540	525	525
q18	7350	6760	6802	6760
q19	1399	1115	1135	1115
q20	453	215	197	197
q21	4216	3224	3116	3116
q22	1079	991	1007	991
Total cold run time: 108433 ms
Total hot run time: 40957 ms

----- Round 2, with runtime_filter_mode=off -----
q1	7454	7225	7262	7225
q2	327	240	227	227
q3	3039	2946	2974	2946
q4	2046	1816	1813	1813
q5	5732	5713	5699	5699
q6	219	138	137	137
q7	2260	1823	1820	1820
q8	3328	3547	3449	3449
q9	8929	8853	8841	8841
q10	3614	3546	3542	3542
q11	591	491	492	491
q12	830	611	647	611
q13	8761	3166	3172	3166
q14	309	279	270	270
q15	582	531	536	531
q16	728	681	676	676
q17	1849	1587	1616	1587
q18	8200	7666	7696	7666
q19	1651	1489	1676	1489
q20	2123	1846	1879	1846
q21	5397	5389	5390	5389
q22	1166	1070	1022	1022
Total cold run time: 69135 ms
Total hot run time: 60443 ms

@doris-robot
Copy link

TPC-DS: Total hot run time: 198642 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpcds-tools
TPC-DS sf100 test result on commit 008b3ffdbd4956caac51b30ee192907baea2f55f, data reload: false

query1	1311	926	915	915
query2	6278	2153	2110	2110
query3	10955	4451	4248	4248
query4	66537	28864	23466	23466
query5	4937	452	465	452
query6	439	190	179	179
query7	5556	324	314	314
query8	322	253	235	235
query9	8998	2685	2680	2680
query10	446	269	255	255
query11	17287	15202	15836	15202
query12	160	104	103	103
query13	1512	449	443	443
query14	9836	7879	7236	7236
query15	200	184	177	177
query16	7123	484	471	471
query17	1088	585	584	584
query18	1887	349	326	326
query19	207	167	162	162
query20	131	115	111	111
query21	215	114	115	114
query22	4768	4637	4576	4576
query23	35079	34502	34654	34502
query24	6298	2932	2970	2932
query25	562	437	417	417
query26	668	170	170	170
query27	1930	365	367	365
query28	4181	2503	2511	2503
query29	699	461	432	432
query30	233	161	178	161
query31	1008	828	835	828
query32	71	61	60	60
query33	437	295	278	278
query34	900	512	527	512
query35	829	775	712	712
query36	1064	963	973	963
query37	122	77	71	71
query38	4096	4127	4158	4127
query39	1517	1492	1518	1492
query40	206	111	101	101
query41	49	55	46	46
query42	115	101	102	101
query43	535	494	490	490
query44	1180	841	853	841
query45	187	176	168	168
query46	1124	741	730	730
query47	2032	1870	1920	1870
query48	502	387	400	387
query49	756	406	391	391
query50	844	425	436	425
query51	7414	7276	7084	7084
query52	99	89	87	87
query53	256	197	186	186
query54	555	458	462	458
query55	76	79	78	78
query56	258	243	241	241
query57	1227	1100	1106	1100
query58	236	212	227	212
query59	3224	3004	2983	2983
query60	273	259	259	259
query61	118	118	110	110
query62	770	664	684	664
query63	224	194	195	194
query64	1377	695	681	681
query65	3284	3203	3213	3203
query66	707	307	306	306
query67	15752	15661	15614	15614
query68	4218	576	587	576
query69	433	274	273	273
query70	1189	1142	1177	1142
query71	375	250	252	250
query72	6403	4070	4051	4051
query73	759	342	345	342
query74	10274	9079	8912	8912
query75	3346	2687	2613	2613
query76	2037	1187	1071	1071
query77	555	308	293	293
query78	10513	9557	9549	9549
query79	1982	593	611	593
query80	1327	438	453	438
query81	527	239	240	239
query82	1205	119	114	114
query83	165	153	143	143
query84	279	80	85	80
query85	968	311	303	303
query86	421	282	300	282
query87	4700	4384	4403	4384
query88	3777	2402	2358	2358
query89	409	296	294	294
query90	1938	190	192	190
query91	204	172	150	150
query92	67	51	60	51
query93	2389	551	547	547
query94	878	303	303	303
query95	365	252	258	252
query96	611	288	272	272
query97	3363	3201	3179	3179
query98	220	217	196	196
query99	1631	1308	1315	1308
Total cold run time: 320795 ms
Total hot run time: 198642 ms

@doris-robot
Copy link

ClickBench: Total hot run time: 33.83 s
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/clickbench-tools
ClickBench test result on commit 008b3ffdbd4956caac51b30ee192907baea2f55f, data reload: false

query1	0.03	0.04	0.03
query2	0.07	0.03	0.03
query3	0.23	0.07	0.07
query4	1.63	0.10	0.10
query5	0.52	0.50	0.52
query6	1.13	0.72	0.72
query7	0.02	0.02	0.01
query8	0.04	0.03	0.04
query9	0.56	0.50	0.50
query10	0.56	0.54	0.55
query11	0.14	0.11	0.13
query12	0.14	0.11	0.11
query13	0.61	0.60	0.59
query14	3.06	2.94	2.92
query15	0.88	0.83	0.83
query16	0.38	0.37	0.38
query17	1.04	1.01	0.99
query18	0.24	0.23	0.23
query19	1.95	1.90	1.95
query20	0.01	0.02	0.01
query21	15.37	0.61	0.60
query22	2.61	2.97	2.27
query23	17.02	1.01	0.84
query24	3.82	1.13	1.18
query25	0.22	0.12	0.11
query26	0.52	0.14	0.14
query27	0.05	0.04	0.06
query28	9.90	1.11	1.07
query29	12.60	3.31	3.26
query30	0.26	0.06	0.06
query31	2.86	0.39	0.38
query32	3.23	0.46	0.47
query33	3.01	3.03	3.04
query34	16.87	4.48	4.49
query35	4.57	4.58	4.57
query36	0.67	0.52	0.50
query37	0.10	0.07	0.06
query38	0.05	0.03	0.03
query39	0.03	0.02	0.02
query40	0.15	0.13	0.12
query41	0.08	0.02	0.02
query42	0.03	0.02	0.02
query43	0.03	0.03	0.04
Total cold run time: 107.29 s
Total hot run time: 33.83 s

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants