Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add some tsfile-related tools #14766

Open
wants to merge 4 commits into
base: master
Choose a base branch
from
Open

Add some tsfile-related tools #14766

wants to merge 4 commits into from

Conversation

jt2594838
Copy link
Contributor

@jt2594838 jt2594838 commented Jan 23, 2025

  1. TsFileRewriteOverPrecisedI64Scan
    Rewrite Int64 chunks that can be converted to Int32 losslessly.

  2. TsFileRewriteSmallRangeI64Scan
    Rewrite Int64 chunks whose ranges are no larger than Int32.MAX_VALUE by replacing the value with the difference with the first value in the chunk.

  3. TsFileStatisticScan
    It prints:
    a. the number of series of each data type;
    b. the number of Int64 chunks that are over-precised (can be converted to Int32 losslessly), just-precised (cannot be converted to Int32 losslessly), small range (range not larger than Int32.MAX_VALUE), or large range (range larger than Int32.MAX_VALUE);
    c. the number of points of each data type;
    d. the total chunk size of each data type;
    e. average distinct binary value num of STRING and TEXT chunks;

Copy link

codecov bot commented Jan 23, 2025

Codecov Report

Attention: Patch coverage is 5.89812% with 351 lines in your changes missing coverage. Please review.

Project coverage is 39.16%. Comparing base (79b0807) to head (d38e080).
Report is 10 commits behind head on master.

Files with missing lines Patch % Lines
...ache/iotdb/db/tools/utils/TsFileStatisticScan.java 0.00% 134 Missing ⚠️
...db/tools/utils/TsFileRewriteSmallRangeI64Scan.java 0.00% 112 Missing ⚠️
.../tools/utils/TsFileRewriteOverPrecisedI64Scan.java 0.00% 105 Missing ⚠️
Additional details and impacted files
@@             Coverage Diff              @@
##             master   #14766      +/-   ##
============================================
- Coverage     39.22%   39.16%   -0.06%     
  Complexity      193      193              
============================================
  Files          4443     4450       +7     
  Lines        282425   283271     +846     
  Branches      34849    34949     +100     
============================================
+ Hits         110784   110946     +162     
- Misses       171641   172325     +684     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.


@Override
protected void onFileEnd() throws IOException {
writer.endFile();
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does the last chunk group call writer.endChunkGroup()?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed

Copy link

Quality Gate Failed Quality Gate failed

Failed conditions
3 Security Hotspots
31.4% Duplication on New Code (required ≤ 5%)

See analysis details on SonarQube Cloud


private boolean rewriteInt64ChunkAligned(Chunk chunk) throws IOException {
AlignedChunkReader chunkReader =
new AlignedChunkReader(currTimeChunk, Collections.singletonList(chunk));
Copy link
Collaborator

@shuwenwei shuwenwei Jan 24, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There are only one column. If any value of this column is null, one row may be skipped.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is to find how much space can be saved by rewriting Int64 to Int32. If nulls are removed, the comparison will be not that precise.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suppose there are (time, s1, s2, s3,) and we want to rewrite chunks of s1. At this time, a TimeValuePair is (1, null, 1, 1). When only (time, s1) is provided to AlignedChunkReader, this row will not be returned because all value is null, and the new chunk written by ValueChunkWriter may not be able to align the value with the original time chunk.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To retain these null values, TableChunkReader should be used here

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants