-
Notifications
You must be signed in to change notification settings - Fork 1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add some tsfile-related tools #14766
base: master
Are you sure you want to change the base?
Conversation
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## master #14766 +/- ##
============================================
- Coverage 39.22% 39.16% -0.06%
Complexity 193 193
============================================
Files 4443 4450 +7
Lines 282425 283271 +846
Branches 34849 34949 +100
============================================
+ Hits 110784 110946 +162
- Misses 171641 172325 +684 ☔ View full report in Codecov by Sentry. |
|
||
@Override | ||
protected void onFileEnd() throws IOException { | ||
writer.endFile(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does the last chunk group call writer.endChunkGroup()?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed
Quality Gate failedFailed conditions |
|
||
private boolean rewriteInt64ChunkAligned(Chunk chunk) throws IOException { | ||
AlignedChunkReader chunkReader = | ||
new AlignedChunkReader(currTimeChunk, Collections.singletonList(chunk)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There are only one column. If any value of this column is null, one row may be skipped.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is to find how much space can be saved by rewriting Int64 to Int32. If nulls are removed, the comparison will be not that precise.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Suppose there are (time, s1, s2, s3,) and we want to rewrite chunks of s1. At this time, a TimeValuePair is (1, null, 1, 1). When only (time, s1) is provided to AlignedChunkReader, this row will not be returned because all value is null, and the new chunk written by ValueChunkWriter may not be able to align the value with the original time chunk.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To retain these null values, TableChunkReader should be used here
TsFileRewriteOverPrecisedI64Scan
Rewrite Int64 chunks that can be converted to Int32 losslessly.
TsFileRewriteSmallRangeI64Scan
Rewrite Int64 chunks whose ranges are no larger than Int32.MAX_VALUE by replacing the value with the difference with the first value in the chunk.
TsFileStatisticScan
It prints:
a. the number of series of each data type;
b. the number of Int64 chunks that are over-precised (can be converted to Int32 losslessly), just-precised (cannot be converted to Int32 losslessly), small range (range not larger than Int32.MAX_VALUE), or large range (range larger than Int32.MAX_VALUE);
c. the number of points of each data type;
d. the total chunk size of each data type;
e. average distinct binary value num of STRING and TEXT chunks;