Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[feat][storage] Added the auto creation of Jaeger ILM/ISM policy for ES/OS #6604

Open
wants to merge 13 commits into
base: main
Choose a base branch
from

Conversation

Manik2708
Copy link
Contributor

Which problem is this PR solving?

Part of: #4708

Description of the changes

  • Added the auto-creation of default jaeger ILM/ISM policy. This is also the first step for introducing data streams in jaeger.

How was this change tested?

  • E2E and Unit Tests

Checklist

Signed-off-by: Manik2708 <mehtamanik96@gmail.com>
Signed-off-by: Manik2708 <mehtamanik96@gmail.com>
Signed-off-by: Manik2708 <mehtamanik96@gmail.com>
Signed-off-by: Manik2708 <mehtamanik96@gmail.com>
Signed-off-by: Manik2708 <mehtamanik96@gmail.com>
@Manik2708
Copy link
Contributor Author

Some notes for the reviewer:

  1. I tried my best to decrease the length of PR but mock generation is making this PR big.
  2. Currently I have not written any unit test, have only tested the changes e2e. Once the approach seems good I will complete the code coverage.

Copy link

codecov bot commented Jan 25, 2025

Codecov Report

Attention: Patch coverage is 8.77193% with 208 lines in your changes missing coverage. Please review.

Project coverage is 95.08%. Comparing base (6d4d7c4) to head (228c68e).

Files with missing lines Patch % Lines
plugin/storage/es/ilm/policymanager.go 0.00% 84 Missing ⚠️
pkg/es/wrapper/wrapper.go 0.00% 66 Missing ⚠️
plugin/storage/es/ilm/createpolicy.go 18.75% 23 Missing and 3 partials ⚠️
plugin/storage/es/factory.go 31.57% 10 Missing and 3 partials ⚠️
plugin/storage/es/spanstore/writer.go 23.52% 13 Missing ⚠️
plugin/storage/es/dependencystore/storage.go 0.00% 3 Missing ⚠️
plugin/storage/es/samplingstore/storage.go 0.00% 3 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main    #6604      +/-   ##
==========================================
- Coverage   96.05%   95.08%   -0.98%     
==========================================
  Files         364      366       +2     
  Lines       20750    20972     +222     
==========================================
+ Hits        19932    19941       +9     
- Misses        622      828     +206     
- Partials      196      203       +7     
Flag Coverage Δ
badger_v1 9.36% <0.00%> (-0.52%) ⬇️
badger_v2 1.73% <0.00%> (-0.10%) ⬇️
cassandra-4.x-v1-manual 14.24% <0.00%> (-0.79%) ⬇️
cassandra-4.x-v2-auto 1.72% <0.00%> (-0.10%) ⬇️
cassandra-4.x-v2-manual 1.72% <0.00%> (-0.10%) ⬇️
cassandra-5.x-v1-manual 14.24% <0.00%> (-0.79%) ⬇️
cassandra-5.x-v2-auto 1.72% <0.00%> (-0.10%) ⬇️
cassandra-5.x-v2-manual 1.72% <0.00%> (-0.10%) ⬇️
elasticsearch-6.x-v1 18.26% <3.98%> (-0.98%) ⬇️
elasticsearch-7.x-v1 18.33% <3.98%> (-0.98%) ⬇️
elasticsearch-8.x-v1 18.50% <3.98%> (-0.99%) ⬇️
elasticsearch-8.x-v2 1.73% <0.00%> (-0.10%) ⬇️
grpc_v1 10.60% <0.00%> (-0.59%) ⬇️
grpc_v2 7.61% <0.00%> (-0.43%) ⬇️
kafka-3.x-v1 9.64% <0.00%> (-0.54%) ⬇️
kafka-3.x-v2 1.73% <0.00%> (-0.10%) ⬇️
memory_v2 1.73% <0.00%> (-0.10%) ⬇️
opensearch-1.x-v1 18.39% <4.42%> (-0.98%) ⬇️
opensearch-2.x-v1 18.39% <4.42%> (-0.98%) ⬇️
opensearch-2.x-v2 1.73% <0.00%> (-0.10%) ⬇️
tailsampling-processor 0.46% <0.00%> (-0.03%) ⬇️
unittests 93.88% <8.77%> (-0.97%) ⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

Signed-off-by: Manik2708 <mehtamanik96@gmail.com>
Signed-off-by: Manik2708 <mehtamanik96@gmail.com>
@@ -219,18 +222,6 @@ func TestCreateTemplateError(t *testing.T) {
require.Error(t, err, "template-error")
}

func TestILMDisableTemplateCreation(t *testing.T) {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is no need of test now, as template creation is independent of use_ilm now

var wantbytes []byte
fileSuffix := fmt.Sprintf("-%d", tt.esVersion)
wantbytes, err = FIXTURES.ReadFile("fixtures/" + templateName + fileSuffix + ".json")
require.NoError(t, err)
want := string(wantbytes)
assert.Equal(t, want, got)
require.NoError(t, json.Unmarshal(wantbytes, &wantObj))
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not related to this PR, but changing the mapping was very difficult due to this test. In this test strings are being compared rather than json objects.

@@ -155,12 +155,14 @@ func (s *storageExt) Start(_ context.Context, host component.Host) error {
s.telset.Logger,
)
case cfg.Elasticsearch != nil:
//nolint: contextcheck
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I really don't know how can I pass context or do I need to supress the linter like this only?

"actions": {
"rollover": {
"max_age": "1d"
}
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not added delete phase, currently confused between using delete in policy or max_retain_age in datastream.

@Manik2708 Manik2708 marked this pull request as draft January 26, 2025 03:15
Signed-off-by: Manik Mehta <mehtamanik96@gmail.com>
Signed-off-by: Manik2708 <mehtamanik96@gmail.com>
Signed-off-by: Manik2708 <mehtamanik96@gmail.com>
@Manik2708 Manik2708 marked this pull request as ready for review January 26, 2025 03:52
Signed-off-by: Manik2708 <mehtamanik96@gmail.com>
Signed-off-by: Manik2708 <mehtamanik96@gmail.com>
@Manik2708
Copy link
Contributor Author

I have been studying the change which PR: #6567. So now archive distinction is only on the basis of index prefix (please correct me if I am wrong). Hence we don't need to manage the archive differently here!

@yurishkuro
Copy link
Member

correct, no special logic for archiving

@@ -0,0 +1,76 @@
service:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why do we need new configs? Let's change the main ones

}
],
"ism_template": {
"index_patterns": ["*jaeger-*"],
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this feels too broad, it could catch unexpected indices this way. I think it should be parameterized with the index prefix of the actual store

// NewPolicyManager creates the policy manager with appropriate version and prefixedIndexNameWithSeparator.
// prefixedIndexNameWithSeparator is the prefix with separator. For example if index prefix is jaeger-main
// and policy manager is called for span indices, then prefixedIndexNameWithSeparator will be jaeger-main-jaeger-span-
func NewPolicyManager(cl func() es.Client, prefixedIndexNameWithSeparator string) *PolicyManager {
Copy link
Member

@yurishkuro yurishkuro Jan 26, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why does everyone need to call New & Init? Can't we have just a single public method MaybeCreatePolicy?

,"lifecycle": {
"name": "{{ .ILMPolicyName }}",
"rollover_alias": "{{ .IndexPrefix }}jaeger-dependencies-write"
{{- if .IsOpenSearch }}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not following why any of these templates need to change, please explain

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For ISM Policy this distinction is required. When ISM Policy is applied, we need not to specify the policy name in template but just the rollover alias!

serviceWriter: serviceOperationStorage.Write,
spanConverter: dbmodel.NewFromDomain(p.AllTagsAsFields, p.TagKeysAsFields, p.TagDotReplacement),
spanServiceIndex: getSpanAndServiceIndexFn(p, writeAliasSuffix),
spanAndServicePrefixFn: p.getSpanAndServicePrefixFn,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why does this need to be a lambda?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Span writer can't access the params and we need the index prefix which can only be accessed by params.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the params are in scope right here. You argument doesn't stand.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, are you saying to do it the way like it is done for getSpanAndServiceIndexFn?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am saying there is no point in this indirection. The getSpanAndServiceIndexFn has a very good reason to exist because the exact index names were very different depending on the settings. Your indirection always returns the same thing.

@@ -104,6 +107,25 @@ func (s *SpanWriter) CreateTemplates(spanTemplate, serviceTemplate string, index
return nil
}

func (s *SpanWriter) InitializePolicyManager() error {
spanPrefix := s.spanAndServicePrefixFn(spanIndexBaseName)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why can't you directly call p.IndexPrefix.Apply here?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Writer can't access params

@@ -142,6 +142,9 @@ type Configuration struct {
// latest adaptive sampling probabilities.
AdaptiveSamplingLookback time.Duration `mapstructure:"adaptive_sampling_lookback"`
Tags TagsAsFields `mapstructure:"tags_as_fields"`
// IsOpenSearch stores whether the backend is of opensearch type or not.
// If kept empty, jaeger will automatically identify the distinction.
IsOpenSearch bool `mapstructure:"is_open_search"`
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we never needed this, why now? We can automatically determine the backend based on the info it returns from the initial handshake.

}
}
} else {
policyExists, err := client.IsmPolicyExists(ctx, DefaultIsmPolicy)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

there is no need for this bifurcation for ILM/ISM at this level,

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Then what is going to be the correct level. I understood your point of creating a single method from client which will automatically determine whether backend is OS/ES and create policy but we need to distinguish between ilm-policy.json and ism-policy.json

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

try to minimize bifurcation.


// writeAliasName returns write alias name of the index
func (p *PolicyManager) writeAliasName() string {
return p.prefixedIndexNameWithSeparator + writeAliasSuffix
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we can reuse IndexPrefix type that provides concatenation capabilities

return nil
}

func (p *PolicyManager) getJaegerIndices(ctx context.Context) ([]client.Index, error) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what is this function doing and why? I thought the whole point of ILM policy was that we don't have to worry about individual indices

return nil
}

func (p *PolicyManager) createIndexIfNotExists(ctx context.Context, index string) error {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we never need this before, what changed?

}

// CreateIlmPolicy calls the internal XPackIlmPutLifecycle service
func (c ClientWrapper) CreateIlmPolicy() es.XPackIlmPutLifecycle {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The point of this client is to abstract the differences between ES7/ES8/OS. I think there should be just a single function CreatePolicy and let it figure out internally how to implement it.

@Manik2708
Copy link
Contributor Author

@yurishkuro Suddenly a point came to my mind. Actually I think that this is not the correct approach. (Your questions in the need of jaegerIndices and createIndexIfNotExist striked this) Actually the need to do this is to create alias if not created (because these read-write aliases would be used as rollover aliases). Also along with this we need to apply aliases and policy to older indices but now I don't think that it can be automated as a whole. Instead I could think of these things:

  1. Against Empty storage we can create an initial rollover index and then aliases which will be used as rollover aliases (like done in this PR, see createIndexIfNotExists).
  2. For older data we can give steps to the user for manually applying policy and rollover aliases to the indices.
    Is this making sense to you?

@yurishkuro
Copy link
Member

I think this is what you should've started from - discuss the desired behavior and backwards compatibility, before writing code. Describe user workflows

@Manik2708
Copy link
Contributor Author

@yurishkuro Sorry for the churn, I tried taking inspiration from the init of es-rollover but definitely will dive more so as to get best possible solution.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants