Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Job doesn't work with Opensearch #110

Open
dnsppv opened this issue Nov 3, 2021 · 5 comments
Open

Job doesn't work with Opensearch #110

dnsppv opened this issue Nov 3, 2021 · 5 comments
Labels

Comments

@dnsppv
Copy link

dnsppv commented Nov 3, 2021

Describe the bug
Cannot get dependencies feature on "System architecture" tab in Jaeger UI with Opensearch 1.1.0 as backend. In case of switching to Elasticsearch 7.10.1 works without any problems.

To Reproduce
Steps to reproduce the behavior:

  1. Run opensearch container in Docker: docker run --detach --name opensearch -p 9200:9200 -p 9600:9600 -e "discovery.type=single-node" -e "plugins.security.disabled=true" opensearchproject/opensearch:1.1.0
  2. Run Jaeger: docker run --detach --name jaeger --link opensearch --env SPAN_STORAGE_TYPE=opensearch --env ES_SERVER_URLS=http://opensearch:9200 -p 6831:6831/udp -p 6832:6832/udp -p 5778:5778 -p 16686:16686 -p 14268:14268 -p 14250:14250 jaegertracing/all-in-one:1.27 --es.num-replicas=1 --es.num-shards=1
  3. Run example app: docker run --detach --name hotrod --link jaeger -p8080-8083:8080-8083 -e JAEGER_AGENT_HOST="jaeger" jaegertracing/example-hotrod:1.27 all
  4. Tap on buttons in HotRod UI to collect data in database
  5. Run dependencies job: docker run --rm --link opensearch --env STORAGE=elasticsearch --env ES_NODES=http://opensearch:9200 jaegertracing/spark-dependencies, it exits with error

Expected behavior
Storing result to jaeger-dependencies-YYYY-MM-DD index, displaying it on Jaeger UI

Screenshots

$ docker run --rm --link opensearch --env STORAGE=elasticsearch --env ES_NODES=http://opensearch:9200 jaegertracing/spark-dependencies
21/11/03 12:27:36 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
21/11/03 12:27:36 INFO ElasticsearchDependenciesJob: Running Dependencies job for 2021-11-03T00:00Z, reading from jaeger-span-2021-11-03 index, result storing to jaeger-dependencies-2021-11-03
Exception in thread "main" org.elasticsearch.hadoop.EsHadoopIllegalArgumentException: invalid map received dynamic_templates=[{span_tags_map={path_match=tag.*, mapping={ignore_above=256, type=keyword}}}, {process_tags_map={path_match=process.tag.*, mapping={ignore_above=256, type=keyword}}}]
        at org.elasticsearch.hadoop.serialization.dto.mapping.FieldParser.parseField(FieldParser.java:146)
        at org.elasticsearch.hadoop.serialization.dto.mapping.FieldParser.parseMapping(FieldParser.java:88)
        at org.elasticsearch.hadoop.serialization.dto.mapping.FieldParser.parseIndexMappings(FieldParser.java:69)
        at org.elasticsearch.hadoop.serialization.dto.mapping.FieldParser.parseMappings(FieldParser.java:40)
        at org.elasticsearch.hadoop.rest.RestClient.getMappings(RestClient.java:321)
        at org.elasticsearch.hadoop.rest.RestClient.getMappings(RestClient.java:307)
        at org.elasticsearch.hadoop.rest.RestRepository.getMappings(RestRepository.java:293)
        at org.elasticsearch.hadoop.rest.RestService.findPartitions(RestService.java:252)
        at org.elasticsearch.spark.rdd.AbstractEsRDD.esPartitions$lzycompute(AbstractEsRDD.scala:79)
        at org.elasticsearch.spark.rdd.AbstractEsRDD.esPartitions(AbstractEsRDD.scala:78)
        at org.elasticsearch.spark.rdd.AbstractEsRDD.getPartitions(AbstractEsRDD.scala:48)
        at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:253)
        at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:251)
        at scala.Option.getOrElse(Option.scala:121)
        at org.apache.spark.rdd.RDD.partitions(RDD.scala:251)
        at org.apache.spark.rdd.MapPartitionsRDD.getPartitions(MapPartitionsRDD.scala:35)
        at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:253)
        at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:251)
        at scala.Option.getOrElse(Option.scala:121)
        at org.apache.spark.rdd.RDD.partitions(RDD.scala:251)
        at org.apache.spark.Partitioner$$anonfun$4.apply(Partitioner.scala:75)
        at org.apache.spark.Partitioner$$anonfun$4.apply(Partitioner.scala:75)
        at scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234)
        at scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234)
        at scala.collection.immutable.List.foreach(List.scala:381)
        at scala.collection.TraversableLike$class.map(TraversableLike.scala:234)
        at scala.collection.immutable.List.map(List.scala:285)
        at org.apache.spark.Partitioner$.defaultPartitioner(Partitioner.scala:75)
        at org.apache.spark.rdd.RDD$$anonfun$groupBy$1.apply(RDD.scala:691)
        at org.apache.spark.rdd.RDD$$anonfun$groupBy$1.apply(RDD.scala:691)
        at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151)
        at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:112)
        at org.apache.spark.rdd.RDD.withScope(RDD.scala:363)
        at org.apache.spark.rdd.RDD.groupBy(RDD.scala:690)
        at org.apache.spark.api.java.JavaRDDLike$class.groupBy(JavaRDDLike.scala:243)
        at org.apache.spark.api.java.AbstractJavaRDDLike.groupBy(JavaRDDLike.scala:45)
        at io.jaegertracing.spark.dependencies.elastic.ElasticsearchDependenciesJob.run(ElasticsearchDependenciesJob.java:236)
        at io.jaegertracing.spark.dependencies.elastic.ElasticsearchDependenciesJob.run(ElasticsearchDependenciesJob.java:212)
        at io.jaegertracing.spark.dependencies.DependenciesSparkJob.run(DependenciesSparkJob.java:54)
        at io.jaegertracing.spark.dependencies.DependenciesSparkJob.main(DependenciesSparkJob.java:40)

Version (please complete the following information):

  • OS: Ubuntu 20.04
  • Jaeger version: 1.27
  • Deployment: Docker
  • Backend: Opensearch 1.1.0
@dnsppv dnsppv added the bug label Nov 3, 2021
@Jakob3xD
Copy link

Is there any update on this issue? Jaeger-spark is the last system blocking me from upgrading to OpenSearch.

@pavolloffay
Copy link
Member

No updates, I am not planning to work on this at least for now.

If somebody has free cycles to contribute this feature I am happy to review and approve.

@YumeNoTenshi
Copy link

Hi fellas,
I've faced the same issue and got it working with updating elasticsearch-spark-20_2.12 dependency to 7.16.3 version. But this version of spark provides compatibility check for ES version and this could be get around by this setting is Opensearch. https://opensearch.org/docs/latest/clients/agents-and-ingestion-tools/index/
Finally, It works with selfbuilded container and workaround setting in Opensearch.
As this looks like workaround, I can't provide the MR. The target solution in my opinion would be to wait till Spark will provide Opensearch client, as Opensearch wouldn't be compatible with Elasticsearch(theoretically).

@tronda
Copy link

tronda commented Jun 7, 2023

Hey. Just like @shnurok672 I've made a custom version where I've updated the ElasticSearch-Spark dependency to latest version and adjusted the OpenSearch settings and then the dependency job works. We are getting a lots of warnings in the logs while running:

23/06/06 23:00:08 WARN RestClient: Could not verify server is Elasticsearch! Invalid main action response body format [tag].
23/06/06 23:00:08 WARN RestClient: Could not verify server is Elasticsearch! Invalid main action response body format [build_flavor].
23/06/06 23:00:08 WARN RestClient: Could not verify server is Elasticsearch! ES-Hadoop will require server validation when connecting to an Elasticsearch cluster if that Elasticsearch cluster is v7.14 and up.

I see that the OpenSearch project has released their fork of the ElasticSearch-Spark library. I guess this could be used to create an OpenSearch version of the Spark job. I've tried to get the tests in this project to work, but without success. Being able to contribute to this project is a bit difficult since the tests are a bit complex to get into.

@tronda
Copy link

tronda commented Nov 10, 2023

We have a fork where I have updated the dependency towards ElasticSearch client library. We have a custom build based on these changes.
https://github.com/DIPSAS/spark-dependencies

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

5 participants