-
Notifications
You must be signed in to change notification settings - Fork 290
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
NChan no longer works with clustered Redis in AWS Elasticache #662
Comments
What version of Nchan are you running? |
Currently on 1.2.10, and our resultant config looks something along the lines of:
|
1.2.10 is over 2 years and 10 releases behind. Please try the latest version (1.3.6). Elasticache should work just fine. |
Sorry, I think I might've made things a bit confusing. 1.2.10 is working fine. We are facing these issues when we try to upgrade to 1.3.6. We are seeing this issue only when using Elasticache with clustered Redis. Our other service that uses non-clustered Redis is working fine with 1.3.6. I think I misunderstood what you meant by "what version are you running", sorry! |
Hi @slact, have you had a chance to look into this further? |
What is your ElastiCache configuration? Is TLS enabled? AUTH? |
No TLS or Auth in this case no. Just a cluster with a couple shards and replicas running on Redis 7. |
Strange. I have no problem whatsoever using ElastiCache, any version, on any modern Nchan version. Please try the following (separately):
Let me know which of these work, if any |
How is nchan discovering nodes from ElastiCache? I would like to simulate it locally with DNS-based HAProxy setup. |
@slact Sorry for the delay in getting back to you. I tried these separately as you suggested:
You mentioned you had no issues reproducing the issue, were you using the configuration endpoint or that of the individual nodes? The issue is specific to using the configuration endpoint due to its use of some form of a round-robin DNS. The configuration endpoint worked with NChan until v1.2.15. |
Yeah, I had no problem using the shared config endpoint with Roundrobin DNS. Please try the following: set the logging level to 'debug'', and grep through the log for anything with |
closing this as stale |
Reopened due to popular demand |
Thanks for reopening the issue; I've picked this up from @mkdewidar. Since his last comment we have been using the suggested workaround of directly targeting a node instead of the configuration endpoint, however recently we’ve discovered if the targeted node happens to failover then a connection to Redis is never re-established even after the cluster recovers (however, targeting the configuration endpoint as recommended by AWS does not encounter this issue). This makes the workaround less than ideal. Separately we’ve found that when using the configuration endpoint the “IP address connects to more than one server. Is Redis behind a proxy?” connection failures eventually stop after 10+ minutes and a successful connection is established. Debug log filtered to entries containing "redis" is as follows:
|
For context, our Redis configuration for the above log is 1 shard, 2 nodes, cluster mode enabled, TLS enabled, engine version 7.0.7. We are now on NChan 1.3.7. |
Hi,
It seems that starting from v1.2.15 (technically, v1.2.13 but that was withdrawn), NChan can no longer be used with AWS Elasticache Redis clusters. It fails to establish connections with the cluster citing (with debug logs enabled):
Elasticache Redis clusters are not behind a proxy, though there is some sort of DNS load balancing that happens. Clients use DNS to resolve a fixed hostname (called the "configuration endpoint") to any one of the cluster's nodes, and then discover the IP addresses of the other nodes in the cluster using standard Redis cluster commands.
From what I can tell, the root of the issue here seems to be that as part of the Redis TLS support changes, the pubsub connection now connects to Redis by
cp->hostname
, rather thancp->peername
innode_connector_callback
for theREDIS_NODE_CMD_CONNECTING
state.I applied the following patch and tested and it seemed to fix it, however I don't know enough about TLS or NChan to know if this is a reliable solution or not.
The text was updated successfully, but these errors were encountered: