Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

The number of clients connected to nodes in the Redis cluster fluctuates every one hour #3142

Open
xingzhang8023 opened this issue Jan 27, 2025 · 1 comment
Labels
status: waiting-for-feedback We need additional information before we can continue

Comments

@xingzhang8023
Copy link

xingzhang8023 commented Jan 27, 2025

Bug Report

Current Behavior

The number of clients connected to the Redis cluster node fluctuates every one hour. The idle timeout period of the Redis service is set to 1 hour. The more connections in the connection pool, the greater the flapping amplitude.

Image

Stack trace
// your stack trace here;

Input Code

Input Code
 // connection pool
    @Bean(destroyMethod = "closeAsync")
    @ConditionalOnProperty(name = "spring.redis.cluster.nodes")
    AsyncPool<ProxyConnection<String, String>> redisClusterConnectionPool(RedisClusterClient redisClusterClient,
        BoundedPoolConfig config) {
        Supplier<CompletionStage<ProxyConnection<String, String>>> connectionSupplier =
            () -> redisClusterClient.connectAsync(StringCodec.UTF8).thenApply(ProxyConnectionClusterImpl::new);

        return createConnectionBoundedAsyncPool(config, connectionSupplier);
    }
    
    private BoundedAsyncPool<ProxyConnection<String, String>> createConnectionBoundedAsyncPool(
        BoundedPoolConfig config, Supplier<CompletionStage<ProxyConnection<String, String>>> connectionSupplier) {
        BoundedAsyncPool<ProxyConnection<String, String>> redisConnectionPool
            = AsyncConnectionPoolSupport.createBoundedObjectPool(connectionSupplier, config, false);
        RedisConnectionPoolMetric.buildRedisPoolMetric(redisConnectionPool);
        return redisConnectionPool;
    }
// client configuration
private ClusterClientOptions getClientOptions()  {
        ClusterTopologyRefreshOptions clusterTopologyRefreshOptions = ClusterTopologyRefreshOptions.builder()
            .enableAllAdaptiveRefreshTriggers()
            .adaptiveRefreshTriggersTimeout(
                Duration.ofSeconds(redisBaseProperty.getTopologyAdaptiveRefreshInSecond()))
            .enablePeriodicRefresh(Duration.ofSeconds(redisBaseProperty.getTopologyPeriodicRefreshInSecond()))
            .build();
        ClusterClientOptions.Builder clientOptionBuilder = ClusterClientOptions.builder()
            .protocolVersion(getRedisProtocolVersion())
            .topologyRefreshOptions(clusterTopologyRefreshOptions)
            .timeoutOptions(getClientTimeoutOptions())
            .socketOptions(SocketOptions.builder()
                .keepAlive(SocketOptions.KeepAliveOptions.builder()
                    .enable(redisBaseProperty.getCustomKeepalive())
                    .count(redisBaseProperty.getTcpKeepCNT())
                    .idle(Duration.ofSeconds(redisBaseProperty.getTcpKeepIdleInSecond()))
                    .interval(Duration.ofSeconds(redisBaseProperty.getTcpKeepIntervalInSecond()))
                    .build())
                .tcpUserTimeout(SocketOptions.TcpUserTimeoutOptions.builder()
                    .enable()
                    .tcpUserTimeout(Duration.ofSeconds(redisBaseProperty.getTcpUserTimeoutSeconds()))
                    .build())
                .build());
        return clientOptionBuilder.build();
    }

Expected behavior/code

Environment

  • Lettuce version(s): 6.3.0.RELEASE
  • Redis version: 5.0.9

Possible Solution

The client triggers the Watchdog to reconnect. During reconnection, RoundRobinSocketAddressSupplier is used to obtain the next Redis node to be connected.
Node list of the RoundRobinSocketAddressSupplier, which is fixed during initialization. If no node change occurs later,

Can you reorder the partions each time you obtain them? In this way, the node clients are relatively balanced.

class RoundRobinSocketAddressSupplier implements Supplier<SocketAddress> {

    private static final InternalLogger logger = InternalLoggerFactory.getInstance(RoundRobinSocketAddressSupplier.class);

    private final Supplier<Partitions> partitions;

    private final Function<Collection<RedisClusterNode>, Collection<RedisClusterNode>> sortFunction;

    private final ClientResources clientResources;

    private final RoundRobin<RedisClusterNode> roundRobin;

    @SuppressWarnings({ "unchecked", "rawtypes" })
    public RoundRobinSocketAddressSupplier(Supplier<Partitions> partitions,
            Function<? extends Collection<RedisClusterNode>, Collection<RedisClusterNode>> sortFunction,
            ClientResources clientResources) {

        LettuceAssert.notNull(partitions, "Partitions must not be null");
        LettuceAssert.notNull(sortFunction, "Sort-Function must not be null");

        this.partitions = partitions;
        this.roundRobin = new RoundRobin<>(
                (l, r) -> l.getUri() == r.getUri() || (l.getUri() != null && l.getUri().equals(r.getUri())));
        this.sortFunction = (Function) sortFunction;
        this.clientResources = clientResources;
        resetRoundRobin(partitions.get());   // init 
    }

    @Override
    public SocketAddress get() {

        Partitions partitions = this.partitions.get();
        if (!roundRobin.isConsistent(partitions)) {
            resetRoundRobin(partitions);
        }

        RedisClusterNode redisClusterNode = roundRobin.next();
        return getSocketAddress(redisClusterNode);
    }

    protected void resetRoundRobin(Partitions partitions) {
        roundRobin.rebuild(sortFunction.apply(partitions));
    }

    protected SocketAddress getSocketAddress(RedisClusterNode redisClusterNode) {

        SocketAddress resolvedAddress = clientResources.socketAddressResolver().resolve(redisClusterNode.getUri());
        logger.debug("Resolved SocketAddress {} using for Cluster node {}", resolvedAddress, redisClusterNode.getNodeId());
        return resolvedAddress;
    }

}

Additional context

@ggivo
Copy link
Contributor

ggivo commented Jan 31, 2025

Hi, @xingzhang8023

Interesting one. A brief look at the Lettuce code and reading Connection Count for a Redis Cluster Connection Object I assume the flapping amplitude might relate to the number of 'default' connection opened.

Can you elaborate on the chart with the example metrics provided?
For example, how were those metrics gathered, and what do different lines represent? What was the Redis cluster setup when the chart was generated and the corresponding BoundedPoolConfig (e.g minIdle/maxIdle/maxTotal).

@tishun tishun added status: waiting-for-feedback We need additional information before we can continue and removed status: waiting-for-triage labels Feb 3, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
status: waiting-for-feedback We need additional information before we can continue
Projects
None yet
Development

No branches or pull requests

3 participants