tidb节点重启后,服务无法重连

news/2024/5/19 14:53:39 标签: tidb

大家好,我是烤鸭:

    前几天遇到tidb节点重启后服务无法重连,确切地说是两个服务,一个可以正常重连,一个不行。

问题复现

由于线上执行慢SQL,导致TiDB 单个节点宕机重启。

其中A服务的3个节点和B服务的1个节点开始报错,重启后B服务恢复,A服务还在报错。

使用的组件版本:

A服务:tomcat-jdbc-9.0.71.jar
B服务:HikariCP-4.0.3.jar

数据库配置:

A服务:

spring.xxx-tidb-data-source.url=jdbc:mysql://xxx.xxx.xxx:6000/xxxA?useUnicode=true&characterEncoding=utf8&zeroDateTimeBehavior=convertToNull&useSSL=false&serverTimezone=GMT%2B8

B服务:

spring.datasource.url=jdbc:mysql://xxx.xxx.xxx:6000/xxxB?useSSL=false&autoReconnect=true

可以看到两边的配置和连接池不同。

报错日志

无法重连服务的报错日志:

Caused by: com.mysql.cj.exceptions.ConnectionIsClosedException: No operations allowed after connection closed.
	at sun.reflect.GeneratedConstructorAccessor139.newInstance(Unknown Source)
	at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
	at java.lang.reflect.Constructor.newInstance(Constructor.java:423)
	at com.mysql.cj.exceptions.ExceptionFactory.createException(ExceptionFactory.java:61)
	at com.mysql.cj.exceptions.ExceptionFactory.createException(ExceptionFactory.java:105)
	at com.mysql.cj.exceptions.ExceptionFactory.createException(ExceptionFactory.java:151)
	at com.mysql.cj.NativeSession.checkClosed(NativeSession.java:762)
	at com.mysql.cj.jdbc.ConnectionImpl.checkClosed(ConnectionImpl.java:569)
	at com.mysql.cj.jdbc.ConnectionImpl.prepareStatement$original$McwhO8C0(ConnectionImpl.java:1580)
	... 125 common frames omitted
Caused by: com.mysql.cj.exceptions.CJCommunicationsException: The last packet successfully received from the server was 259,130,913 milliseconds ago. The last packet sent successfully to the server was 259,130,914 milliseconds ago.may or may not be greater than the server-side timeout (the driver was unable to determine the value of either the 'wait_timeout' or 'interactive_timeout' configuration values from the server.. You should consider either expiring and/or testing connection validity before use in your application, increasing the server configured values for client timeouts, or using the Connector/J connection property 'autoReconnect=true' to avoid this problem.
	at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
	at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
	at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
	at java.lang.reflect.Constructor.newInstance(Constructor.java:423)
	at com.mysql.cj.exceptions.ExceptionFactory.createException(ExceptionFactory.java:61)
	at com.mysql.cj.exceptions.ExceptionFactory.createException(ExceptionFactory.java:105)
	at com.mysql.cj.exceptions.ExceptionFactory.createException(ExceptionFactory.java:151)
	at com.mysql.cj.exceptions.ExceptionFactory.createCommunicationsException(ExceptionFactory.java:167)
	at com.mysql.cj.protocol.a.NativeProtocol.readMessage(NativeProtocol.java:520)
	at com.mysql.cj.protocol.a.NativeProtocol.checkErrorMessage(NativeProtocol.java:700)
	at com.mysql.cj.protocol.a.NativeProtocol.sendCommand(NativeProtocol.java:639)
	at com.mysql.cj.protocol.a.NativeProtocol.sendQueryPacket(NativeProtocol.java:987)
	at com.mysql.cj.NativeSession.execSQL(NativeSession.java:666)
	at com.mysql.cj.jdbc.ClientPreparedStatement.executeInternal(ClientPreparedStatement.java:930)
	at com.mysql.cj.jdbc.ClientPreparedStatement.execute$original$mDKgOeFa(ClientPreparedStatement.java:371)
	at com.mysql.cj.jdbc.ClientPreparedStatement.execute$original$mDKgOeFa$accessor$UdYpLG4x(ClientPreparedStatement.java)
	at com.mysql.cj.jdbc.ClientPreparedStatement$auxiliary$BAL6Kq1Z.call(Unknown Source)
	at org.apache.skywalking.apm.agent.core.plugin.interceptor.enhance.InstMethodsInter.intercept(InstMethodsInter.java:86)
	at com.mysql.cj.jdbc.ClientPreparedStatement.execute(ClientPreparedStatement.java)
	at sun.reflect.GeneratedMethodAccessor261.invoke(Unknown Source)
	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
	at java.lang.reflect.Method.invoke(Method.java:498)
	at org.apache.tomcat.jdbc.pool.StatementFacade$StatementProxy.invoke(StatementFacade.java:118)

问题猜想

找到变量了,就顺着猜测查一下。

  • 数据库配置参数 autoReconnect=true
  • 连接池 tomcat-jdbc 和 HikariCP

报错日志都来源于 CJCommunicationsException 这个类,其实调用的是 ExceptionFactory.createLinkFailureMessageBasedOnHeuristics,报错日志来源于配置文件。

我保留了一部分一会可能用到的变量

LocalizedErrorMessages.properties

CommunicationsException.2=\ is longer than the server configured value of 
CommunicationsException.3=''wait_timeout''
CommunicationsException.4=''interactive_timeout''
CommunicationsException.5=may or may not be greater than the server-side timeout 
CommunicationsException.6=(the driver was unable to determine the value of either the 
CommunicationsException.7=''wait_timeout'' or ''interactive_timeout'' configuration values from 
CommunicationsException.8=the server.
CommunicationsException.11=. You should consider either expiring and/or testing connection validity 
CommunicationsException.12=before use in your application, increasing the server configured values for client timeouts, 
CommunicationsException.13=or using the Connector/J connection property ''autoReconnect=true'' to avoid this problem.
CommunicationsException.20=Communications link failure

...

Connection.0=Unable to connect to database.
Connection.1=Cannot connect to MySQL server on {0}:{1}.\n\nMake sure that there is a MySQL server running on the machine/port you are trying to connect to and that the machine this software is running on is able to connect to this host/port (i.e. not firewalled). Also make sure that the server has not been started with the --skip-networking flag.\n\n
Connection.2=No operations allowed after connection closed.
Connection.3=Can''t call commit when autocommit=true
Connection.4=Communications link failure during commit(). Transaction resolution unknown.
Connection.5=Unknown Java encoding for the character set with index ''{0}''. Use the ''customCharsetMapping'' property to force it.
Connection.6=Unknown character set index ''{0}'' received from server. The appropriate client character set can be forced via the ''characterEncoding'' property.
Connection.7=Can''t map {0} given for characterSetResults to a supported MySQL encoding.
Connection.8=Unable to use encoding: {0}

看看源码

从报错日志入手,根据链接状态和上次发送、接收时间判断,拼一些异常原因。

可以看出报错日志有几个组成部分:

1、上次发送packet的时间距离现在超过8小时,getLastPacketSentTime > 28800秒(8小时)

2、当前链接已关闭:No operations allowed after connection closed

ExceptionFactory

public static String createLinkFailureMessageBasedOnHeuristics(PropertySet propertySet, ServerSession serverSession,
            PacketSentTimeHolder packetSentTimeHolder, PacketReceivedTimeHolder packetReceivedTimeHolder, Throwable underlyingException) {
        long serverTimeoutSeconds = 0;
        boolean isInteractiveClient = false;

        long lastPacketReceivedTimeMs = packetReceivedTimeHolder == null ? 0L : packetReceivedTimeHolder.getLastPacketReceivedTime();
        long lastPacketSentTimeMs = packetSentTimeHolder.getLastPacketSentTime();
        if (lastPacketSentTimeMs > lastPacketReceivedTimeMs) {
            lastPacketSentTimeMs = packetSentTimeHolder.getPreviousPacketSentTime();
        }

        if (propertySet != null) {
            isInteractiveClient = propertySet.getBooleanProperty(PropertyKey.interactiveClient).getValue();

            String serverTimeoutSecondsStr = null;

            if (serverSession != null) {
                serverTimeoutSecondsStr = isInteractiveClient ? serverSession.getServerVariable("interactive_timeout")
                        : serverSession.getServerVariable("wait_timeout");
            }

            if (serverTimeoutSecondsStr != null) {
                try {
                    serverTimeoutSeconds = Long.parseLong(serverTimeoutSecondsStr);
                } catch (NumberFormatException nfe) {
                    serverTimeoutSeconds = 0;
                }
            }
        }

        StringBuilder exceptionMessageBuf = new StringBuilder();

        long nowMs = System.currentTimeMillis();

        if (lastPacketSentTimeMs == 0) {
            lastPacketSentTimeMs = nowMs;
        }

        long timeSinceLastPacketSentMs = (nowMs - lastPacketSentTimeMs);
        long timeSinceLastPacketSeconds = timeSinceLastPacketSentMs / 1000;

        long timeSinceLastPacketReceivedMs = (nowMs - lastPacketReceivedTimeMs);

        int dueToTimeout = DUE_TO_TIMEOUT_FALSE;

        StringBuilder timeoutMessageBuf = null;

        if (serverTimeoutSeconds != 0) {
            if (timeSinceLastPacketSeconds > serverTimeoutSeconds) {
                dueToTimeout = DUE_TO_TIMEOUT_TRUE;

                timeoutMessageBuf = new StringBuilder();
                timeoutMessageBuf.append(Messages.getString("CommunicationsException.2"));
                timeoutMessageBuf.append(Messages.getString(isInteractiveClient ? "CommunicationsException.4" : "CommunicationsException.3"));
            }
		  // 上次发送成功的时间距离现在超过8小时
        } else if (timeSinceLastPacketSeconds > DEFAULT_WAIT_TIMEOUT_SECONDS) {
            dueToTimeout = DUE_TO_TIMEOUT_MAYBE;
			
            timeoutMessageBuf = new StringBuilder();
            timeoutMessageBuf.append(Messages.getString("CommunicationsException.5"));
            timeoutMessageBuf.append(Messages.getString("CommunicationsException.6"));
            timeoutMessageBuf.append(Messages.getString("CommunicationsException.7"));
            timeoutMessageBuf.append(Messages.getString("CommunicationsException.8"));
        }

        if (dueToTimeout == DUE_TO_TIMEOUT_TRUE || dueToTimeout == DUE_TO_TIMEOUT_MAYBE) {

            exceptionMessageBuf.append(lastPacketReceivedTimeMs != 0
                    ? Messages.getString("CommunicationsException.ServerPacketTimingInfo",
                            new Object[] { Long.valueOf(timeSinceLastPacketReceivedMs), Long.valueOf(timeSinceLastPacketSentMs) })
                    : Messages.getString("CommunicationsException.ServerPacketTimingInfoNoRecv", new Object[] { Long.valueOf(timeSinceLastPacketSentMs) }));

            if (timeoutMessageBuf != null) {
                exceptionMessageBuf.append(timeoutMessageBuf);
            }

            exceptionMessageBuf.append(Messages.getString("CommunicationsException.11"));
            exceptionMessageBuf.append(Messages.getString("CommunicationsException.12"));
            exceptionMessageBuf.append(Messages.getString("CommunicationsException.13"));

        } else {
            //
            // Attempt to determine the reason for the underlying exception (we can only make a best-guess here)
            //
            if (underlyingException instanceof BindException) {
                String localSocketAddress = propertySet.getStringProperty(PropertyKey.localSocketAddress).getValue();
                boolean interfaceNotAvaliable;
                try {
                    interfaceNotAvaliable = localSocketAddress != null && NetworkInterface.getByName(localSocketAddress) == null;
                } catch (SocketException e1) {
                    interfaceNotAvaliable = false;
                }
                exceptionMessageBuf.append(interfaceNotAvaliable ? Messages.getString("CommunicationsException.LocalSocketAddressNotAvailable")
                        : Messages.getString("CommunicationsException.TooManyClientConnections"));
            }
        }

        if (exceptionMessageBuf.length() == 0) {
            // We haven't figured out a good reason, so copy it.
            exceptionMessageBuf.append(Messages.getString("CommunicationsException.20"));

            if (propertySet.getBooleanProperty(PropertyKey.maintainTimeStats).getValue() && !propertySet.getBooleanProperty(PropertyKey.paranoid).getValue()) {
                exceptionMessageBuf.append("\n\n");
                exceptionMessageBuf.append(lastPacketReceivedTimeMs != 0
                        ? Messages.getString("CommunicationsException.ServerPacketTimingInfo",
                                new Object[] { Long.valueOf(timeSinceLastPacketReceivedMs), Long.valueOf(timeSinceLastPacketSentMs) })
                        : Messages.getString("CommunicationsException.ServerPacketTimingInfoNoRecv", new Object[] { Long.valueOf(timeSinceLastPacketSentMs) }));
            }
        }

        return exceptionMessageBuf.toString();
    }

再根据上边的两个猜测,找一下对应的源码。

猜想1:autoReconnect 这个参数是有用的

网上很多说这个参数只针对mysql5以下,这个倒是没找到,官网关于这个参数的说明。
https://dev.mysql.com/doc/connectors/en/connector-j-connp-props-high-availability-and-clustering.html#cj-conn-prop_autoReconnect

如果链接断开的情况下,配置了这个参数可以发起重试。如果不配置的话,就会一直报错。

在这里插入图片描述

猜想2:HikariCP 会自动重连,tomcat-jdbc不会

可以看到 tomcat-jdbc 执行 invoke,getConnetion一气呵成,在默认配置下,也没有任何校验,如果因为第三方导致的链接断开,就麻烦了。

public Object invoke(Object proxy, Method method, Object[] args) throws Throwable {
        if (compare(ISCLOSED_VAL,method)) {
            return Boolean.valueOf(isClosed());
        }
        if (compare(CLOSE_VAL,method)) {
            if (connection==null)
             {
              return null; //noop for already closed.
            }
            PooledConnection poolc = this.connection;
            this.connection = null;
            pool.returnConnection(poolc);
            return null;
        } else if (compare(TOSTRING_VAL,method)) {
            return this.toString();
        } else if (compare(GETCONNECTION_VAL,method) && connection!=null) {
            return connection.getConnection();
        } else if (method.getDeclaringClass().isAssignableFrom(XAConnection.class)) {
            try {
                return method.invoke(connection.getXAConnection(),args);
            }catch (Throwable t) {
                if (t instanceof InvocationTargetException) {
                    throw t.getCause() != null ? t.getCause() : t;
                } else {
                    throw t;
                }
            }
        }
        if (isClosed()) {
          throw new SQLException("Connection has already been closed.");
        }
        if (compare(UNWRAP_VAL,method)) {
            return unwrap((Class<?>)args[0]);
        } else if (compare(ISWRAPPERFOR_VAL,method)) {
            return Boolean.valueOf(this.isWrapperFor((Class<?>)args[0]));
        }
        try {
            PooledConnection poolc = connection;
            if (poolc!=null) {
                return method.invoke(poolc.getConnection(),args);
            } else {
                throw new SQLException("Connection has already been closed.");
            }
        }catch (Throwable t) {
            if (t instanceof InvocationTargetException) {
                throw t.getCause() != null ? t.getCause() : t;
            } else {
                throw t;
            }
        }
    }

在这里插入图片描述

HikariCP

HikarPool.getConnection 会校验当前是否存活,如果不存活,会关闭当前链接,并会重新创建连接。

在这里插入图片描述

问题解决

当天的解决方案比较简单,重启服务就行。不过如果想彻底解决这个问题的话。

  • 升级连接池,tomcat-jdbc 升级 druid 或者 HikariCP ,springboot 2.0以上默认是 HikariCP

  • 修改配置,如果还使用 tomcat-jdbc :

    增加以下配置:

    # 连接存活检查
    spring.datasource.tomcat.validation-query='SELECT 1'
    # 指定多少ms执行一次连接校验.	
    spring.datasource.tomcat.validation-interval=3000
    # 指定连接校验查询的超时时间
    spring.datasource.tomcat.validation-query-timeout=-1
    
    或者连接地址增加 autoReconnect=true
    

总结

有些问题不经常遇到,即便是预发1比1的环境可能也没办法避免。除了在事故处理的时候可以完善流程,就只能尽量保持架构统一,尤其是老项目。业务侧增加全方面中间件的监控,尤其是存活检测这种。坑是无穷无尽的,踩完了记得遇到相同的跳过去。

相关文章

https://zhuanlan.zhihu.com/p/361130333?utm_id=0

https://blog.csdn.net/zhangxin09/article/details/124901850

https://blog.csdn.net/u014398624/article/details/47746473


http://www.niftyadmin.cn/n/5362505.html

相关文章

MAC word删除空白页

问题&#xff1a;MAC word删除空白页 解决&#xff1a; option删除键

在哪里申请Geo Trust证书

毫无疑问作为Symantec和Digicert的子品牌&#xff0c;证书安全性能方面毋庸置疑&#xff0c;也是目前相当多企业、单位的首选证书选择之一、其较高的性价比也是大众愿意相信Geo Trust的原因之一。 GeoTrust是全球知名的数字证书颁发机构之一&#xff0c;成立于1995年&#xff…

【QT+QGIS跨平台编译】之二十三:【Proj+Qt跨平台编译】(一套代码、一套框架,跨平台编译)

文章目录 一、proj介绍二、文件下载三、文件分析四、pro文件五、编译实践一、proj介绍 PROJ(原名Proj.4)是一个开源的地理信息系统(GIS)软件库,用于执行地图投影和地理坐标转换。这个库提供了广泛的地图投影和坐标转换功能,使其成为处理地理空间数据时不可或缺的工具。P…

电脑/机顶盒/ps3/4/连接老电视(只有AV、S-Video接口)解决方案之HDMI转AV/S-Video转换器HAV

HDMI转AV/S-Video转换器功能 01、将HDMI高清信号经过视频处理转换成AV、S-VIDEO(PAL/NTSC)的视频信号输出 02、将HDMI数字音频&#xff0c;经过DAC数模芯片处理转成模拟立体声输出 03、采用先进的视频处理技术&#xff0c;对图像的亮度&#xff0c;对比度及色彩进行增强处理 04…

用函数求最小公倍数和最大公约数(c++题解)

题目描述 输入两个正整数m和n&#xff0c;求其最大公约数和最小公倍数。 提示&#xff0c;求最大公约数用一个函数实现。本题求最大公约数必须用高效算法&#xff0c;如辗转相除法&#xff0c;朴素算法要超时。 输入格式 第1行&#xff1a;两个非整数&#xff0c;值在0&…

MySQL中如何将字符串替换

在MySQL中&#xff0c;你可以使用REPLACE函数来进行字符串替换操作。REPLACE函数接受三个参数&#xff1a;源字符串&#xff0c;要替换的子字符串&#xff0c;以及替换后的字符串。以下是REPLACE函数的基本语法&#xff1a; REPLACE(str, search_str, replace_str)str: 要进行…

问题:在顺序图中,如果一个对象在接收到消息时还没有被激活,那么这条消息将会激活个对象。 #其他#知识分享

问题&#xff1a;在顺序图中&#xff0c;如果一个对象在接收到消息时还没有被激活&#xff0c;那么这条消息将会激活个对象。 参考答案如图所示

基于WordPress开发微信小程序1:搭建Wordpress

2年前&#xff0c;在知乎上提问&#xff1a;多数公司为什么宁愿自研也不用wordpress二次开发建站&#xff1f; - 知乎 (zhihu.com)&#xff0c;收到了&#xff0c;很多回答 自己打算做一下提升&#xff0c;便有了自己基于wordpress开发微信小程序的想法 项目定位 基于wordpre…