阿里二面:使用 Nacos 做注冊(cè)中心怎么做優(yōu)雅發(fā)布?
大家好,我是君哥。
今天來(lái)聊一聊使用 Nacos 做注冊(cè)中心怎么做優(yōu)雅發(fā)布。
跟其他的注冊(cè)中心一樣,Nacos 作為注冊(cè)中心的使用如下圖:
Service Provider 啟動(dòng)后注冊(cè)到 Nacos Server,Service Consumer 則從 Nacos Server 拉取服務(wù)列表,根據(jù)一定算法選擇一個(gè) Service Provider 來(lái)發(fā)送請(qǐng)求。
1.優(yōu)雅要求
對(duì)于優(yōu)雅發(fā)布,要求是 Service Provider 上線(注冊(cè)到 Nacos)后,服務(wù)能夠正常地接收和處理請(qǐng)求,而 Service Provider 停服后,則不會(huì)再收到請(qǐng)求。這就有兩個(gè)要求:
- 優(yōu)雅上線:Service Provider 發(fā)布完成之前,Service Consumer 不應(yīng)該從服務(wù)列表中拉取到這個(gè)服務(wù)地址;
- 優(yōu)雅下線:Service Provider 下線后,Service Consumer 不會(huì)從服務(wù)列表中拉取到這個(gè)服務(wù)地址。
解決了這兩個(gè)問(wèn)題,優(yōu)雅發(fā)布就可以做到了。
2.搭建環(huán)境
搭建環(huán)境是為了看 Nacos 日志,通過(guò)日志找到對(duì)應(yīng)的源代碼。本文搭建的環(huán)境如下圖:
2.1 啟動(dòng) provider
啟動(dòng) springboot-provider 的應(yīng)用,注冊(cè)到 Nacos,啟動(dòng)日志如下:
2023-06-11 18:58:10,120 [main] [INFO] com.alibaba.nacos.client.naming - [BEAT] adding beat: BeatInfo{port=8083, ip='192.168.31.94', weight=1.0, serviceName='DEFAULT_GROUP@@springboot-provider', cluster='DEFAULT', metadata={management.endpoints.web.base-path=/actuator, management.port=18082, preserved.register.source=SPRING_CLOUD, management.address=127.0.0.1}, scheduled=false, period=5000, stopped=false} to beat map.
2023-06-11 18:58:10,121 [main] [INFO] com.alibaba.nacos.client.naming - [REGISTER-SERVICE] public registering service DEFAULT_GROUP@@springboot-provider with instance: Instance{instanceId='null', ip='192.168.31.94', port=8083, weight=1.0, healthy=true, enabled=true, ephemeral=true, clusterName='DEFAULT', serviceName='null', metadata={management.endpoints.web.base-path=/actuator, management.port=18082, preserved.register.source=SPRING_CLOUD, management.address=127.0.0.1}}
2023-06-11 18:58:10,133 [main] [INFO] com.alibaba.cloud.nacos.registry.NacosServiceRegistry - nacos registry, DEFAULT_GROUP springboot-provider 192.168.31.94:8083 register finished
2023-06-11 18:58:10,221 [main] [INFO] org.springframework.boot.web.embedded.tomcat.TomcatWebServer - Tomcat initialized with port(s): 18082 (http)
2023-06-11 18:58:10,222 [main] [INFO] org.apache.coyote.http11.Http11NioProtocol - Initializing ProtocolHandler ["http-nio-127.0.0.1-18082"]
2023-06-11 18:58:10,223 [main] [INFO] org.apache.catalina.core.StandardService - Starting service [Tomcat]
2023-06-11 18:58:10,223 [main] [INFO] org.apache.catalina.core.StandardEngine - Starting Servlet engine: [Apache Tomcat/9.0.21]
2023-06-11 18:58:10,239 [main] [INFO] org.apache.catalina.core.ContainerBase.[Tomcat-1].[localhost].[/] - Initializing Spring embedded WebApplicationContext
2023-06-11 18:58:10,239 [main] [INFO] org.springframework.web.context.ContextLoader - Root WebApplicationContext: initialization completed in 99 ms
2023-06-11 18:58:10,268 [main] [INFO] org.springframework.boot.actuate.endpoint.web.EndpointLinksResolver - Exposing 22 endpoint(s) beneath base path '/actuator'
2023-06-11 18:58:10,336 [main] [INFO] org.apache.coyote.http11.Http11NioProtocol - Starting ProtocolHandler ["http-nio-127.0.0.1-18082"]
2023-06-11 18:58:10,340 [main] [INFO] org.springframework.boot.web.embedded.tomcat.TomcatWebServer - Tomcat started on port(s): 18082 (http) with context path ''
2023-06-11 18:58:10,342 [main] [INFO] boot.Application - Started Application in 7.051 seconds (JVM running for 7.874)
2023-06-11 18:58:10,358 [main] [INFO] com.alibaba.nacos.client.config.impl.ClientWorker - [fixed-39.105.183.91_8848] [subscribe] springboot-provider.properties+DEFAULT_GROUP
2023-06-11 18:58:10,359 [main] [INFO] com.alibaba.nacos.client.config.impl.CacheData - [fixed-39.105.183.91_8848] [add-listener] ok, tenant=, dataId=springboot-provider.properties, group=DEFAULT_GROUP, cnt=1
2023-06-11 18:58:10,359 [main] [INFO] com.alibaba.nacos.client.config.impl.ClientWorker - [fixed-39.105.183.91_8848] [subscribe] springboot-provider-dev.properties+DEFAULT_GROUP
2023-06-11 18:58:10,359 [main] [INFO] com.alibaba.nacos.client.config.impl.CacheData - [fixed-39.105.183.91_8848] [add-listener] ok, tenant=, dataId=springboot-provider-dev.properties, group=DEFAULT_GROUP, cnt=1
2023-06-11 18:58:10,360 [main] [INFO] com.alibaba.nacos.client.config.impl.ClientWorker - [fixed-39.105.183.91_8848] [subscribe] springboot-provider+DEFAULT_GROUP
2023-06-11 18:58:10,360 [main] [INFO] com.alibaba.nacos.client.config.impl.CacheData - [fixed-39.105.183.91_8848] [add-listener] ok, tenant=, dataId=springboot-provider, group=DEFAULT_GROUP, cnt=1
2023-06-11 18:58:10,639 [RMI TCP Connection(1)-192.168.31.94] [INFO] org.apache.catalina.core.ContainerBase.[Tomcat].[localhost].[/] - Initializing Spring DispatcherServlet 'dispatcherServlet'
2023-06-11 18:58:10,839 [com.alibaba.nacos.client.naming.updater] [INFO] com.alibaba.nacos.client.naming - [BEAT] adding beat: BeatInfo{port=8083, ip='192.168.31.94', weight=1.0, serviceName='DEFAULT_GROUP@@springboot-provider', cluster='DEFAULT', metadata={management.endpoints.web.base-path=/actuator, management.port=18082, preserved.register.source=SPRING_CLOUD, management.address=127.0.0.1}, scheduled=false, period=5000, stopped=false} to beat map.
2023-06-11 18:58:10,840 [com.alibaba.nacos.client.naming.updater] [INFO] com.alibaba.nacos.client.naming - modified ips(1) service: DEFAULT_GROUP@@springboot-provider@@DEFAULT -> [{"instanceId":"192.168.31.94#8083#DEFAULT#DEFAULT_GROUP@@springboot-provider","ip":"192.168.31.94","port":8083,"weight":1.0,"healthy":true,"enabled":true,"ephemeral":true,"clusterName":"DEFAULT","serviceName":"DEFAULT_GROUP@@springboot-provider","metadata":{"management.endpoints.web.base-path":"/actuator","management.port":"18082","preserved.register.source":"SPRING_CLOUD","management.address":"127.0.0.1"},"ipDeleteTimeout":30000,"instanceHeartBeatInterval":5000,"instanceHeartBeatTimeOut":15000}]
2023-06-11 18:58:10,841 [com.alibaba.nacos.client.naming.updater] [INFO] com.alibaba.nacos.client.naming - current ips:(1) service: DEFAULT_GROUP@@springboot-provider@@DEFAULT -> [{"instanceId":"192.168.31.94#8083#DEFAULT#DEFAULT_GROUP@@springboot-provider","ip":"192.168.31.94","port":8083,"weight":1.0,"healthy":true,"enabled":true,"ephemeral":true,"clusterName":"DEFAULT","serviceName":"DEFAULT_GROUP@@springboot-provider","metadata":{"management.endpoints.web.base-path":"/actuator","management.port":"18082","preserved.register.source":"SPRING_CLOUD","management.address":"127.0.0.1"},"ipDeleteTimeout":30000,"instanceHeartBeatInterval":5000,"instanceHeartBeatTimeOut":15000}]
我們?cè)倏聪?Nacos 的日志,這里看的文件 naming-server.log,日志如下:
2023-06-11 18:58:09,723 INFO Client connection 192.168.31.94:51885#true connect
2023-06-11 18:58:10,105 INFO Client change for service Service{namespace='public', group='DEFAULT_GROUP', name='springboot-provider', ephemeral=true, revisinotallow=1}, 192.168.31.94:8083#true
2023-06-11 18:58:18,204 INFO Client connection 192.168.31.94:60850#true disconnect, remove instances and subscribers
springboot-provider 啟動(dòng)成功后,從Nacos 管理后臺(tái)可以看到下圖:
2.2 provider 下線
服務(wù)下線后,Nacos 日志如下:
2023-06-11 19:01:03,375 INFO Client connection 192.168.31.94:51885#true disconnect, remove instances and subscribers
2023-06-11 19:01:05,048 INFO [AUTO-DELETE-IP] service: Service{namespace='public', group='DEFAULT_GROUP', name='springboot-provider', ephemeral=true, revisinotallow=2}, ip: {"ip":"192.168.31.94","port":8083,"healthy":false,"cluster":"DEFAULT","extendDatum":{"management.endpoints.web.base-path":"/actuator","management.port":"18082","preserved.register.source":"SPRING_CLOUD","management.address":"127.0.0.1","customInstanceId":"192.168.31.94#8083#DEFAULT#DEFAULT_GROUP@@springboot-provider"},"lastHeartBeatTime":1686481231604,"metadataId":"192.168.31.94:8083:DEFAULT"}
2023-06-11 19:01:05,048 INFO Client remove for service Service{namespace='public', group='DEFAULT_GROUP', name='springboot-provider', ephemeral=true, revisinotallow=2}, 192.168.31.94:8083#true
2023-06-11 19:01:08,379 INFO Client connection 192.168.31.94:8083#true disconnect, remove instances and subscribers
2.3 服務(wù)調(diào)用
在 springboot-consumer 上跑一個(gè)單元測(cè)試的用例,用 FeignClient 調(diào)用下面的方法:
@FeignClient(value = "springboot-provider", configuration = FeignMultipartSupportConfig.class)
public interface FeignAsEurekaClient {
@PostMapping("/employee/save")
String saveEmployeebyName(@RequestBody Employee employee);
}
日志如下:
2023-06-11 19:15:47,694 [main] [INFO] org.springframework.test.context.transaction.TransactionContext - Began transaction (1) for test context [DefaultTestContext@5bf0d49 testClass = TestFeignAsEurekaClient, testInstance = boot.service.TestFeignAsEurekaClient@10683d9d, testMethod = testPostEmployByFeign@TestFeignAsEurekaClient, testException = [null], mergedContextConfiguration = [WebMergedContextConfiguration@5b7a5baa testClass = TestFeignAsEurekaClient, locations = '{}', classes = '{class boot.Application, class boot.Application}', contextInitializerClasses = '[]', activeProfiles = '{}', propertySourceLocations = '{}', propertySourceProperties = '{org.springframework.boot.test.context.SpringBootTestCnotallow=true, server.port=0}', contextCustomizers = set[org.springframework.boot.test.context.filter.ExcludeFilterContextCustomizer@166fa74d, org.springframework.boot.test.json.DuplicateJsonObjectContextCustomizerFactory$DuplicateJsonObjectContextCustomizer@588df31b, org.springframework.boot.test.mock.mockito.MockitoContextCustomizer@0, org.springframework.boot.test.web.client.TestRestTemplateContextCustomizer@7fad8c79, org.springframework.boot.test.autoconfigure.properties.PropertyMappingContextCustomizer@0, org.springframework.boot.test.autoconfigure.web.servlet.WebDriverContextCustomizerFactory$Customizer@10b48321], resourceBasePath = 'src/main/webapp', contextLoader = 'org.springframework.boot.test.context.SpringBootContextLoader', parent = [null]], attributes = map['org.springframework.test.context.web.ServletTestExecutionListener.activateListener' -> false]]; transaction manager [org.springframework.jdbc.datasource.DataSourceTransactionManager@693676d]; rollback [true]
2023-06-11 19:15:47,941 [main] [INFO] com.netflix.config.ChainedDynamicProperty - Flipping property: springboot-provider.ribbon.ActiveConnectionsLimit to use NEXT property: niws.loadbalancer.availabilityFilteringRule.activeConnectionsLimit = 2147483647
2023-06-11 19:15:47,962 [main] [INFO] com.netflix.loadbalancer.BaseLoadBalancer - Client: springboot-provider instantiated a LoadBalancer: DynamicServerListLoadBalancer:{NFLoadBalancer:name=springboot-provider,current list of Servers=[],Load balancer stats=Zone stats: {},Server stats: []}ServerList:null
2023-06-11 19:15:47,969 [main] [INFO] com.netflix.loadbalancer.DynamicServerListLoadBalancer - Using serverListUpdater PollingServerListUpdater
2023-06-11 19:15:48,064 [main] [INFO] com.netflix.config.ChainedDynamicProperty - Flipping property: springboot-provider.ribbon.ActiveConnectionsLimit to use NEXT property: niws.loadbalancer.availabilityFilteringRule.activeConnectionsLimit = 2147483647
2023-06-11 19:15:48,064 [main] [INFO] com.netflix.loadbalancer.DynamicServerListLoadBalancer - DynamicServerListLoadBalancer for client springboot-provider initialized: DynamicServerListLoadBalancer:{NFLoadBalancer:name=springboot-provider,current list of Servers=[192.168.31.94:8083],Load balancer stats=Zone stats: {unknown=[Zone:unknown; Instance count:1; Active connections count: 0; Circuit breaker tripped count: 0; Active connections per server: 0.0;]
},Server stats: [[Server:192.168.31.94:8083; Zone:UNKNOWN; Total Requests:0; Successive connection failure:0; Total blackout seconds:0; Last connection made:Thu Jan 01 08:00:00 CST 1970; First connection made: Thu Jan 01 08:00:00 CST 1970; Active Connections:0; total failure count in last (1000) msecs:0; average resp time:0.0; 90 percentile resp time:0.0; 95 percentile resp time:0.0; min resp time:0.0; max resp time:0.0; stddev resp time:0.0]
]}ServerList:com.alibaba.cloud.nacos.ribbon.NacosServerList@24d998ba
注意,這里使用了 OpenFeign,其中用到了 Ribbon 做負(fù)載均衡,那就需要考慮到 Ribbon 的刷新本地服務(wù)列表的時(shí)間,從源代碼中看,刷新周期是 30s。如下圖:
Ribbon 刷新緩存的邏輯參考下面代碼:
public synchronized void start(final UpdateAction updateAction) {
if (isActive.compareAndSet(false, true)) {
final Runnable wrapperRunnable = new Runnable() {
@Override
public void run() {
//...
}
};
scheduledFuture = getRefreshExecutor().scheduleWithFixedDelay(
wrapperRunnable,
initialDelayMs,
refreshIntervalMs,//這里定義的是30s
TimeUnit.MILLISECONDS
);
}//...
}
3.優(yōu)雅發(fā)布
前面第一節(jié)提到過(guò),優(yōu)雅發(fā)布有兩個(gè)要求:優(yōu)雅上線和優(yōu)雅下線。
Nacos 客戶(hù)端和服務(wù)端的交互采用長(zhǎng)輪詢(xún)的方式,服務(wù)端收到客戶(hù)端的請(qǐng)求后,首先會(huì)判斷服務(wù)端本地的服務(wù)列表是否跟客戶(hù)端的相比是否發(fā)生變化(比較 MD5),如果發(fā)生變化則立即通知客戶(hù)端,否則放入長(zhǎng)輪詢(xún)隊(duì)列掛起,如果這段時(shí)間內(nèi)服務(wù)列表發(fā)生變化,則立刻通知客戶(hù)端,否則等到超時(shí)后再通知客戶(hù)端。代碼如下:
//LongPollingService.java
public void addLongPollingClient(HttpServletRequest req, HttpServletResponse rsp, Map<String, String> clientMd5Map,
int probeRequestSize) {
String str = req.getHeader(LongPollingService.LONG_POLLING_HEADER);
int delayTime = SwitchService.getSwitchInteger(SwitchService.FIXED_DELAY_TIME, 500);
// Add delay time for LoadBalance, and one response is returned 500 ms in advance to avoid client timeout.
long timeout = -1L;
if (isFixedPolling()) {
//...
} else {
timeout = Math.max(10000, Long.parseLong(str) - delayTime);//29.5s
long start = System.currentTimeMillis();
List<String> changedGroups = MD5Util.compareMd5(req, rsp, clientMd5Map);
if (changedGroups.size() > 0) {
//服務(wù)列表發(fā)生變化,直接返回給客戶(hù)端
generateResponse(req, rsp, changedGroups);
return;
} //...
}
String ip = RequestUtil.getRemoteIp(req);
//..
// Must be called by http thread, or send response.
final AsyncContext asyncContext = req.startAsync();
// AsyncContext.setTimeout() is incorrect, Control by oneself
asyncContext.setTimeout(0L);
String appName = req.getHeader(RequestUtil.CLIENT_APPNAME_HEADER);
String tag = req.getHeader("Vipserver-Tag");
//服務(wù)列表沒(méi)有發(fā)生變化,放入長(zhǎng)輪詢(xún)隊(duì)列等待調(diào)度
ConfigExecutor.executeLongPolling(
new ClientLongPolling(asyncContext, clientMd5Map, ip, probeRequestSize, timeout, appName, tag));
}
從上面服務(wù)端源代碼可以看到,這里超時(shí)時(shí)間是 30s,其中 29.5s 用于掛起等待,0.5s 檢查服務(wù)列表是否發(fā)生變化。這里使用了長(zhǎng)輪詢(xún),如果服務(wù)端列表發(fā)生變化,會(huì)立刻通知客戶(hù)端,所以對(duì)優(yōu)雅發(fā)布影響非常小。
服務(wù)列表發(fā)生變化后,客戶(hù)端用單獨(dú)的線程通知監(jiān)聽(tīng)的 listener,代碼如下:
public void startInternal() {
executor.schedule(() -> {
while (!executor.isShutdown() && !executor.isTerminated()) {
try {
listenExecutebell.poll(5L, TimeUnit.SECONDS);
//...
executeConfigListen();
} catch (Throwable e) {
//...
}
}
}, 0L, TimeUnit.MILLISECONDS);
}
3.1 優(yōu)雅上線
優(yōu)雅上線存在的問(wèn)題主要在于 Service Provider 注冊(cè)到 Nacos 后,服務(wù)還沒(méi)有完成初始化,請(qǐng)求已經(jīng)到來(lái)。這種情況主要原因是 Service Provider 啟動(dòng)后立刻注冊(cè) Naocs,而本身提供的接口可能還沒(méi)有初始化完成。
這種情況的解決方法是關(guān)閉自動(dòng)注冊(cè):
spring.cloud.nacos.discovery.registerEnabled=false
在服務(wù)初始化后使用代碼手動(dòng)注冊(cè),代碼如下:
Properties setting8 = new Properties();
String serverIp8 = "127.0.0.1:8848";
setting8.put(PropertyKeyConst.SERVER_ADDR, serverIp8);
setting8.put(PropertyKeyConst.USERNAME, "nacos");
setting8.put(PropertyKeyConst.PASSWORD, "nacos");
NamingService inaming8 = NacosFactory.createNamingService(setting7);
inaming8.registerInstance("springboot-provider", "192.168.31.94", 8083);
3.2 優(yōu)雅下線
服務(wù)下線分兩種情況,一個(gè)是正常停服,一個(gè)是服務(wù)故障。
3.2.1 正常停服
對(duì)于正常停服,Nacos 采用心跳檢測(cè)來(lái)實(shí)現(xiàn)服務(wù)在線。心跳周期是 5s,Nacos Server 如果 15s 沒(méi)收到心跳就會(huì)將實(shí)例設(shè)置為不健康,在 30s 沒(méi)收到心跳才會(huì)講這個(gè)服務(wù)刪除。當(dāng)然這個(gè)時(shí)間可以設(shè)置:
spring.cloud.nacos.discovery.metadata.preserved.heart.beat.interval=1000 #心跳間隔5s->1s
spring.cloud.nacos.discovery.metadata.preserved.heart.beat.timeout=3000 #超時(shí)時(shí)間15s->3s
spring.cloud.nacos.discovery.metadata.preserved.ip.delete.timeout=5000 #刪除時(shí)間30s->5s
但這樣并不能保證服務(wù)停止后能夠立刻從 Nacos Server 下線,很有可能服務(wù)停止后還能再收到請(qǐng)求,最好的方式是手動(dòng)下線,比如增加一個(gè) API 接口,服務(wù)下線之前增加 preStopHook 函數(shù)調(diào)用這個(gè) API 接口來(lái)實(shí)現(xiàn)下線。API 接口示例代碼如下:
@GetMapping(value = "/nacos/deregisterInstance")
public String deregisterInstance() {
Properties prop = new Properties();
prop.setProperty("serverAddr", "localhost");
prop.put(PropertyKeyConst.NAMESPACE, "test");
NacosNamingService client = new NacosNamingService(prop);
client.deregisterInstance("springboot-provider", "192.168.31.94", 8083);
return "success";
}
在使用 Ribbon 的場(chǎng)景,也需要考慮 Ribbon 更新本地緩存服務(wù)列表的機(jī)制,手動(dòng)下線后,可以再等待 30s 再關(guān)閉服務(wù)。
3.2.1 服務(wù)故障
第二種情況是服務(wù)故障,但是并沒(méi)有停服,這種情況是很難避免外部請(qǐng)求再發(fā)送過(guò)來(lái)的。處理方式是對(duì)這個(gè)服務(wù)本身的健康檢查結(jié)果進(jìn)行處理,比如連續(xù)三次健康檢查失敗,可以調(diào)用上面的 API 接口讓服務(wù)下線。
4 總結(jié)
無(wú)論是哪一款注冊(cè)中心,優(yōu)雅發(fā)布要解決的問(wèn)題都是優(yōu)雅上線和優(yōu)雅下線。本文結(jié)合 Nacos 的原理講解了 Nacos 的優(yōu)雅發(fā)布,希望對(duì)你有所幫助。