JVM發生CMS GC的 5 種情況,你知道的肯定不全!
經常有同學會問,為啥我的應用 Old Gen 的使用占比沒達到 CMSInitiatingOccupancyFraction 參數配置的閾值,就觸發了 CMS GC,表示很莫名奇妙,不知道問題出在哪?
其實 CMS GC 的觸發條件非常多,不只是 CMSInitiatingOccupancyFraction 閾值觸發這么簡單。本文通過源碼全面梳理了觸發 CMS GC 的條件,盡可能的幫你了解平時遇到的奇奇怪怪的 CMS GC 問題。
先拋出一些問題,來吸引你的注意力。
- 為什么 Old Gen 使用占比僅 50% 就進行了一次 CMS GC?
- Metaspace 的使用也會觸發 CMS GC 嗎?
- 為什么 Old Gen 使用占比非常小就進行了一次 CMS GC?
觸發條件
CMS GC 在實現上分成 foreground collector 和 background collector。foreground collector 相對比較簡單,background collector 比較復雜,情況比較多。
下面我們從 foreground collector 和 background collector 分別來說明他們的觸發條件:
說明:本文內容是基于 JDK 8
說明:本文僅涉及 CMS GC 的觸發條件,至于算法的具體過程,以及什么時候進行 MSC(mark sweep compact)不在本文范圍。
foreground collector
foreground collector 觸發條件比較簡單,一般是遇到對象分配但空間不夠,就會直接觸發 GC,來立即進行空間回收。采用的算法是 mark sweep,不壓縮。
background collector
說明 background collector 的觸發條件之前,先來說下 background collector 的流程,它是通過 CMS 后臺線程不斷的去掃描,過程中主要是判斷是否符合 background collector 的觸發條件,一旦有符合的情況,就會進行一次 background 的 collect。
- void ConcurrentMarkSweepThread::run() {
- ...//省略
- while (!_should_terminate) {
- sleepBeforeNextCycle();
- if (_should_terminate) break;
- GCCause::Cause cause = _collector->_full_gc_requested ?
- _collector->_full_gc_cause : GCCause::_cms_concurrent_mark;
- _collector->collect_in_background(false, cause);
- }
- ...//省略
- }
每次掃描過程中,先等 CMSWaitDuration 時間,然后再去進行一次 shouldConcurrentCollect 判斷,看是否滿足 CMS background collector 的觸發條件。CMSWaitDuration 默認時間是 2s(經常會有業務遇到頻繁的 CMS GC,注意看每次 CMS GC 之間的時間間隔,如果是 2s,那基本就可以斷定是 CMS 的 background collector)。
- void ConcurrentMarkSweepThread::sleepBeforeNextCycle() {
- while (!_should_terminate) {
- if (CMSIncrementalMode) {
- icms_wait();
- if(CMSWaitDuration >= 0) {
- // Wait until the next synchronous GC, a concurrent full gc
- // request or a timeout, whichever is earlier.
- wait_on_cms_lock_for_scavenge(CMSWaitDuration);
- }
- return;
- } else {
- if(CMSWaitDuration >= 0) {
- // Wait until the next synchronous GC, a concurrent full gc
- // request or a timeout, whichever is earlier.
- wait_on_cms_lock_for_scavenge(CMSWaitDuration);
- } else {
- // Wait until any cms_lock event or check interval not to call shouldConcurrentCollect permanently
- wait_on_cms_lock(CMSCheckInterval);
- }
- }
- // Check if we should start a CMS collection cycle
- if (_collector->shouldConcurrentCollect()) {
- return;
- }
- // .. collection criterion not yet met, let's go back
- // and wait some more
- }
- }
那 shouldConcurrentCollect() 方法中都有哪些條件呢?
- bool CMSCollector::shouldConcurrentCollect() {
- // ***種觸發情況
- if (_full_gc_requested) {
- if (Verbose && PrintGCDetails) {
- gclog_or_tty->print_cr("CMSCollector: collect because of explicit "
- " gc request (or gc_locker)");
- }
- return true;
- }
- // For debugging purposes, change the type of collection.
- // If the rotation is not on the concurrent collection
- // type, don't start a concurrent collection.
- NOT_PRODUCT(
- if (RotateCMSCollectionTypes &&
- (_cmsGen->debug_collection_type() !=
- ConcurrentMarkSweepGeneration::Concurrent_collection_type)) {
- assert(_cmsGen->debug_collection_type() !=
- ConcurrentMarkSweepGeneration::Unknown_collection_type,
- "Bad cms collection type");
- return false;
- }
- )
- FreelistLocker x(this);
- // ------------------------------------------------------------------
- // Print out lots of information which affects the initiation of
- // a collection.
- if (PrintCMSInitiationStatistics && stats().valid()) {
- gclog_or_tty->print("CMSCollector shouldConcurrentCollect: ");
- gclog_or_tty->stamp();
- gclog_or_tty->print_cr("");
- stats().print_on(gclog_or_tty);
- gclog_or_tty->print_cr("time_until_cms_gen_full %3.7f",
- stats().time_until_cms_gen_full());
- gclog_or_tty->print_cr("free="SIZE_FORMAT, _cmsGen->free());
- gclog_or_tty->print_cr("contiguous_available="SIZE_FORMAT,
- _cmsGen->contiguous_available());
- gclog_or_tty->print_cr("promotion_rate=%g", stats().promotion_rate());
- gclog_or_tty->print_cr("cms_allocation_rate=%g", stats().cms_allocation_rate());
- gclog_or_tty->print_cr("occupancy=%3.7f", _cmsGen->occupancy());
- gclog_or_tty->print_cr("initiatingOccupancy=%3.7f", _cmsGen->initiating_occupancy());
- gclog_or_tty->print_cr("metadata initialized %d",
- MetaspaceGC::should_concurrent_collect());
- }
- // ------------------------------------------------------------------
- // 第二種觸發情況
- // If the estimated time to complete a cms collection (cms_duration())
- // is less than the estimated time remaining until the cms generation
- // is full, start a collection.
- if (!UseCMSInitiatingOccupancyOnly) {
- if (stats().valid()) {
- if (stats().time_until_cms_start() == 0.0) {
- return true;
- }
- } else {
- // We want to conservatively collect somewhat early in order
- // to try and "bootstrap" our CMS/promotion statistics;
- // this branch will not fire after the first successful CMS
- // collection because the stats should then be valid.
- if (_cmsGen->occupancy() >= _bootstrap_occupancy) {
- if (Verbose && PrintGCDetails) {
- gclog_or_tty->print_cr(
- " CMSCollector: collect for bootstrapping statistics:"
- " occupancy = %f, boot occupancy = %f", _cmsGen->occupancy(),
- _bootstrap_occupancy);
- }
- return true;
- }
- }
- }
- // 第三種觸發情況
- // Otherwise, we start a collection cycle if
- // old gen want a collection cycle started. Each may use
- // an appropriate criterion for making this decision.
- // XXX We need to make sure that the gen expansion
- // criterion dovetails well with this. XXX NEED TO FIX THIS
- if (_cmsGen->should_concurrent_collect()) {
- if (Verbose && PrintGCDetails) {
- gclog_or_tty->print_cr("CMS old gen initiated");
- }
- return true;
- }
- // 第四種觸發情況
- // We start a collection if we believe an incremental collection may fail;
- // this is not likely to be productive in practice because it's probably too
- // late anyway.
- GenCollectedHeap* gch = GenCollectedHeap::heap();
- assert(gch->collector_policy()->is_two_generation_policy(),
- "You may want to check the correctness of the following");
- if (gch->incremental_collection_will_fail(true /* consult_young */)) {
- if (Verbose && PrintGCDetails) {
- gclog_or_tty->print("CMSCollector: collect because incremental collection will fail ");
- }
- return true;
- }
- // 第五種觸發情況
- if (MetaspaceGC::should_concurrent_collect()) {
- if (Verbose && PrintGCDetails) {
- gclog_or_tty->print("CMSCollector: collect for metadata allocation ");
- }
- return true;
- }
- return false;
- }
上述代碼可知,從大類上分, background collector 一共有 5 種觸發情況:
1.是否是并行 Full GC
指的是在 GC cause 是 gclocker 且配置了 GCLockerInvokesConcurrent 參數, 或者 GC cause 是javalangsystemgc(就是 System.gc()調用)and 且配置了 ExplicitGCInvokesConcurrent 參數,這時會觸發一次 background collector。
2.根據統計數據動態計算(僅未配置 UseCMSInitiatingOccupancyOnly 時) 未配置 UseCMSInitiatingOccupancyOnly 時,會根據統計數據動態判斷是否需要進行一次 CMS GC。
判斷邏輯是,如果預測 CMS GC 完成所需要的時間大于預計的老年代將要填滿的時間,則進行 GC。 這些判斷是需要基于歷史的 CMS GC 統計指標,然而,***次 CMS GC 時,統計數據還沒有形成,是無效的,這時會跟據 Old Gen 的使用占比來判斷是否要進行 GC。
- if (!UseCMSInitiatingOccupancyOnly) {
- if (stats().valid()) {
- if (stats().time_until_cms_start() == 0.0) {
- return true;
- }
- } else {
- // We want to conservatively collect somewhat early in order
- // to try and "bootstrap" our CMS/promotion statistics;
- // this branch will not fire after the first successful CMS
- // collection because the stats should then be valid.
- if (_cmsGen->occupancy() >= _bootstrap_occupancy) {
- if (Verbose && PrintGCDetails) {
- gclog_or_tty->print_cr(
- " CMSCollector: collect for bootstrapping statistics:"
- " occupancy = %f, boot occupancy = %f", _cmsGen->occupancy(),
- _bootstrap_occupancy);
- }
- return true;
- }
- }
- }
那占多少比率,開始回收呢?(也就是 bootstrapoccupancy 的值是多少呢?) 答案是 50%。或許你已經遇到過類似案例,在沒有配置 UseCMSInitiatingOccupancyOnly 時,發現老年代占比到 50% 就進行了一次 CMS GC,當時的你或許還一頭霧水呢。
- _bootstrap_occupancy = ((double)CMSBootstrapOccupancy)/(double)100;
- //參數默認值
- product(uintx, CMSBootstrapOccupancy, 50,
- "Percentage CMS generation occupancy at which to initiate CMS collection for bootstrapping collection stats")
3.根據 Old Gen 情況判斷
- bool ConcurrentMarkSweepGeneration::should_concurrent_collect() const {
- assert_lock_strong(freelistLock());
- if (occupancy() > initiating_occupancy()) {
- if (PrintGCDetails && Verbose) {
- gclog_or_tty->print(" %s: collect because of occupancy %f / %f ",
- short_name(), occupancy(), initiating_occupancy());
- }
- return true;
- }
- if (UseCMSInitiatingOccupancyOnly) {
- return false;
- }
- if (expansion_cause() == CMSExpansionCause::_satisfy_allocation) {
- if (PrintGCDetails && Verbose) {
- gclog_or_tty->print(" %s: collect because expanded for allocation ",
- short_name());
- }
- return true;
- }
- if (_cmsSpace->should_concurrent_collect()) {
- if (PrintGCDetails && Verbose) {
- gclog_or_tty->print(" %s: collect because cmsSpace says so ",
- short_name());
- }
- return true;
- }
- return false;
- }
從源碼上看,這里主要分成兩類: (a) Old Gen 空間使用占比情況與閾值比較,如果大于閾值則進行 CMS GC 也就是"occupancy() > initiatingoccupancy()",occupancy 毫無疑問是 Old Gen 當前空間的使用占比,而 initiatingoccupancy 是多少呢?
- _cmsGen ->init_initiating_occupancy(CMSInitiatingOccupancyFraction, CMSTriggerRatio);
- ...
- void ConcurrentMarkSweepGeneration::init_initiating_occupancy(intx io, uintx tr) {
- assert(io <= 100 && tr <= 100, "Check the arguments");
- if (io >= 0) {
- _initiating_occupancy = (double)io / 100.0;
- } else {
- _initiating_occupancy = ((100 - MinHeapFreeRatio) +
- (double)(tr * MinHeapFreeRatio) / 100.0)
- / 100.0;
- }
- }
可以看到當 CMSInitiatingOccupancyFraction 參數配置值大于 0,就是 “io / 100.0”;
當 CMSInitiatingOccupancyFraction 參數配置值小于 0 時(注意,默認是 -1),是 “((100 - MinHeapFreeRatio) + (double)(tr * MinHeapFreeRatio) / 100.0) / 100.0”,這到底是多少呢?是 92%,這里就不貼出具體的計算過程了,或許你已經在某些書或者博客中了解過,CMSInitiatingOccupancyFraction 沒有配置,就是 92,但是其實 CMSInitiatingOccupancyFraction 沒有配置是 -1,所以閾值取后者 92%,并不是 CMSInitiatingOccupancyFraction 的值是 92。
(b) 接下來沒有配置 UseCMSInitiatingOccupancyOnly 的情況
這里也分成有兩小類情況:
- 當 Old Gen 剛因為對象分配空間而進行擴容,且成功分配空間,這時會考慮進行一次 CMS GC;
- 根據 CMS Gen 空閑鏈判斷,這里有點復雜,目前也沒整清楚,好在按照默認配置其實這里返回的是 false,所以默認是不用考慮這種觸發條件了。
4.根據增量 GC 是否可能會失敗(悲觀策略)
什么意思呢?兩代的 GC 體系中,主要指的是 Young GC 是否會失敗。如果 Young GC 已經失敗或者可能會失敗,JVM 就認為需要進行一次 CMS GC。
- bool incremental_collection_will_fail(bool consult_young) {
- // Assumes a 2-generation system; the first disjunct remembers if an
- // incremental collection failed, even when we thought (second disjunct)
- // that it would not.
- assert(heap()->collector_policy()->is_two_generation_policy(),
- "the following definition may not be suitable for an n(>2)-generation system");
- return incremental_collection_failed() ||
- (consult_young && !get_gen(0)->collection_attempt_is_safe());
- }
我們看兩個判斷條件,“incrementalcollectionfailed()” 和 “!getgen(0)->collectionattemptissafe()” incrementalcollectionfailed() 這里指的是 Young GC 已經失敗,至于為什么會失敗一般是因為 Old Gen 沒有足夠的空間來容納晉升的對象。
!getgen(0)->collectionattemptissafe() 指的是新生代晉升是否安全。 通過判斷當前 Old Gen 剩余的空間大小是否足夠容納 Young GC 晉升的對象大小。 Young GC 到底要晉升多少是無法提前知道的,因此,這里通過統計平均每次 Young GC 晉升的大小和當前 Young GC 可能晉升的***大小來進行比較。
- //av_promo 是平均每次 YoungGC 晉升的大小,max_promotion_in_bytes 是當前可能的***晉升大小( eden+from 當前使用空間的大小)
- bool res = (available >= av_promo) || (available >= max_promotion_in_bytes);
5.根據 meta space 情況判斷
這里主要看 metaspace 的 shouldconcurrent_collect 標志,這個標志在 meta space 進行擴容前如果配置了 CMSClassUnloadingEnabled 參數時,會進行設置。這種情況下就會進行一次 CMS GC。因此經常會有應用啟動不久,Old Gen 空間占比還很小的情況下,進行了一次 CMS GC,讓你很莫名其妙,其實就是這個原因導致的。
總結
本文梳理了 CMS GC 的 foreground collector 和 background collector 的觸發條件,foreground collector 的觸發條件相對來說比較簡單,而 background collector 的觸發條件比較多,分成 5 大種情況,各大種情況種還有一些小的觸發分支。尤其是在沒有配置 UseCMSInitiatingOccupancyOnly 參數的情況下,會多出很多種觸發可能,一般在生產環境是強烈建議配置 UseCMSInitiatingOccupancyOnly 參數,以便于能夠比較確定的執行 CMS GC,另外,也方便排查 GC 原因。