問(wèn)題解惑:MPI Application rank 0 exited before MPI_Finalize() with status 2

2016-09-23  by:CAE仿真在線  來(lái)源:互聯(lián)網(wǎng)

這種問(wèn)題是fluent多線程問(wèn)題,一旦出現(xiàn)這種問(wèn)題整個(gè)fluent就死掉了,所有的數(shù)據(jù)都無(wú)法保存,問(wèn)題很嚴(yán)重。

但是問(wèn)題一般情況不是多線程本身的問(wèn)題,而是因?yàn)榫€程里面運(yùn)行的計(jì)算過(guò)程出現(xiàn)了問(wèn)題。


1、MPI_Finalize() with status 2 原因之一:出現(xiàn)負(fù)體積

只要出現(xiàn)負(fù)體積,線程的計(jì)算就無(wú)法進(jìn)行下去了,這時(shí)候線程要拋出異常終止

Error at Node 3: Update-Dynamic-Mesh failed. Negative cell volume detected.

WARNING: 2 cells with non-positive volume detected.MPI Application rank 0 exited before MPI_Finalize() with status 2


2、MPI_Finalize() with status 2 原因之二:任何原因出現(xiàn)發(fā)散或速度或移動(dòng)推進(jìn)速度過(guò)快的情況,如Courant數(shù)超大

112 more time steps

Updating solution at time level N...
Global Courant Number [Explicit VOF Criteria] : 471.06  

Error at Node 0: Global courant number is greater than 250.00   The 
velocity field is probably diverging. Please check the solution 
and reduce the time-step if necessary.

Error at Node 1: Global courant number is greater than 250.00   The 
velocity field is probably diverging. Please check the solution 
and reduce the time-step if necessary.

Error at Node 2: Global courant number is greater than 250.00   The 
velocity field is probably diverging. Please check the solution 
and reduce the time-step if necessary.

Error at Node 3: Global courant number is greater than 250.00   The 
velocity field is probably diverging. Please check the solution 
and reduce the time-step if necessary.
MPI Application rank 0 exited before MPI_Finalize() with status 2

===============Message from the Cortex Process================================

Fatal error in one of the compute processes.

==============================================================================

Error: Cortex received a fatal signal (unrecognized signal).
Error Object: ()

Error: There is no active application.
Error Object: (case-modified?)

Error: No journal response to dialog box message:'There is no active application.'
. Internally, cancelled the dialog.

Error: There is no active application.
Error Object: (rp-var-value 'physical-time-step)

Error: There is no active application.
Error Object: (rp-var-value 'delta-time-sampled)

3、MPI_Finalize() with status 2 原因之三:內(nèi)存過(guò)度緊張,多線程中只要任何一個(gè)線程無(wú)法分配到足夠的內(nèi)存,就會(huì)終止


一般情況下,MPI_Finalize() with status 出現(xiàn)之前會(huì)有錯(cuò)誤信息,如上面e文所示,但是有些情況是沒(méi)有的

如下圖所示



上面的計(jì)算是在進(jìn)行一次正常的動(dòng)網(wǎng)格重構(gòu)完成后,進(jìn)入下一次迭代求解計(jì)算的時(shí)候,直接出現(xiàn)了問(wèn)題。

筆者檢查機(jī)器此時(shí)內(nèi)存占用已經(jīng)達(dá)到92%的水平,為了進(jìn)一步驗(yàn)證這個(gè)猜測(cè),本人馬上用一個(gè)可以正常計(jì)算的case,在內(nèi)存90%以上的情況下進(jìn)行計(jì)算

開(kāi)始可以計(jì)算一步,第二部就直接MPI_Finalize() with status 2 


fluent多線程mpi異常退出問(wèn)題,還有多種不同的status,如-1最多,其實(shí)只有兩種類型的錯(cuò)誤,一種是腳本錯(cuò)誤,一種是物理模型數(shù)據(jù)錯(cuò)誤

前者如 journal file 腳本 udf腳本,這些錯(cuò)誤一般會(huì)導(dǎo)致-1或其他-值,后者就是發(fā)散、超指標(biāo)等導(dǎo)致異常物理指標(biāo)的情況終止。

情況很多,各位要具體問(wèn)題具體分析,先看出現(xiàn)問(wèn)題之前的log,如果有l(wèi)og這就是問(wèn)題根源,如果沒(méi)有l(wèi)og提示,很可能就是內(nèi)存問(wèn)題


還有一種情況是在多線程計(jì)算中其實(shí)內(nèi)部不同線程一直在持續(xù)通訊,如果計(jì)算過(guò)程網(wǎng)絡(luò)環(huán)境變化,直接就回出問(wèn)題,下圖是計(jì)算過(guò)程通訊圖,如果網(wǎng)絡(luò)改變了,比如ip或網(wǎng)卡屬性變化,在計(jì)算期間是不允許的。







開(kāi)放分享:優(yōu)質(zhì)有限元技術(shù)文章,助你自學(xué)成才

相關(guān)標(biāo)簽搜索:問(wèn)題解惑:MPI Application rank 0 exited before MPI_Finalize() with status 2 Fluent培訓(xùn) Fluent流體培訓(xùn) Fluent軟件培訓(xùn) fluent技術(shù)教程 fluent在線視頻教程 fluent資料下載 fluent分析理論 fluent化學(xué)反應(yīng) fluent軟件下載 UDF編程代做 Fluent、CFX流體分析 HFSS電磁分析 

編輯
在線報(bào)名:
  • 客服在線請(qǐng)直接聯(lián)系我們的客服,您也可以通過(guò)下面的方式進(jìn)行在線報(bào)名,我們會(huì)及時(shí)給您回復(fù)電話,謝謝!
驗(yàn)證碼

全國(guó)服務(wù)熱線

1358-032-9919

廣州公司:
廣州市環(huán)市中路306號(hào)金鷹大廈3800
電話:13580329919
          135-8032-9919
培訓(xùn)QQ咨詢:點(diǎn)擊咨詢 點(diǎn)擊咨詢
項(xiàng)目QQ咨詢:點(diǎn)擊咨詢
email:kf@1cae.com