问题描述
在nic_link_down_check 报告里信息如下:
报警:
Link on NIC vmnic[x] of host [x.x.x.x] is down.
NCC 故障:
FAIL: One or more NICs is/are down on host x.x.x.x
解决方法
一般网卡down后,网卡的状态会在check_cvm_health_job_state.json 文件中保留24小时.
从 NCC 4.3.0 以后, nic_link_down_check会检查网卡状态,并报警, 24小时后, 如果网卡任然是down状态,并且后面没有在up过,此网卡信息会从 check_cvm_health_job_state.json文件中删除,并且在以后的nic_link_down_check检查中不会报警。
如果NCC版本低于4.3.0, 网卡down后,即使在Prism中手动消除报警, nic_link_down_check任然会触发,并在24小时内触发报警。
根据实际NCC版本参考如下处理方式:
NCC4.3.0以后,手动确认报警即可,后续不会再报警。
NCC低于4.3.0,升级NCC至最新版本,再手动确认报警即可。
如果NCC低于4.3.0,且无法升级,参考以下方法:
- 使用命令手动检查,并确认网卡状态.
nutanix@cvm$ allssh cat /home/nutanix/data/serviceability/check_cvm_health_job_state.json
类似以下输出:
Executing cat /home/nutanix/data/serviceability/check_cvm_health_job_state.json on the cluster
================== x.x.x.x =================
FIPS mode initialized
Nutanix Controller VM
{
"eth3": {
"Status": "Up",
"Timestamp": 1433865608
},
"eth2": {
"Status": "Up",
"Timestamp": 1433865608
},
"eth0": {
"Status": "Down",
"Timestamp": 1433865608
}
}================== x.x.x.x =================
FIPS mode initialized
Nutanix Controller VM
{
"eth3": {
"Status": "Up",
"Timestamp": 1433869207
},
"eth2": {
"Status": "Up",
"Timestamp": 1433869207
},
"eth0": {
"Status": "Down",
"Timestamp": 1433869207
}
}================== x.x.x.x =================
FIPS mode initialized
Nutanix Controller VM
{
"eth3": {
"Status": "Up",
"Timestamp": 1433869208
},
"eth2": {
"Status": "Up",
"Timestamp": 1433869208
},
"eth0": {
"Status": "Down",
"Timestamp": 1433869208
}
- 删除所有CVM上的check_cvm_health_job_state.json文件:
nutanix@cvm$ allssh /bin/rm /home/nutanix/data/serviceability/check_cvm_health_job_state.json