19 个 K8S 日常故障处理集锦!

问题1:K8S集群服务访问失败?



curl:(60)Peer'sCertificateissuerisnotrecognized.

Moredetailshere:

curlperformsSSLcertificateverificationbydefault,usinga"bundle"
ofCertificateAuthority(CA)publickeys(CAcerts).Ifthedefault
bundlefileisn'tadequate,youcanspecifyanalternatefile
usingthe--cacertoption.
IfthisHTTPSserverusesacertificatesignedbyaCArepresentedin
thebundle,thecertificateverificationprobablyfailedduetoa
problemwiththecertificate(itmightbeexpired,orthenamemight
notmatchthedomainnameintheURL).
Ifyou'dliketoturnoffcurl'sverificationofthecertificate,use
the-k(or--insecure)option.

原因分析:证书不能被识别,其原因为:自定义证书,过期等。

解决方法:更新证书即可。

问题2:K8S集群服务访问失败?

curl:(7):3000;Connectionrefused

原因分析:端口映射错误,服务正常工作,但不能提供服务。

解决方法:删除svc,重新映射端口即可。

kubectldeletesvcnginx-deployment

问题3:K8S集群服务暴露失败?

Errorfromserver(AlreadyExists):services"nginx-deployment"alreadyexists

原因分析:该容器已暴露服务了。

解决方法:删除svc,重新映射端口即可。

问题4:外网无法访问K8S集群提供的服务?

原因分析:K8S集群的type为ClusterIP,未将服务暴露至外网。

解决方法:修改K8S集群的type为NodePort即可,于是可通过所有K8S集群节点访问服务。

kubectleditsvcnginx-deployment

问题5:pod状态为ErrImagePull?

readiness-httpget-pod0/1ErrImagePull010s

原因分析:image无法拉取;

WarningFailed59m(x4over61m)kubelet,k8s-node01Error:ErrImagePull

解决方法:更换镜像即可。

问题6:创建initC容器后,其状态不正常?

NAMEREADYSTATUSRESTARTSAGE
myapp-pod0/1Init:0/2020s

原因分析:查看日志发现,pod一直出于初始化中;然后查看pod详细信息,定位pod创建失败的原因为:初始化容器未执行完毕。

Errorfromserver(BadRequest):container"myapp-container"inpod"myapp-pod"iswaitingtostart:PodInitializing

waitingformyservice

Server:10.96.0.10
Address:10.96.0.10:53

**servercan':NXDOMAIN

***Can':Noanswer
***Can':Noanswer
***Can':Noanswer
***Can':Noanswer
***Can':Noanswer

解决方法:创建相关service,将SVC的name写入K8S集群的coreDNS服务器中,于是coreDNS就能对POD的initC容器执行过程中的域名解析了。

NAMEREADYSTATUSRESTARTSAGE

myapp-pod0/1Init:1/2027m
myapp-pod0/1PodInitializing028m
myapp-pod1/1Running028m

问题7:探测存活pod状态为CrashLoopBackOff?

readiness-httpget-pod0/1CrashLoopBackOff113s
readiness-httpget-pod0/1Completed220s
readiness-httpget-pod0/1CrashLoopBackOff231s
readiness-httpget-pod0/1Completed342s
readiness-httpget-pod0/1CrashLoopBackOff353s

原因分析:镜像问题,导致容器重启失败。

Events:TypeReasonAgeFromMessage-------------------------NormalPulling56mkubelet,k8s-node01Pullingimage"/library/mylandmarktech/myapp:v1"NormalPulled56mkubelet,k8s-node01Successfullypulledimage"/library/mylandmarktech/myapp:v1"NormalCreated56m(x3over56m)kubelet,k8s-node01Createdcontainerreadiness-httpget-containerNormalStarted56m(x3over56m)kubelet,k8s-node01Startedcontainerreadiness-httpget-containerNormalPulled56m(x2over56m)kubelet,k8s-node01Containerimage"/library/mylandmarktech/myapp:v1"alreadypresentonmachineWarningUnhealthy56mkubelet,k8s-node01Readinessprobefailed:Get(x4over56m)kubelet,k8s-node01Back-offrestartingfailedcontainerNormalScheduled50sdefault-schedulerSuccessfullyassigneddefault/readiness-httpget-podtok8s-node01

解决方法:更换镜像即可。

问题8:POD创建失败?

readiness-httpget-pod0/1Ping00sreadiness-httpget-pod0/1Ping00sreadiness-httpget-pod0/1ContainerCreating00sreadiness-httpget-pod0/1Error02sreadiness-httpget-pod0/1Error13sreadiness-httpget-pod0/1CrashLoopBackOff14sreadiness-httpget-pod0/1Error215sreadiness-httpget-pod0/1CrashLoopBackOff226sreadiness-httpget-pod0/1Error337sreadiness-httpget-pod0/1CrashLoopBackOff352sreadiness-httpget-pod0/1Error482s

原因分析:镜像问题导致容器无法启动。

[root@k8s-master01~]30:*1open()"/usr/share/nginx/html/"failed(2:Nosuchfileordirectory),client:10.244.2.1,server:localhost,request:"GET//1.1",host:"10.244.2.25:80"10.244.2.1--[11/Jun/2021:07:10:14+0000]"GET//1.1"404153"-""kube-probe/1.15""-"10.244.2.1--[11/Jun/2021:07:10:17+0000]"GET//1.1"404153"-""kube-probe/1.15""-"

Events:TypeReasonAgeFromMessage-------------------------NormalPulled64mkubelet,k8s-node01Containerimage"/library/nginx"alreadypresentonmachineNormalCreated64mkubelet,k8s-node01Createdcontainerreadiness-httpget-containerNormalStarted64mkubelet,k8s-node01Startedcontainerreadiness-httpget-containerWarningUnhealthy59m(x101over64m)kubelet,k8s-node01Readinessprobefailed:HTTPprobefailedwithstatuscode:404NormalScheduled8m16sdefault-schedulerSuccessfullyassigneddefault/readiness-httpget-podtok8s-node01

解决方法:进入容器内部,创建yaml定义的资源

问题10:pod创建失败?

error:errorvalidating"":errorvalidatingdata:ValidationError([0])::got"string",expected"map";ifyouchoosetoignoretheseerrors,turnvalidationoffwith--validate=false

原因分析:yml文件内容出错---使用中文字符;

解决方法:修改myregistrykey内容即可。

11、kube-flannel-ds-amd64-ndsf7插件pod的status为Init:0/1?

排查思路:kubectl-nkube-systemdescribepodkube-flannel-ds-amd64-ndsf7helminstallError:Thiscommandneeds1argument:chartnam[root@k8s-master01hello-world]helmupgradejoyous-wasp./UPGRADEFAILEDROLLINGBACKError:rererrorin"hello-world/templates/":template:hello-world/templates/:14:35:executing"hello-world/templates/":can'tevaluatefieldimageintypeinterface{}Error:UPGRADEFAILED:rererrorin"hello-world/templates/":template:hello-world/templates/:14:35:executing"hello-world/templates/":can'tevaluatefieldimageintypeinterface{}

原因分析:yaml文件语法错误。

解决方法:修改yaml文件即可。

21、etcd启动失败?

[root@k8s-master01~]systemctlstatusetcd●:loaded(/usr/lib/systemd/system/;enabled;vorpreset:disabled)Active:activating(start)sinceWed2021-07-1409:53:03CST;1min6sagoDocs:(etcd)CGroup://└─39692/usr/local/bin/etcd--config-file=/etc/etcd/:54:09k8s-master01etcd[39692]:rejectedconnectionfrom"192.168.0.108:46168"(error"remoteerror:tls:badcertificate",ServerName"")Jul1409:54:09k8s-master01etcd[39692]:rejectedconnectionfrom"192.168.0.108:46166"(error"remoteerror:tls:badcertificate",ServerName"")Jul1409:54:09k8s-master01etcd[39692]:rejectedconnectionfrom"192.168.0.108:46170"(error"remoteerror:tls:badcertificate",ServerName"")Jul1409:54:09k8s-master01etcd[39692]:rejectedconnectionfrom"192.168.0.108:46172"(error"remoteerror:tls:badcertificate",ServerName"")Jul1409:54:09k8s-master01etcd[39692]:rejectedconnectionfrom"192.168.0.108:46176"(error"remoteerror:tls:badcertificate",ServerName"")Jul1409:54:09k8s-master01etcd[39692]:rejectedconnectionfrom"192.168.0.108:46174"(error"remoteerror:tls:badcertificate",ServerName"")Jul1409:54:09k8s-master01etcd[39692]:rejectedconnectionfrom"192.168.0.108:46178"(error"remoteerror:tls:badcertificate",ServerName"")Jul1409:54:09k8s-master01etcd[39692]:rejectedconnectionfrom"192.168.0.108:46180"(error"remoteerror:tls:badcertificate",ServerName"")Jul1409:54:10k8s-master01etcd[39692]:rejectedconnectionfrom"192.168.0.108:46182"(error"remoteerror:tls:badcertificate",ServerName"")Jul1409:54:10k8s-master01etcd[39692]:rejectedconnectionfrom"192.168.0.108:46186"(error"remoteerror:tls:badcertificate",ServerName"")

解决方法:kill占用2379端口的进程,重启etcd即可。

22、svc反代理服务,跨域访问失败?

Connectingtoexternalname(183.232.231.172:80)
wget:serverreturnederror:HTTP/1.1403Forbidden

原因分析:pod跨域访问,被百度禁止访问;

解决方法:修改访问策略即可(略略)。

发布于 2025-04-18
100
目录

    推荐阅读