Pod健康检查

  • Post author:
  • Post category:Kubernetes
  • Page Views 383 阅读

1.概念

健康检查有Liveness和Readiness两种机制,两种探测的配置和参数完全一样,不同之处在于探测失败后的动作:Liveness是重启容器;Readiness是将容器设置为不可用。

2.例子

fxx2@kube-node-1:~/yaml$ cat liveness.yaml
apiVersion: v1
kind: Pod
metadata:
name: liveness
spec:
restartPolicy: OnFailure #Always表示一旦不管以何种方式终止运行,pod都将重启;
#OnFailure表示只有pod以非0退出码退出才重启;
#Nerver表示不重启pod,但会一直生成新pod,直到成功为止。
containers:
- name: liveness
image: busybox:latest
imagePullPolicy: IfNotPresent
args:
- /bin/sh
- -c
- touch /tmp/liveness; sleep 10; rm -rf /tmp/liveness; sleep 30;
readinessProbe: #或为livenessProbe
exec:
command:
- cat
- /tmp/liveness
initialDelaySeconds: 5 #指容器启动5s后开始探测
periodSeconds: 5 #指每5s探测一次

3.在sacle up中的应用

apiVersion: apps/v1
kind: Deployment
metadata:
name: nginx-deploy
spec:
selector:
matchLabels:
app: web_server
revisionHistoryLimit: 4
replicas: 2
template:
metadata:
labels:
app: web_server
spec:
containers:
- name: nginx
image: nginx:1.16.1
imagePullPolicy: IfNotPresent
ports:
- containerPort: 80
readinessProbe:
httpGet: #探测成功的条件是http请求的返回代码在200-400之间
scheme: HTTP #指定协议,支持http和https
path: /path #访问路径
port: 80 #访问端口
initiaDelaySeconds: 10
periodSeconds: 5

4.在滚动更新中的应用

fxx2@kube-node-1:~/yaml$ cat check.yaml
apiVersion: apps/v1
kind: Deployment
metadata:
name: nginx-deploy
spec:
selector:
matchLabels:
app: web_server
revisionHistoryLimit: 4
replicas: 6
template:
metadata:
labels:
app: web_server
spec:
containers:
- name: nginx
image: nginx:1.16.1
imagePullPolicy: IfNotPresent
ports:
- containerPort: 80
args:
- /bin/sh
- -c
- sleep 5; touch /tmp/liveness; sleep 10000;
readinessProbe:
exec:
command:
- cat
- /tmp/liveness
initialDelaySeconds: 5
periodSeconds: 5
fxx2@kube-node-1:~/yaml$ kubectl apply -f check.yaml
fxx2@kube-node-1:~/yaml$ kubectl get deploy nginx-deploy
NAME READY UP-TO-DATE AVAILABLE AGE
nginx-deploy 6/6 6 6 21m
fxx2@kube-node-1:~$ kubectl get pod
NAME READY STATUS RESTARTS AGE
nginx-deploy-745ff9dbf7-45xt2 1/1 Running 0 19m
nginx-deploy-745ff9dbf7-55dsn 1/1 Running 0 19m
nginx-deploy-745ff9dbf7-5pcqt 1/1 Running 0 19m
nginx-deploy-745ff9dbf7-c2bhx 1/1 Running 0 19m
nginx-deploy-745ff9dbf7-m7sc9 1/1 Running 0 19m
nginx-deploy-745ff9dbf7-zpthj 1/1 Running 0 19m

接下来使用错误的配置文件来滚动更新

fxx2@kube-node-1:~/yaml$ cat check-error.yaml
......
spec:
containers:
- name: nginx
image: nginx:1.16.1
imagePullPolicy: IfNotPresent
ports:
- containerPort: 80
args:
- /bin/sh
- -c
- sleep 50000;
readinessProbe:
exec:
command:
- cat
- /tmp/liveness
initialDelaySeconds: 5
periodSeconds: 5
fxx2@kube-node-1:~/yaml$ kubectl get deploy nginx-deploy
NAME READY UP-TO-DATE AVAILABLE AGE
nginx-deploy 5/6 3 5 27m
fxx2@kube-node-1:~/yaml$ kubectl get pod
NAME READY STATUS RESTARTS AGE
nginx-deploy-64fcc74d96-cspd6 0/1 Running 0 99s
nginx-deploy-64fcc74d96-hjx4r 0/1 Running 0 99s
nginx-deploy-64fcc74d96-sk9db 0/1 Running 0 99s
nginx-deploy-745ff9dbf7-45xt2 1/1 Running 0 27m
nginx-deploy-745ff9dbf7-5pcqt 1/1 Running 0 27m
nginx-deploy-745ff9dbf7-c2bhx 1/1 Running 0 27m
nginx-deploy-745ff9dbf7-m7sc9 1/1 Running 0 27m
nginx-deploy-745ff9dbf7-zpthj 1/1 Running 0 27m

根据上面的结果可以得出结论:

新pod无法通过readiness探测,状态一直会保持未准备状态;

健康检查屏蔽了具有缺陷的pod,保留了大部分可用的旧pod,从而业务不会受到影响。

相关概念:

READY(5/6):期望状态是6个ready的pod;

UP-TO-DATE(3):表示已更新3个新pod;

AVAILABLE(5):表示当前处于ready的pod,即旧pod。

至于为什么一下子更新了三个pod,而销毁了一个旧pod,这个数量是由maxSurge和maxUnavailable两个参数决定的。

maxSurge:控制滚动更新过程中副本超过READY的上限。值可以为具体的整数,也可以是百分比,默认值25%,向上取整。

上面的例子,ready为6,那么最大pod数量为:6+6*25%=8,也就是旧版本k8s中current的值。

maxUnavailable:控制滚动更新过程中,不可用pod占的ready值的最大比例,值可以为具体的整数,也可以是百分比,默认值25%,向下取整。

上面的例子,ready为6,那么可用的副本至少为6-(6*25%)=5

具体更新过程如下:

1.创建两个新pod,使pod数量达到8;

2.销毁一个旧pod,使可以用pod达到5;

3.销毁一个pod后,再新建一个pod,使pod总数达到8;

4.新pod通过readiness探测后,ready pod数量会超过5,从而销毁旧pod,使数量恢复至5;

5.旧pod销毁,总pod数会少于8,从而继续创建新副本;

6.具体过程可以使用describe命令查看。


「 文章如果对你有帮助,请点个赞哦^^ 」 

0