@i-pushkin

Kubernetes восстановление мастер ноды с etcd?

Кластер был развёрнут при помощи kubespray. Выпала одна из 3-х мастер нод. Просто cluster.yaml не помогал. Решил попробовать recover-control-plane.yaml подготовил как в документации
В итоге скрипт зависает в самом начале:
ansible-playbook -i inventory/dev-tickeron/inventory.ini --private-key ~/.ssh/id_rsa --user tickeradmin --become --become-user=root --limit etcd,kube_control_plane -e etcd_retries=50 recover-control-plane.yml -K
BECOME password: 

PLAY [localhost] ***************************************************************
skipping: no hosts matched
[WARNING]: Could not match supplied host pattern, ignoring: kube-master

PLAY [Add kube-master nodes to kube_control_plane] *****************************
skipping: no hosts matched
[WARNING]: Could not match supplied host pattern, ignoring: kube-node

PLAY [Add kube-node nodes to kube_node] ****************************************
skipping: no hosts matched
[WARNING]: Could not match supplied host pattern, ignoring: k8s-cluster

PLAY [Add k8s-cluster nodes to k8s_cluster] ************************************
skipping: no hosts matched
[WARNING]: Could not match supplied host pattern, ignoring: calico-rr

PLAY [Add calico-rr nodes to calico_rr] ****************************************
skipping: no hosts matched
[WARNING]: Could not match supplied host pattern, ignoring: no-floating

PLAY [Add no-floating nodes to no_floating] ************************************
skipping: no hosts matched
[WARNING]: Could not match supplied host pattern, ignoring: bastion

PLAY [bastion[0]] **************************************************************
skipping: no hosts matched

PLAY [etcd[0]] *****************************************************************
Tuesday 21 September 2021  17:31:07 +0400 (0:00:00.170)       0:00:00.171 ***** 
Tuesday 21 September 2021  17:31:07 +0400 (0:00:00.184)       0:00:00.355 ***** 
Tuesday 21 September 2021  17:31:07 +0400 (0:00:00.089)       0:00:00.445 ***** 
Tuesday 21 September 2021  17:31:08 +0400 (0:00:00.098)       0:00:00.543 ***** 
Tuesday 21 September 2021  17:31:08 +0400 (0:00:00.089)       0:00:00.633 ***** 
Tuesday 21 September 2021  17:31:08 +0400 (0:00:00.100)       0:00:00.733 ***** 
Tuesday 21 September 2021  17:31:08 +0400 (0:00:00.092)       0:00:00.825 ***** 
Tuesday 21 September 2021  17:31:08 +0400 (0:00:00.085)       0:00:00.911 ***** 
Tuesday 21 September 2021  17:31:08 +0400 (0:00:00.098)       0:00:01.010 ***** 
Tuesday 21 September 2021  17:31:08 +0400 (0:00:00.088)       0:00:01.098 ***** 
Tuesday 21 September 2021  17:31:08 +0400 (0:00:00.085)       0:00:01.183 ***** 
Tuesday 21 September 2021  17:31:08 +0400 (0:00:00.094)       0:00:01.278 ***** 
Tuesday 21 September 2021  17:31:08 +0400 (0:00:00.087)       0:00:01.366 ***** 
Tuesday 21 September 2021  17:31:08 +0400 (0:00:00.085)       0:00:01.451 ***** 
Tuesday 21 September 2021  17:31:09 +0400 (0:00:00.095)       0:00:01.546 ***** 
Tuesday 21 September 2021  17:31:09 +0400 (0:00:00.083)       0:00:01.630 ***** 
Tuesday 21 September 2021  17:31:09 +0400 (0:00:00.084)       0:00:01.715 ***** 
Tuesday 21 September 2021  17:31:11 +0400 (0:00:01.958)       0:00:03.674 ***** 

TASK [kubespray-defaults : Configure defaults] *********************************
ok: [dev-kube-master01.tickeron.local] => {
    "msg": "Check roles/kubespray-defaults/defaults/main.yml"
}
Tuesday 21 September 2021  17:31:11 +0400 (0:00:00.101)       0:00:03.775 ***** 
Tuesday 21 September 2021  17:31:11 +0400 (0:00:00.488)       0:00:04.264 ***** 

TASK [kubespray-defaults : create fallback_ips_base] ***************************
ok: [dev-kube-master01.tickeron.local]
Tuesday 21 September 2021  17:31:11 +0400 (0:00:00.186)       0:00:04.450 ***** 

TASK [kubespray-defaults : set fallback_ips] ***********************************
ok: [dev-kube-master01.tickeron.local]
Tuesday 21 September 2021  17:31:12 +0400 (0:00:00.120)       0:00:04.570 ***** 
Tuesday 21 September 2021  17:31:12 +0400 (0:00:00.092)       0:00:04.663 ***** 
Tuesday 21 September 2021  17:31:12 +0400 (0:00:00.090)       0:00:04.754 ***** 

TASK [recover_control_plane/etcd : Get etcd endpoint health] *******************
fatal: [dev-kube-master01.tickeron.local]: FAILED! => {"changed": false, "cmd": ["/usr/local/bin/etcdctl", "endpoint", "health"], "delta": "0:00:05.150947", "end": "2021-09-21 13:31:24.423697", "msg": "non-zero return code", "rc": 1, "start": "2021-09-21 13:31:19.272750", "stderr": "{\"level\":\"warn\",\"ts\":\"2021-09-21T13:31:24.381Z\",\"caller\":\"clientv3/retry_interceptor.go:62\",\"msg\":\"retrying of unary invoker failed\",\"target\":\"endpoint://client-37938514-2fd4-4b88-99b0-6f931720b396/192.168.11.242:2379\",\"attempt\":0,\"error\":\"rpc error: code = DeadlineExceeded desc = latest balancer error: all SubConns are in TransientFailure, latest connection error: connection error: desc = \\\"transport: Error while dialing dial tcp 192.168.11.242:2379: connect: connection refused\\\"\"}\n{\"level\":\"warn\",\"ts\":\"2021-09-21T13:31:24.406Z\",\"caller\":\"clientv3/retry_interceptor.go:62\",\"msg\":\"retrying of unary invoker failed\",\"target\":\"endpoint://client-e0bfc18d-af74-467d-b373-f2d40732edf8/192.168.11.200:2379\",\"attempt\":0,\"error\":\"rpc error: code = DeadlineExceeded desc = latest balancer error: all SubConns are in TransientFailure, latest connection error: connection error: desc = \\\"transport: authentication handshake failed: x509: certificate signed by unknown authority\\\"\"}\n{\"level\":\"warn\",\"ts\":\"2021-09-21T13:31:24.421Z\",\"caller\":\"clientv3/retry_interceptor.go:62\",\"msg\":\"retrying of unary invoker failed\",\"target\":\"endpoint://client-b56b6e2a-36ef-4628-aa47-2dd60637275b/192.168.11.219:2379\",\"attempt\":0,\"error\":\"rpc error: code = DeadlineExceeded desc = latest balancer error: all SubConns are in TransientFailure, latest connection error: connection error: desc = \\\"transport: authentication handshake failed: x509: certificate signed by unknown authority\\\"\"}\nhttps://192.168.11.242:2379 is unhealthy: failed to commit proposal: context deadline exceeded\nhttps://192.168.11.200:2379 is unhealthy: failed to commit proposal: context deadline exceeded\nhttps://192.168.11.219:2379 is unhealthy: failed to commit proposal: context deadline exceeded\nError: unhealthy cluster", "stderr_lines": ["{\"level\":\"warn\",\"ts\":\"2021-09-21T13:31:24.381Z\",\"caller\":\"clientv3/retry_interceptor.go:62\",\"msg\":\"retrying of unary invoker failed\",\"target\":\"endpoint://client-37938514-2fd4-4b88-99b0-6f931720b396/192.168.11.242:2379\",\"attempt\":0,\"error\":\"rpc error: code = DeadlineExceeded desc = latest balancer error: all SubConns are in TransientFailure, latest connection error: connection error: desc = \\\"transport: Error while dialing dial tcp 192.168.11.242:2379: connect: connection refused\\\"\"}", "{\"level\":\"warn\",\"ts\":\"2021-09-21T13:31:24.406Z\",\"caller\":\"clientv3/retry_interceptor.go:62\",\"msg\":\"retrying of unary invoker failed\",\"target\":\"endpoint://client-e0bfc18d-af74-467d-b373-f2d40732edf8/192.168.11.200:2379\",\"attempt\":0,\"error\":\"rpc error: code = DeadlineExceeded desc = latest balancer error: all SubConns are in TransientFailure, latest connection error: connection error: desc = \\\"transport: authentication handshake failed: x509: certificate signed by unknown authority\\\"\"}", "{\"level\":\"warn\",\"ts\":\"2021-09-21T13:31:24.421Z\",\"caller\":\"clientv3/retry_interceptor.go:62\",\"msg\":\"retrying of unary invoker failed\",\"target\":\"endpoint://client-b56b6e2a-36ef-4628-aa47-2dd60637275b/192.168.11.219:2379\",\"attempt\":0,\"error\":\"rpc error: code = DeadlineExceeded desc = latest balancer error: all SubConns are in TransientFailure, latest connection error: connection error: desc = \\\"transport: authentication handshake failed: x509: certificate signed by unknown authority\\\"\"}", "https://192.168.11.242:2379 is unhealthy: failed to commit proposal: context deadline exceeded", "https://192.168.11.200:2379 is unhealthy: failed to commit proposal: context deadline exceeded", "https://192.168.11.219:2379 is unhealthy: failed to commit proposal: context deadline exceeded", "Error: unhealthy cluster"], "stdout": "", "stdout_lines": []}
...ignoring
Tuesday 21 September 2021  17:31:24 +0400 (0:00:12.357)       0:00:17.112 ***** 

TASK [recover_control_plane/etcd : Set healthy fact] ***************************
ok: [dev-kube-master01.tickeron.local]
Tuesday 21 September 2021  17:31:24 +0400 (0:00:00.265)       0:00:17.378 ***** 

TASK [recover_control_plane/etcd : Set has_quorum fact] ************************
ok: [dev-kube-master01.tickeron.local]
Tuesday 21 September 2021  17:31:25 +0400 (0:00:00.172)       0:00:17.550 ***** 
included: /home/pipneogen/Documents/Work/kubespray/roles/recover_control_plane/etcd/tasks/recover_lost_quorum.yml for dev-kube-master01.tickeron.local
Tuesday 21 September 2021  17:31:25 +0400 (0:00:00.180)       0:00:17.731 *****


Как починить?
  • Вопрос задан
  • 542 просмотра
Пригласить эксперта
Ваш ответ на вопрос

Войдите, чтобы написать ответ

Войти через центр авторизации
Похожие вопросы