chemtech
@chemtech
Линуксойд, DevOps

Почему pod prometheus-operator имеет статус CrashLoopBackOff?

Почему pod prometheus-operator имеет статус CrashLoopBackOff ?
Установил последнюю версию kubernetes через kubespray
[root@node1 ~]# helm init --service-account tiller
$HELM_HOME has been configured at /root/.helm.
Warning: Tiller is already installed in the cluster.
(Use --client-only to suppress this message, or --upgrade to upgrade Tiller to the current version.)
Happy Helming!
[root@node1 ~]# kubectl create ns monitoring
namespace/monitoring created
[root@node1 ~]# helm repo add coreos https://s3-eu-west-1.amazonaws.com/coreos-charts/stable/
"coreos" has been added to your repositories
[root@node1 ~]# helm install --name prometheus-operator --namespace monitoring --set rbacEnable=false coreos/prometheus-operator
Error: timed out waiting for the condition

[root@node1 ~]# kubectl get pvc --namespace=monitoring
No resources found.


kubectl get pod --namespace=monitoring
NAME                                   READY   STATUS             RESTARTS   AGE
prometheus-operator-858dffb4cf-79wzb   0/1     CrashLoopBackOff   7          14m


Подробное описание:
kubectl describe pod --namespace=monitoring prometheus-operator-858dffb4cf-79wzb 
Name:               prometheus-operator-858dffb4cf-79wzb
Namespace:          monitoring
Priority:           0
PriorityClassName:  <none>
Node:               node1/10.233.60.104
Start Time:         Fri, 12 Oct 2018 11:43:44 +0300
Labels:             app=prometheus-operator
                    operator=prometheus
                    pod-template-hash=858dffb4cf
                    release=prometheus-operator
Annotations:        <none>
Status:             Running
IP:                 10.233.104.132
Controlled By:      ReplicaSet/prometheus-operator-858dffb4cf
Containers:
  prometheus-operator:
    Container ID:  docker://0ce4ec30c86b12de2f27f55b8f90e3bc47aecd3aef0adb5aec750b9d400c7b46
    Image:         quay.io/coreos/prometheus-operator:v0.20.0
    Image ID:      docker-pullable://quay.io/coreos/prometheus-operator@sha256:88cd66e273db8f96cfcce2eec03c04b04f0821f3f8d440396af2b5510667472d
    Port:          8080/TCP
    Host Port:     0/TCP
    Args:
      --kubelet-service=kube-system/kubelet
      --prometheus-config-reloader=quay.io/coreos/prometheus-config-reloader:v0.20.0
      --config-reloader-image=quay.io/coreos/configmap-reload:v0.0.1
    State:          Waiting
      Reason:       CrashLoopBackOff
    Last State:     Terminated
      Reason:       Error
      Exit Code:    1
      Started:      Fri, 12 Oct 2018 11:54:31 +0300
      Finished:     Fri, 12 Oct 2018 11:54:31 +0300
    Ready:          False
    Restart Count:  7
    Environment:    <none>
    Mounts:
      /var/run/secrets/kubernetes.io/serviceaccount from default-token-jbqrb (ro)
Conditions:
  Type              Status
  Initialized       True 
  Ready             False 
  ContainersReady   False 
  PodScheduled      True 
Volumes:
  default-token-jbqrb:
    Type:        Secret (a volume populated by a Secret)
    SecretName:  default-token-jbqrb
    Optional:    false
QoS Class:       BestEffort
Node-Selectors:  <none>
Tolerations:     node.kubernetes.io/not-ready:NoExecute for 300s
                 node.kubernetes.io/unreachable:NoExecute for 300s
Events:
  Type     Reason     Age                   From                                 Message
  ----     ------     ----                  ----                                 -------
  Normal   Scheduled  14m                   default-scheduler                    Successfully assigned monitoring/prometheus-operator-858dffb4cf-79wzb to node1
  Normal   Pulling    14m                   kubelet, node1  pulling image "quay.io/coreos/prometheus-operator:v0.20.0"
  Normal   Pulled     14m                   kubelet, node1  Successfully pulled image "quay.io/coreos/prometheus-operator:v0.20.0"
  Normal   Created    13m (x5 over 14m)     kubelet, node1  Created container
  Normal   Started    13m (x5 over 14m)     kubelet, node1  Started container
  Normal   Pulled     13m (x4 over 14m)     kubelet, node1  Container image "quay.io/coreos/prometheus-operator:v0.20.0" already present on machine
  Warning  BackOff    4m47s (x47 over 14m)  kubelet, node1  Back-off restarting failed container
  • Вопрос задан
  • 668 просмотров
Решения вопроса 1
chemtech
@chemtech Автор вопроса
Линуксойд, DevOps
kubectl logs prometheus-operator-858dffb4cf-79wzb -n monitoring
ts=2018-10-12T09:04:34.592045358Z caller=main.go:167 msg="Unhandled error received. Exiting..." err="getting CRD: Alertmanager: customresourcedefinitions.apiextensions.k8s.io \"alertmanagers.monitoring.coreos.com\" is forbidden: User \"system:serviceaccount:monitoring:default\" cannot get resource \"customresourcedefinitions\" in API group \"apiextensions.k8s.io\" at the cluster scope"

Какие то проблемы с RBAC
kubectl -n monitoring get all -l app=prometheus-operator
NAME                                       READY   STATUS             RESTARTS   AGE
pod/prometheus-operator-858dffb4cf-79wzb   0/1     CrashLoopBackOff   15         54m

NAME                                  DESIRED   CURRENT   UP-TO-DATE   AVAILABLE   AGE
deployment.apps/prometheus-operator   1         1         1            0           54m

NAME                                             DESIRED   CURRENT   READY   AGE
replicaset.apps/prometheus-operator-858dffb4cf   1         1         0       54m

NAME                                          COMPLETIONS   DURATION   AGE
job.batch/prometheus-operator-create-sm-job   0/1           54m        54m


надо попробовать по этой статье
https://github.com/coreos/prometheus-operator/tree...
Ответ написан
Комментировать
Пригласить эксперта
Ваш ответ на вопрос

Войдите, чтобы написать ответ

Войти через центр авторизации
Похожие вопросы