Задать вопрос
maximkv25
@maximkv25
web-developer

Почему падает контейнер в докере с exit code 137?

Есть несколько контейнеров
root# docker-compose ps
      Name                    Command               State                 Ports
---------------------------------------------------------------------------------------------
back_async_1       /bin/sh -c python main.py        Up      0.0.0.0:8070->8070/tcp
back_beat_1        celery -A backend beat           Up
back_celery_1      celery -A backend worker - ...   Up
back_flower_1      /bin/sh -c celery -A backe ...   Up      0.0.0.0:5555->5555/tcp
back_mysql_1       /entrypoint.sh mysqld            Up      0.0.0.0:3300->3306/tcp, 33060/tcp
back_redis_1       docker-entrypoint.sh redis ...   Up      0.0.0.0:6379->6379/tcp
back_web_1         uwsgi --ini /app/uwsgi.ini       Up      0.0.0.0:8080->8080/tcp
back_ws_server_1   /bin/sh -c python main.py        Up      0.0.0.0:8060->8060/tcp


На сервере 4 гига оперативки.

Падает back_celery_1 с кодом 137, прочитал, что это может быть недостаток RAM, но мониторинг показывает что максимум использовалось 1.5 гига.
Падение происходит в промежутке от 3 до 6 часов полноценной работы. Сервер выполняет запросы к социальным сетям.

После запуска всех контейнеров
CONTAINER ID        NAME                CPU %               MEM USAGE / LIMIT     MEM %               NET I/O             BLOCK I/O           PIDS
ca8934bd9a75        back_redis_1        0.17%               1.223MiB / 1.977GiB   0.06%               1.77MB / 2.32MB     0B / 16.4kB         4
4014dd4a2ecc        back_celery_1       0.17%               517.8MiB / 1.977GiB   25.58%              1.52MB / 1.46MB     4.1kB / 4.1kB       21
d95f6fd88e96        back_web_1          0.00%               49.92MiB / 1.977GiB   2.47%               1.9kB / 2.4kB       0B / 0B             5
bda4eee57ded        back_beat_1         0.00%               42.23MiB / 1.977GiB   2.09%               7.6kB / 64kB        0B / 106kB          1
ce40f4a1b4af        back_flower_1       0.02%               46.61MiB / 1.977GiB   2.30%               938kB / 334kB       0B / 0B             7
b2d84df70cfe        back_async_1        0.01%               19.75MiB / 1.977GiB   0.98%               1.21kB / 0B         0B / 0B             2
430734923ffe        back_ws_server_1    0.02%               19.75MiB / 1.977GiB   0.98%               1.21kB / 0B         0B / 0B             2
f0c7143c246e        back_mysql_1        0.08%               201.8MiB / 1.977GiB   9.97%               94.6kB / 145kB      1.18MB / 14.6MB     30


Последние логи, но при мониторинге сервера видно, что использование ресурсов упало в 20:50. Возможно логи не записывались все это время.
$docker logs --details back_celery_1

[2018-02-28 17:56:16,151: INFO/ForkPoolWorker-18] Task stats.views.collect_stats[7d4730f8-ab6a-446d-b516-2d8d4ba0b9c8] succeeded in 42.99442209396511s: None
 Import Error

  -------------- celery@4014dd4a2ecc v4.1.0 (latentcall)
 ---- **** -----
 --- * ***  * -- Linux-4.4.0-34-generic-x86_64-with-debian-8.9 2018-02-28 13:16:20
 -- * - **** ---
 - ** ---------- [config]
 - ** ---------- .> app:         backend:0x7f9bd22247f0
 - ** ---------- .> transport:   redis://redis:6379/0
 - ** ---------- .> results:     disabled://
 - *** --- * --- .> concurrency: 20 (prefork)
 -- ******* ---- .> task events: OFF (enable -E to monitor tasks in this worker)
 --- ***** -----
  -------------- [queues]
                 .> celery           exchange=celery(direct) key=celery


 [tasks]
   . CallbackNotifier
   . FB posting
   . FB token status
   . MD posting
   . MD token status
   . OK posting
   . OK token status
   . TW posting
   . TW token status
   . VK posting
   . VK token status
   . api.controllers.message.scheduled_message
   . backend.celery.debug_task
   . stats.views.collect_stats

 /usr/local/lib/python3.4/site-packages/celery/platforms.py:795: RuntimeWarning: You're running the worker with superuser privileges: this is
 absolutely not recommended!

 Please specify a different user using the -u option.

 User information: uid=0 euid=0 gid=0 egid=0

   uid=uid, euid=euid, gid=gid, egid=egid,


Логи celery, содержащие ошибку о redis, буквально после 1 минуты логи celery перестали записываться.
[2018-02-28 17:55:34,221: CRITICAL/MainProcess] Unrecoverable error: ResponseError('MISCONF Redis is configured to save RDB snapshots, but it is currently not able to persist on disk. Commands that may modify the data set are disabled, because this instance is configured to report errors during writes if RDB snapshotting fails (stop-writes-on-bgsave-error option). Please check the Redis logs for details about the RDB error.',)
 Traceback (most recent call last):
   File "/usr/local/lib/python3.4/site-packages/celery/worker/worker.py", line 203, in start
     self.blueprint.start(self)
   File "/usr/local/lib/python3.4/site-packages/celery/bootsteps.py", line 119, in start
     step.start(parent)
   File "/usr/local/lib/python3.4/site-packages/celery/bootsteps.py", line 370, in start
     return self.obj.start()
   File "/usr/local/lib/python3.4/site-packages/celery/worker/consumer/consumer.py", line 320, in start
     blueprint.start(self)
   File "/usr/local/lib/python3.4/site-packages/celery/bootsteps.py", line 119, in start
     step.start(parent)
   File "/usr/local/lib/python3.4/site-packages/celery/worker/consumer/consumer.py", line 596, in start
     c.loop(*c.loop_args())
   File "/usr/local/lib/python3.4/site-packages/celery/worker/loops.py", line 88, in asynloop
     next(loop)
   File "/usr/local/lib/python3.4/site-packages/kombu/async/hub.py", line 354, in create_loop
     cb(*cbargs)
   File "/usr/local/lib/python3.4/site-packages/kombu/transport/redis.py", line 1040, in on_readable
     self.cycle.on_readable(fileno)
   File "/usr/local/lib/python3.4/site-packages/kombu/transport/redis.py", line 337, in on_readable
     chan.handlers[type]()
   File "/usr/local/lib/python3.4/site-packages/kombu/transport/redis.py", line 714, in _brpop_read
     **options)
   File "/usr/local/lib/python3.4/site-packages/redis/client.py", line 680, in parse_response
     response = connection.read_response()
   File "/usr/local/lib/python3.4/site-packages/redis/connection.py", line 629, in read_response
     raise response
 redis.exceptions.ResponseError: MISCONF Redis is configured to save RDB snapshots, but it is currently not able to persist on disk. Commands that may modify the data set are disabled, because this instance is configured to report errors during writes if RDB snapshotting fails (stop-writes-on-bgsave-error option). Please check the Redis logs for details about the RDB error.


Логи redis, после запуска
1:M 01 Mar 08:24:09.060 * Background saving started by pid 8738
 8738:C 01 Mar 08:24:09.060 # Failed opening the RDB file root (in server root dir /run) for saving: Permission denied
 1:M 01 Mar 08:24:09.160 # Background saving error
 1:C 01 Mar 08:24:16.265 # oO0OoO0OoO0Oo Redis is starting oO0OoO0OoO0Oo
 1:C 01 Mar 08:24:16.269 # Redis version=4.0.6, bits=64, commit=00000000, modified=0, pid=1, just started
 1:C 01 Mar 08:24:16.269 # Warning: no config file specified, using the default config. In order to specify a config file use redis-server /path/to/redis.conf
 1:M 01 Mar 08:24:16.270 * Running mode=standalone, port=6379.
 1:M 01 Mar 08:24:16.271 # WARNING: The TCP backlog setting of 511 cannot be enforced because /proc/sys/net/core/somaxconn is set to the lower value of 128.
 1:M 01 Mar 08:24:16.271 # Server initialized
 1:M 01 Mar 08:24:16.271 # WARNING overcommit_memory is set to 0! Background save may fail under low memory condition. To fix this issue add 'vm.overcommit_memory = 1' to /etc/sysctl.conf and then reboot or run the command 'sysctl vm.overcommit_memory=1' for this to take effect.


Настройки разрешений для redis и исполнение из-под redis пользователя
CMD ["chown", "redis:redis", "-R", "/etc"]
CMD ["chown", "redis:redis", "-R", "/var/lib"]
CMD ["chown", "redis:redis", "-R", "/run"]

CMD ["sudo", "chmod", "644", "/data/dump.rdb" ]
CMD ["sudo", "chmod", "755", "/etc" ]
CMD ["sudo", "chmod", "770", "/var/lib" ]
CMD ["sudo", "chmod", "770", "/run" ]


Кто сталкивался с подобным? Какие причины могут быть?
  • Вопрос задан
  • 7032 просмотра
Подписаться 4 Средний Комментировать
Пригласить эксперта
Ваш ответ на вопрос

Войдите, чтобы написать ответ

Похожие вопросы