Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Containers of a docker stack not coming up after host reboot #2188

Open
naanuswaroop opened this issue Jun 18, 2018 · 9 comments
Open

Containers of a docker stack not coming up after host reboot #2188

naanuswaroop opened this issue Jun 18, 2018 · 9 comments

Comments

@naanuswaroop
Copy link

naanuswaroop commented Jun 18, 2018

We need to deploy a docker stack in a CentOS VM. We have a docker compose file to launch a stack with two services and one container in each of the services. One of these containers connects to two external networks.

The docker-compose.yml looks like this:

version: '3'
services:
  GoOn_db:
    image: postgres
  GoOn_web:
    image: sshweb_5:new
    command: bash start.sh
    volumes:
      - .:/code
    ports:
      - "8000:8000"
      - "8022:22"
    networks:
      - external_oam_network
      - external_data_network
    depends_on:
      - GoOn_db
networks:
  external_oam_network:
    external:
      name: goon__oam
  external_data_network:
    external:
      name: goon__data

The external networks are swarm scoped macvlan networks created using below commands:

        docker network create --config-only --subnet 172.28.128.0/24 --gateway 172.28.128.1 -o parent=goon_data.1000 --ip-range 172.28.128.32/27 __goon__data
	docker network create -d macvlan --scope swarm --config-from __goon__data goon__data
	
	docker network create --config-only --subnet 172.28.232.0/24 --gateway 172.28.232.1 -o parent=goon_oam --ip-range 172.28.232.32/27 __goon__oam
	docker network create -d macvlan --scope swarm --config-from __goon__oam goon__oam

The docker stack is created using below command:
docker stack deploy --compose-file docker-compose.yml app

The Issue:

With the above configuration, docker stack comes up perfectly the first time. But, if the hosting VM goes for a reboot [or crashes and comes up again], the container that is connected to the external networks (GoOn_web service), fails to come up. Following are the errors seen in journallogs.

Jun 13 15:00:14 localhost.localdomain dockerd[21817]: time="2018-06-13T15:00:14.393543091+05:30" level=error msg="fatal task error" error="network dm-g3ovik5qx6br is already using parent interface goon__data" module=node/agent/taskmanager node.id=enqfccpf6sn28l01f6i6grq6h service.id=6w743aksizz5b6p7u3xgqpet8 task.id=wlxpngi6faw571nkgm4f9p0c9
Jun 13 15:00:14 localhost.localdomain dockerd[21817]: time="2018-06-13T15:00:14.824590521+05:30" level=warning msg="failed to deactivate service binding for container app_GoOn_web.1.y7l2c88wrhfq54f5af4d0qio7" error="No such container: app_GoOn_web.1.y7l2c88wrhfq54f5af4d0qio7" module=node/agent node.id=enqfccpf6sn28l01f6i6grq6h
Jun 13 15:00:16 localhost.localdomain dockerd[21817]: time="2018-06-13T15:00:16.827271962+05:30" level=error msg="network goon__data remove failed: network goon__data not found" module=node/agent node.id=enqfccpf6sn28l01f6i6grq6h
Jun 13 15:00:16 localhost.localdomain dockerd[21817]: time="2018-06-13T15:00:16.827406882+05:30" level=error msg="remove task failed" error="network goon__data not found" module=node/agent node.id=enqfccpf6sn28l01f6i6grq6h task.id=y7l2c88wrhfq54f5af4d0qio7

The other issue observed is that there is no way to clean up the network along with its config completely. The following commands were tried:

[localhost config_drive]# docker stack rm app
Removing service app_GoOn_db
Removing service app_GoOn_web
Removing network app_default
[localhost config_drive]# docker network rm goon__data
goon__data
[localhost config_drive]# docker network rm __goon__data
Error response from daemon: configuration network "__goon__data" is in use

It seems like the network cleanup has some issue as well.
Please let us know if there are any workarounds for this issue or if our configuration needs some tweaking.
Possibly related issues found in github:
#1743
moby/moby#23302

The following are the docker command outputs [after hosting VM reboot]:

[localhost config_drive]# docker version
Client:
 Version:       17.12.0-ce
 API version:   1.35
 Go version:    go1.9.2
 Git commit:    c97c6d6
 Built: Wed Dec 27 20:10:14 2017
 OS/Arch:       linux/amd64

Server:
 Engine:
  Version:      17.12.0-ce
  API version:  1.35 (minimum version 1.12)
  Go version:   go1.9.2
  Git commit:   c97c6d6
  Built:        Wed Dec 27 20:12:46 2017
  OS/Arch:      linux/amd64
  Experimental: false


[root@localhost config_drive]# docker network ls
NETWORK ID          NAME                DRIVER              SCOPE
1456a3ec482a        __goon__data        null                local
c637316e8a95        __goon__oam         null                local
395a87391443        bridge              bridge              local
67f95713ee03        docker_gwbridge     bridge              local
ut4qii3qdzrs        goon__data          macvlan             swarm
88zfsq41n7xo        goon__oam           macvlan             swarm
803609448d35        host                host                local
wrypoj5x9fxx        ingress             overlay             swarm
095dc2ca9729        none                null                local

[localhost config_drive]# docker stack ls
NAME                SERVICES
app                 2

[localhost config_drive]# docker service ls
ID                  NAME                MODE                REPLICAS            IMAGE               PORTS
c4kpzmc26qgk        app_GoOn_db         replicated          1/1                 postgres:latest
6w743aksizz5        app_GoOn_web        replicated          0/1                 sshweb_5:new        *:8000->8000/tcp,*:8022->22/tcp

[localhost config_drive]# docker ps --all
CONTAINER ID        IMAGE               COMMAND                  CREATED             STATUS              PORTS               NAMES
ce820c3596f0        postgres:latest     "docker-entrypoint.s…"   9 minutes ago       Up 9 minutes        5432/tcp            app_GoOn_db.1.o000vpvfer0moz9b1xnh05wp5
@sprabhuonline
Copy link

what is this image? where did you taken from?
GoOn_web:
image: sshweb_5:new
command: bash start.sh

@naanuswaroop
Copy link
Author

Update:
The workaround seems to be deleting of "/var/lib/docker/network/files/local-kv.db" and restart docker. This is mentioned in moby/moby#17669 . We are in the process of testing this in CI. Please let us know if there are any side-effects of this work-around.

@dobe
Copy link

dobe commented Aug 28, 2018

i encountered the same issue on Server Version: 18.03.1-ce today after a power failure.
we had the issue before on other swarm clusters, is there a plan to fix this?

@naanuswaroop
Copy link
Author

naanuswaroop commented Aug 31, 2018

Update:

The above mentioned workaround of deleting the "/var/lib/docker/network/files/local-kv.db" file and restarting docker has been tested for more than a fortnight now and with this, we are not facing the original issue anymore. So, anyone facing this issue can probably use this workaround. This probably is a genuine issue in docker and needs to be fixed.

@ozlevka-work
Copy link

+1 I also have suffered from this problem
docker version 18.03.1.

@teadur
Copy link

teadur commented Jun 9, 2019

+1 on 18.09.6

@promzeus
Copy link

+2 on 18.09.6

@Drewster727
Copy link

Drewster727 commented Sep 17, 2019

+1 on 19.03.2 with windows containers (windows server 1903 host)
Made a post in the moby repo here about my particular issue that seems very relevant to this: moby/moby#39955

@DanOrsborne
Copy link

+1 on 19.03.2 with windows containers (windows server 1903 host) too

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

8 participants