Skip to content

Latest commit

 

History

History
357 lines (257 loc) · 12 KB

Docker原理剖析--网络实现.md

File metadata and controls

357 lines (257 loc) · 12 KB

Docker 网络模式

Bridge

桥接模式,docker run 默认模式,此模式会为容器分配network namespace、设置IP等,并将容器网络桥接到一个虚拟网桥docker0上,可以和同一宿主机上桥接模式的其他容器进行通信

Host

主机模式,容器与宿主机共用一个network namespace,也是说跟宿主机共用网卡,需要注意容器中服务的端口号不能与host上已经使用的端口号冲突

Container

该网络模式是Docker中一种较为特别的网络的模式。处于这个模式下的Docker容器会共享其他容器的network namespace,因此,在该network namespace下的容器不存网络隔离

None

该模式下不为容器创造任何的网络环境,容器内部就只能使用loopback网络设备,不会再有其他的网络资源.


接下来我们将一步步动手实现Docker``Bridge模式下的原理


Linux VethBridge

前言

Docker的网络以及Flannel的网络实现都涉及到VethBridge使用. 在宿主机上创建一个Bridge,每到一个容器创建,就会创建一对互通Veth (Bridge-veth <--> Container-veth) 一端连接到主机的Bridge(docker0),另一端连接到容器的Network namespace 可以通过sudo brctl show查看Bridge连接的veth

说明

VETH (virtual Ethernet)

Linux Kernel支持的一种虚拟网络设备,表示一对虚拟的网络接口 Veth的两端可以处于不同的Network namespace,可以作为主机和容器之间的网络通信 发送到Veth一端的请求会从另一端的Veth发出

Bridge

BridgeLinux 上用来做 TCP/IP 二层协议交换的设备,相当于交换机 可以将其他网络设备挂在 Bridge 上面 当有数据到达时,Bridge会根据报文中的MAC信息进行广播,转发,丢弃.

网络拓扑图

                           +------------------------+
                           |                        | iptables +----------+
                           |  br01 192.168.88.1/24  |          |          |
                +----------+                        <--------->+ eth0   |
                |          +------------------+-----+          |          |
                |                             |                +----------+
           +----+---------+       +-----------+-----+
           |              |       |                 |
           | br-veth01    |       |   br-veth02     |
           +--------------+       +-----------+-----+
                |                             |
+--------+------+-----------+     +-------+---+-------------+
|        |                  |     |       |                 |
|  ns01  |   veth01         |     |  ns02 |  veth01         |
|        |                  |     |       |                 |
|        |   192.168.88.11  |     |       |  192.168.88.12  |
|        |                  |     |       |                 |
|        +------------------+     |       +-----------------+
|                           |     |                         |
|                           |     |                         |
+---------------------------+     +-------------------------+

br01是创建的Bridge,链接着两个Veth,两个Veth的另一端分别在另外两个namespaceeth0是宿主机对外的网卡,namespace对外的数据包会通过SNAT/MASQUERADE出去

部署BridgeVeth

设置Bridge

创建Bridge

sudo brctl addbr br01

启动Bridge

sudo ip link set dev br01 up
# 也可以用下面这种方式启动
sudo ifconfig br01 up 

Bridge分配IP地址

sudo ifconfig br01 192.168.88.1/24 up
创建Network namespace

创建两个namespace: ns01 ns02

sudo ip netns add ns01
sudo ip netns add ns02

## 查看创建的ns
sudo ip netns list
ns02
ns01
设置Veth pair

创建两对veth

# 创建 `VETH` 设备:`ip link add link [DEVICE NAME] type veth`
sudo ip link add veth01 type veth peer name br-veth01
sudo ip link add veth02 type veth peer name br-veth02

将其中一端的Veth(br-veth$)挂载到br01下面

# attach 设备到 Bridge:brctl addif [BRIDGE NAME] [DEVICE NAME]
sudo brctl addif br01 br-veth01
sudo brctl addif br01 br-veth02

# 查看挂载详情
sudo brctl show br01
bridge name     bridge id               STP enabled     interfaces
br01            8000.321bc3fd56fd       no              br-veth01
                                                        br-veth02

启动这两对Veth

sudo ip link set dev br-veth01 up
sudo ip link set dev br-veth02 up

将另一端的veth分配给创建好的ns

sudo ip link set veth01 netns ns01
sudo ip link set veth02 netns ns02
部署Vethns的网络

通过sudo ip netns [NS] [COMMAND]命令可以在特定的网络命名空间执行命令

查看network namespace里的网络设备:

sudo ip netns exec ns01 ip addr
1: lo: <LOOPBACK> mtu 65536 qdisc noop state DOWN group default qlen 1000
    link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
2: sit0@NONE: <NOARP> mtu 1480 qdisc noop state DOWN group default qlen 1000
    link/sit 0.0.0.0 brd 0.0.0.0
8: veth01@if7: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN group default qlen 1000
    link/ether d2:88:ec:62:cd:0a brd ff:ff:ff:ff:ff:ff link-netnsid 0

可以看到刚刚被加进来的veth01还没有IP地址 给两个network namespaceveth设置IP地址和默认路由 默认网关设置为BridgeIP

sudo ip netns exec ns01 ip link set dev veth01 up
sudo ip netns exec ns01 ifconfig veth01 192.168.88.11/24 up
sudo ip netns exec ns01 ip route add default via 192.168.88.1

sudo ip netns exec ns02 ip link set dev veth02 up
sudo ip netns exec ns02 ifconfig veth02 192.168.88.12/24 up
sudo ip netns exec ns02 ip route add default via 192.168.88.1

查看 nsveth是否分配了IP

sudo ip netns exec ns01 ifconfig veth01
sudo ip netns exec ns02 ifconfig veth02

veth02: flags=4163<UP,BROADCAST,RUNNING,MULTICAST>  mtu 1500
        inet 192.168.88.12  netmask 255.255.255.0  broadcast 192.168.88.255
        inet6 fe80::fca2:57ff:fe1c:67df  prefixlen 64  scopeid 0x20<link>
        ether fe:a2:57:1c:67:df  txqueuelen 1000  (以太网)
        RX packets 15  bytes 1146 (1.1 KB)
        RX errors 0  dropped 0  overruns 0  frame 0
        TX packets 11  bytes 866 (866.0 B)
        TX errors 0  dropped 0 overruns 0  carrier 0  collisions 0

验证ns内网络情况

ns01ping ns02,同时在默认用tcpdumpbr01 bridge上抓包

# 首先启动抓包
sudo tcpdump -i br01 -nn

tcpdump: verbose output suppressed, use -v or -vv for full protocol decode
listening on br01, link-type EN10MB (Ethernet), capture size 262144 bytes

# 然后从 ns01 ping ns02
sudo ip netns exec ns01 ping 192.168.88.12 -c 1

PING 192.168.88.12 (192.168.88.12) 56(84) bytes of data.
64 bytes from 192.168.88.12: icmp_seq=1 ttl=64 time=0.086 ms

--- 192.168.88.12 ping statistics ---
1 packets transmitted, 1 received, 0% packet loss, time 0ms
rtt min/avg/max/mdev = 0.086/0.086/0.086/0.000 ms

# 查看抓包信息
16:19:42.739429 ARP, Request who-has 192.168.88.12 tell 192.168.88.11, length 28
16:19:42.739471 ARP, Reply 192.168.88.12 is-at fe:a2:57:1c:67:df, length 28
16:19:42.739476 IP 192.168.88.11 > 192.168.88.12: ICMP echo request, id 984, seq 1, length 64
16:19:42.739489 IP 192.168.88.12 > 192.168.88.11: ICMP echo reply, id 984, seq 1, length 64
16:19:47.794415 ARP, Request who-has 192.168.88.11 tell 192.168.88.12, length 28
16:19:47.794451 ARP, Reply 192.168.88.11 is-at d2:88:ec:62:cd:0a, length 28

可以看到ARP能正确定位到MAC地址,并且reply包能正确返回到ns01中,反之在ns02ping ns01也是通的

ns01内执行arp

sudo ip netns exec ns01 arp

地址                     类型    硬件地址            标志  Mask            接口
192.168.88.12            ether   fe:a2:57:1c:67:df   C                     veth01
192.168.88.1             ether   32:1b:c3:fd:56:fd   C                     veth01

可以看到192.168.88.1MAC地址是正确的,跟ip link打印出来的是一致

ip link

6: br01: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP mode DEFAULT group default qlen 1000
    link/ether 32:1b:c3:fd:56:fd brd ff:ff:ff:ff:ff:ff

ns与外网互通

ns02 ping 外网地址(如下以114.114.114.114为例子)

sudo ip netns exec ns02 ping 114.114.114.114 -c 1

PING 114.114.114.114 (114.114.114.114) 56(84) bytes of data.

--- 114.114.114.114 ping statistics ---
1 packets transmitted, 0 received, 100% packet loss, time 0ms

发现是ping不通的,抓包查看详情

# 抓Bridge设备
tcpdump -i br01 -nn -vv host 114.114.114.114

tcpdump: listening on br01, link-type EN10MB (Ethernet), capture size 262144 bytes
17:02:59.027478 IP (tos 0x0, ttl 64, id 51092, offset 0, flags [DF], proto ICMP (1), length 84)
    192.168.88.12 > 114.114.114.114: ICMP echo request, id 1045, seq 1, length 64


# 抓出口设备
tcpdump -i eth0 -nn -vv host 114.114.114.114

发现只有br01有出口流量,而出口网卡eth0没有任何反应,说明没有开启ip_forward

# 开启 ip_forward
sudo sysctl -w net.ipv4.conf.all.forwarding=1

再次尝试抓包eth0设备

sudo tcpdump -i eth0 -nn -vv host 114.114.114.114

tcpdump: listening on eth0, link-type EN10MB (Ethernet), capture size 262144 bytes
17:11:26.517292 IP (tos 0x0, ttl 63, id 15277, offset 0, flags [DF], proto ICMP (1), length 84)
    192.168.88.12 > 114.114.114.114: ICMP echo request, id 1059, seq 1, length 64

发现只有发出去的包request没有回来replay的包,原因是因为源地址是私有地址,如果发回来的包是私有地址会被丢弃 解决方法是将发出去的包sourceIP改成gatewayIP,可以用SNAT或者MAQUERADE

SNAT: 需要搭配静态IP MAQUERADE: 可以用于动态分配的IP,但每次数据包被匹配中时,都会检查使用的IP地址

sudo iptables -t nat -A POSTROUTING -s 192.168.88.0/24 -j MASQUERADE

# 查看防火墙规则
sudo iptables -t nat -nL --line-number

Chain PREROUTING (policy ACCEPT)
num  target     prot opt source               destination         

Chain INPUT (policy ACCEPT)
num  target     prot opt source               destination         

Chain OUTPUT (policy ACCEPT)
num  target     prot opt source               destination         

Chain POSTROUTING (policy ACCEPT)
num  target     prot opt source               destination         
1    MASQUERADE  all  --  192.168.88.0/24      0.0.0.0/0

再次尝试ping 114.114.114.114

sudo ip netns exec ns02 ping 114.114.114.114 -c 1

抓包查看

sudo tcpdump -i eth0 -nn -vv host 114.114.114.114

tcpdump: listening on eth0, link-type EN10MB (Ethernet), capture size 262144 bytes
17:43:54.744599 IP (tos 0x0, ttl 63, id 46107, offset 0, flags [DF], proto ICMP (1), length 84)
    172.22.36.202 > 114.114.114.114: ICMP echo request, id 1068, seq 1, length 64
17:43:54.783749 IP (tos 0x4, ttl 71, id 62825, offset 0, flags [none], proto ICMP (1), length 84)
    114.114.114.114 > 172.22.36.202: ICMP echo reply, id 1068, seq 1, length 64

---

sudo tcpdump -i br01 -nn -vv
tcpdump: listening on br01, link-type EN10MB (Ethernet), capture size 262144 bytes17:43:54.744560 IP (tos 0x0, ttl 64, id 46107, offset 0, flags [DF], proto ICMP (1), length 84)
    192.168.88.12 > 114.114.114.114: ICMP echo request, id 1068, seq 1, length 64
17:43:54.783805 IP (tos 0x4, ttl 70, id 62825, offset 0, flags [none], proto ICMP (1), length 84)
    114.114.114.114 > 192.168.88.12: ICMP echo reply, id 1068, seq 1, length 64

可以看到从eth0出去的数据包的sourceIP已经变成网卡IP了 而br01收到的包的sourceIP还是ns02192.168.88.12


清理

sudo ip netns del ns01
sudo ip netns del ns02
sudo ifconfig br01 down
sudo brctl delbr br01
sudo iptables -t nat -D POSTROUTING -s 192.168.88.0/24 -j MASQUERADE