问题描述
今天在启动docker容器的时候发现一段时间后宿主机上所有的容器的根目录全部变成了只读,并且宿主机message日志报磁盘相关的错
容器内mount结果如下
[root@zk-1 ~]# mount
/dev/mapper/docker-253:0-4298664622-7830c39693a73c13e80cf2a22a46558b22adcc5adf94ca1c893f44ece878601e on / type ext4 (ro,relatime,stripe=16,data=ordered)
proc on /proc type proc (rw,nosuid,nodev,noexec,relatime)
tmpfs on /dev type tmpfs (rw,nosuid,mode=755)
devpts on /dev/pts type devpts (rw,nosuid,noexec,relatime,gid=5,mode=620,ptmxmode=666)
shm on /dev/shm type tmpfs (rw,nosuid,nodev,noexec,relatime,size=65536k)
mqueue on /dev/mqueue type mqueue (rw,nosuid,nodev,noexec,relatime)
sysfs on /sys type sysfs (rw,nosuid,nodev,noexec,relatime)
tmpfs on /sys/fs/cgroup type tmpfs (rw,nosuid,nodev,noexec,relatime)
cgroup on /sys/fs/cgroup/systemd type cgroup (rw,nosuid,nodev,noexec,relatime,xattr,release_agent=/usr/lib/systemd/systemd-cgroups-agent,name=systemd)
cgroup on /sys/fs/cgroup/cpuset type cgroup (rw,nosuid,nodev,noexec,relatime,cpuset)
cgroup on /sys/fs/cgroup/cpu,cpuacct type cgroup (rw,nosuid,nodev,noexec,relatime,cpu,cpuacct)
cgroup on /sys/fs/cgroup/blkio type cgroup (rw,nosuid,nodev,noexec,relatime,blkio)
cgroup on /sys/fs/cgroup/memory type cgroup (rw,nosuid,nodev,noexec,relatime,memory)
cgroup on /sys/fs/cgroup/devices type cgroup (rw,nosuid,nodev,noexec,relatime,devices)
cgroup on /sys/fs/cgroup/freezer type cgroup (rw,nosuid,nodev,noexec,relatime,freezer)
cgroup on /sys/fs/cgroup/net_cls,net_prio type cgroup (rw,nosuid,nodev,noexec,relatime,net_cls,net_prio)
cgroup on /sys/fs/cgroup/perf_event type cgroup (rw,nosuid,nodev,noexec,relatime,perf_event)
cgroup on /sys/fs/cgroup/hugetlb type cgroup (rw,nosuid,nodev,noexec,relatime,hugetlb)
/dev/mapper/centos-root on /nfsc type xfs (rw,relatime,attr2,inode64,noquota)
tmpfs on /run/secrets type tmpfs (rw,nosuid,nodev,noexec,relatime)
/dev/mapper/centos-root on /etc/resolv.conf type xfs (rw,relatime,attr2,inode64,noquota)
/dev/mapper/centos-root on /etc/hostname type xfs (rw,relatime,attr2,inode64,noquota)
/dev/mapper/centos-root on /etc/hosts type xfs (rw,relatime,attr2,inode64,noquota)
宿主机报错如下
Jan 13 15:06:01 docker2 systemd: Starting Session 6 of user root.
Jan 13 15:06:01 docker2 systemd: Started Session 6 of user root.
Jan 13 15:06:31 docker2 systemd: Starting Session 7 of user root.
Jan 13 15:06:31 docker2 systemd-logind: New session 7 of user root.
Jan 13 15:06:31 docker2 systemd: Started Session 7 of user root.
Jan 13 15:07:01 docker2 systemd: Starting Session 8 of user root.
Jan 13 15:07:01 docker2 systemd: Started Session 8 of user root.
Jan 13 15:07:38 docker2 kernel: device-mapper: thin: 253:3: reached low water mark for data device: sending event.
Jan 13 15:07:44 docker2 kernel: device-mapper: thin: 253:3: switching pool to out-of-data-space (queue IO) mode
Jan 13 15:08:01 docker2 systemd: Starting Session 9 of user root.
Jan 13 15:08:01 docker2 systemd: Started Session 9 of user root.
Jan 13 15:08:44 docker2 kernel: device-mapper: thin: 253:3: switching pool to out-of-data-space (error IO) mode
Jan 13 15:08:44 docker2 kernel: EXT4-fs warning (device dm-5): ext4_end_bio:332: I/O error -28 writing to inode 1320256 (offset 13077839872 size 8388608 starting block 9269408)
Jan 13 15:08:44 docker2 kernel: Buffer I/O error on device dm-5, logical block 9269408
Jan 13 15:08:44 docker2 kernel: Aborting journal on device dm-4-8.
Jan 13 15:08:44 docker2 kernel: Buffer I/O error on device dm-5, logical block 9269409
Jan 13 15:08:44 docker2 kernel: Buffer I/O error on device dm-5, logical block 9269410
Jan 13 15:08:44 docker2 kernel: EXT4-fs error (device dm-4): ext4_journal_check_start:56: Detected aborted journal
Jan 13 15:08:44 docker2 kernel: EXT4-fs (dm-4): Remounting filesystem read-only
Jan 13 15:08:44 docker2 kernel: Buffer I/O error on device dm-5, logical block 9269411
Jan 13 15:08:44 docker2 kernel: Buffer I/O error on device dm-5, logical block 9269412
Jan 13 15:08:44 docker2 kernel: Buffer I/O error on device dm-5, logical block 9269413
Jan 13 15:08:44 docker2 kernel: Buffer I/O error on device dm-5, logical block 9269414
Jan 13 15:08:44 docker2 kernel: Buffer I/O error on device dm-5, logical block 9269415
Jan 13 15:08:44 docker2 kernel: Buffer I/O error on device dm-5, logical block 9269416
Jan 13 15:08:44 docker2 kernel: Buffer I/O error on device dm-5, logical block 9269417
Jan 13 15:08:44 docker2 kernel: EXT4-fs warning (device dm-5): ext4_end_bio:332: I/O error -28 writing to inode 1320256 (offset 13077839872 size 8388608 starting block 9269424)
Jan 13 15:08:44 docker2 kernel: EXT4-fs warning (device dm-5): ext4_end_bio:332: I/O error -28 writing to inode 1320256 (offset 13077839872 size 8388608 starting block 9269440)
Jan 13 15:08:44 docker2 kernel: EXT4-fs warning (device dm-5): ext4_end_bio:332: I/O error -28 writing to inode 1320256 (offset 13077839872 size 8388608 starting block 9269456)
Jan 13 15:08:44 docker2 kernel: EXT4-fs warning (device dm-5): ext4_end_bio:332: I/O error -28 writing to inode 1320256 (offset 13077839872 size 8388608 starting block 9269472)
Jan 13 15:08:44 docker2 kernel: EXT4-fs warning (device dm-5): ext4_end_bio:332: I/O error -28 writing to inode 1320256 (offset 13077839872 size 8388608 starting block 9269488)
Jan 13 15:08:44 docker2 kernel: EXT4-fs warning (device dm-5): ext4_end_bio:332: I/O error -28 writing to inode 1320256 (offset 13077839872 size 8388608 starting block 9269504)
Jan 13 15:08:44 docker2 kernel: EXT4-fs warning (device dm-5): ext4_end_bio:332: I/O error -28 writing to inode 1320256 (offset 13077839872 size 8388608 starting block 9269520)
Jan 13 15:08:44 docker2 kernel: EXT4-fs warning (device dm-5): ext4_end_bio:332: I/O error -28 writing to inode 1320256 (offset 13077839872 size 8388608 starting block 9269536)
Jan 13 15:08:44 docker2 kernel: EXT4-fs warning (device dm-5): ext4_end_bio:332: I/O error -28 writing to inode 1320256 (offset 13077839872 size 8388608 starting block 9269552)
Jan 13 15:08:44 docker2 kernel: Aborting journal on device dm-5-8.
Jan 13 15:08:44 docker2 kernel: EXT4-fs error (device dm-5) in ext4_da_write_end:2782: IO failure
Jan 13 15:08:44 docker2 kernel: EXT4-fs error (device dm-5): ext4_journal_check_start:56: Detected aborted journal
Jan 13 15:08:44 docker2 kernel: EXT4-fs (dm-5): Remounting filesystem read-only
Jan 13 15:08:44 docker2 kernel: EXT4-fs error (device dm-5) in ext4_do_update_inode:4504: Journal has aborted
Jan 13 15:08:44 docker2 kernel: EXT4-fs error (device dm-5): mpage_map_and_submit_extent:2229: comm kworker/u98:3: Failed to mark inode 1320256 dirty
Jan 13 15:08:44 docker2 kernel: EXT4-fs error (device dm-5) in ext4_writepages:2520: IO failure
第一反映是查看磁盘空间使用情况
[root@zk-1 ~]# df -Th
Filesystem Type Size Used Avail Use% Mounted on
/dev/mapper/docker-253:0-4298664622-7830c39693a73c13e80cf2a22a46558b22adcc5adf94ca1c893f44ece878601e
ext4 99G 49G 46G 52% /
tmpfs tmpfs 126G 0 126G 0% /dev
shm tmpfs 64M 0 64M 0% /dev/shm
tmpfs tmpfs 126G 0 126G 0% /sys/fs/cgroup
/dev/mapper/centos-root
xfs 1.7T 113G 1.6T 7% /nfsc
tmpfs tmpfs 126G 0 126G 0% /run/secrets
/dev/mapper/centos-root
xfs 1.7T 113G 1.6T 7% /etc/resolv.conf
/dev/mapper/centos-root
xfs 1.7T 113G 1.6T 7% /etc/hostname
/dev/mapper/centos-root
xfs 1.7T 113G 1.6T 7% /etc/hosts
根目录下还剩余46G未使用;感觉甚是怪异,于是网上搜索很多资料终于有了相关解释
参考网络上的一片文章:http://jpetazzo.github.io/2014/01/29/docker-device-mapper-resize/
具体原因
docker服务启动的时候默认会创建一个107.4G的data文件,而后启动的容器的所有更改内容全部存储至这个data文件中;也就是说当容器内产生的相关data数据超过107.4G后容器就再也没有多余的空间可用,从而导致所有容器的根目录变为只读!
宿主机的docker info信息如下
[root@docker2 ~]# docker info
Containers: 169
Images: 1672
Storage Driver: devicemapper
Pool Name: docker-253:0-4298664622-pool
Pool Blocksize: 65.54 kB
Backing Filesystem: xfs
Data file: /dev/loop0
Metadata file: /dev/loop1
Data Space Used: 107.4 GB
Data Space Total: 107.4 GB
Data Space Available: 0 B
Metadata Space Used: 137.4 MB
Metadata Space Total: 2.147 GB
Metadata Space Available: 2.01 GB
Udev Sync Supported: true
Deferred Removal Enabled: false
Data loop file: /var/lib/docker/devicemapper/devicemapper/data
Metadata loop file: /var/lib/docker/devicemapper/devicemapper/metadata
Library Version: 1.02.107-RHEL7 (2015-10-14)
Execution Driver: native-0.2
Logging Driver: json-file
Kernel Version: 4.2.1-1.el7.elrepo.x86_64
Operating System: CentOS Linux 7 (Core)
CPUs: 40
Total Memory: 251.9 GiB
Name: docker2.stg.1qianbao.com
ID: JMZF:IQ6H:RDBK:XNSN:W3IO:ZAQH:RRFB:XRIT:4I72:KOKD:R34K:FD5L
WARNING: bridge-nf-call-iptables is disabled
WARNING: bridge-nf-call-ip6tables is disabled
由于我的容器比较多(169个jboss应用),故直接导致整个环境不可用。
解决方案
停止docker服务
service docker stop
删除/var/lib/docker下面的所有文件(删除后你的镜像和容器都没有了,建议将有用的镜像先备份或者上传至仓储里面)
rm -rf /var/lib/docker/*
-
使用更大的文件或磁盘或逻辑卷创建
/var/lib/docker/devicemapper/devicemapper/data
文件使用文件:
dd if=/dev/zero of=/var/lib/docker/devicemapper/devicemapper/data bs=1G count=0 seek=1000
这样将会创建一个虚拟的1000G大小的data文件,如果不加seek参数count直接为1000的话则是创建了一个结结实实的1000G的文件
使用磁盘:
ln -s /dev/sdb /var/lib/docker/devicemapper/devicemapper/data
使用逻辑卷:
ln -s /dev/mapper/centos-dockerdata /var/lib/docker/devicemapper/devicemapper/data
我用的是第一种使用文件的方法创建了一个1.6T的虚拟文件
mkdir -p /var/lib/docker/devicemapper/devicemapper/
dd if=/dev/zero of=/var/lib/docker/devicemapper/devicemapper/data bs=1G count=0 seek=1600
创建完成后启动docker服务
service docker start
这时再看下docker info中的data池
[root@docker2 ~]# docker info
Containers: 169
Images: 1701
Storage Driver: devicemapper
Pool Name: docker-253:0-2355438-pool
Pool Blocksize: 65.54 kB
Backing Filesystem: xfs
Data file: /dev/loop0
Metadata file: /dev/loop1
Data Space Used: 90.9 GB
Data Space Total: 1.611 TB
Data Space Available: 1.52 TB
Metadata Space Used: 147.5 MB
Metadata Space Total: 2.147 GB
Metadata Space Available: 2 GB
Udev Sync Supported: true
Deferred Removal Enabled: false
Data loop file: /var/lib/docker/devicemapper/devicemapper/data
Metadata loop file: /var/lib/docker/devicemapper/devicemapper/metadata
Library Version: 1.02.107-RHEL7 (2015-10-14)
Execution Driver: native-0.2
Logging Driver: json-file
Kernel Version: 4.2.1-1.el7.elrepo.x86_64
Operating System: CentOS Linux 7 (Core)
CPUs: 40
Total Memory: 251.9 GiB
Name: docker2.stg.1qianbao.com
ID: JMZF:IQ6H:RDBK:XNSN:W3IO:ZAQH:RRFB:XRIT:4I72:KOKD:R34K:FD5L
WARNING: bridge-nf-call-iptables is disabled
WARNING: bridge-nf-call-ip6tables is disabled
此后,你的data文件有多大就决定了你的宿主机上所有容器可用的空间的大小!
另一方面:也可以通过docker启动参数的--storage-opt
选项来限制每个容器初始化的磁盘大小,如-storage-opt dm.basesize=80G
这样每个容器启动后起根目录的总空间就是80G
[root@zk-1 ~]# df -Th
Filesystem Type Size Used Avail Use% Mounted on
/dev/mapper/docker-253:0-27661746-8b7f953fb4759982ad82235c27e39dfe7190b55180d63cbcf3aa2fdc6569d43a
ext4 79G 785M 74G 2% /
tmpfs tmpfs 126G 0 126G 0% /dev
shm tmpfs 64M 0 64M 0% /dev/shm
tmpfs tmpfs 126G 0 126G 0% /sys/fs/cgroup
tmpfs tmpfs 126G 0 126G 0% /run/secrets
/dev/mapper/centos-root
xfs 1.5T 338G 1.2T 23% /wls/wls81/zookeeper.out
/dev/mapper/centos-root
xfs 1.5T 338G 1.2T 23% /etc/resolv.conf
/dev/mapper/centos-root
xfs 1.5T 338G 1.2T 23% /etc/hostname
/dev/mapper/centos-root
xfs 1.5T 338G 1.2T 23% /etc/hosts