Troubleshooting Docker
上QQ阅读APP看书,第一时间看更新

Image testing and debugging

While we can applaud the benefits of containers, troubleshooting and effectively monitoring them currently present some complexity. Since by design, containers run in isolation, their resulting environment can be cloudy. Effective troubleshooting has generally required shell entry into the container itself, coupled with the complications of installing additional Linux tools to merely peruse information that is twice as hard to investigate.

Typically, available tools, methods, and approaches for meaningful troubleshooting of our containers and images has required installing additional packages in every container. This results in the following:

  • Requirements for connecting or attaching directly to the container, which is not always a piddling matter
  • Limitations on inspection of a single container at a time

Compounding these difficulties, adding unnecessary bloat to our containers with these tools is something we originally attempted to avoid in our planning; minimalism is one of the advantages we looked for in using containers in the first place. Let's take a look then at how we can reasonably glean useful information on our container images with some basic commands, as well as investigate emergent applications that allow us to monitor and troubleshoot containers from the outside.

Docker details for troubleshooting

Now that you have your image (regardless of building method) with Docker running, let's do some testing to make sure that all is copacetic with our build. While these may seem routine and mundane, it is a good practice to run any or all of the following as a top-down approach to troubleshooting.

The first two commands here are ridiculously simple and seemingly too generic, but will provide base-level detail with which to begin any downstream troubleshooting efforts--$ docker version and $ docker info.

Docker version

Let's ensure that we firstly recognize what version of Docker, Go, and Git we are running:

$ sudo docker version

Docker info

Additionally, we should understand our host operating system and kernel version, as well as storage, execution, and logging drivers. Knowing these things can help us troubleshoot from our top-down perspective:

$ sudo docker info

A troubleshooting note for Debian/Ubuntu

From a $ sudo docker info command, you may receive one or both of the following warnings:

WARNING: No memory limit support WARNING: No swap limit support

You will need to add the following command-line parameters to the kernel in order to enable memory and swap accounting:

cgroup_enable=memory swapaccount=1

For these Debian or Ubuntu systems, if you use the default GRUB bootloader, those parameters can be added by editing /etc/default/grub and extending GRUB_CMDLINE_LINUX. Locate the following line:

GRUB_CMDLINE_LINUX="" 

Then, replace it with the following one:

GRUB_CMDLINE_LINUX="cgroup_enable=memory swapaccount=1" 

Then, run update-grub and reboot the host machine.

Listing installed Docker images

We also need to ensure that the container instance has actually installed your image locally. SSH into the docker host and execute the docker images command. You should see your docker image listed, as follows:

$ sudo docker images

What if my image does not appear? Check the agent logs and make sure that your container instance is able to contact your docker registry by curling the registry and printing out the available tags:

curl [need to add in path to registry!]
Note

What $ sudo docker images tells us: Our container image was successfully installed on the host.

Manually crank your Docker image

Now that we know our image is installed on the host, we need to know whether it is accessible to the Docker daemon. An easy way to test to make certain your image can be run on the container instance is by attempting to run your image from the command line. There is an added benefit here: we will now have the opportunity to additionally inspect application logs for further troubleshooting.

Let's take a look at the following example:

$ sudo docker run -it [need to add in path to registry/latest bin!]
Note

What $ sudo docker run <imagename> tells us: Our container image is accessible from the docker daemon and also provides accessible output logs for further troubleshooting.

What if my image does not run? Check for any running containers. If the intended container isn't running on the host, there may be issues preventing it from starting:

$ sudo docker ps

When a container fails to start, it does not log anything. Output of logs for container start processes are located in /var/log/containers on the host. Here, you will find files following the naming convention of <service>_start_errors.log. Within these logs, you will find any output generated by our RUN command, and are a recommended starting point in troubleshooting as to why your container failed to start.

Tip

TIP: Logspout (https://github.com/gliderlabs/logspout) is a log router for Docker containers that runs inside Docker. Logsprout attaches to all containers on a host, then routes their logs wherever you desire.

While we can also peruse the /var/log/messages output in our attempts to troubleshoot, there are a few other avenues we can persue, albeit a little more labor intensive.

Examining the filesystem state from cache

As we've discussed, after each successful RUN command in our Dockerfiles, Docker caches the entire filesytem state. We can exploit this cache to examine the latest state prior to the failed RUN command.

To accomplish the task:

  • Access the Dockerfile and comment out the failing RUN command, in addition to any and subsequent RUN commands
  • Re-save the Dockerfile
  • Re-execute $ sudo docker build and $ sudo docker run

Image layer IDs as debug containers

Every time Docker successfully executes a RUN command from a Dockerfile, a new layer in the image filesystem is committed. Conveniently, you can use those layers IDs as images to start a new container.

Consider the following Dockerfile as an example:

FROM centos 
RUN echo 'trouble' > /tmp/trouble.txt 
RUN echo 'shoot' >> /tmp/shoot.txt 

If we then build from this Dockerfile:

$ docker build -force-rm -t so26220957 .

We would get output similar to the following:

Sending build context to Docker daemon 3.584 kB Sending build context to Docker daemon Step 0 : FROM ubuntu ---> b750fe79269d Step 1 : RUN echo 'trouble' > /tmp/trouble.txt ---> Running in d37d756f6e55 ---> de1d48805de2 Removing intermediate container d37d756f6e55 Step 2 : RUN echo 'bar' >> /tmp/shoot.txt Removing intermediate container a180fdacd268 Successfully built 40fd00ee38e1

We can then use the preceding image layer IDs to start new containers from b750fe79269d, de1d48805de2, and 40fd00ee38e1:

$ docker run -rm b750fe79269d cat /tmp/trouble.txt cat: /tmp/trouble.txt No such file or directory $ docker run -rm de1d48805de2 cat /tmp/trouble.txt trouble $ docker run -rm 40fd00ee38e1 cat /tmp/trouble.txt trouble shoot
Note

We employ --rm to remove all the debug containers since there is no reason to have them around postruns.

What happens if my container build fails? Since no image is created on a failed build, we'd have no hash of the container with which to ID. Instead, we can note the ID of the preceding layer and run a container with a shell of that ID:

$ sudo docker run --rm -it <id_last_working_layer> bash -il

Once inside the container, execute the failing command in attempt to reproduce the issue, fix the command and test, and finally update the Dockerfile with the fixed command.

You may also want to start a shell and explore the filesystem, try out commands, and others:

$ docker run -rm -it de1d48805de2 bash -il root@ecd3ab97cad4:/# ls -l /tmp total 4 -rw-r-r-- 1 root root 4 Jul 3 12:14 trouble.txt root@ecd3ab97cad4:/# cat /tmp/trouble.txt trouble root@ecd3ab97cad4:/#

Additional example

One final example is to comment out of the following Dockerfile, including the offending line. We are then able to run the container and docker commands manually and look into the logs in the normal way. In this example Dockerfile:

RUN trouble 
RUN shoot 
RUN debug 

Also, the failure is at shoot, then comment out as follows:

RUN trouble 
# RUN shoot 
# RUN debug 

Then, build and run:

$ docker build -t trouble . $ docker run -it trouble bash container# shoot ...grep logs...

Checking failed container processes

Even if your container successfully runs from the command line, it would prove beneficial to inspect for any failed container processes, for containers that are no longer running, and checking our container configuration.

Run the following command to check for failed or no-longer running containers and note the CONTAINER ID to inspect a given container's configuration:

$ sudo docker ps -a

Note the STATUS of the containers. Should any of your containers, STATUS show exit codes other than 0, there could be issues with the container's configuration. By way of an example, a bad command would result in an exit code of 127. With this information, you can troubleshoot the task definition CMD field to debug.

Although somewhat limited, we can further inspect a container for additional troubleshooting details:

$ sudo docker inspect <containerId>

Finally, let's also analyze the container's application logs. Error messages for container start failures are output here:

$ sudo docker logs <containerId>

Other potentially useful resources

$ sudo docker top gives us a list of processes running inside a container.

$ sudo docker htop can be utilized when you need a little more detail than provided by top in a convenient, cursor-controlled inferface. htop starts faster than top, you can scroll the list vertically and horizontally to see all processes and complete command lines, and you do not need to type the process number to kill a process or the priority value to recieve a process.

By the time this book goes to print, it is likely that the mechanisms for troubleshooting containers and images will have dramatically improved. Much focus is being given by the Docker community toward baked-in reporting and monitoring solutions, in addition to market forces that will certainly bring additional options to bear.

Using sysdig to debug

As with any newer technology, some of the initial complexities inherent with them are debugged in time, and newer tools and applications are developed to enhance their use. As we've discussed, containers certainly fit into this category at this time. While we have witnessed improvements in availability of official, standardized images within the Docker Registry, we are also now seeing emergent tools that help us to effectively manage, monitor, and troubleshoot our containers.

Sysdig provides application monitoring for containers [Image Copyright © 2014 Draios, Inc.]

Sysdig (http://www.sysdig.org/ ) is one such tool. As an au courant application for system-level exploration and troubleshooting visibility into containerized environments, the beauty of sysdig is that we are able to access container data from the outside (even though sysdig can actually also be installed inside a container). From a top level, what sysdig brings to our container management is this:

  • Ability to access and review processes (inclusive of internal and external PIDs) in each container
  • Ability to drill-down into specific containers
  • Ability to easily filter sets of containers for process review and analysis

Sysdig provides data on CPU usage, I/O, logs, networking, performance, security, and system state. To repeat, this is all accomplishable from the outside, without a need to install anything into our containers.

We will make continued and valuable use of sysdig going forward in this book to monitor and troubleshoot specific processes related to our containers, but for now we will provide just a few examples toward troubleshooting our basic container processes and logs.

Let's dig into sysdig by getting it installed on our host to show off what it can do for us and our containers!

Single step installation

Installation of sysdig can be accomplished in a single step by executing the following command as root or with sudo:

curl -s https://s3.amazonaws.com/download.draios.com/stable/install-sysdig | sudo bash
Note

NOTE: sysdig is currently included natively in the latest Debian and Ubuntu versions; however, it is recommended to update/run installation for the latest packages.

Advanced installation

According to the sysdig wiki, the advanced installation method may be useful for scripted deployments or containerized environments. It is also easy; the advanced installation method is enlisted for RHEL and Debian systems.

What are chisels?

To get started with sysdig, we should understand some of its parlance, specifically chisels. In sysdig, chisels are little scripts (written in Lua) that analyze the sysdig event stream to perform useful actions. Events are efficiently brought to user level, enriched with context, and then scripts can be applied to them. Chisels work well on live systems, but can also be used with trace files for offline analysis. You can run as many chisels as you'd like, all at the same time. For example:

topcontainers_error chisel will show us the top containers by number of errors.

For a list of sysdig chisels:

$ sysdig -cl (use the -i flag to get detailed information about a specific chisel)

Single container processes analysis

Using the example of a topprocs_cpu chisel, we can apply a filter:

$ sudo sysdig -pc -c topprocs_cpu container.name=zany_torvalds

These are the example results:

CPU% Process container.name ------------------------------------------ 02.49% bash zany_torvalds 37.06% curl zany_torvalds 0.82% sleep zany_torvalds

Unlike using $ sudo docker top (and similar), we can determine exactly which containers we want to see processes for; for example, the following example shows us processes from only the wordpress containers:

$ sudo sysdig -pc -c topprocs_cpu container.name contains wordpress CPU% Process container.name -------------------------------------------------- 5.38% apache2 wordpress3 4.37% apache2 wordpress2 6.89% apache2 wordpress4 7.96% apache2 wordpress1

Other Useful Sysdig Chisels & Syntax

  • topprocs_cpu shows top processes by CPU usage
  • topcontainers_file shows top containers by R+W disk bytes
  • topcontainers_net shows top containers by network I/O
  • lscontainers will list the running containers
  • $ sudo sysdig -pc -cspy_logs analyzes all logs per screen
  • $ sudo sysdig -pc -cspy_logs container.name=zany_torvalds prints logs for the container zany_torvalds

Troubleshooting - an open community awaits you

In general, most issues you may face have likely been experienced by others, somewhere and sometime before. The Docker and open source communities, IRC channels and various search engines, can provide resulting information that is highly accessible and likely to provide you with answers to situations, and conditions, that perplex. Make good use of the open source community (specifically, the Docker community) in getting the answers you are looking for. As with any emergent technology, in the beginning, we are all somewhat learning together!