When starting using docker, lot of people store file either in the container instance itself either on the file system of the server. in this post, will see the different option to deals with files, and we’ll discuss in which case it’s a good approach.

With docker, like in general, there’s several way to manage files :
directly in the container
– via a mount point on your server
– data container
– using a docker volume
– using a network file system (amazon S3, Hadoop, …) .

Files stored directly in the container :
The main advantage of this solution is it’s simplicity.
But there are several drawbacks:
– your container is not persisted, so, when your container crash, or is removed, you may loose your files. In some case it’s not important : temporary files, demo, files generated by a script, ….
– Performance, as docker use union fs, when reading and writing files it may not be as fast as expected.
– file sharing : if you want to share the file among several process, either you have several process in your container (not very good), either you may link strongly two containers (or have them in the same pod if you use K8S), that’s may not be very good too.

Files mount on the host :
One of the advantage of this solution is also simplicity, and it solves partially the problem of sharing a file among container (as long as thy run on the same server).
This solution have some drawback too:
– your container need to run on this server,
– the container need to have access in write (or read) to your host, it may not be secure.
– that’s to be avoid in production, because you create a hard link between the container and the host, and you loose some of the advantage of the isolation concept.

Files store in a data container :
Some people do that, …, I still not understand why doing that when you have volumes….

File in a cloud provider storage :
It may looks like a perfect solution if your containers run in the cloud, but you create a direct link between the container and the cloud provider. In some case you may not want that. So, use it carefully, being cautious of what your are doing. Using a volume may offer your the abstraction needed.

Files in a docker volume :
This solution may be a bit complex for beginner, as it introduce a new concept, but in fact it’s pretty efficient. The full documentation is available here. You can see it , as creating a new logical disk dedicated for your need, where you can configure the size, the name, the filesystem used.
The main advantages of using a volume are :
– it’s ease file sharing among containers
– it doesn’t make any link to the server where the container and the volumes are stored.
– it’s allow to scale, as you can define the filesystem you want inside it, you may even imagine to have a distributed file system like GlusterFS. (you may have a look to all driver available here) . You can even use directly cloud provider solution. So, it’s my favorite solution, as it’s offer also an abstraction level between your container and the file system.

Hope it give you some hint on how to manage files, i will try to give some performance metrics soon.

Why to we need to share a file in the docker world ? I will give some common use case see on several projects :

  • As a container should run only one process, when you start an application, and you need to generate a configuration file at the start up (configuration file that may change over the time because you are using consul, or etcd, or vault ..). So you have a daemon process that scan change , and that update the configuration file, and then that will notify the application to read it. Having two processes in your document will be a mess, as if one of the two process die, you may be in a strange situation. A way to manage it, is too have have two side car container sharing a volume.
  • If you want to push your logs in a central place (ELK, ….), you may use the same side container concept.
  • Database files, you may store your files in a dedicated volume, that will allow you to start another psql process, using the volume in read-only mode, this will give you access to the data, without taking any risk with the process running your DB.