Dockerized Java Development

Maksym
Govorischev

Lead Software Engineer
@EPAM

Dockerized Java Development


Having said that, let’s establish the base for the subsequent discussion and just briefly recap what Docker is, and why we should be interested in this particular implementation of containers concept. Here are some stats, that were presented during this year’s DockerCon by Docker’s CEO Ben Golub:
«Docker usage statistics: Increased adoption by enterprises and for production use»:
• There are 460K Dockerized applications a 3100% growth over 2 years
• Over 4 billion containers have been pulled so far
• Docker is supported by a large and fast growing community of contributors and users
• As an example, there are 125K Docker Meetup members worldwide. That is about 40% of the population of Iceland! (Yes, that is the country that beated England at Euro 2016).

As we can see, Docker shows impressive growth dynamics both in terms of adoption for development and production needs, and in terms of community size, which is not always the same. I’d say that the first one is often impossible without the second one.

mal1

And another chart, showing Docker adoption rate, depending on the tech stack. There’s an interesting correlation between the complexity of development environment setup for the particular platform and Docker popularity among developers for this platform. And it’s actually the main reason to be interested in Docker usage for development purposes (remember, we don’t discuss production use here).
But how is Docker different from the other existing approaches to simplify development environment setup? To be precise, from various tools that utilize virtualization technology.
Two main arguments are:
Density – Docker allows to pack more containers onto one machine, than it would be possible with virtual machines.

User experience – this one is purely subjective, but for me personally Docker provided better experience than virtual machines world tools (e.g. Vagrant) Of course, here we should make a remark, that until recently, Docker engine required a virtual machine to run on two most popular OS’es used by developers.

mal2

And to establish some more context, let’s look at the main building blocks of Docker core toolset, which you will use the most for development needs.

mal3

Besides the core elements, there are also some supporting tools, we’ll talk about:
Docker Machine – a tool that actually abstracts you from the details of of the host system you’re running Docker on.

Docker Compose – a tool that allows to run multi-container setup. If your setup contains web server, app server and a database, and they should be brought up and shut down together in a particular order, Docker Compose does this for you

By now, you should have understood that the area where Docker gives the most benefits for development is development environment setup. And Docker makes this procedure:
Consistent– across the team, from dev to prod
Time saving– no need to go through long and sophisticated installation process for each developer

Reproducible– no more “works on my machine”

Clean– easy to work with multiple versions of environment without turning your local machine into mess

Let’s consider two of the most popular use-cases where Docker may come handy:
1. Quickly bootstrapping sophisticated dev environment locally.
2. Setting up common template environment to communicate between different teams (e.g. UI and back-end teams)

Use-case 1: Quickly bootstrapping sophisticated dev environment locally

On one of my recent projects we had to choose a machine learning stack on which we would base various types of predictions and recommendations for our system. After some research we decided to give Prediction IO a try. Prediction IO is an open source Machine Learning Server built on top of state-of-the-art open source stack.

mal4

The thing is to set it up locally (or on your QA env) you need to meet quite a few pre-requisites. Here’s the extract from Prediction IO installation guide.

mal5

Even assuming that one goes successfully through all these components (without making some tiny mistake in the middle of the installation steps for one of the items in this list, or messing with something already installed on the machine), this is quite sophisticated and fragile process. However, luckily there’s a community Docker image for the Prediction IO, and if you have Docker installed locally, you can bring the above mentioned stack up literally with a single command.

mal6

Pros:
• Fast and easy – no need for additional explanations, I believe, one command brings it all up.
Cons:
• Breaks initial consistency promise – some time before we stated that Docker gives you consistency across the various environments. And of course, we understand, that in production we will need this stack to be distributed and not run on a single machine. Moreover, each of the components from the list, may require it’s separate cluster. So here we break consistency in favor of simplicity. However, if we adopt our Docker setup to use some clustering solution like Swarm, consistency promise can also be met.

Use-case 2: Setting up common template environment to communicate between different teams (e.g. UI and back-end teams)

Another example could be one of the Epam’s internal projects, I took part in, utilizing pretty standard stack, and having UI and Backend teams, each handling their part of the system.
mal7

It was the first phase of the system development, so requirements were often quite fuzzy and changing, and there were not stable domain model and backend API’s to be utilized by UI team. Due to this development of most of the features required very intensive interactions and communications between UI and Backend teams, going through many iterations while making some particular piece of functionality work.

mal8

And in order to do that each of the teams had to be able to deliver their changes and expose/test them as part of the overall system setup.
Possible options to achieve this are:
• Full local dev environment copy – both UI and backend devs setup the whole environment copy locally, then pull changes of the other team, build, deploy and test them.
It’s a generally working solution, unless we consider the fact that UI and Backend team have totally different development stacks, UI devs are not quite comfortable with setting up Maven & Co, Java devs are generally not familiar with Webpack and Babel.

mal9

mal10

• Sync through dedicated environments
In this case each team pushes their changes to the common feature branch, all necessary artifacts are built by CI tool and deployed to one of the pre-selected environments, set up in advance. Also working solution (and to be honest, we ended up using it, while polishing Docker based approach). However, this approach is more resource consuming (it’s not always easy to get additional machines for one more env, especially for internal projects), requires additional communication inside and between teams (often one dedicate environment is re-used to test several different features), and also between devs and devops, who usually manages these dedicated environments.

• Local Docker based environment.
This last one is based on Docker and Docker core tools usage and can generally fit in to following workflow.

mal11

Each of the teams pushes their changes to feature branch, CI pulls them, builds one or more Docker images with corresponding tags and pushes them to the Docker registry. Please note – this approach works the best when the simplest unit of distribution is a Docker image and you have a Docker registry set up to host your images. In our case we had no resources to host Docker registry and therefore had to go into series of hacks that significantly complicated the process.

There are some public services and in-house solutions that provide Docker registry functionality. We’ll just name them without going into much detail:

Public services:

mal12

– DockerHub (hub.docker.com).

 

quay_preview

 

– Quay.io

 

mal14

 

– Amazon Elastic Container Service

 

 

mal15

 

Google Cloud Container registry

 

 

 

In-house Docker registry options, which can be hosted inside your organization:
mal12

 

– Docker registry under the hood of Docker Hub is actually an open-source product, so can be hosted for free locally.

 

mal16

– JFrog Artifactory provides functionality to host Docker images.

 

mal17

 

one of the aims of GitLab is to cover the full development cycle and they have recently announced a cotainer registry feature.

 

Let’s also describe couple of issues that we faced trying to adopt Docker based approach and possible solutions for them

Issue 1: Oracle JDK License

Recently, there was quite hot discussion about whether it’s legal to run Java in docker containers, inspired by this post Takipi blog and with results generally summarized here.

Let’s just briefly highlight the cases where you might be breaking the Oracle Java’s license.
Typical violations:
• Removing some files from JDK distribution in order to shrink the size
• Re-distributing Oracle JDK binaries as part of Docker image
• Using smart Dockerfiles to automatically download Oracle JDK during image build
Possible solutions:
• Use OpenJDK based Docker images
• Use Azul Zulu based Docker images
• Download Oracle JDK and build image locally


Issue 2: Persistent storage management

By default, all data that your container operates on, is stored in container’s local storage and is removed together with the container. Any more or less sophisticated use cases require data durability, so you need to think of how to make your container persist the data somewhere outside itself and be able to freely dispose the container but save the data.
There’s very detailed and informative discussion on Stackoverflow about this topic.

mal18-png

The bottom line is that this post doesn’t pretend to be an exhaustive information source about all the details of Docker usage, but rather intends to share a piece of practical knowledge, gained during development.