Making a container image for my school python project
What is this?
This is a simple documentation/journal/story of me trying to create a container
image for my school python project using podman
/buildah
for the first time.
I used:
podman
: The pick for me because Docker has too many features that I don't use.buildah
: Used for making the container image.- All scripts written in
fish
unless specified.
First steps
I've had experience with Docker but I found podman
and buildah
to be more of
the tool I'd like to use. Especially buildah
; the fact that I can make shell
scripts to build container images really made it look attractive to me.
So the first step definitely was to dig into using both the tools.
Choosing a base image
[Spoiler - Bad Idea]
For me, the go-to image for containers is the Alpine Linux image (due to its small size), but never did I knew that I will be swapping it out because the whole thing wouldn't even work at all!
Making the binary work in the container
So the first steps were to try out my application binary (made using
pyinstaller
) in the image. So the commands to make it were:
set id (buildah from --pull alpine)
buildah copy $id ./dist/cli /
buildah run $id -- sh
Note that I am jumping into the container's shell because I didn't want to commit an image right now. Tried to run the binary, only to get
sh: ./dist/cli: not found
I was initially confused by this error message, but later got to know that this
is a
common symptom of a dynamic link failure
, which is entirely right, because Alpine Linux uses musl-libc
, and not
glibc
, so definitely the glibc libraries will be missing there. After a bit of
searching, I found a
way to run glibc programs on Alpine linux
. So tried that, and...
/ # ./cli --help
Error relocating /tmp/_MEIAdPkMp/libz.so.1: crc32_z: symbol not found
So, definitely I was clueless about what to do. So I really had two options: switch to another image, or not containerise. Fortunately, I picked the first one - choose a different image.
Choosing another image
Since the app was working well enough in my Ubuntu 20.04 LTS distro
(WSL by the way), I decided to go for it. Checked the
20.04 tag on the
ubuntu
image. ~76 MB uncompressed disk size, not bad. So I pulled the image,
slapped the binary in, and tried ./cli --help
, it worked! So I was ready to
do the next.
Making the PDF generation work in the container
My application uses Playwright to generate the report card PDFs, based on a Jinja template filled with data which can be sourced from different places. So if you go to the website of Playwright, throw in a few clicks by navigating through the website, you will land here - the page that tells how to install system dependencies. So I pulled on a temporary Ubuntu container and tried to install python3 and pip, but...
root@35423fdec4a9:/# apt install -y python3 python3-pip
...
Need to get 72.5 MB of archives.
After this operation, 319 MB of additional disk space will be used.
319 MB? Just to install the system dependencies? I was shocked. I decided to leave this here and instead take a look in the source code of the Playwright CLI , to figure out how it figures out system dependencies. So after quite a bit of searching, I landed at this nativeDeps.ts file, which has a list of dependencies required for Playwright to function. Immediately realising this is an TypeScript file, an idea came to my mind. What if we just get the dependencies every time from this file when we start building out image?
Installing dependencies, the DIY way
If we run playwright without the dependencies installed, we will get something like this:
╔════════════════════════════════════════════════════════════╗
║ Host system is missing a few dependencies to run browsers. ║
║ Please install them with the following command: ║
║ ║
║ playwright install-deps ║
║ ║
║ <3 Playwright Team ║
╚════════════════════════════════════════════════════════════╝
So I hacked up a little Deno script to fetch the dependencies:
deno eval "import {deps} from 'https://raw.githubusercontent.com/microsoft/playwright/main/packages/playwright-core/src/utils/nativeDeps.ts'; console.log(deps['ubuntu20.04']['chromium'].join('\n'));"
Cheeky enough, I know 😉. By the way, Deno is used because the source is a TypeScript file, and I wouldn't like to install several dozen packages on Node just to get the dependencies list.
And tried installing that in the container but apt decided to greet me with an error message:
E: Unable to locate package <all-the-package-names-seperated-by-a-space>
Realising how naïve this was, I tried to look up for solutions to this. One solution that particularly stood out was somewhat like this:
- Store the list of dependencies in a file, one item on a line.
- Using
xargs
to pass the arguments to apt So I came up with these lines to do it:
# Install dependencies
deno eval "import {deps} from 'https://raw.githubusercontent.com/microsoft/playwright/main/packages/playwright-core/src/utils/nativeDeps.ts'; console.log(deps['ubuntu20.04']['chromium'].join('\n'));" > /tmp/deps.txt
buildah copy $id /tmp/deps.txt /tmp
buildah run $id -- xargs -a /tmp/deps.txt apt install -y
Which gets those packages installed, claiming ~117 MB of space in the process (not to shabby, at least better than the previous 319 MB).
The tzdata prompt problem
While doing the installation, I noticed a prompt popping up:
Configuring tzdata
------------------
Please select the geographic area in which you live. Subsequent configuration
questions will narrow this down by presenting a list of cities, representing
the time zones in which they are located.
1. Africa 4. Australia 7. Atlantic 10. Pacific 13. Etc
2. America 5. Arctic 8. Europe 11. SystemV
3. Antarctica 6. Asia 9. Indian 12. US
Geographic area:
This was really a disappointment since building container image is supposed to
be an automated process. Looking up a
solution for this on the Web, I found out
that if I set the DEBIAN_FRONTEND
environment variable to noninteractive
, I
can stop the prompt from appearing. But I still need a time zone somehow! There
was another solution for it:
buildah config --env TZ=Asia/Kolkata $id
buildah run $id -- ln -snf /usr/share/zoneinfo/\$TZ /etc/localtime
Notice the $TZ. The $ was to prevent the TZ env from the host to creep in Running this, gets the job done. No more tzdata prompt in the middle of an installation, yay!
Setting up the image's public interface
The first step was to change the working directory, since the app is being
copied to /app
.
buildah config --entrypoint '["/app/cli"]' $id
After this, I commit the image and then remove the working container:
buildah commit $id rep
buildah rm $id
Building the image
After this, I thought the thing was done. So I start the process of building the image and leave the machine alone for some time. After the build was done, I tried to run the image just to see that everything is working.
podman run -it --rm localhost/rep
And I see a very, very peculiar error:
Usage: cli [OPTIONS] COMMAND [ARGS]...
Try 'cli --help' for help.
Error: No such command 'bash'.
But I didn't pass bash as a parameter! However, digging in the logs, I see a line:
WARN[0000] cmd "bash" exists and will be passed to entrypoint as a parameter
Some more "looking up" and I get to know that I have not set the cmd
key in
the image config1. So I had two attempts to fix it:
buildah config --cmd "" $id
buildah config --cmd '[""]' $id
But they didn't work. I, getting bonkers about not getting a solution, ranbuildah config --cmd '[]' $id
and it worked! There was no "bash" being passed as a parameter!
Running the image
For this one, I use this command to run it:
podman run -it --rm -v $PWD/config:/app/config -v $PWD/out:/app/out localhost/rep
And it worked flawlessly. I was happy.
Size of the image
I also took a look at the image size just to see if our efforts have made any
significant effect. So running podman image ls
gives:
REPOSITORY TAG IMAGE ID CREATED SIZE
localhost/rep latest 4c129f4360a0 About an hour ago 494 MB
494 MB. Still OK. Could have easily been in the range of 1GB if I would
install Python to install the dependencies. I did that too, here are some
rough figures:
- With Python and pip: 1.1 GB
- With Python and pip, but deleted them afterwards plus ran
apt autoremove
: ~580 MB
Getting attracted to Alpine, again
SPOILER: BAD IDEA AGAIN! :(
As I am writing this post, another idea pops up: what if, we bundle the app into an AppImage bundle, and pop it inside an Alpine container? After all, AppImage bundles, if done correctly, can run anywhere, even in the scratch container. Also we could save more storage space. The Ubuntu image still has lots of things that are of no use to my app and we could, in theory, cut on a lot of bloat. The general process could be:
- Figure out the shared libraries of the app and the Playwright-supplied Chromium binary.
- Create an AppImage.
- Pop it in the
scratch
/alpine
imageCould be a very nice plan to do, but not now, its for another day!Figured out the shared libraries usingldd
, I made an AppImage, tried to run it on my host machine, but it failed to generate the PDFs. Maybe the Chromium binary was missing a library or two, who knows. Later I found a Docker image for running Chrome in Alpine and thought of trying it out. But first, I checked the image size, it was 282.18 MB compressed. If I consider the compression ratio to be about as half, that, plus, my binary, the total storage space would be well over the current image size. Not worth the effort.
Ending Words
This was really my first time making an image, and it was a fun adventure, span over the course of a week.
If you are interested about this project, check it out here.
Also, here's a direct link to the script that makes the container image.
Footnotes
-
( https://www.mankier.com/1/buildah-config#--cmd ) Always read your man pages when in doubt. It helps. If it doesn't, then you might need to look up elsewhere. ↩