How I Hardened SlackerNews with SecureBuild

If you’ve seen me demonstrate Replicated, you’ve probably noticed that I nearly always use SlackerNews as my demo application. Replicated built SlackerNews as a showcase application for our Platform, so my team uses it every time we do a demo or experiment.

When we launched SecureBuild, I knew I needed to adopt it for SlackerNews immediately. We had a Postgres image available, so I dropped that into my values file right away.

images:
  postgres:
    registry: images.shortrib.io
    repository: proxy/slackernews-mackerel/cve0.io/postgres
    tag: 16
    pullPolicy: IfNotPresent

I made the equivalent change to my HelmChart object and added SecureBuild as a private registry that the proxy Registry could access. Being the first person from Replicated to sign up publicly for SecureBuild, I snagged the replicated username.

$ replicated registry add other \
    --username replicated \
    --password 4kVS48FrbcBATEsfIPEX4MPL3iiitLndSk0B
    --registry https://cve0.io

Then we needed an NGINX image. But there wasn’t one in the catalog.

Basing the SlackerNews image on a SecureBuild image

The team was busy moving the product forward and building images that our customers needed, so I wasn’t sure I’d get an NGINX image right away (I’m not as important as you folks are). But I couldn’t sit still.

I needed to figure out what else I could do. Then I heard @grant share something on a call—we’re providing access to one “language” image for every Replicated customer. I wanted to check out how that works, so I decided to use the Node.js image as the base for the application image in SlackerNews.

At first, I thought it would mean adopting a new toolchain. After all, we use a different approach to building images with SecureBuild than the Dockerfile I was using. But that was just me trying to get a gold star. The tools don’t matter as long as the foundation is secure.

So I updated my Dockerfile, swapping out the base image like this:

FROM cve0.io/node:20.19

Then I ran a build.

# the output you'd expect...
Dockerfile.web:5
--------------------
   3 |
   4 |     # Install dependencies that might be necessary
   5 | >>> RUN apk add --no-cache libc6-compat
   6 |     WORKDIR /app
   7 |

Well, that’s not so great.

It makes sense, though. We don’t have the apk command because we’re working with a distroless SecureBuild image. Maybe I don’t need it—it’s just installing the standard C library. I tried that, but it failed a little later because I had some shell logic and the SecureBuild image has no shell.

But I noticed that I specify three different images in the build: base, deps, and runner. I care most about using the SecureBuild image for the runner image. Let’s try that.

FROM node:20-alpine AS base

# set the working directory and install the library

FROM base AS deps

# build application

FROM cve0.io/node:20.19 AS runner

# run the application

That got me through building the runner. But that failed on its first RUN command.

Dockerfile.web:39
--------------------
  37 |
  38 |     # Create a group and user for running the application
  39 | >>> RUN addgroup --system --gid 1001 nodejs
  40 |     RUN adduser --system --uid 1001 nextjs
  41 |
--------------------

I hate to admit how long this took me to figure out. The parts of my brain that may once have known that RUN behaves differently with a string vs. an array just did not light up. Eventually Claude realized I had to use an array with my RUN command to avoid calling a shell (remember, the image doesn’t have one!).

I got past that then realized that addgroup and adduser aren’t on the image either, but that’s fine since COPY can set permissions and both COPY and USER can use IDs instead of names.

# Copy the necessary files from the builder stage
COPY --from=builder --chown=1001:1001 /app/public ./public
COPY --from=builder --chown=1001:1001 /app/.next/standalone ./
COPY --from=builder --chown=1001:1001 /app/.next/static ./.next/static

# Switch to the nextjs user
USER 1001

With those changes I had an image that built, but I couldn’t get it to run.

Why won’t it run?

There’s not much more frustrating that wrestling through an image build only to find out your image won’t run. Here’s what it did:

node:internal/modules/cjs/loader:1215
  throw err;
  ^

Error: Cannot find module '/app/node'
    at Module._resolveFilename (node:internal/modules/cjs/loader:1212:15)
    at Module._load (node:internal/modules/cjs/loader:1043:27)
    at Function.executeUserEntryPoint [as runMain] (node:internal/modules/run_main:164:12)
    at node:internal/main/run_main_module:28:49 {
  code: 'MODULE_NOT_FOUND',
  requireStack: []
}

This was another one that left me scratching my head and begging Claude for help. It even led me to turn to ChatGPT after Claude didn’t get me past it. Eventually, with all my searching and lots of trial and error, I realized exactly what the message meant (I’m not a Node developer, after all). Here was the culprit:

CMD ["node", "server.js"]

Remember that CMD is the command that gets run when the container starts. Another long forgotten detail surfaced in my flailing and pleading with AIs: CMD behaves differently with ENTRYPOINT. But I wasn’t defining an ENTRYPOINT. Was my base image?

$ crane config cve0.io/node:20.19
{
  "architecture": "amd64",
  "author": "github.com/chainguard-dev/apko",
  "created": "1970-01-01T00:00:00Z",
  "history": [
    {
      "author": "apko",
      "created": "1970-01-01T00:00:00Z",
      "created_by": "apko",
      "comment": "This is an apko single-layer image"
    }
  ],
  "os": "linux",
  "rootfs": {
    "type": "layers",
    "diff_ids": [
      "sha256:6442be140a6047e57cda66b2bff50ee19c3207c5ef4c13a1f01e0ae4f6bbed7c"
    ]
  },
  "config": {
    "Entrypoint": [
      "/usr/bin/node"
    ],
    "Env": [
      "PATH=/usr/local/sbin:/usr/local/bin:/usr/bin:/usr/sbin:/sbin:/bin",
      "SSL_CERT_FILE=/etc/ssl/certs/ca-certificates.crt"
    ],
    "Labels": {
      "org.opencontainers.image.created": "1970-01-01T00:00:00Z"
    }
  }
}

There it is! The ENTRYPOINT is /usr/bin/node. I had to change my CMD to ["server.js"] to pass the argument to the node command rather than invoking the node command directly (which was telling node to run node).

Returning to NGINX

I got through all of this and the team still hadn’t been able to get to my request for an NGINX image. After a bit of persistence on Slack, I was able to deputize myself onto the SecureBuild team to build the image for myself. It was quite an adventure.

A SecureBuild image is a multi-architecture image with a distroless base image, with all packages built from source code. The image can be “dropped in” in place of an “official” image. In this case, I wanted to swap out the Docker Hub “Official” NGINX image with my SecureBuild image. That meant I needed to understand what was in that image and build each and every component from source.

I initially tried to build a maximal NGINX installation (though I did leave out the perl module) since I thought that might be the best bet. That led me down the road of building lots of libraries, which meant I needed to build tools like cmake and the GNU Autotools. And that last one meant I needed to build Perl. I felt like I woke up in 1995 when I was building all my packages from source and a big part of my job was writing CGI scripts in Perl[1].

Eventually I realized that I could scope down my NGINX build to the same modules and configuration used in the official images, and that all of the libraries I needed for that were already built[2].

I’m glad that, most of the time, I have the SecureBuild team to take care of all that for me.

Finalizing the SlackerNews image

After a quick pairing session with a SecureBuild engineer, we had an NGINX Image ready to go. I updated my values file and KOTS HelmChart to use it, and SlackerNews was ready to go.

Only one thing left to do: I’d been ignoring those SlackerNews Dependabot alerts for a long time. SlackerNews itself had a ton of CVEs! There was no point in basing my image on a Zero-CVE NodeJS image if I had dozens of CVEs in my code. Fortunately, I wrestled most of them to the ground by removing an old dependency that brought in lots of vulnerabilities and wasn’t even used in the code anymore. That was a quick fix.

There were a few packages in the application that didn’t have a fix for their CVEs, but I got rid of dozens of them. That’s on top of the ones SecureBuild eliminated: 744 in Node, 141 in NGINX, and 221 in Postgres.

Not bad for what amounted to a few hours of work.


  1. And installing the patches that lead to “a patchy server↩︎

  2. I still finished up all those dependent packages because the team needed them. ↩︎

1 Like