Technical Deboning — Week One

BIND9 from
First Principles

Every file, every directive, every tuning knob — from a small branch office to a national ISP handling five million queries per second. A Query Per Second (QPS) is simply one DNS lookup answered in one second; it is the standard unit of DNS throughput measurement. By the time you close this document, you are pilot in command.

Ubuntu 24.04 LTS
bind9 9.18.x
David Emiru Egwell · CTO
ISP Debone Series

Contents

00The BIND9 Mental Model 01named.conf — The Dispatcher 02named.conf.options — The Engine Room 03named.conf.local — Your Zones 04named.conf.default-zones — RFC Plumbing 05db.* Zone Files — The Data 06rndc.key — The Control Channel 07bind.keys — DNSSEC Root Anchor 08zones.rfc1918 — Private Reverse Zones 09Small ISP Setup — ~100 Queries/sec 10National ISP Setup — 5 Million Queries/sec 11Docker — Running BIND9 in Containers 12Defending Your Turf — DNS Security 13DNSSEC — Signing the Internet 14Final Project — Production Build
Chapter 00 · Foundation

The BIND9 Mental Model

Before touching a single file, understand the architecture. BIND9 is not one thing — it is three roles that can run on one box or ten.

Windows DNS in Server Manager is a GUI wrapping a service. BIND is a daemon — named — with a text-based configuration stack. The files on disk are the configuration. There is no registry, no AD metadata, no GUI state to diverge from the actual running config. This is power, but it requires you to hold the mental model clearly.

Three Roles BIND Can Play

RoleWhat It DoesWindows EquivalentYour Use Case
AuthoritativeHolds zone data. Answers definitively: "yes, this record exists in this zone and I am the source of truth."DNS zone on Server with recordssprintug.com, sprinttz.co.tz zones
Recursive ResolverWalks the DNS tree on behalf of clients. Caches results. Does NOT hold zone files.DNS forwarding + cache on ServerCPE resolver for your subscriber base
ForwarderPasses queries upstream instead of resolving itself. Less CPU, less cache independence.DNS Forwarders tab in Server ManagerEdge resolvers pointing to your core resolvers

Most DNS confusion — for engineers coming from Windows DNS or from a decade away from the field — comes from not separating these three roles cleanly in the mind. Windows Server DNS blurs them together in one GUI, which is convenient but hides the architecture. BIND forces you to be explicit, which means you have to understand what you are building before you build it. That is not a weakness. That is the point.

verified
Role 1 — Authoritative Nameserver
The source of truth. It does not search. It does not ask. It knows.

The authoritative server is the last stop in the DNS lookup chain. It holds the actual zone files — the database of records for a domain. When it answers a query, its response carries the aa flag (Authoritative Answer). That flag means: I am not relaying someone else's information. This is my data. It ends here.

Think of it like the land registry office. If you want to know who owns a plot in Kampala, you go to the registry — not to your neighbour's guess about it. The authoritative server is that registry for a domain. There is no higher authority to appeal to. When it says a record does not exist, that is the answer.

It has no cache. Caching would be wrong here — if you cached a stale answer and served it as authoritative truth, you would be lying. Every answer comes directly from the zone file on disk.

An authoritative server runs in one of two sub-modes:

  • Primary (master): The original. Zone files live here. You edit records here. When something changes, it notifies the secondaries.
  • Secondary (slave): A read-only replica. It pulls zone data from the primary via a mechanism called a zone transfer. If the primary goes down, the secondary keeps answering. You never edit the secondary's zone files directly — the primary is always the source of truth.

In the Sprint Group context: ns1.sprintug.com and ns1.sprinttz.co.tz are your authoritative servers. The whole world can query them for records in your zones. They never perform recursion for anyone.

manage_search
Role 2 — Recursive Resolver
The investigator. It knows nothing by default — but it knows how to find everything.

The recursive resolver is the engine your subscribers actually talk to. When a subscriber's phone looks up youtube.com, it asks your resolver. Your resolver does not know the answer — but it knows where to start looking. It walks the DNS tree from the top:

  1. Ask a root server: "Who handles .com?" → gets pointed to Verisign's TLD servers
  2. Ask the .com TLD server: "Who handles youtube.com?" → gets pointed to Google's authoritative servers
  3. Ask Google's authoritative server: "What is the A record for youtube.com?" → gets the IP
  4. Return the answer to the subscriber and cache it

That last step — caching — is where the performance lives. The next time any subscriber asks for youtube.com, the resolver answers from cache in under a millisecond without touching the internet. A well-sized resolver cache with a good hit rate means the vast majority of your subscribers' queries never leave your network at all.

The recursive resolver has no zone files of its own (except for the RFC plumbing zones). It is pure computation and cache. Its memory requirement is large. Its CPU requirement is moderate. Its network requirement is: it must be able to reach the root servers and all authoritative nameservers on the public internet.

In the Sprint Group context: Every Point of Presence runs a resolver. Subscribers' CPE devices point to these IPs. The resolver at Raxio serves Kampala North. The resolver at Wingu Mbezi serves Dar es Salaam. They are your front line — the servers that take the daily subscriber load.

cell_tower
Role 3 — Forwarder
The relay. Passes queries upstream rather than resolving them itself.

A forwarder is a resolver that has opted out of doing the full investigative work itself. Instead of walking the DNS tree from the roots, it passes every query to another resolver (usually your core resolver) and returns whatever that resolver says.

This sounds like a lazy resolver — and in some sense it is. The tradeoff is deliberate. A forwarder requires far less memory (no large cache needed), far less network access (it only needs to reach its upstream, not the whole internet), and far less CPU. It is the right choice for a branch office, a small PoP with limited transit, or a microsite where you want DNS without the overhead.

The risk is dependency. If the upstream resolver goes down, the forwarder has nothing to fall back on. You configure this with forward only; (strict dependency) or forward first; (try upstream, fall back to full recursion if it fails). In a managed ISP environment where your upstreams are your own servers, forward only; from branch offices to your core resolvers is the clean design.

In the Sprint Group context: A small tower site with a CPE aggregation router and one Linux box could run a BIND forwarder pointing to the Raxio or Wingu core resolvers. It caches locally for the site, offloads the upstream, and degrades gracefully — it stops caching but subscribers still get resolution via the core. Simple, lean, correct.

How a Real Query Flows Through All Three Roles

This is the full journey of a single DNS lookup from a SprintUG subscriber to an answer. Understanding this chain completely is what it means to own DNS as a discipline.

query lifecycle — subscriber looks up youtube.comconceptual
  SUBSCRIBER DEVICE                  YOUR INFRASTRUCTURE              THE INTERNET
  ──────────────────                 ──────────────────────────       ─────────────────────

  Phone looks up                     [RESOLVER — Raxio DC]
  youtube.com                  ───►  Cache miss. Not seen yet.
                                     Start recursive walk.
                                                                  ───►  Root server (.)
                                                                         "Who handles .com?"
                                                                  ◄───  "Ask Verisign at 192.5.6.30"

                                                                  ───►  Verisign .com TLD
                                                                         "Who handles youtube.com?"
                                                                  ◄───  "Ask ns1.google.com"

                                     [RESOLVER]                   ───►  [AUTHORITATIVE — Google]
                                     Receives answer                     ns1.google.com
                                     Caches: youtube.com                 "A record = 142.250.185.78"
                                     A = 142.250.185.78           ◄───   aa flag set
                                     TTL: 300 seconds
  ◄────────────────────────────────  Returns answer to phone
  Phone connects to YouTube
  ──────────────────

  [30 seconds later — different subscriber]

  Another phone looks up      ─────►  [RESOLVER]
  youtube.com                          Cache HIT.
                                       Answer returned in <1ms.
                              ◄─────   No internet query needed.
  ──────────────────

  [Now — subscriber looks up sprintug.com]

  Phone looks up              ─────►  [RESOLVER — Raxio DC]
  sprintug.com                         Zone forward: "sprintug.com
                                       → ask our own auth server"
                                                                  ───►  [AUTHORITATIVE — ns1.sprintug.com]
                                                                         Your server. Your zone file.
                                                                         aa flag set.
                              ◄─────   Answer returned               ◄─── A record = 196.43.10.100
  ──────────────────

Why You Must Separate These Roles in Production

Running all three roles on a single server is how you start. It is not how you scale, and it creates two serious problems that every serious ISP eventually runs into.

ProblemWhat happens when mixedWhat separation gives you
Cache poisoning riskAn attacker who tricks your resolver's cache into serving a bad record for sprintug.com now corrupts your authoritative answers too — because they are on the same process with a shared cacheThe authoritative server has no cache. There is nothing to poison. It only knows what is in the zone file on disk.
Performance profile mismatchAuthoritative servers need fast storage I/O and low latency. Resolvers need massive RAM for cache. One box cannot be optimised for both without compromise.Auth servers: small, fast, low memory. Resolvers: large RAM, many CPU threads, high concurrency. Each tuned for its job.
Traffic exposureYour authoritative server — which the whole internet can query — and your subscriber resolver — which holds your full cache — are the same machine. An attack on one is an attack on both.Auth servers face the internet but hold no sensitive cache. Resolvers face only your subscribers. The attack surface is divided.
Scaling pathTo add capacity you must scale one monolith that does everything.Add resolver nodes independently without touching auth. Scale auth independently if your zone count grows.
flight
The Separation as Discipline

In 2006 at MTN Uganda you almost certainly ran a combined server — most ISPs at that scale did. The field has moved. The cost of separation is one extra server or one extra container. The cost of not separating is an attack surface and a scaling ceiling you will hit the moment Sprint Group signs that next enterprise contract. Separate the roles early. It is the correct shape of the architecture and everything else — monitoring, failover, security policy — becomes cleaner once you do.

The Configuration Stack

BIND does not use one monolithic file. It uses an include chain. The startup sequence is:

startup chainconceptual
# named reads ONE entry point:
named
  └── /etc/bind/named.conf               # dispatcher — includes the rest
        ├── named.conf.options            # global tuning, forwarders, ACLs
        ├── named.conf.local              # YOUR zones (authoritative)
        └── named.conf.default-zones      # localhost + RFC zones
              └── zones.rfc1918           # included from default-zones

# Crypto & control:
bind.keys                                 # DNSSEC root KSK anchors (auto-managed)
rndc.key                                  # HMAC key for rndc control channel

# Zone data files (the actual DNS records):
db.local    db.127    db.0    db.255    db.empty
# Plus your own: db.sprintug.com, db.10.10.10.rev, etc.
flight
Pilot-in-Command Mindset

Windows DNS makes it hard to shoot yourself by hiding complexity. BIND gives you every control but expects you to know what each switch does. The risk is misconfiguration. The reward is a resolver capable of handling the DNS load of a national telco on commodity hardware.


Chapter 01 · The Entry Point

named.conf

The dispatcher. It does almost nothing itself — its entire purpose is to pull in the other files and optionally declare global ACLs and logging.

article
/etc/bind/named.conf
Master dispatcher — includes all other config files

This file is the single path the named binary loads at startup. In a clean Ubuntu install it contains nothing but three include directives. Think of it as your main() function.

Default Ubuntu Content

/etc/bind/named.confBIND config
// This is the primary configuration file for the BIND DNS server named.
// Please read /usr/share/doc/bind9/README.Debian for information on the
// structure of BIND configuration files in Debian/Ubuntu.

include "/etc/bind/named.conf.options";
include "/etc/bind/named.conf.local";
include "/etc/bind/named.conf.default-zones";

Production Version — With ACLs and Logging

In production you add global acl blocks here so they are available to all included files, and a top-level logging block.

/etc/bind/named.conf (production)BIND config
// ═══════════════════════════════════════════════════════════
// SprintUG Internet Limited — Authoritative + Resolver DNS
// Authored: David Emiru Egwell  |  AS328939
// ═══════════════════════════════════════════════════════════

// ── ACL Definitions (available to all included files) ──────
acl "trusted_resolvers" {
    10.0.0.0/8;        // All SprintUG private space
    196.43.0.0/16;     // SprintUG allocated prefixes
    41.220.0.0/16;     // Sprint Tanzania
    127.0.0.1;         // Loopback
    ::1;               // IPv6 loopback
};

acl "noc_mgmt" {
    10.255.0.0/24;     // NOC management VLAN
};

// ── Logging ─────────────────────────────────────────────────
logging {
    channel default_log {
        file "/var/log/named/named.log" versions 5 size 50m;
        severity warning;
        print-time yes;
        print-severity yes;
    };
    channel query_log {
        file "/var/log/named/query.log" versions 3 size 100m;
        severity info;
        print-time yes;
    };
    category default   { default_log; };
    category queries   { query_log;   };   // Enable for debugging only — expensive at scale
    category security  { default_log; };
};

// ── Include Chain ───────────────────────────────────────────
include "/etc/bind/named.conf.options";
include "/etc/bind/named.conf.local";
include "/etc/bind/named.conf.default-zones";
warning_amber
Query logging at scale

Enabling the queries category at 5M QPS will generate ~400GB/day of log data and will measurably degrade resolver performance. Only enable it on a dedicated debug instance or with aggressive rate limiting. At scale, use DNSTAP instead — it exports to a binary log collector asynchronously.

What Can Live in named.conf

StanzaPurposeRecommended Location
acl { }Named IP lists reusable throughout confignamed.conf top-level
logging { }Log channels, categories, destinationsnamed.conf top-level
options { }Global daemon behaviournamed.conf.options (included)
zone { }Zone declarationsnamed.conf.local (included)
key { }TSIG/RNDC key definitionsSeparate .key file, included
controls { }rndc control socket bindingnamed.conf.options or named.conf

Chapter 02 · The Engine Room

named.conf.options

This is where you tune the engine. Memory limits, forwarder policy, DNSSEC validation, rate limiting, query source, recursion controls — everything that governs how named behaves globally lives here.

settings
/etc/bind/named.conf.options
Global options block — performance, security, recursion, forwarding

Contains exactly one options { } block. Every global BIND tuning parameter goes here. It can also contain the controls { } block for rndc.

Default Ubuntu (barebones)

/etc/bind/named.conf.options (default)BIND config
options {
    directory "/var/cache/bind";

    // If there is a firewall between you and nameservers you want
    // to talk to, you may need to fix the firewall to allow multiple
    // ports to talk.  See http://www.kb.cert.org/vuls/id/800113

    // If your ISP provided one or more IP addresses for stable
    // nameservers, you probably want to use them as forwarders.
    // Uncomment the following block, and insert the addresses replacing
    // the all-0's placeholder.

    // forwarders {
    //      0.0.0.0;
    // };

    dnssec-validation auto;
    listen-on-v6 { any; };
};

That default gets you running. For an ISP — even a small one — it is dangerously bare. Here is what each key directive does and what you actually need:

Options Reference — Key Directives

Core Paths

DirectiveWhat it controlsRecommended Value
directoryWorking directory for relative paths in zone files/var/cache/bind
dump-fileOutput of rndc dumpdb/var/cache/bind/named_dump.db
statistics-fileOutput of rndc stats/var/log/named/named.stats
memstatistics-fileMemory usage report on shutdown/var/log/named/named_mem.stats

Listening and Interface Binding

DirectiveWhat it controls
listen-on { }IPv4 interfaces/ports to listen on. Default: all interfaces port 53. Lock this down on resolvers.
listen-on-v6 { }IPv6 equivalent. any = all IPv6 interfaces.
query-source address portSource IP/port for outbound recursive queries. Randomise ports for security.

Recursion and Access Control

DirectiveWhat it controlsISP Setting
recursion yes/noWhether named resolves on behalf of clients. Off on authoritative-only servers.yes (resolver), no (auth)
allow-query { }Who may send queries. Lock to your subscriber ranges.trusted_resolvers ACL
allow-recursion { }Who may trigger recursive resolution. More specific than allow-query.trusted_resolvers ACL
allow-query-cache { }Who may read cached answers.trusted_resolvers ACL
allow-transfer { }Who may do AXFR zone transfers. Lock tight.Secondary NS IPs only

Forwarding

DirectiveBehaviour
forwarders { ip; ip; }Upstream resolvers to forward to. Empty = full recursion from root hints.
forward only;ONLY use forwarders. If they fail, query fails. Good for branch offices.
forward first;Try forwarders, fall back to full recursion. Good for core resolvers.
info
Forward first vs full recursion

On your core ISP resolvers, do not forward to Google 8.8.8.8 or Cloudflare 1.1.1.1. Run full recursion from root hints. Forwarding upstream means your query privacy is surrendered to a third party and your resolver depends on their availability. Your transit capacity easily handles full recursion — you are already paying for that bandwidth.

Cache and Memory Tuning

DirectiveWhat it controlsScale Guidance
max-cache-sizeMax RAM for DNS cache. Default: 90% of system RAM — dangerously high on a shared box.Small: 512m · Large: 32g per instance
max-cache-ttlOverride record TTLs downward. Lower = fresher cache, more upstream queries.3600 (1 hour)
max-ncache-ttlMax TTL for NXDOMAIN caching (negative cache).300 (5 min)
cleaning-intervalHow often to sweep expired cache entries (deprecated in 9.18 — automatic).N/A

DNSSEC

SettingBehaviour
dnssec-validation auto;Validate using built-in root anchors from bind.keys. Recommended.
dnssec-validation yes;Validate but require manually managed trust anchors.
dnssec-validation no;Disable validation. Only acceptable on an isolated internal resolver.

Rate Limiting (DNS Amplification Defence)

BIND's built-in Response Rate Limiting (RRL) prevents your server being used as a DDoS amplifier:

rate-limit stanza in options { }BIND config
rate-limit {
    responses-per-second 10;        // Per-client response rate
    referrals-per-second 5;         // Referral responses
    nodata-per-second   5;         // NODATA (empty answers)
    nxdomains-per-second 5;        // NXDOMAIN rate
    errors-per-second   5;         // Error responses
    window              15;        // Sliding window in seconds
    slip                2;         // 1 in N slip through as TC (encourages TCP)
    min-table-size      500;
    max-table-size      20000;
    exempt-clients     { 127.0.0.1; 10.0.0.0/8; };
};

Full Production Options File — Small ISP

/etc/bind/named.conf.options (small ISP — ~100 QPS)BIND config
options {
    directory             "/var/cache/bind";
    dump-file             "/var/cache/bind/named_dump.db";
    statistics-file       "/var/log/named/named.stats";
    memstatistics-file    "/var/log/named/named_mem.stats";

    // ── Listening ──────────────────────────────────────────────
    listen-on             { 127.0.0.1; 10.10.10.1; };
    listen-on-v6          { none; };

    // ── Recursion — open only to your subscribers ─────────────
    recursion             yes;
    allow-query           { trusted_resolvers; };
    allow-recursion       { trusted_resolvers; };
    allow-query-cache     { trusted_resolvers; };
    allow-transfer        { none; };  // No zone transfers from resolver

    // ── Forwarding — NONE: full recursion from roots ──────────
    // forwarders { }; // intentionally empty

    // ── DNSSEC ─────────────────────────────────────────────────
    dnssec-validation     auto;

    // ── Cache ──────────────────────────────────────────────────
    max-cache-size        512m;
    max-cache-ttl         3600;
    max-ncache-ttl        300;

    // ── Rate Limiting (amplification defence) ─────────────────
    rate-limit {
        responses-per-second 10;
        slip                 2;
        exempt-clients      { 127.0.0.1; 10.0.0.0/8; };
    };

    // ── Source port randomisation ──────────────────────────────
    query-source          address * port *;

    // ── Misc security ──────────────────────────────────────────
    version               "not disclosed";  // Don't reveal BIND version
    hostname              "not disclosed";
    hide-identity         yes;
    hide-version          yes;
};

controls {
    inet 127.0.0.1 port 953
        allow { 127.0.0.1; }
        keys { "rndc-key"; };
};

Chapter 03 · Your Zones

named.conf.local

This is your zone registry. Every domain you are authoritative for — forward and reverse — is declared here. The zone declarations are the map; the db.* files are the territory.

map
/etc/bind/named.conf.local
Zone declarations — forward zones, reverse zones, secondary zones

On a fresh Ubuntu install this file is empty (or nearly so). Every zone you declare here corresponds to a zone data file. The zone declaration tells BIND: "I am authoritative for this name, and the data lives in this file."

Zone Declaration Types

typeRoleHas db file?
master / primaryPrimary authoritative source. You edit the zone file here.Yes — you write it
slave / secondarySecondary. Pulls zone data via AXFR from primary. Read-only.Auto-generated in /var/cache/bind/
stubCaches only NS records for a zone. Lightweight delegation helper.Auto-generated
forwardOverrides global forwarding for a specific zone only.No
hintRoot hints zone — bootstrap for full recursion.db.root (default-zones)

Full Production named.conf.local

/etc/bind/named.conf.localBIND config
// ═══════════════════════════════════════════════════════════
// SprintUG — Zone Declarations
// Primary authoritative for SprintUG + SprintTZ domains
// ═══════════════════════════════════════════════════════════

// ── Forward Zones ───────────────────────────────────────────
zone "sprintug.com" {
    type            master;
    file            "/etc/bind/zones/db.sprintug.com";
    allow-transfer { 196.43.10.2; };  // ns2 secondary
    notify          yes;
    also-notify    { 196.43.10.2; };
};

zone "sprinttz.co.tz" {
    type            master;
    file            "/etc/bind/zones/db.sprinttz.co.tz";
    allow-transfer { 41.220.10.2; };   // TZ secondary
    notify          yes;
};

// ── Reverse Zones (PTR records) ─────────────────────────────
// 196.43.X.X — Sprint public block
zone "43.196.in-addr.arpa" {
    type            master;
    file            "/etc/bind/zones/db.196.43.rev";
    allow-transfer { 196.43.10.2; };
};

// Internal management range 10.255.0.0/24 (NOC VLAN)
zone "0.255.10.in-addr.arpa" {
    type            master;
    file            "/etc/bind/zones/db.10.255.0.rev";
    allow-transfer { none; };
};

// ── Secondary Zones (pulled from external primaries) ────────
zone "savannah.co.tz" {
    type            slave;
    masters        { 197.250.1.10; };   // Savannah Comms primary NS
    file            "/var/cache/bind/db.savannah.co.tz";
};

// ── Zone-specific forwarding override ───────────────────────
// Route queries for internal.sprintug.local to the AD DNS server
zone "internal.sprintug.local" {
    type            forward;
    forwarders     { 10.10.10.100; };   // Windows AD DC
    forward         only;
};
check_circle
Split-Brain DNS with Windows AD

This is the exact pattern that lets you run BIND as your internet-facing authoritative while keeping Windows Server DNS for your internal Active Directory. The zone "internal.sprintug.local" { type forward; } stanza passes all internal queries to the AD DC. Your ISP clients see BIND; your AD clients see Windows. Both are happy.

Anatomy of a zone stanza

zone stanza explainedBIND config
zone "example.com" {

    type            master;
    // master = I own this data (primary)
    // slave  = I replicate from another NS
    // forward = pass-through to another resolver

    file            "/etc/bind/zones/db.example.com";
    // Path to the zone data file.
    // Relative to 'directory' in options if not absolute.

    allow-transfer { 203.0.113.2; };
    // Only ns2 may AXFR this zone.
    // 'none' = no transfers allowed.
    // Use TSIG keys in production for authenticated transfers.

    notify          yes;
    // When this zone changes, send NOTIFY to all NS records
    // AND anything in also-notify.

    also-notify    { 203.0.113.2; };
    // Secondary servers that should be notified on zone change.
    // Must match allow-transfer on the secondary.

    allow-update   { none; };
    // Dynamic DNS updates (DDNS). 'none' unless you use DHCP+DDNS.
    // If you enable this, restrict to a TSIG key, not an IP.
};

Chapter 04 · RFC Plumbing

named.conf.default-zones

The RFC-mandated housekeeping zones. You almost never touch this file, but you absolutely must know what it does and why.

menu_book
/etc/bind/named.conf.default-zones
Root hints, localhost zones, and RFC1918 reverse boilerplate

This file declares the zones that every DNS server must have: the root hints zone (bootstrap for full recursion), localhost forward/reverse, and stubs for the private address space.

/etc/bind/named.conf.default-zonesBIND config
// prime the cache with the root zone hint
zone "." {
    type    hint;
    file    "/usr/share/dns/root.hints";
    // This file contains the 13 root server IP addresses.
    // Updated rarely. Ubuntu ships a current copy.
    // type hint = "use these to bootstrap, then cache what you learn"
};

// localhost forward lookup
zone "localhost" {
    type    master;
    file    "/etc/bind/db.local";
    // Authoritative for 'localhost' — serves A 127.0.0.1
};

// localhost reverse lookup (127.0.0.1)
zone "127.in-addr.arpa" {
    type    master;
    file    "/etc/bind/db.127";
    // PTR 1.0.0.127.in-addr.arpa → localhost
};

// 0.0.0.0 (this network — RFC 1122)
zone "0.in-addr.arpa" {
    type    master;
    file    "/etc/bind/db.0";
};

// 255.255.255.255 (broadcast — RFC 1122)
zone "255.in-addr.arpa" {
    type    master;
    file    "/etc/bind/db.255";
};

// RFC 1918 private address reverse zones
// (stops BIND from forwarding PTR queries for 10.x.x.x
//  and 192.168.x.x to the public internet)
include "/etc/bind/zones.rfc1918";
info
Why the root hints zone matters for you as an ISP

The zone "." hint zone is what makes full recursion possible. BIND uses these 13 root server addresses to start walking the DNS tree for any query. If your box cannot reach the root servers (strict firewall), you must forward instead. Your transit routers at Raxio DC should have no such restrictions — full recursion from roots is the correct operating mode for your core resolvers.


Chapter 05 · The Data

Zone Files — db.*

Zone files are the actual DNS records. Every zone declaration in named.conf.local points to one of these. Understanding the format completely is what separates an ISP engineer from a DNS operator.

db.local — Localhost Forward Zone

/etc/bind/db.localZone file
; BIND data file for local loopback interface
;
$TTL    604800   ; Default TTL: 1 week (in seconds)
; SOA record — Start of Authority. MANDATORY first record.
@       IN      SOA     localhost. root.localhost. (
                            2         ; Serial — must increment on every change
                        604800    ; Refresh — how often secondary checks for updates
                         86400    ; Retry   — how often secondary retries if refresh fails
                       2419200   ; Expire  — secondary stops answering if no contact for this long
                        604800 ) ; Negative Cache TTL — how long NXDOMAIN is cached
@       IN      NS      localhost.   ; Name server for this zone
@       IN      A       127.0.0.1    ; localhost → 127.0.0.1
@       IN      AAAA    ::1          ; localhost → ::1 (IPv6)

db.127 — Localhost Reverse Zone

/etc/bind/db.127Zone file
; BIND reverse data file for local loopback interface
$TTL    604800
@       IN      SOA     localhost. root.localhost. (
                            1         ; Serial
                        604800
                         86400
                       2419200
                        604800 )
@       IN      NS      localhost.
1       IN      PTR     localhost.   ; 127.0.0.1 → "localhost."
; Note: only the last octet is written here because the zone
; name "127.in-addr.arpa" already provides the context.

db.0, db.255, db.empty — Stub/Empty Zones

db.0 covers 0.in-addr.arpa, db.255 covers 255.in-addr.arpa, and db.empty is a template for RFC1918 zones that have no actual PTR records to serve. They are identical in structure — SOA + NS with no resource records below. Their purpose is to absorb queries that would otherwise leak to the public internet.

/etc/bind/db.empty (also used as template for db.0, db.255)Zone file
$TTL    86400
@       IN      SOA     localhost. root.localhost. (
                            1
                        604800
                         86400
                       2419200
                         86400 )
@       IN      NS      localhost.
; No PTR records — this zone intentionally empty.
; It exists to answer queries authoritatively with NXDOMAIN
; rather than forwarding to the public internet.

Building Your Own Zone File

A full production zone file for your ISP domain:

/etc/bind/zones/db.sprintug.comZone file
$TTL    3600           ; Default TTL: 1 hour
$ORIGIN sprintug.com.  ; All unqualified names are relative to this

; ── SOA ──────────────────────────────────────────────────────
@   IN  SOA     ns1.sprintug.com.  hostmaster.sprintug.com. (
                    2024031501    ; Serial: YYYYMMDDNN format
                        3600    ; Refresh: 1 hour
                         900    ; Retry:   15 min
                      604800    ; Expire:  1 week
                        300 )  ; Negative TTL: 5 min

; ── Name Servers ─────────────────────────────────────────────
@           IN  NS      ns1.sprintug.com.
@           IN  NS      ns2.sprintug.com.

; ── Glue Records (A records for the NS themselves) ───────────
ns1         IN  A       196.43.10.1     ; Primary NS
ns2         IN  A       196.43.10.2     ; Secondary NS

; ── Mail ─────────────────────────────────────────────────────
@           IN  MX  10  mail.sprintug.com.
@           IN  MX  20  mail2.sprintug.com.
mail        IN  A       196.43.10.10
mail2       IN  A       196.43.10.11

; ── SPF / DMARC ──────────────────────────────────────────────
@           IN  TXT     "v=spf1 mx ip4:196.43.10.0/24 -all"
_dmarc      IN  TXT     "v=DMARC1; p=quarantine; rua=mailto:[email protected]"

; ── Web ──────────────────────────────────────────────────────
@           IN  A       196.43.10.100   ; Root domain → web server
www         IN  CNAME   sprintug.com.   ; www → root
portal      IN  A       196.43.10.101   ; Customer portal
noc         IN  A       10.255.0.10     ; Internal only — NOC dashboard

; ── Infrastructure ───────────────────────────────────────────
raxio-mx1   IN  A       196.43.1.1      ; MX80 — Raxio DC
raxio-mx2   IN  A       196.43.1.2      ; MX204 — Raxio DC
airtel-bng  IN  A       196.43.2.1      ; BNG — Airtel House

; ── Radius / AAA ─────────────────────────────────────────────
radius1     IN  A       10.255.0.50
radius2     IN  A       10.255.0.51

; ── Zabbix ───────────────────────────────────────────────────
zabbix      IN  A       10.255.0.60

Reverse Zone — PTR Records

/etc/bind/zones/db.196.43.rev (reverse zone for 196.43.0.0/16)Zone file
$TTL    3600
; Zone: 43.196.in-addr.arpa
@   IN  SOA     ns1.sprintug.com.  hostmaster.sprintug.com. (
                    2024031501
                        3600
                         900
                      604800
                        300 )

@           IN  NS  ns1.sprintug.com.
@           IN  NS  ns2.sprintug.com.

; PTR records — only last two octets in 43.196.in-addr.arpa context
1.10        IN  PTR raxio-mx1.sprintug.com.    ; 196.43.10.1
2.10        IN  PTR raxio-mx2.sprintug.com.    ; 196.43.10.2
1.2         IN  PTR airtel-bng.sprintug.com.   ; 196.43.2.1
10.10       IN  PTR mail.sprintug.com.         ; 196.43.10.10
100.10      IN  PTR portal.sprintug.com.       ; 196.43.10.100

The SOA Serial — The Most Common Mistake

warning_amber
Always increment the serial

The SOA serial is how secondary nameservers know the zone has changed. Use the format YYYYMMDDNN (e.g. 2024031501 = 15 March 2024, change #1). If you forget to increment it, your secondaries will not pull the new zone data and you will chase a ghost for an hour. After editing a zone file, run named-checkzone sprintug.com /etc/bind/zones/db.sprintug.com before reloading.


Chapter 06 · The Control Channel

rndc.key

rndc (Remote Name Daemon Control) is the management plane for named. This key authenticates your rndc commands to the running daemon.

key
/etc/bind/rndc.key
HMAC-SHA256 shared secret for rndc ↔ named authentication

Contains a single HMAC-SHA256 (or SHA512) secret key. The controls { } stanza in named.conf.options references this key by name. The rndc client also loads this file to authenticate its commands.

/etc/bind/rndc.keyBIND config
key "rndc-key" {
    algorithm hmac-sha256;
    secret    "base64-encoded-secret-here==";
    // Generated by: rndc-confgen -a
    // File permissions must be: -rw-r----- bind:bind (640)
};

Generate a fresh key

bashshell
# Generate rndc.key automatically (writes to /etc/bind/rndc.key)
rndc-confgen -a -b 512

# Verify permissions
ls -la /etc/bind/rndc.key
# Should be: -rw-r----- root bind
chown root:bind /etc/bind/rndc.key
chmod 640 /etc/bind/rndc.key

Essential rndc Commands

CommandWhat it doesWhen you need it
rndc reloadReload all changed zone files without restartAfter editing any zone file
rndc reload sprintug.comReload one specific zone onlyFaster for single zone changes
rndc reconfigReload named.conf without restartAfter changing config files
rndc flushFlush the entire resolver cacheAfter DNS propagation testing
rndc flushname google.comFlush one name from cacheTesting specific domain changes
rndc statsWrite stats to statistics-filePerformance analysis
rndc statusShow daemon status, version, uptimeHealth checks
rndc dumpdb -cacheDump the resolver cache to dump-fileDebugging poisoning or stale data
rndc zonestatus sprintug.comZone SOA serial, file, loaded timeVerify zone loaded correctly

Chapter 07 · DNSSEC Root Anchor

bind.keys

The root of all DNSSEC trust. This file contains the Key Signing Keys (KSKs) for the DNS root zone, managed automatically by BIND's built-in RFC 5011 rolling anchor update mechanism.

lock
/etc/bind/bind.keys
DNSSEC root zone trust anchors — the cryptographic start of trust

When you set dnssec-validation auto;, BIND reads this file to establish the root of the DNSSEC chain of trust. It contains the public KSKs for the "." (root) zone, published by IANA. BIND manages these automatically — never edit this file by hand.

/etc/bind/bind.keys (abbreviated)BIND config
// This file is managed by BIND itself.
// Manual edits will be overwritten.
// Trust anchor for the DNS root zone — IANA-published KSKs.

trust-anchors {
    // KSK-2010 (retired) and KSK-2017 (current)
    "."    initial-key 257 3 8
        "AwEAAaz/tAm8yTn4Mfeh5eyI96WSVexTBAvkMgJzkKTO
         iW1vkIbzxeF3+/4RgWOq7HrxRixHlFlExOLAJr5emLvN
         7SWXgnLh4+B5xQlNVz8Og8kvArMtNROxVQuCaSnIDdD5L
         KyWbRd2n9WGe2R8PzgCmr3EgVLrjyBxWezF0jLHwVN8efS3tCt
         ...base64-continues...";
};
info
RFC 5011 Automatic Key Rollover

BIND watches for KSK rollovers from IANA and updates its trust anchors automatically. The last root KSK rollover was in 2018. With dnssec-validation auto; you are fully covered. The managed-keys database is stored in /var/cache/bind/managed-keys.bind — do not delete this file as it tracks the rollover state.


Chapter 08 · Private Space

zones.rfc1918

Stops your resolver leaking PTR queries for private address space to the public internet. Simple but critical for operational hygiene.

shield
/etc/bind/zones.rfc1918
Authoritative empty zones for all RFC 1918, 6598, APIPA, and special-use ranges

Contains zone declarations for all private address ranges. Each zone uses db.empty as its data file, meaning it answers authoritatively with NXDOMAIN for any PTR query in these ranges rather than forwarding the query externally.

/etc/bind/zones.rfc1918 (excerpt)BIND config
// RFC 1918 private address space
// These empty zones prevent PTR queries for private IPs
// from leaking to the public DNS infrastructure.

zone "10.in-addr.arpa"           { type master; file "/etc/bind/db.empty"; };
zone "16.172.in-addr.arpa"       { type master; file "/etc/bind/db.empty"; };
zone "17.172.in-addr.arpa"       { type master; file "/etc/bind/db.empty"; };
// ... 172.16.0.0/12 covered across 172.16–172.31
zone "168.192.in-addr.arpa"       { type master; file "/etc/bind/db.empty"; };

// RFC 6598 — Shared Address Space (CGNAT — 100.64.0.0/10)
zone "64.100.in-addr.arpa"       { type master; file "/etc/bind/db.empty"; };
// ... continues through 100.127.x.x

// APIPA — 169.254.0.0/16 (link-local)
zone "254.169.in-addr.arpa"       { type master; file "/etc/bind/db.empty"; };
warning_amber
Override rfc1918 zones for your own private ranges

If you declare your own PTR zone — like zone "0.255.10.in-addr.arpa" in named.conf.local — BIND will use your specific zone and ignore the catch-all zone "10.in-addr.arpa" from zones.rfc1918. More-specific zones always win. This is correct: you get real PTR records for your management range and empty-zone protection for everything else.


Chapter 09 · Small Scale

Small ISP Setup — ~100 QPS

A single Ubuntu server serving a small branch, corporate network, or startup ISP. One box, authoritative + recursive, solid fundamentals.

Small ISP Profile
~100 QPS · 500–2,000 subscribers

Hardware: 4 vCPU, 8GB RAM, 100GB SSD
Architecture: Single server, authoritative + recursive
Topology: One datacenter, one resolver IP
Cache size: 512MB
Threads: 4 (match CPUs)

Large ISP Profile
~5M QPS · 500,000+ subscribers

Hardware: 32+ physical cores, 128–512GB RAM
Architecture: Split auth/recursive, anycast cluster
Topology: Multi-PoP, load balanced
Cache size: 64GB+ per instance
Processes: Multiple named instances via views

Complete Small ISP — named.conf.options

/etc/bind/named.conf.options (small ISP — complete)BIND config
options {
    directory               "/var/cache/bind";
    dump-file               "/var/cache/bind/named_dump.db";
    statistics-file         "/var/log/named/named.stats";

    listen-on               { 127.0.0.1; 10.10.10.1; };
    listen-on-v6            { none; };

    // Performance — small server
    recursive-clients       1000;      // Max simultaneous recursive queries
    tcp-clients             150;       // Max simultaneous TCP connections

    // Cache sizing
    max-cache-size          512m;
    max-cache-ttl           3600;
    max-ncache-ttl          300;

    // Recursion: subscribers only
    recursion               yes;
    allow-query             { trusted_resolvers; };
    allow-recursion         { trusted_resolvers; };
    allow-query-cache       { trusted_resolvers; };
    allow-transfer          { none; };

    // Full recursion — no external forwarders
    dnssec-validation       auto;
    query-source            address * port *;

    // Security hardening
    version                 "not disclosed";
    hide-identity           yes;
    hide-version            yes;

    rate-limit {
        responses-per-second   10;
        slip                   2;
        window                 15;
        exempt-clients        { 127.0.0.1; 10.0.0.0/8; };
    };
};

controls {
    inet 127.0.0.1 port 953
        allow { 127.0.0.1; }
        keys { "rndc-key"; };
};

Operational workflow after any zone edit

bash — zone edit workflowshell
# 1. Edit the zone file, increment the serial
nano /etc/bind/zones/db.sprintug.com

# 2. Validate the zone syntax
named-checkzone sprintug.com /etc/bind/zones/db.sprintug.com

# 3. Validate all config files
named-checkconf

# 4. Reload just that zone (no service disruption)
rndc reload sprintug.com

# 5. Verify it loaded
rndc zonestatus sprintug.com

# 6. Test from a client
dig @10.10.10.1 sprintug.com A
dig @10.10.10.1 -x 196.43.10.1

Chapter 10 · National Scale

National ISP Setup — 5M QPS

This is not theory — operators at this scale exist in East Africa. The architecture changes fundamentally: you separate concerns, distribute load, and tune every knob in the engine room.

Architecture at Scale

high-availability DNS topologyconceptual
┌─────────────────────────────────────────────────────────┐
│                    ANYCAST RESOLVER CLUSTER              │
│  BGP-anycast 196.43.53.1/32 and 196.43.53.2/32          │
│                                                          │
│  ┌─────────────┐  ┌─────────────┐  ┌─────────────┐     │
│  │  resolver-1 │  │  resolver-2 │  │  resolver-3 │     │
│  │  Raxio DC   │  │  Airtel     │  │  Wingu TZ   │     │
│  │  named inst │  │  named inst │  │  named inst │     │
│  │  32GB cache │  │  32GB cache │  │  32GB cache │     │
│  └─────────────┘  └─────────────┘  └─────────────┘     │
└─────────────────────────────────────────────────────────┘
                         │ forwards queries for
                         │ your own zones only
                         ▼
┌─────────────────────────────────────────────────────────┐
│               AUTHORITATIVE NS CLUSTER                   │
│  ns1.sprintug.com  (primary)   196.43.10.1               │
│  ns2.sprintug.com  (secondary) 196.43.10.2               │
│  No recursion. No cache. Pure auth only.                 │
└─────────────────────────────────────────────────────────┘

named.conf.options — High-throughput Recursive Resolver

/etc/bind/named.conf.options (5M QPS resolver node)BIND config
options {
    directory                 "/var/cache/bind";
    dump-file                 "/var/cache/bind/named_dump.db";
    statistics-file           "/var/log/named/named.stats";

    listen-on                 { 196.43.53.1; 127.0.0.1; };
    listen-on-v6              { any; };

    // ── Threading ────────────────────────────────────────────
    // Match to physical CPU count on the resolver node.
    // Each task-cpus thread handles queries independently.
    task-cpus                 32;            // Adjust to your CPU count
    tcp-listen-queue          1024;
    clients-per-query         100;
    max-clients-per-query     1000;
    recursive-clients         500000;       // 500K simultaneous recursive
    tcp-clients               10000;

    // ── Cache — sized for the working set ────────────────────
    // At national scale the working set is ~2M unique names.
    // Each cache entry is ~200 bytes → 400MB minimum useful cache.
    // Larger cache = better hit rate = fewer upstream queries.
    max-cache-size            32g;          // 32GB on a 128GB node
    max-cache-ttl             3600;
    max-ncache-ttl            300;
    min-cache-ttl             30;           // Floor on low-TTL records

    // ── Recursion ─────────────────────────────────────────────
    recursion                 yes;
    allow-query               { trusted_resolvers; };
    allow-recursion           { trusted_resolvers; };
    allow-query-cache         { trusted_resolvers; };
    allow-transfer            { none; };

    // ── FULL RECURSION — no external forwarders ──────────────
    dnssec-validation         auto;

    // ── Source port randomisation ─────────────────────────────
    query-source              address * port *;
    use-queryport-pool        yes;
    queryport-pool-ports      8192;         // Large ephemeral port pool
    queryport-pool-updateinterval 10;

    // ── TCP performance ──────────────────────────────────────
    tcp-initial-timeout       300;          // 30 seconds in units of 0.1s
    tcp-idle-timeout          3000;
    tcp-keepalive-timeout     3000;
    tcp-advertised-timeout    3000;

    // ── Stale cache (serve-stale) ─────────────────────────────
    // If upstream unreachable, serve expired cache up to 1 day.
    // Massive resilience win for subscriber experience.
    stale-answer-enable       yes;
    stale-answer-ttl          30;
    max-stale-ttl             86400;        // Stale for up to 24 hours

    // ── Rate Limiting ─────────────────────────────────────────
    rate-limit {
        responses-per-second   50;
        referrals-per-second   20;
        nodata-per-second      20;
        nxdomains-per-second   20;
        errors-per-second      20;
        slip                   2;
        window                 15;
        min-table-size         10000;
        max-table-size         200000;
        exempt-clients        { 127.0.0.1; };
    };

    // ── Security ──────────────────────────────────────────────
    version                   "not disclosed";
    hide-identity             yes;
    hide-version              yes;
};

controls {
    inet 127.0.0.1 port 953
        allow { 127.0.0.1; 10.255.0.0/24; }
        keys { "rndc-key"; };
};

Authoritative Server — No Recursion

/etc/bind/named.conf.options (pure authoritative — no cache)BIND config
options {
    directory             "/var/cache/bind";

    listen-on             { 196.43.10.1; 127.0.0.1; };
    listen-on-v6          { any; };

    // ── CRITICAL: No recursion on authoritative servers ───────
    recursion             no;
    allow-query           { any; };     // Anyone can query our zones
    allow-recursion       { none; };    // Nobody gets recursive service
    allow-transfer        { 196.43.10.2; }; // ns2 only

    // ── DNSSEC signing (if signing your zones) ────────────────
    dnssec-validation     auto;

    // ── Cache — not needed, but set a floor to avoid 0-byte alloc
    max-cache-size        32m;

    version               "not disclosed";
    hide-identity         yes;
    hide-version          yes;

    rate-limit {
        responses-per-second 100;
        slip                 2;
    };
};

OS-Level Tuning for High QPS

/etc/sysctl.d/99-dns-resolver.confsysctl
# UDP receive/send buffer — critical for high-QPS DNS
net.core.rmem_max = 134217728
net.core.wmem_max = 134217728
net.core.rmem_default = 33554432
net.core.wmem_default = 33554432

# Increase max open files (each query = 1 socket fd)
fs.file-max = 2097152

# Ephemeral port range — more ports = better randomisation
net.ipv4.ip_local_port_range = 1024 65535

# Time-wait socket reuse
net.ipv4.tcp_tw_reuse = 1

# Increase backlog
net.core.somaxconn = 65535
net.ipv4.tcp_max_syn_backlog = 65535
/etc/systemd/system/named.service.d/limits.confsystemd
[Service]
LimitNOFILE=1048576    # Open files — 1 million
LimitNPROC=65536       # Max threads
LimitMEMLOCK=infinity  # Allow named to lock cache in RAM

Monitoring at Scale

Zabbix / monitoring queriesshell
# Query rate from named statistics (after rndc stats)
grep "queries resulted in" /var/log/named/named.stats

# Cache hit ratio — the key performance indicator
# (cache hits / total queries) × 100 → target >85% at scale

# Real-time QPS monitoring
watch -n1 'rndc stats && grep "query (cache)" /var/log/named/named.stats | tail -5'

# Named statistics over HTTP (enable in options)
# Add to named.conf.options:
# statistics-channels { inet 127.0.0.1 port 8080 allow { 127.0.0.1; }; };
curl http://127.0.0.1:8080/json/v1/server

Chapter 11 · Modern Infrastructure

Docker — BIND9 in Containers

Docker did not exist in 2010. By 2013 it had changed infrastructure permanently. Here is everything you need to understand it and run BIND9 inside it — from the mental model to a production-ready Compose stack.

The Docker Mental Model

In 2010 you ran services directly on a server. The problem: two services that need different library versions, or different OS configurations, cannot peacefully share one machine. You ended up with snowflake servers — each one configured slightly differently, impossible to reproduce reliably.

Docker solves this with containers. A container is an isolated process that carries its own filesystem, its own dependencies, and its own runtime environment. It shares the host's Linux kernel but sees nothing outside its own walls unless you explicitly allow it.

ConceptPhysical/VM world (2010)Docker world (now)
ServerPhysical box or VMware VMThe Docker host — any Ubuntu server
OS installFull Ubuntu/CentOS install per VMContainer image — a few MB of layered filesystem
Service configFiles directly on host at /etc/bind/Files on host, mounted into the container at /etc/bind/
Starting a servicesystemctl start nameddocker compose up -d
Upgrading BINDapt upgrade, hope nothing breaksChange image tag, docker compose pull, redeploy in seconds
Running two BIND versionsImpossible on one OSTrivial — each container is isolated
Reproducibility"Works on my server" — configuration driftdocker compose up and it is identical everywhere

Four Key Docker Concepts

inventory_2
Image
A read-only template — like a frozen OS snapshot with BIND9 pre-installed

An image is built from a Dockerfile. Someone has already built a BIND9 image and published it to Docker Hub. You pull it, you run it. You never install BIND9 on the host OS at all.

play_circle
Container
A running instance of an image — named is running inside here

The container is the live process. You can start it, stop it, restart it, destroy it and recreate it from the same image in seconds. Destroying a container does not delete your config — that lives in a volume on the host.

folder_open
Volume / Bind Mount
Your config files on the host, visible inside the container at a path you choose

This is the critical piece for BIND9. Your named.conf, zone files, and keys live on the host at a path like /opt/dns/config/. The container mounts this directory at /etc/bind/ — named sees it as if the files are native, but you edit them from the host normally.

queue_music
Docker Compose
A YAML file that declares your entire stack — the Infrastructure-as-Code replacement for running commands by hand

Instead of typing docker run -v ... -p ... --name ... every time, you write one docker-compose.yml file. docker compose up -d starts everything. docker compose down stops it. The file is your source of truth — commit it to Git.

Project Directory Layout

Before writing a single Docker file, establish the layout on the host. Your config lives here permanently — containers come and go, this stays.

host filesystem layoutshell
/opt/dns/
├── docker-compose.yml          # The orchestration file — your source of truth
├── config/                     # Mounted to /etc/bind/ inside the container
│   ├── named.conf
│   ├── named.conf.options
│   ├── named.conf.local
│   ├── named.conf.default-zones
│   ├── bind.keys
│   ├── rndc.key
│   ├── db.local
│   ├── db.127
│   ├── db.0
│   ├── db.255
│   ├── db.empty
│   ├── zones.rfc1918
│   └── zones/
│       ├── db.sprintug.com
│       └── db.196.43.rev
└── logs/                       # Mounted to /var/log/named/ inside the container
    ├── named.log
    └── query.log

# Create it:
mkdir -p /opt/dns/config/zones /opt/dns/logs
chown -R 101:101 /opt/dns/logs   # UID 101 = bind user inside the container

Install Docker on Ubuntu 24.04

bash — Docker installationshell
# Remove any old Docker packages
apt remove -y docker.io docker-doc docker-compose docker-compose-v2 \
              podman-docker containerd runc 2>/dev/null

# Add Docker's official repository
apt install -y ca-certificates curl
install -m 0755 -d /etc/apt/keyrings
curl -fsSL https://download.docker.com/linux/ubuntu/gpg \
     -o /etc/apt/keyrings/docker.asc
chmod a+r /etc/apt/keyrings/docker.asc

echo \
  "deb [arch=$(dpkg --print-architecture) \
  signed-by=/etc/apt/keyrings/docker.asc] \
  https://download.docker.com/linux/ubuntu \
  $(. /etc/os-release && echo "$VERSION_CODENAME") stable" | \
  tee /etc/apt/sources.list.d/docker.list > /dev/null

apt update
apt install -y docker-ce docker-ce-cli containerd.io \
               docker-buildx-plugin docker-compose-plugin

# Verify
docker version
docker compose version

Single Container — Quick Start

The fastest way to run BIND9 in Docker: one command, your config mounted from the host. This is the conceptual foundation before we move to Compose.

bash — single docker runshell
# Pull the official internetsystemsconsortium/bind9 image docker pull internetsystemsconsortium/bind9:9.18 # Run it — mount your config directory and expose port 53 docker run -d \ --name bind9 \ --restart unless-stopped \ -p 53:53/udp \ -p 53:53/tcp \ -p 127.0.0.1:953:953/tcp \ -v /opt/dns/config:/etc/bind:ro \ -v /opt/dns/logs:/var/log/named:rw \ internetsystemsconsortium/bind9:9.18 # Check it started cleanly docker logs bind9 # Use rndc from inside the container docker exec bind9 rndc status
info
What -v /opt/dns/config:/etc/bind:ro means

-v HOST_PATH:CONTAINER_PATH:OPTIONS — the colon separates three fields. /opt/dns/config is the directory on your server. /etc/bind is where the container will see it. ro means read-only — named can read the config but cannot write to it, which is correct. The logs volume is rw (read-write) because named needs to write log files there.

Production Docker Compose Stack

The docker run command above is fine for learning. In production you use Docker Compose — everything is declared in one YAML file, version controlled, and reproducible.

/opt/dns/docker-compose.yml — single resolver + authyaml
version: "3.9"

services:

  # ── Primary nameserver: authoritative + recursive ────────────
  bind9:
    image: internetsystemsconsortium/bind9:9.18
    container_name: sprintug-dns
    restart: unless-stopped

    # Port mapping: host:container
    # UDP and TCP both required for DNS
    ports:
      - "53:53/udp"
      - "53:53/tcp"
      - "127.0.0.1:953:953/tcp"    # rndc — localhost only on the host

    volumes:
      # Config files: read-only into /etc/bind
      - ./config:/etc/bind:ro
      # Log files: read-write
      - ./logs:/var/log/named:rw

    # Capabilities BIND9 needs
    # CAP_NET_BIND_SERVICE: bind to port 53 (below 1024)
    cap_add:
      - NET_BIND_SERVICE

    # Resource limits — prevent this container consuming all RAM
    # on a shared host
    deploy:
      resources:
        limits:
          memory: 1g       # Hard ceiling: 1GB RAM
          cpus: "2.0"      # Max 2 CPU cores
        reservations:
          memory: 256m     # Guarantee 256MB

    # Health check: ask named if it is alive every 30 seconds
    # dig @127.0.0.1 . SOA is a lightweight always-valid query
    healthcheck:
      test: ["CMD", "dig", "@127.0.0.1", ".", "SOA", "+time=3"]
      interval: 30s
      timeout: 10s
      retries: 3
      start_period: 15s

    # Log driver: send named stdout/stderr to journald on the host
    # so you can read them with: journalctl -u docker -f
    logging:
      driver: journald
      options:
        tag: "sprintug-dns"

    # Timezone — important for log timestamps
    environment:
      - TZ=Africa/Kampala

Split Architecture Compose Stack — Auth + Recursive

This is the production-grade split: one container answers your own zones (authoritative, no cache, port 5353 internally), another does recursive resolution for subscribers (resolver, forwards to auth for your own zones).

/opt/dns/docker-compose.yml — split auth + resolveryaml
version: "3.9"

# Internal network — containers talk to each other on this
# without being exposed to the host network
networks:
  dns-internal:
    driver: bridge
    ipam:
      config:
        - subnet: 172.20.53.0/24

services:

  # ── 1. Authoritative nameserver ──────────────────────────────
  # Serves your zones only. No recursion. No cache.
  # NOT exposed directly on port 53 — only the resolver talks to it.
  bind9-auth:
    image: internetsystemsconsortium/bind9:9.18
    container_name: sprintug-auth
    restart: unless-stopped
    networks:
      dns-internal:
        ipv4_address: 172.20.53.10    # Fixed IP on internal network
    ports:
      - "127.0.0.1:5353:53/udp"      # Only accessible from host loopback
      - "127.0.0.1:5353:53/tcp"      # for rndc and debug — not public
    volumes:
      - ./config/auth:/etc/bind:ro   # Separate config dir for auth
      - ./logs/auth:/var/log/named:rw
    cap_add:
      - NET_BIND_SERVICE
    deploy:
      resources:
        limits:
          memory: 512m
          cpus: "2.0"
    healthcheck:
      test: ["CMD", "dig", "@127.0.0.1", "sprintug.com", "SOA", "+time=3"]
      interval: 30s
      timeout: 10s
      retries: 3
    environment:
      - TZ=Africa/Kampala

  # ── 2. Recursive resolver ────────────────────────────────────
  # Serves subscribers. Full recursion from root hints.
  # Forwards queries for internal zones to bind9-auth above.
  # This is the container whose port 53 is exposed publicly.
  bind9-resolver:
    image: internetsystemsconsortium/bind9:9.18
    container_name: sprintug-resolver
    restart: unless-stopped
    depends_on:
      bind9-auth:
        condition: service_healthy   # Wait for auth to be healthy first
    networks:
      dns-internal:
        ipv4_address: 172.20.53.20
    ports:
      - "53:53/udp"                  # PUBLIC — subscriber-facing
      - "53:53/tcp"
      - "127.0.0.1:953:953/tcp"      # rndc control — host only
    volumes:
      - ./config/resolver:/etc/bind:ro
      - ./logs/resolver:/var/log/named:rw
    cap_add:
      - NET_BIND_SERVICE
    deploy:
      resources:
        limits:
          memory: 2g
          cpus: "4.0"
        reservations:
          memory: 512m
    healthcheck:
      test: ["CMD", "dig", "@127.0.0.1", "google.com", "A", "+time=3"]
      interval: 30s
      timeout: 10s
      retries: 3
    environment:
      - TZ=Africa/Kampala

Resolver config that forwards your zones to the auth container

/opt/dns/config/resolver/named.conf.localBIND config
// In the resolver container, forward queries for your own domains
// to the auth container at its fixed IP on the internal network.
// The resolver handles everything else via full recursion.

zone "sprintug.com" {
    type    forward;
    forward only;
    forwarders { 172.20.53.10; };  // bind9-auth container IP
};

zone "sprinttz.co.tz" {
    type    forward;
    forward only;
    forwarders { 172.20.53.10; };
};

zone "43.196.in-addr.arpa" {
    type    forward;
    forward only;
    forwarders { 172.20.53.10; };
};

Day-to-Day Operations with Docker

docker operations cheat sheetshell
# ── Start / Stop ────────────────────────────────────────────
cd /opt/dns
docker compose up -d                  # Start all containers in background
docker compose down                   # Stop and remove containers (config safe)
docker compose restart bind9-resolver # Restart one container

# ── After editing a zone file ───────────────────────────────
# 1. Validate on the host (named-checkzone must be installed)
named-checkzone sprintug.com /opt/dns/config/zones/db.sprintug.com

# 2. Reload named inside the container without restart
docker exec sprintug-auth rndc reload sprintug.com

# 3. Verify it loaded
docker exec sprintug-auth rndc zonestatus sprintug.com

# ── After editing named.conf.options ───────────────────────
docker exec sprintug-auth rndc reconfig

# ── View live logs ──────────────────────────────────────────
docker compose logs -f bind9-auth
docker compose logs -f bind9-resolver
# Or from journald if you used that log driver:
journalctl -t sprintug-dns -f

# ── Get a shell inside the container (for debugging) ───────
docker exec -it sprintug-auth /bin/bash
# From inside you can run: dig, rndc, cat /etc/bind/named.conf

# ── Check resource usage ────────────────────────────────────
docker stats sprintug-auth sprintug-resolver

# ── Upgrade BIND9 ───────────────────────────────────────────
# Change image tag in docker-compose.yml, then:
docker compose pull
docker compose up -d
# Containers are recreated with new image. Config untouched.

# ── Test from outside the container ────────────────────────
dig @127.0.0.1 sprintug.com SOA
dig @YOUR-HOST-IP google.com A

Dockerfile — Build Your Own Image

The official internetsystemsconsortium/bind9 image is clean and sufficient. But if you need to pre-bake your config into an image (for deployment pipelines, or to ship a ready-to-run resolver to a remote PoP), you write your own Dockerfile:

/opt/dns/Dockerfiledockerfile
# Start from the official BIND9 image
FROM internetsystemsconsortium/bind9:9.18

# Add dnsutils so rndc and dig are available inside the container
RUN apt-get update && apt-get install -y --no-install-recommends \
    dnsutils \
    && rm -rf /var/lib/apt/lists/*

# Copy your config into the image
# Use this pattern when deploying to remote PoPs where
# you cannot guarantee the config directory is present.
COPY config/ /etc/bind/

# Create log directory with correct ownership
RUN mkdir -p /var/log/named && chown bind:bind /var/log/named

# named runs as UID 101 (bind) inside the container
USER bind

# named is the entrypoint — this is what runs when the container starts
ENTRYPOINT ["/usr/sbin/named"]
CMD ["-g", "-u", "bind"]
# -g = run in foreground (required for Docker)
# -u bind = run as bind user
bash — build and run your custom imageshell
cd /opt/dns docker build -t sprintug/bind9:1.0 . docker push sprintug/bind9:1.0 # Push to your private registry # Deploy on any remote PoP: docker run -d --name dns \ -p 53:53/udp -p 53:53/tcp \ sprintug/bind9:1.0
check_circle
What this unlocks for Sprint Group

You have five Points of Presence: Raxio DC, Airtel House, Wingu Mbezi, Derm Complex, and Redstone HQ. With a Docker-based BIND9 setup, deploying a resolver to a new PoP is docker compose up -d. Upgrading BIND9 across all five PoPs is changing one line in a YAML file and pushing. No more hand-configuring each server differently. The config lives in Git; the server is disposable.

Checking That Port 53 is Not Already Occupied

Ubuntu 24.04 ships with systemd-resolved listening on port 53 by default. This will conflict with your BIND9 container. Fix it before you start:

bash — disable systemd-resolved on port 53shell
# Check what is using port 53 ss -tulnp | grep :53 # Disable systemd-resolved's stub listener mkdir -p /etc/systemd/resolved.conf.d/ cat > /etc/systemd/resolved.conf.d/nostub.conf << 'EOF' [Resolve] DNSStubListener=no EOF systemctl restart systemd-resolved # Point the host's own resolver to your BIND container # (so the host itself can still resolve DNS) ln -sf /run/systemd/resolve/resolv.conf /etc/resolv.conf # Now port 53 is free for Docker to bind to ss -tulnp | grep :53 # Should be empty # Start your stack docker compose up -d

Chapter 12 · Defence

Defending Your Turf

DNS-specific attack vectors, what each one costs you operationally, and the exact BIND9 configuration that closes each one. Nothing outside DNS. The attack surface is narrower than the certification courses imply — and completely addressable.

What the Expensive Courses Get Wrong

CEH — Certified Ethical Hacker — covers 20 domains of security in one qualification. Social engineering, physical access, web applications, wireless networks, mobile devices. Broad by design, because it is sold to a general audience. For an ISP operator, most of it is noise.

DNS has its own contained attack surface. The threats against your nameservers and resolvers are well-documented, finite in number, and each has a known, implementable countermeasure inside BIND9 itself. You do not need a 40-hour survey course. You need to understand six DNS-specific attack classes and the config that neutralises each one. That is this chapter.

info
The defender's position on DNS

Every attack in this chapter exploits either a default BIND9 behaviour that should have been changed, or a zone management habit that should have been followed. None require exotic countermeasures. You are not racing against a sophisticated zero-day. You are implementing a known checklist against a known catalogue. That is a winnable position.

What Your DNS Exposes Before You Harden Anything

Run these against your own nameserver before applying any defences. This is what the internet sees today.

reconnaissance — three queries, three exposure classesshell
# Ask BIND what version it is running
dig version.bind chaos txt @ns1.sprintug.com
# Without hardening returns: "9.18.12-1ubuntu1"
# Attacker now knows exactly which CVEs apply to your build.

# Request a full zone transfer
dig axfr sprintug.com @ns1.sprintug.com
# Without hardening: every A, MX, CNAME, TXT record in your zone.
# noc.sprintug.com, radius1.sprintug.com, zabbix.sprintug.com —
# a complete labelled map of your infrastructure in one command.

# Test whether your resolver answers the entire internet
dig @ns1.sprintug.com google.com A
# If you get an answer: your resolver is open.
# You are available as a DDoS amplification platform.

Three commands. Three classes of exposure. All three are closed by configuration changes covered in this chapter.

The Six DNS Attack Vectors

leak_add
1 — Open Resolver Abuse (DNS Amplification)
Your resolver becomes a weapon aimed at a third party — your transit bill is the first sign

What it is. A DNS response is larger than the query that triggers it. A 40-byte query for a DNSKEY or TXT record can return a 3,000-byte response — a 75x amplification ratio. An attacker spoofs the source address in their query to be their victim's IP. Your resolver sends 3,000 bytes to the victim for every 40-byte packet the attacker sends you. At volume, your infrastructure floods someone else's network using your transit capacity and your IP reputation. You are the weapon and you did not know it.

What it costs you. Transit bill spikes with no corresponding subscriber activity. Your upstream provider raises an abuse complaint or null-routes your resolver prefix. In a worst case, Savannah or Liquid temporarily filters your prefix while investigating.

named.conf.options — close the open resolverBIND config
// If allow-recursion is absent or set to { any; } you are an open resolver.
allow-recursion   { trusted_resolvers; };
allow-query       { trusted_resolvers; };
allow-query-cache { trusted_resolvers; };

// Response Rate Limiting: the slip value sends every 2nd over-rate reply
// as a truncated TC response. TC forces a TCP retry — amplification attacks
// cannot use TCP because a real handshake cannot be faked with a spoofed IP.
rate-limit {
    responses-per-second  10;
    slip                  2;
    exempt-clients       { 127.0.0.1; 10.0.0.0/8; };
};
verify — must return REFUSED from outside your ACLshell
# Run from an IP outside your subscriber ranges
dig @YOUR-RESOLVER-IP google.com A +time=3
# status: REFUSED = correctly closed.
# Any actual answer = still open. Fix it before anything else.
cached
2 — Cache Poisoning
Attacker plants a false record in your resolver cache — subscribers get the wrong IP for a real domain

What it is. Your resolver sends an outbound query and waits for a response. An attacker races to send a forged reply first, guessing the 16-bit transaction ID. If they win, their fake answer is cached. Every subscriber asking for that name now gets directed to an IP the attacker controls. Demonstrated as practically exploitable in 2008 by Dan Kaminsky — every resolver of that era was vulnerable within minutes.

What it costs you. Subscribers querying a banking domain reach a phishing page. Traffic for a business service goes to a dead end or a malware server. Your resolver logs show a normal-looking response — just from the wrong source. Silent, and the damage extends to every subscriber until the poisoned TTL expires.

named.conf.options — three layers of poisoning defenceBIND config
// Layer 1: Source port randomisation.
// Forces attacker to guess transaction ID (16 bits) AND source port (16 bits).
// 2^32 combinations instead of 2^16 — years instead of minutes.
query-source          address * port *;
use-queryport-pool    yes;
queryport-pool-ports  8192;

// Layer 2: DNSSEC validation.
// Signed responses carry a cryptographic signature that a forged
// reply cannot reproduce. Poisoning is mathematically impossible
// against any zone whose owner has signed it.
dnssec-validation     auto;

// Layer 3: Separate your authoritative server from your resolver.
// An authoritative-only server has no cache. Nothing to poison.

The limit of DNSSEC. Validation only protects against poisoning for zones that have been signed by their owners. Layers 1 and 3 remain relevant regardless.

download
3 — Unauthorised Zone Transfer (AXFR)
One query hands your entire zone — every hostname, every IP, every internal service — to anyone who asks

What it is. Zone transfers are how a primary replicates data to its secondaries — a legitimate, necessary mechanism. Without access restrictions, any host on the internet can request one. A single AXFR query returns the complete contents of your zone file. This is not an attack by itself. It is reconnaissance that makes every subsequent attack more precise. Hostnames like radius1, zabbix, noc tell an attacker exactly which services you run on which IPs — without them having to scan a single port.

named.conf — lock transfers, then authenticate themBIND config
// Step 1: Global default — deny all transfers
allow-transfer { none; };    // in options { }

// Step 2: Permit your secondary by IP in each zone
zone "sprintug.com" {
    type           master;
    file           "/etc/bind/zones/db.sprintug.com";
    allow-transfer { 196.43.10.2; };
};

// Step 3 (stronger): TSIG-authenticated transfers.
// Even if an attacker spoofs your secondary's IP, they cannot
// forge the HMAC-SHA256 signature. Generate the key:
# tsig-keygen transfer-key-sprintug

key "transfer-key-sprintug" {
    algorithm hmac-sha256;
    secret    "paste-generated-secret-here==";
};

zone "sprintug.com" {
    type           master;
    file           "/etc/bind/zones/db.sprintug.com";
    allow-transfer { key "transfer-key-sprintug"; };
};
// Copy the same key block to your secondary's named.conf
water
4 — Resource Exhaustion (NXDOMAIN Flood / Phantom Domains)
Resolver is flooded with queries it cannot answer from cache until real subscriber queries time out

NXDOMAIN flood. Every query for a non-existent name is a guaranteed cache miss — the random string a1b2c3x.google.com will never be in cache. Each miss forces your resolver to walk the DNS tree outbound. A stream of randomised queries saturates your outbound query pipeline. Legitimate subscriber queries start timing out.

Phantom domain attack. The attacker sets up authoritative servers that never respond. Your resolver sends a query, waits for the timeout, holds that slot open the entire time. Fill enough slots and the resolver stalls for legitimate traffic while waiting on servers that will never answer.

named.conf.options — exhaustion defencesBIND config
// Cap in-flight recursive queries. When the ceiling is reached,
// new queries get SERVFAIL rather than queuing indefinitely.
recursive-clients      10000;

// Limit queries per source IP — legitimate resolvers send tens,
// flood sources send thousands.
clients-per-query      10;
max-clients-per-query  100;

// Phantom domain defence: stop waiting on unresponsive upstreams.
resolver-query-timeout 10000;    // 10 seconds maximum, in milliseconds

// Serve stale cache during a flood — subscribers querying
// already-cached names still get answers even if the outbound
// pipeline is saturated.
stale-answer-enable    yes;
stale-answer-ttl       30;
max-stale-ttl          86400;

When BIND-level config is not enough. A flood large enough to saturate your transit link before it reaches the server cannot be stopped by BIND alone. That conversation happens at the BGP layer with Savannah, MTN wholesale, or Liquid — Remote Triggered Black Hole filtering drops attack traffic at the transit edge. That is a BGP topic for a later debone, but it is the correct escalation path when rate limiting alone is insufficient.

alt_route
5 — Subdomain Takeover (Dangling DNS)
A CNAME points to a cloud resource you no longer control — attacker claims it, serves content under your domain name

What it is. You set app.sprintug.com CNAME sprintug.azurewebsites.net for a cloud trial. You decommission the Azure resource. The CNAME stays in your zone. That Azure hostname is now unclaimed — an attacker registers it. They now control what app.sprintug.com resolves to. They serve content under your domain to your subscribers without touching your nameserver at all. The countermeasure is not a BIND config setting — it is zone hygiene. Every CNAME pointing to an external provider must be reviewed when that resource is decommissioned.

audit for dangling CNAMEs — run monthlyshell
# A CNAME target that returns NXDOMAIN is a live takeover risk while IFS= read -r target; do result=$(dig "$target" A +short +time=3) [ -z "$result" ] && echo "DANGLING CNAME TARGET: $target" done < <(grep -i CNAME /etc/bind/zones/db.sprintug.com | awk '{print $NF}') # Any output from this script: delete that CNAME and reload immediately
badge
6 — Version and Identity Disclosure
BIND answers questions about itself by default — the answers narrow an attacker's targeting

What it is. BIND responds to a special CHAOS class DNS query with its exact version string and hostname. Neither is useful to a legitimate DNS client. Both are useful to an attacker: knowing you run 9.18.12-1ubuntu1 takes them directly to the CVE database for that build. Two lines of config eliminate this signal entirely.

named.conf.options — suppress all identity informationBIND config
version "not disclosed"; hostname "not disclosed"; hide-version yes; hide-identity yes; # Verify: # dig version.bind chaos txt @YOUR-SERVER # Should return: "not disclosed"

Complete Hardening Block and Verification

named.conf.options — full security hardening (consolidated)BIND config
// Identity
version               "not disclosed";
hostname              "not disclosed";
hide-version          yes;
hide-identity         yes;

// Access control — closes open resolver
recursion             yes;
allow-recursion       { trusted_resolvers; };
allow-query           { trusted_resolvers; };
allow-query-cache     { trusted_resolvers; };
allow-transfer        { none; };

// Cache poisoning
dnssec-validation     auto;
query-source          address * port *;
use-queryport-pool    yes;
queryport-pool-ports  8192;

// Amplification
rate-limit {
    responses-per-second  10;
    referrals-per-second  5;
    nodata-per-second     5;
    nxdomains-per-second  5;
    slip                  2;
    exempt-clients       { 127.0.0.1; 10.0.0.0/8; };
};

// Resource exhaustion
recursive-clients      10000;
clients-per-query      10;
max-clients-per-query  100;
resolver-query-timeout 10000;

// Resilience during attacks
stale-answer-enable    yes;
stale-answer-ttl       30;
max-stale-ttl          86400;
post-hardening verification — all six checksshell
# 1. Version hidden
dig version.bind chaos txt @YOUR-SERVER
# Expect: "not disclosed"

# 2. Resolver closed
dig @YOUR-SERVER google.com A       # from outside your ACL
# Expect: status: REFUSED

# 3. Zone transfer blocked
dig axfr sprintug.com @YOUR-SERVER
# Expect: Transfer failed or connection refused

# 4. DNSSEC validation active
dig @YOUR-SERVER google.com A +dnssec
# Expect: flags include "ad" (authenticated data)

# 5. Source port randomisation active
tcpdump -i eth0 -nn 'udp and dst port 53' -c 20
# Expect: source ports vary — not fixed to one value

# 6. Dangling CNAMEs
while IFS= read -r t; do
    [ -z "$(dig "$t" A +short +time=3)" ] && echo "DANGLING: $t"
done < <(grep -i CNAME /etc/bind/zones/db.sprintug.com | awk '{print $NF}')
# Expect: no output
flight
Defence as discipline, not paranoia

Every countermeasure in this chapter is a configuration line, not a vendor product or a certification. The DNS attack surface is finite and well-catalogued. You close the open resolver, restrict transfers, randomise ports, validate DNSSEC, cap rates, audit your CNAMEs. After that, your DNS infrastructure is harder to exploit than most enterprise networks running commercial appliances. The expensive courses teach you to fear a vast threat landscape. For DNS, the landscape is narrow and the defences are already in the tool you are running.


Chapter 13 · Cryptographic Trust

DNSSEC — Signing the Internet

DNS was designed in 1983 with no concept of authentication. Any server could lie about any record and the protocol had no way to detect it. DNSSEC — DNS Security Extensions — adds a cryptographic layer of proof. This chapter builds that understanding from the mathematical foundation up to the operational BIND9 commands.

The Problem: DNS Has No Memory of Who Told It What

When your resolver receives an answer to a query, the DNS protocol as originally designed gives it no way to verify whether that answer came from the legitimate authoritative server or from an attacker who intercepted the response. The answer arrives. The resolver trusts it. It has no choice — there is no signature to check, no certificate to validate, no chain of custody.

This is not a theoretical concern. The Kaminsky attack demonstrated in 2008 that cache poisoning — planting false records in a resolver — was practical and fast against every resolver running at the time. A resolver that had cached a poisoned record for yourbank.com would direct every subscriber to the attacker's server until the TTL expired. Millions of users. Invisible to everyone except the attacker.

The underlying problem is not a BIND9 bug. It is structural. DNS was built for a cooperative internet of trusted parties. It has no mechanism to prove that an answer is genuine. DNSSEC adds that mechanism.

info
Two distinct DNSSEC roles for an ISP

As an ISP you wear two hats. As a resolver operator, you validate other people's signed zones on behalf of your subscribers — this is already done with dnssec-validation auto; in named.conf.options and requires no additional work. As a zone owner, you sign your own zones so the rest of the internet can verify your records. Both are covered in this chapter. They are independent activities that happen to share the same cryptographic infrastructure.

First Principles: What a Digital Signature Is

Before DNSSEC makes sense, public key cryptography must make sense. The concept is straightforward and worth spending five minutes on before moving to DNS-specific detail.

A public key pair consists of two mathematically linked keys. What one key encrypts, only the other can decrypt. You keep one key private — it never leaves your control. You publish the other key openly — anyone can have it.

A digital signature works like this:

digital signature — conceptualconceptual
SIGNING (done by the zone owner, on the authoritative server):

  1. Take the DNS record you want to sign
     e.g.  sprintug.com.  A  196.43.10.100

  2. Run it through a hash function (SHA-256)
     Produces a fixed-length fingerprint of the data:
     e.g.  8f14e45fceea167a5a36dedd4bea2543...

  3. Encrypt that fingerprint with your PRIVATE key
     The result is the signature.

  4. Publish the signature alongside the record in your zone.
     Anyone who has your PUBLIC key can verify it.

VERIFICATION (done by the resolver, on behalf of your subscriber):

  1. Resolver receives the DNS record AND its signature.

  2. Decrypts the signature using the PUBLIC key (published in DNS).
     Recovers the original fingerprint.

  3. Independently hashes the received record.
     Gets its own fingerprint.

  4. Compares the two fingerprints.
     If they match: the record is genuine. Nobody tampered with it.
     If they differ: the record was altered. Resolver discards it → SERVFAIL.

What an attacker cannot do:
  They cannot produce a valid signature for a forged record
  without possessing the private key.
  The private key never leaves your server.
  The forgery is mathematically detectable.

That is the entire cryptographic concept underlying DNSSEC. Everything else is the operational machinery for managing keys, distributing public keys through the DNS hierarchy, and handling key rotation over time.

The Four New Record Types DNSSEC Adds

DNSSEC does not replace any existing DNS records. It adds four new record types that carry the cryptographic machinery alongside your existing data.

Record TypeFull NameWhat It ContainsPurpose
RRSIGResource Record SignatureThe cryptographic signature over a set of DNS recordsProves a set of records is genuine and unmodified
DNSKEYDNS Public KeyThe public half of your signing key pairPublished in your zone so anyone can verify your signatures
DSDelegation SignerA hash of your child zone's KSK public keyPublished in the parent zone — creates the chain of trust
NSEC / NSEC3Next Secure / Next Secure v3An authenticated pointer to the next record in the zoneProves a name does not exist without exposing the full zone

The Chain of Trust — How It Connects Root to Your Zone

DNSSEC's power comes from a continuous chain of cryptographic trust that runs from the DNS root zone all the way down to every individual record in a signed zone. Break the chain at any point and validation fails. Maintain it and every record in a signed zone is verifiable by anyone on the internet.

the chain of trust — sprintug.comconceptual
┌──────────────────────────────────────────────────────────────┐ │ ROOT ZONE "." │ │ │ │ Root KSK (Key Signing Key) — the trust anchor │ │ This key is published in /etc/bind/bind.keys on your server │ │ and hardcoded into every DNSSEC-aware resolver on earth. │ │ It signs the Root ZSK. │ │ │ │ Root ZSK (Zone Signing Key) │ │ Signs all records IN the root zone, including the DS │ │ records that delegate trust to TLD zones. │ └────────────────────┬─────────────────────────────────────────┘ │ DS record for .com │ "I vouch for this .com key" ▼ ┌──────────────────────────────────────────────────────────────┐ │ .COM TLD ZONE │ │ │ │ .com KSK — signed by root ZSK, verified via root DS │ │ .com ZSK — signs all .com zone records │ │ Including the DS record for sprintug.com │ └────────────────────┬─────────────────────────────────────────┘ │ DS record for sprintug.com │ "I vouch for this sprintug.com key" ▼ ┌──────────────────────────────────────────────────────────────┐ │ sprintug.com (YOUR ZONE) │ │ │ │ Your KSK — its hash is published as a DS record at .com │ │ Your ZSK — signs all records in sprintug.com │ │ RRSIG records — one signature per record set │ └────────────────────┬─────────────────────────────────────────┘ │ ▼ sprintug.com. A 196.43.10.100 + RRSIG proving this record was signed by your ZSK + DNSKEY publishing your ZSK public key = verifiable by anyone from root down

The chain is only as strong as its weakest link. If your DS record is not published at the parent — the .com TLD in this case — the chain is broken and resolvers will either treat your zone as unsigned or return SERVFAIL depending on their configuration. Submitting the DS record to your registrar is the step that connects your zone to the global chain.

The Two Keys: ZSK and KSK — Why Two?

Every DNSSEC-signed zone uses two key pairs, not one. This is a deliberate design decision with operational consequences you need to understand.

ZSK — Zone Signing Key
rotated frequently · smaller key · high volume

Signs every individual resource record set in your zone. Because it touches every record, it is used constantly and generates significant cryptographic work. Uses a smaller key size (1024–2048 bit RSA or 256-bit ECDSA P-256) for performance. Rotated every 30–90 days. When you rotate it, only your zone is affected — no coordination with the parent registrar needed.

KSK — Key Signing Key
rotated rarely · larger key · low volume

Signs only the DNSKEY record set — specifically, it vouches for the ZSK. Its hash is published as a DS record in the parent zone. Uses a larger key size (2048–4096 bit RSA or 384-bit ECDSA P-384). Rotated annually or less. When you rotate it, you must update the DS record at your registrar — a coordination step that takes time and must be done carefully to avoid breaking the chain.

The split exists because it separates two concerns: the high-frequency, high-volume signing work (ZSK, rotated often, small and fast), from the trust anchor that the parent zone vouches for (KSK, rarely rotated, large and expensive but used infrequently). If there were only one key, every rotation would require a registrar update. With the split, routine ZSK rotations are entirely self-contained.

NSEC and NSEC3 — Authenticated Denial of Existence

Signing records that exist is straightforward. But what do you return when someone queries a name that does not exist? Without DNSSEC you return NXDOMAIN and the resolver trusts it. With DNSSEC, that NXDOMAIN must also be provable — an attacker should not be able to forge a denial for a name that does exist.

NSEC solves this by creating a signed linked list of every name in your zone, in alphabetical order. Each NSEC record points to the next name and lists which record types exist at the current name. If a query falls between two NSEC records alphabetically, the resolver can prove — cryptographically — that nothing exists in that gap.

NSEC records — what they look likezone file
; Zone contains: sprintug.com, mail.sprintug.com, www.sprintug.com
; NSEC chain (alphabetical order, loops at end):

sprintug.com.      NSEC  mail.sprintug.com.  A NS MX SOA RRSIG NSEC DNSKEY
; "After sprintug.com the next name is mail.sprintug.com"
; "sprintug.com has these record types: A NS MX SOA RRSIG NSEC DNSKEY"

mail.sprintug.com. NSEC  www.sprintug.com.   A RRSIG NSEC
; "After mail the next name is www"
; If someone queries noc.sprintug.com, it falls between mail and www.
; The resolver receives this NSEC record, verifies its signature,
; and proves noc does not exist — without you ever having to say so directly.

www.sprintug.com.  NSEC  sprintug.com.       A RRSIG NSEC
; Chain loops back to the start (last name → first name)

The NSEC problem: zone walking. NSEC's linked list structure means an attacker can enumerate your entire zone by following the chain from name to name. This is called zone walking — essentially a free AXFR for zones that blocked zone transfers but forgot about NSEC.

NSEC3 solves zone walking by hashing the names before putting them in the chain. Instead of mail.sprintug.com in the chain, NSEC3 stores 2T7B4G4... — the salted SHA-1 hash of mail. The chain is still traversable but the names are not recoverable from the hashes. NSEC3 is the correct choice for any zone where the hostnames themselves carry operational information you want to keep private.

NSEC vs NSEC3 — the tradeoffconceptual
NSEC: + Simpler to implement and debug + Slightly lower CPU overhead − Zone walking possible: attacker can enumerate all hostnames NSEC3: + Hostnames are hashed — zone walking reveals hashes, not names + Correct choice for production zones − Slightly higher CPU overhead on the authoritative server − Hash parameter (iterations, salt) must be chosen carefully Too many hash iterations → CPU DoS vector (NSEC3 White Lies attack) BIND default: iterations=0, random salt → correct and safe

Algorithm Choice

DNSSEC supports multiple cryptographic algorithms. The choice matters for security, performance, and compatibility.

AlgorithmTypeKey SizeVerdict
RSASHA1 (alg 5)RSA1024–4096 bitDeprecated. SHA-1 is broken. Do not use.
RSASHA256 (alg 8)RSA2048–4096 bitWidely supported. Correct choice if you need maximum compatibility with old resolvers.
RSASHA512 (alg 10)RSA2048–4096 bitStronger hash than alg 8. Larger signatures — not worth it over alg 8.
ECDSAP256SHA256 (alg 13)ECDSA256 bitRecommended. Shorter keys, shorter signatures, faster validation, equivalent security to RSA-3072. BIND default since 9.16.
ECDSAP384SHA384 (alg 14)ECDSA384 bitHigher security margin. Reasonable choice for KSKs where key size matters less than ZSKs.
ED25519 (alg 15)EdDSA256 bitFastest, smallest, most modern. Not yet universally supported by all resolvers. Suitable once support is confirmed in your ecosystem.

For new zones in 2024 and beyond: ECDSAP256SHA256 (algorithm 13) is the correct default. It is what BIND's built-in dnssec-policy uses. Small keys, fast verification, broadly supported.

Signing Your Zone — The Modern Way: dnssec-policy

Before BIND 9.16, signing a zone required manual key generation with dnssec-keygen, manually including key files, configuring automated signing, and writing your own key rollover procedures. It was correct but operationally heavy.

From BIND 9.16 onward, dnssec-policy automates all of it. Key generation, signing, NSEC3 configuration, and key rollovers are all handled by BIND itself. You declare a policy, assign it to a zone, and BIND does the rest. This is the correct operational approach for any zone you manage today.

Step 1 — Define a policy in named.conf

named.conf — define your DNSSEC policyBIND config
// A dnssec-policy block defines the key parameters and rotation schedule.
// Define it once, apply it to any number of zones.

dnssec-policy "sprintug-policy" {

    // How long signatures (RRSIG records) are valid.
    // Resolver caches will serve signed responses for this long.
    // 14 days is standard — long enough to survive a weekend outage.
    signatures-validity          14d;

    // How far in advance to re-sign records before signatures expire.
    // 5 days gives you a comfortable window to detect and fix problems
    // before any signature actually expires.
    signatures-refresh           5d;

    // Key parameters
    keys {
        // KSK: larger key, rotated annually (365 days)
        ksk lifetime 365d algorithm ecdsap256sha256;

        // ZSK: smaller key (same algorithm), rotated every 90 days
        zsk lifetime 90d  algorithm ecdsap256sha256;
    };

    // Use NSEC3 instead of NSEC — prevents zone walking
    nsec3param iterations 0 optout no salt-length 8;
    // iterations 0 = one hash round — NIST-recommended, avoids CPU DoS
    // optout no  = sign every record including empty non-terminals
    // salt-length 8 = 8 bytes of random salt per zone

    // DNSKEY TTL — how long resolvers cache your public keys
    dnskey-ttl                   3600;

    // DS TTL — this should match what your registrar publishes
    ds-ttl                       3600;

    // Zone propagation delay — time for changes to reach all secondaries.
    // BIND waits this long during key rollovers to ensure new keys
    // are propagated before retiring old ones.
    zone-propagation-delay       300;

    // Maximum zone TTL — the longest TTL in your zone.
    // BIND needs this to know how long old signatures must remain valid
    // during rollovers (cached answers may live this long).
    max-zone-ttl                 3600;
};

// BIND also ships a built-in "default" policy which is a sensible baseline:
//   dnssec-policy "default";
// It uses ECDSAP256SHA256, 1 year KSK, 1 year ZSK, NSEC3.
// Using your own named policy gives you explicit control over rotation timing.

Step 2 — Apply the policy to your zone in named.conf.local

named.conf.local — enable signing on your zoneBIND config
zone "sprintug.com" {
    type           master;
    file           "/etc/bind/zones/db.sprintug.com";

    // Attach the DNSSEC policy — this is the only change needed.
    // BIND will generate keys, sign the zone, and manage rollovers
    // automatically from this point forward.
    dnssec-policy  "sprintug-policy";

    // inline-signing is implied by dnssec-policy — BIND maintains
    // a signed copy of your zone internally without modifying
    // your zone file. You continue editing the unsigned source file normally.

    allow-transfer { key "transfer-key-sprintug"; };
    notify         yes;
};

Step 3 — Reload and watch BIND sign the zone

bash — activate signing and observeshell
# Reload configuration
rndc reconfig

# BIND will immediately:
# 1. Generate a KSK and ZSK for the zone
# 2. Sign every record set in the zone
# 3. Add DNSKEY, RRSIG, and NSEC3 records
# Watch the logs to confirm:
journalctl -u named -f | grep -i "dnssec\|signing\|keygen"

# Verify the zone is signed — look for RRSIG records
dig @127.0.0.1 sprintug.com DNSKEY +short
# Should return two DNSKEY records: one with flag 257 (KSK), one with 256 (ZSK)

dig @127.0.0.1 sprintug.com A +dnssec
# Should return the A record AND an RRSIG record beneath it

# Check where BIND stored the keys it generated
ls -la /var/cache/bind/K*.key /var/cache/bind/K*.private
# You will see files like:
# Ksprintug.com.+013+12345.key     (public key — safe to share)
# Ksprintug.com.+013+12345.private (PRIVATE KEY — never share, back this up)
warning_amber
Back up the private key files immediately

The .private files in /var/cache/bind/ are your zone's signing keys. If you lose them during a server failure without a backup, you cannot generate matching signatures — your zone will appear unsigned to validators and your DS record at the registrar will be invalid. Back these files up to offline storage the moment they are generated. Do not store the backup on the same server.

Step 4 — Extract Your DS Record and Submit It to the Registrar

Signing the zone makes it internally consistent — your records are signed, your DNSKEY is published. But the global chain of trust is not yet connected. The .com TLD (or .ug, or .co.tz depending on your domain) does not yet know to vouch for your KSK. That link is the DS record, and you must publish it at your registrar.

bash — extract the DS record for your registrarshell
# Method 1: Ask BIND directly (most reliable) rndc dnssec -status sprintug.com # Shows active keys and their DS records # Method 2: Generate DS from the KSK file # The KSK has flag 257 in the DNSKEY record dig @127.0.0.1 sprintug.com DNSKEY | grep " 257 " # Output example: # sprintug.com. 3600 IN DNSKEY 257 3 13 mdsswUyr3DPW132mOi8V9xESWE8jTo0d... # Generate DS records in both supported digest formats dnssec-dsfromkey -a SHA-256 Ksprintug.com.+013+12345.key # Output: # sprintug.com. IN DS 12345 13 2 8abc1234...hex...digest dnssec-dsfromkey -a SHA-1 Ksprintug.com.+013+12345.key # Some registrars still require SHA-1 DS as well # The DS record contains four fields: # sprintug.com. IN DS [KeyTag] [Algorithm] [DigestType] [Digest] # 12345 13 2 8abc... # # KeyTag: identifies which DNSKEY this DS corresponds to # Algorithm: 13 = ECDSAP256SHA256 # DigestType: 2 = SHA-256 (preferred), 1 = SHA-1 # Digest: hash of the KSK public key
info
What to tell your registrar

Most registrars have a DNSSEC or DS record section in their domain management portal. You will be asked for: the Key Tag, the Algorithm number, the Digest Type, and the Digest value. These come directly from the dnssec-dsfromkey output above. Some registrars accept the full DNSKEY record and compute the DS themselves. After submission, allow up to 48 hours for the DS record to propagate through the TLD zone. Until the DS is live at the parent, your zone is signed but not yet connected to the global chain — resolvers will treat it as unsigned.

The Manual Way — Understanding What dnssec-policy Does Under the Hood

Knowing the automated path is sufficient for operating production zones. Knowing the manual path is necessary for understanding what is actually happening, for debugging, and for the moment when the automation produces unexpected output. Here is what dnssec-policy does for you, expressed as the commands you would run manually.

bash — manual DNSSEC (for understanding, not for production)shell
# Generate the KSK (flag 257, larger key for KSK) dnssec-keygen -a ECDSAP256SHA256 -b 256 -f KSK -n ZONE sprintug.com # Creates: Ksprintug.com.+013+NNNNN.key and .private # Generate the ZSK (flag 256, standard zone signing key) dnssec-keygen -a ECDSAP256SHA256 -b 256 -n ZONE sprintug.com # Creates another pair of .key and .private files # Sign the zone file manually (produces a signed zone file) dnssec-signzone \ -A \ # include all DNSKEY records -3 $(openssl rand -hex 8) \ # use NSEC3 with random salt -H 0 \ # NSEC3 iterations: 0 -o sprintug.com \ # zone origin -t \ # print stats when done /etc/bind/zones/db.sprintug.com \ Ksprintug.com.+013+KSK.key \ Ksprintug.com.+013+ZSK.key # Produces: db.sprintug.com.signed # This is the file you point named.conf.local to instead of the unsigned file # The signed zone must be re-signed before signatures expire. # Without dnssec-policy, you write a cron job to do this. # This is exactly what dnssec-policy automates — and why you use it.

Key Rollover — What Happens and Why It Matters

Keys have limited lifetimes. They expire because cryptographic best practice requires it — long-lived keys give attackers more time to accumulate data for attacks, and a key that is never rotated is a key that may have been compromised without your knowledge. BIND's dnssec-policy handles rollovers automatically, but you must understand the process to recognise a rollover in progress and to intervene correctly if something goes wrong.

ZSK Rollover (self-contained — no registrar involvement)

ZSK rollover timelineconceptual
DAY 1 — Old ZSK active, new ZSK not yet created DAY 80 (10 days before 90-day lifetime expires): BIND generates new ZSK, publishes it in DNSKEY records. Both old and new ZSK appear in the zone simultaneously. Why: resolvers that cached the old DNSKEY need time to pick up the new one. The overlap window covers this. DAY 85 — BIND starts signing new records with the new ZSK. Old signatures (signed with old ZSK) are still valid and cached. New signatures (signed with new ZSK) begin appearing. DAY 90 — Old ZSK retired. All signatures in the zone are now from the new ZSK. The old DNSKEY is removed. Throughout: zero subscriber impact. No registrar update required. The KSK, which is published as a DS at the registrar, did not change.

KSK Rollover (requires registrar coordination)

KSK rollover — the careful procedureconceptual
BIND generates new KSK. Both old and new KSK appear in DNSKEY records. You must now submit the NEW KSK's DS record to your registrar. The registrar publishes the new DS record. Both old DS and new DS coexist at the parent for a period. Once the new DS has propagated (TTL expiry + buffer), BIND retires the old KSK. You remove the old DS from the registrar. CRITICAL: if you remove the old DS BEFORE the new DS has propagated, or retire the old KSK BEFORE the new DS is live at the parent, you break the chain of trust. Every DNSSEC-validating resolver on earth will return SERVFAIL for your zone until you fix it.
warning_amber
A broken chain of trust is not subtle

If you break the DNSSEC chain — by incorrectly removing a DS record, by losing the private key, or by misconfiguring inline signing — every resolver with dnssec-validation auto; or dnssec-validation yes; will return SERVFAIL for your entire zone. Your domain becomes unreachable for a large fraction of internet users. This is not a degraded experience. It is a full outage. BIND's automated rollover via dnssec-policy is designed to prevent this. Do not perform manual key operations on a zone managed by dnssec-policy without thoroughly understanding the current rollover state.

Monitoring Key and Signature Status

bash — DNSSEC operational monitoring commandsshell
# Show the full DNSSEC status of a zone: active keys,
# scheduled rollovers, signature expiry times
rndc dnssec -status sprintug.com

# Show DNSKEY records published in the zone (flag 257=KSK, 256=ZSK)
dig @127.0.0.1 sprintug.com DNSKEY

# Check signature expiry on a specific record
dig @127.0.0.1 sprintug.com A +dnssec | grep RRSIG
# The RRSIG record contains: algorithm, key tag, inception date, expiry date
# Expiry must be in the future — if it is in the past, signatures have expired

# Check whether the DS record at the parent matches your KSK
dig sprintug.com DS +short            # what the parent publishes
dnssec-dsfromkey Ksprintug.com.+013+NNNNN.key   # what your KSK produces
# These must match. If they do not: chain is broken.

# Full end-to-end DNSSEC verification from the root
dig @127.0.0.1 sprintug.com A +dnssec +cd
# +cd = checking disabled — shows the raw chain for debugging
# Without +cd: a broken chain returns SERVFAIL with no useful error

# Check NSEC3 is in place (should see NSEC3PARAM record)
dig @127.0.0.1 sprintug.com NSEC3PARAM

External Validation Tools

After signing your zone and publishing the DS record at the registrar, validate the complete chain from an independent vantage point before declaring it done.

ToolWhat It ChecksHow to Use
dnsviz.netFull graphical chain of trust from root to your zone. Shows every link, every key, every DS match. The most complete visual validator available.Enter your domain name. Read the graph — every node should be green.
dnssec-analyzer.verisignlabs.comVerisign's DNSSEC analyser. Structured report format with specific pass/fail checks.Enter domain. Review each check.
dig +dnssec +cdRaw chain data without validation — useful when you need to see what is in the chain even when it is broken.From your own server or any external resolver.
delv (BIND tool)Like dig but performs full DNSSEC validation locally. Shows exactly what a validating resolver sees.delv @127.0.0.1 sprintug.com A +vtrace
bash — delv full validation traceshell
# delv is the DNSSEC-aware replacement for dig # +vtrace shows the full validation process step by step delv @127.0.0.1 sprintug.com A +vtrace +multiline # Good output includes lines like: # ; fully validated # sprintug.com. 3600 IN A 196.43.10.100 # sprintug.com. 3600 IN RRSIG A 13 2 3600 (...) # Bad output: # resolution failed: RRSIG missing → zone not signed # resolution failed: verify failed → signature invalid # resolution failed: no valid DS → DS not at parent yet # resolution failed: DNSKEY missing → DNSKEY not in zone

Common Mistakes and How to Diagnose Them

SymptomCauseFix
SERVFAIL from external resolvers, but zone resolves locallyDNSSEC chain is broken — DS at parent does not match your KSK, or signatures have expiredRun delv +vtrace — it tells you exactly which step failed. Check DS match with dig sprintug.com DS vs dnssec-dsfromkey.
Zone resolves for some resolvers but not othersSome resolvers validate DNSSEC, some do not. The non-validating ones still work because they ignore the broken chain. Validating resolvers return SERVFAIL.Same fix as above — the chain is broken. Fix the chain.
Signatures present but validation failsSignatures were generated by a key whose DNSKEY record is no longer in the zone, or the DS at the parent points to a retired keyrndc dnssec -status sprintug.com — check which keys are active vs retired
Zone works but dnsviz shows red DSDS record submitted to registrar but not yet propagated (still within TTL window)Wait for TTL to expire (typically 24–48 hours). Recheck.
named log shows: "zone signing failed: no private key"Private key file missing from /var/cache/bind/Restore from backup. If no backup exists, generate new keys, update DS at registrar, full re-sign.

DNSSEC for the ISP: Validator vs Signer

To close the chapter, the practical summary of where DNSSEC sits in your operational life as Sprint Group CTO:

DNSSEC operational map — Sprint Groupconceptual
WHAT YOU DO AS A RESOLVER OPERATOR: ──────────────────────────────────── dnssec-validation auto; ← already in your named.conf.options bind.keys ← already on disk, auto-managed That is it. You are already validating DNSSEC for all subscribers. Every signed zone on the internet is verified before answers reach them. No additional configuration required. WHAT YOU DO AS A ZONE OWNER: ────────────────────────────── 1. Add dnssec-policy "sprintug-policy"; to each zone in named.conf.local 2. Define the policy block in named.conf (keys, lifetimes, NSEC3) 3. rndc reconfig 4. Extract DS record: rndc dnssec -status sprintug.com 5. Submit DS record to your registrar (one-time per zone, per KSK) 6. Validate with dnsviz.net — confirm full green chain 7. After that: BIND manages everything — signing, rotation, re-signing Zones to sign: sprintug.com → submit DS to .com registrar sprinttz.co.tz → submit DS to .tz NIC (Tanzania) Your reverse zones → sign them too — PTR records are forgeable without DNSSEC What you do not need to do: Manually generate keys ← dnssec-policy handles this Schedule re-signing cron jobs ← dnssec-policy handles this Write rollover procedures ← dnssec-policy handles this
flight
DNSSEC as infrastructure maturity

Signing your zones is not just a security measure. It is a signal of operational maturity. Enterprise customers evaluating Sprint Group as their ISP will run a DNSSEC check on your nameservers. A fully signed zone with a clean chain tells them that whoever runs the DNS infrastructure knows what they are doing. An unsigned zone in 2024 is a yellow flag. It is also one config block and one registrar submission to fix — and then BIND maintains it indefinitely.


Chapter 14 · Final Project

Production Build

Implement this. One server, two roles (auth + recursive), production hardened, ready to serve SprintUG subscribers and host your zones authoritatively.

Project: SprintUG Primary DNS Server

Ubuntu 24.04 · BIND 9.18 · Single server · Auth + Recursive · ~1,000 subscribers

01
Install and verify
apt install bind9 bind9utils bind9-doc dnsutils
Verify: named -v → should show 9.18.x
02
Create log directory
mkdir -p /var/log/named && chown bind:bind /var/log/named
Create zones directory: mkdir -p /etc/bind/zones
03
Configure named.conf
Replace default with the production version from Chapter 01. Define your trusted_resolvers ACL with your actual subscriber IP ranges.
04
Configure named.conf.options
Use the small ISP template from Chapter 09. Set listen-on to your actual server IPs. Confirm no forwarders block — full recursion from roots.
05
Create your forward zone files
Create /etc/bind/zones/db.sprintug.com from the Chapter 05 template. Set serial to today: 2024031501. Add your NS glue, MX, A, and TXT records.
06
Create your reverse zone file
Create /etc/bind/zones/db.YOUR-BLOCK.rev. Add PTR records for your routers, servers, and BNG interfaces. These are what show up in traceroutes.
07
Declare zones in named.conf.local
Add zone stanzas for your forward and reverse zones using the Chapter 03 template. Restrict allow-transfer to your secondary NS IP only.
08
Generate rndc key
rndc-confgen -a -b 512
chown root:bind /etc/bind/rndc.key && chmod 640 /etc/bind/rndc.key
09
Validate everything before starting
named-checkconf → zero errors.
named-checkzone sprintug.com /etc/bind/zones/db.sprintug.com → OK
10
Start and enable
systemctl enable --now named
systemctl status named → active (running)
11
Test authority
dig @127.0.0.1 sprintug.com SOA → should show your SOA with aa flag
dig @127.0.0.1 sprintug.com NS → ns1, ns2
dig @127.0.0.1 -x YOUR-IP → PTR record
12
Test recursion from a subscriber IP
dig @YOUR-SERVER-IP google.com A → answer with ra flag (recursion available)
dig @YOUR-SERVER-IP cloudflare.com AAAA → AAAA answer
From outside your trusted ACL: should get REFUSED
13
Configure firewall
ufw allow from YOUR-SUBSCRIBER-RANGE to any port 53 proto udp
ufw allow from YOUR-SUBSCRIBER-RANGE to any port 53 proto tcp
Block 53 from everywhere else. rndc port 953 → localhost only.
14
Add to Zabbix monitoring
Monitor: named process, port 53 UDP/TCP, query response time, cache size, zone serial numbers. Alert on SOA serial mismatch between primary and secondary.
15
Register glue records with your registrar
Submit ns1.sprintug.com → YOUR-IP and ns2.sprintug.com → SECONDARY-IP as glue at your domain registrar. Without glue, your NS records are unresolvable.

Testing Checklist

full test suiteshell
# ── Authority tests ──────────────────────────────────────── dig @ns1.sprintug.com sprintug.com SOA # aa flag present? dig @ns1.sprintug.com sprintug.com NS # Both NS records? dig @ns1.sprintug.com www.sprintug.com A # A record resolves? dig @ns1.sprintug.com sprintug.com MX # Mail records? dig @ns1.sprintug.com sprintug.com TXT # SPF record? dig @ns1.sprintug.com -x 196.43.10.1 # PTR resolves? # ── Recursion tests (from inside trusted ACL) ────────────── dig @ns1.sprintug.com google.com A # ra flag? Got answer? dig @ns1.sprintug.com twitter.com AAAA # IPv6 works? dig @ns1.sprintug.com -t DNSKEY . +dnssec # DNSSEC validation? # ── Security tests ───────────────────────────────────────── dig @ns1.sprintug.com google.com A \ +subnet=0.0.0.0/0 # Should work dig version.bind chaos txt @ns1.sprintug.com # Should get "not disclosed" # Test from outside your ACL (should get REFUSED) dig @ns1.sprintug.com google.com A \ +time=3 +tries=1 # Should get REFUSED # ── Performance test ─────────────────────────────────────── dnsperf -s 127.0.0.1 -d /usr/share/dnsperf/queryfile-example-current \ -t 10 -Q 5000 # 5000 QPS for 10 seconds
flight
Pilot in Command

When you can pass every test in that checklist, explain to a junior engineer what each file does and why, sign off a zone edit under pressure, and read a dig +trace output like a route map — you are PIC on BIND. The files never change. The understanding compounds.

Next debone: FreeRADIUS — you already run it at AS328939, now you take command of it.