Deploy on Kubernetes

The emit/poll split maps cleanly onto Kubernetes. The expensive analysis becomes scheduled batch work, the database is a stateful service, and the agents are stateless readers that scale on demand.

flowchart TB
    subgraph Jobs["Emit · CronJobs (one per project/language)"]
        EJ["codeanalyzer-java"]
        EP["canpy"]
        ET["cants"]
    end
    subgraph Core["Neo4j (StatefulSet or managed Aura)"]
        NS[("graph<br/>J* · Py* · TS*")]
    end
    subgraph Read["Poll · agent Deployment (stateless, scalable)"]
        D1["agent replica"]
        D2["agent replica"]
        D3["agent replica"]
    end
    EJ -->|Bolt write| NS
    EP -->|Bolt write| NS
    ET -->|Bolt write| NS
    NS -->|Bolt read| D1
    NS -->|Bolt read| D2
    NS -->|Bolt read| D3

Three roles:

Neo4j — one database for the whole fleet. Run it as a StatefulSet with a PersistentVolumeClaim, or point at a managed instance (Neo4j Aura). The emit jobs need write credentials; the agents need only read credentials.
Emit CronJobs — one per project (and per language within a project). They run on a schedule, push incrementally over Bolt, and exit. Because writes are idempotent, a missed or repeated run is harmless.
Agent Deployment — long-lived pods that construct analysis objects against the graph with Neo4jConnectionConfig. They hold no analysis state, so you scale them horizontally like any other web service.

The emit CronJob

Each backend ships as a self-contained binary (codeanalyzer for Java, canpy for Python, cants for TypeScript), so an emit job is a container that has the binary, the source checkout, and the Neo4j connection in its environment. Connection settings are read from NEO4J_URI, NEO4J_USERNAME, and NEO4J_PASSWORD, so the command line carries no secrets.

apiVersion: batch/v1
kind: CronJob
metadata:
  name: emit-billing-core
spec:
  schedule: "0 * * * *"            # hourly; incremental pushes are cheap
  concurrencyPolicy: Forbid
  jobTemplate:
    spec:
      template:
        spec:
          restartPolicy: Never
          containers:
            - name: canpy
              image: ghcr.io/your-org/cldk-emit-python:latest
              args:
                - canpy
                - -i
                - /src/billing-core
                - --emit
                - neo4j
                - --app-name
                - billing-core
              env:
                - name: NEO4J_URI
                  value: bolt://neo4j.cldk.svc.cluster.local:7687
                - name: NEO4J_USERNAME
                  value: writer
                - name: NEO4J_PASSWORD
                  valueFrom:
                    secretKeyRef:
                      name: neo4j-credentials
                      key: writer-password
              volumeMounts:
                - name: src
                  mountPath: /src
          volumes:
            - name: src                # a checkout sidecar, PVC, or initContainer git clone
              emptyDir: {}

Swap the image and args per language — codeanalyzer -i /src/payments-service -a 2 --emit neo4j --app-name payments-service for Java, cants -i /src/web-frontend -a 2 --emit neo4j --app-name web-frontend for TypeScript. Point every job at the same NEO4J_URI; the J* / Py* / TS* namespacing keeps them from colliding.

The agent Deployment

The agents read with a credential scoped to read-only. Construction is identical to the poll examples — the only Kubernetes-specific detail is that the URI is the in-cluster Neo4j Service and the password comes from a Secret.

import os
from cldk import CLDK
from cldk.analysis.commons.backend_config import Neo4jConnectionConfig

def analysis_for(app_name: str):
    return CLDK.python(
        backend=Neo4jConnectionConfig(
            uri=os.environ["NEO4J_URI"],
            username=os.environ["NEO4J_USERNAME"],   # reader
            password=os.environ["NEO4J_PASSWORD"],
            application_name=app_name,
        ),
    )

Because no source is parsed at query time, an agent pod starts instantly and answers from the graph. Scale the Deployment’s replicas to match query load; Neo4j’s connection pool and your read-replica topology absorb the fan-out.

Walkthrough: Odoo, end to end

Odoo is a good stress test for multi-project analysis at scale: it is large and split into hundreds of addon modules, all Python. A real fleet rarely stops at one language, though, so we pair Odoo’s Python addons with a TypeScript service in the same graph and query both through one API.

1. Emit both languages into one graph

Run one Python job over Odoo’s addons and one TypeScript job over the storefront service. They use distinct --app-names, so both land in the same database as separate application anchors — the namespaced labels keep the Python and TypeScript graphs from colliding, and an agent can query either.

Python (Odoo addons)
TypeScript (storefront service)

canpy -i ./odoo/addons --emit neo4j \
  --app-name odoo \
  --neo4j-uri bolt://neo4j:7687 \
  --neo4j-user writer --neo4j-password "$NEO4J_PASSWORD"
# -> writes :PyApplication {name: "odoo"} and its :PyModule / :PySymbol graph
# -> ~hundreds of modules; subsequent runs only rewrite changed files

cants -i ./storefront/src -a 2 --emit neo4j \
  --app-name storefront \
  --neo4j-uri bolt://neo4j:7687 \
  --neo4j-user writer --neo4j-password "$NEO4J_PASSWORD"
# -> writes :TSApplication {name: "storefront"} and its :TSModule / :TSSymbol graph

In a cluster these are two CronJobs on the same schedule. After the first full run, each subsequent run is an incremental Bolt push: only modules whose content hash changed are rewritten. The two apps are different languages, so their full-run prunes never touch each other; keep two same-language apps in separate databases (see the prune note).

2. Query across modules and languages

An agent now answers structural questions without touching either source tree. Construct one analysis per app, each anchored to its own application_name:

from cldk import CLDK
from cldk.analysis.commons.backend_config import Neo4jConnectionConfig

def connect(factory, app):
    return factory(
        backend=Neo4jConnectionConfig(
            uri="bolt://neo4j:7687",
            username="reader",
            password="…",
            application_name=app,
        ),
    )

py = connect(CLDK.python, "odoo")            # the Odoo addons graph
ts = connect(CLDK.typescript, "storefront")  # the TypeScript service graph

# "Which callables across all addons reach SaleOrder.action_confirm?"
# Signatures are rooted at the emit -i directory (./odoo/addons), so no "odoo.addons." prefix.
callers = py.get_callers("sale.models.sale_order.SaleOrder", "action_confirm")
# -> callers spread across the sale, stock, and account addons — one query, no per-addon parsing

# The same agent inspects the TypeScript service's call graph through the same API.
storefront_cg = ts.get_call_graph()   # -> networkx.DiGraph

The first query touched modules from several addons in a single call, and the agent reached a second language through the same analysis vocabulary — the multi-lingual, multi-project payoff. Re-analyzing these projects from scratch on every such question would be untenable; reading a precomputed graph is constant work.