Deploy on Kubernetes
The emit/poll split maps cleanly onto Kubernetes. The expensive analysis becomes scheduled batch work, the database is a stateful service, and the agents are stateless readers that scale on demand.
flowchart TB
subgraph Jobs["Emit · CronJobs (one per project/language)"]
EJ["codeanalyzer-java"]
EP["canpy"]
ET["cants"]
end
subgraph Core["Neo4j (StatefulSet or managed Aura)"]
NS[("graph<br/>J* · Py* · TS*")]
end
subgraph Read["Poll · agent Deployment (stateless, scalable)"]
D1["agent replica"]
D2["agent replica"]
D3["agent replica"]
end
EJ -->|Bolt write| NS
EP -->|Bolt write| NS
ET -->|Bolt write| NS
NS -->|Bolt read| D1
NS -->|Bolt read| D2
NS -->|Bolt read| D3
Three roles:
- Neo4j — one database for the whole fleet. Run it as a
StatefulSetwith aPersistentVolumeClaim, or point at a managed instance (Neo4j Aura). The emit jobs need write credentials; the agents need only read credentials. - Emit
CronJobs — one per project (and per language within a project). They run on a schedule, push incrementally over Bolt, and exit. Because writes are idempotent, a missed or repeated run is harmless. - Agent
Deployment— long-lived pods that constructanalysisobjects against the graph withNeo4jConnectionConfig. They hold no analysis state, so you scale them horizontally like any other web service.
The emit CronJob
Section titled “The emit CronJob”Each backend ships as a self-contained binary (codeanalyzer for Java, canpy for Python, cants for TypeScript), so an emit job is a container that has the binary, the source checkout, and the Neo4j connection in its environment. Connection settings are read from NEO4J_URI, NEO4J_USERNAME, and NEO4J_PASSWORD, so the command line carries no secrets.
apiVersion: batch/v1kind: CronJobmetadata: name: emit-billing-corespec: schedule: "0 * * * *" # hourly; incremental pushes are cheap concurrencyPolicy: Forbid jobTemplate: spec: template: spec: restartPolicy: Never containers: - name: canpy image: ghcr.io/your-org/cldk-emit-python:latest args: - canpy - -i - /src/billing-core - --emit - neo4j - --app-name - billing-core env: - name: NEO4J_URI value: bolt://neo4j.cldk.svc.cluster.local:7687 - name: NEO4J_USERNAME value: writer - name: NEO4J_PASSWORD valueFrom: secretKeyRef: name: neo4j-credentials key: writer-password volumeMounts: - name: src mountPath: /src volumes: - name: src # a checkout sidecar, PVC, or initContainer git clone emptyDir: {}Swap the image and args per language — codeanalyzer -i /src/payments-service -a 2 --emit neo4j --app-name payments-service for Java, cants -i /src/web-frontend -a 2 --emit neo4j --app-name web-frontend for TypeScript. Point every job at the same NEO4J_URI; the J* / Py* / TS* namespacing keeps them from colliding.
The agent Deployment
Section titled “The agent Deployment”The agents read with a credential scoped to read-only. Construction is identical to the poll examples — the only Kubernetes-specific detail is that the URI is the in-cluster Neo4j Service and the password comes from a Secret.
import osfrom cldk import CLDKfrom cldk.analysis.commons.backend_config import Neo4jConnectionConfig
def analysis_for(app_name: str): return CLDK.python( backend=Neo4jConnectionConfig( uri=os.environ["NEO4J_URI"], username=os.environ["NEO4J_USERNAME"], # reader password=os.environ["NEO4J_PASSWORD"], application_name=app_name, ), )Because no source is parsed at query time, an agent pod starts instantly and answers from the graph. Scale the Deployment’s replicas to match query load; Neo4j’s connection pool and your read-replica topology absorb the fan-out.
Walkthrough: Odoo, end to end
Section titled “Walkthrough: Odoo, end to end”Odoo is a good stress test for multi-project analysis at scale: it is large and split into hundreds of addon modules, all Python. A real fleet rarely stops at one language, though, so we pair Odoo’s Python addons with a TypeScript service in the same graph and query both through one API.
1. Emit both languages into one graph
Section titled “1. Emit both languages into one graph”Run one Python job over Odoo’s addons and one TypeScript job over the storefront service. They use distinct --app-names, so both land in the same database as separate application anchors — the namespaced labels keep the Python and TypeScript graphs from colliding, and an agent can query either.
canpy -i ./odoo/addons --emit neo4j \ --app-name odoo \ --neo4j-uri bolt://neo4j:7687 \ --neo4j-user writer --neo4j-password "$NEO4J_PASSWORD"# -> writes :PyApplication {name: "odoo"} and its :PyModule / :PySymbol graph# -> ~hundreds of modules; subsequent runs only rewrite changed filescants -i ./storefront/src -a 2 --emit neo4j \ --app-name storefront \ --neo4j-uri bolt://neo4j:7687 \ --neo4j-user writer --neo4j-password "$NEO4J_PASSWORD"# -> writes :TSApplication {name: "storefront"} and its :TSModule / :TSSymbol graphIn a cluster these are two CronJobs on the same schedule. After the first full run, each subsequent run is an incremental Bolt push: only modules whose content hash changed are rewritten. The two apps are different languages, so their full-run prunes never touch each other; keep two same-language apps in separate databases (see the prune note).
2. Query across modules and languages
Section titled “2. Query across modules and languages”An agent now answers structural questions without touching either source tree. Construct one analysis per app, each anchored to its own application_name:
from cldk import CLDKfrom cldk.analysis.commons.backend_config import Neo4jConnectionConfig
def connect(factory, app): return factory( backend=Neo4jConnectionConfig( uri="bolt://neo4j:7687", username="reader", password="…", application_name=app, ), )
py = connect(CLDK.python, "odoo") # the Odoo addons graphts = connect(CLDK.typescript, "storefront") # the TypeScript service graph
# "Which callables across all addons reach SaleOrder.action_confirm?"# Signatures are rooted at the emit -i directory (./odoo/addons), so no "odoo.addons." prefix.callers = py.get_callers("sale.models.sale_order.SaleOrder", "action_confirm")# -> callers spread across the sale, stock, and account addons — one query, no per-addon parsing
# The same agent inspects the TypeScript service's call graph through the same API.storefront_cg = ts.get_call_graph() # -> networkx.DiGraphThe first query touched modules from several addons in a single call, and the agent reached a second language through the same analysis vocabulary — the multi-lingual, multi-project payoff. Re-analyzing these projects from scratch on every such question would be untenable; reading a precomputed graph is constant work.