What is CLDK?
CLDK (CodeLLM-DevKit) is a Python library that loads a codebase and hands you back a typed object model of it (classes, methods, fields, and call graphs) through one consistent analysis object. The interface is the same whether the code is Java or Python: the same methods, the same typed results. You stop parsing files by hand and start asking questions like “what calls this method?” or “is this sink reachable from that entry point?” and getting real answers.
The mental model
Section titled “The mental model”Every CLDK program follows the same three-step shape: pick a language, build an analysis facade, then query typed models.
-
Construct a
CLDKobject for your language:CLDK(language="java"). This selects the analysis backend but does no work yet. -
Build the analysis facade by pointing it at a project:
.analysis(project_path="commons-cli"). This is where the backend engine runs and produces the program model. -
Query typed models off that facade:
get_classes(),get_method(...),get_call_graph(). Everything you get back is a typed object you can read, walk, and feed to an LLM.
from cldk import CLDKfrom cldk.analysis import AnalysisLevel
# 1. pick the language 2. build the facade 3. queryanalysis = CLDK(language="java").analysis( project_path="commons-cli", analysis_level=AnalysisLevel.call_graph,)
print(len(analysis.get_classes()), "classes")print(analysis.get_call_graph()) # -> networkx.DiGraph# 23 classes# DiGraph with caller -> callee edgesfrom cldk import CLDKfrom cldk.analysis import AnalysisLevel
# Python requires a project_path (no single-file mode)analysis = CLDK(language="python").analysis( project_path="my_pkg", analysis_level=AnalysisLevel.call_graph,)
print(len(analysis.get_classes()), "classes")print(analysis.get_call_graph()) # -> networkx.DiGraphThe flow is the same regardless of language; only the engine behind the facade changes:
flowchart LR
A["CLDK(language)"] --> B["analysis(project_path)"]
B --> C["Typed models + schema"]
C --> D["Symbol table"]
C --> E["Call graph"]
C --> F["Class structure / CRUD"]
B -.-> J["Java: CodeAnalyzer / WALA"]
B -.-> P["Python: Jedi (+ optional CodeQL)"]
B -.-> TS["TypeScript (coming soon)"]
B -.-> GO["Go (coming soon)"]
B -.-> RS["Rust (coming soon)"]
classDef planned stroke-dasharray:5 4,opacity:0.55;
class TS,GO,RS planned;
Each language plugs in its own analysis engine (Java uses CodeAnalyzer over WALA, Python uses Jedi with optional CodeQL), but you query them through the same facade, so your code barely changes when you switch languages.
Why it matters for agents
Section titled “Why it matters for agents”An agent asked “what calls this method?” without CLDK has to crawl: reading and grepping file after file, spending tokens to approximate an answer it still can’t guarantee. CLDK turns that into a single deterministic lookup against the real program. Give the agent a way to run CLDK (like the cocoa plugin) and it stops crawling and starts querying: reachability is a networkx graph query, callers are get_callers, and every claim is grounded in ground truth. That difference (one precise call versus a token-heavy crawl) is why agents reach for CLDK.
Language coverage
Section titled “Language coverage”You query every language through the same analysis facade; only the engine behind it changes. Java is the most complete backend today, with Python close behind. More languages (Go, TypeScript, Rust, and C) are on the way.
| Language | Status | Symbol table | Call graph / callers / callees |
|---|---|---|---|
| Java | Full | Yes | Yes |
| Python | Strong | Yes | Yes |