Skip to content

What is CLDK?

CLDK (CodeLLM-DevKit) is a Python library that loads a codebase and hands you back a typed object model of it (classes, methods, fields, and call graphs) through one consistent analysis object. The interface is the same whether the code is Java or Python: the same methods, the same typed results. You stop parsing files by hand and start asking questions like “what calls this method?” or “is this sink reachable from that entry point?” and getting real answers.

Every CLDK program follows the same three-step shape: pick a language, build an analysis facade, then query typed models.

  1. Construct a CLDK object for your language: CLDK(language="java"). This selects the analysis backend but does no work yet.

  2. Build the analysis facade by pointing it at a project: .analysis(project_path="commons-cli"). This is where the backend engine runs and produces the program model.

  3. Query typed models off that facade: get_classes(), get_method(...), get_call_graph(). Everything you get back is a typed object you can read, walk, and feed to an LLM.

from cldk import CLDK
from cldk.analysis import AnalysisLevel
# 1. pick the language 2. build the facade 3. query
analysis = CLDK(language="java").analysis(
project_path="commons-cli",
analysis_level=AnalysisLevel.call_graph,
)
print(len(analysis.get_classes()), "classes")
print(analysis.get_call_graph()) # -> networkx.DiGraph
# 23 classes
# DiGraph with caller -> callee edges

The flow is the same regardless of language; only the engine behind the facade changes:

flowchart LR
    A["CLDK(language)"] --> B["analysis(project_path)"]
    B --> C["Typed models + schema"]
    C --> D["Symbol table"]
    C --> E["Call graph"]
    C --> F["Class structure / CRUD"]
    B -.-> J["Java: CodeAnalyzer / WALA"]
    B -.-> P["Python: Jedi (+ optional CodeQL)"]
    B -.-> TS["TypeScript (coming soon)"]
    B -.-> GO["Go (coming soon)"]
    B -.-> RS["Rust (coming soon)"]
    classDef planned stroke-dasharray:5 4,opacity:0.55;
    class TS,GO,RS planned;

Each language plugs in its own analysis engine (Java uses CodeAnalyzer over WALA, Python uses Jedi with optional CodeQL), but you query them through the same facade, so your code barely changes when you switch languages.

An agent asked “what calls this method?” without CLDK has to crawl: reading and grepping file after file, spending tokens to approximate an answer it still can’t guarantee. CLDK turns that into a single deterministic lookup against the real program. Give the agent a way to run CLDK (like the cocoa plugin) and it stops crawling and starts querying: reachability is a networkx graph query, callers are get_callers, and every claim is grounded in ground truth. That difference (one precise call versus a token-heavy crawl) is why agents reach for CLDK.

Build cocoa →

You query every language through the same analysis facade; only the engine behind it changes. Java is the most complete backend today, with Python close behind. More languages (Go, TypeScript, Rust, and C) are on the way.

LanguageStatusSymbol tableCall graph / callers / callees
JavaFullYesYes
PythonStrongYesYes