Core

The CLDK class is the top-level entry point. Construct it with a language, then ask it for an analysis object over your project. You never instantiate JavaAnalysis or PythonAnalysis directly; CLDK hands you the correct one.

Overview

Two steps, always the same shape:

from cldk import CLDK
from cldk.analysis import AnalysisLevel

analysis = CLDK(language="java").analysis(project_path="commons-cli")
# -> JavaAnalysis, ready to query

CLDK(language=...) accepts "java" and "python" today (Go, TypeScript, Rust, and C are on the way). The object it returns exposes the primary method used in most workflows:

.analysis(...): returns the language-specific analysis object (JavaAnalysis or PythonAnalysis) backed by the appropriate static analysis engine. This is where the symbol table and call graph are produced.

flowchart LR
    C["CLDK(language)"] --> A[".analysis(project_path)"]
    A --> J[JavaAnalysis]
    A --> P[PythonAnalysis]
    J --> M[Typed models]
    P --> M

Analysis levels. The depth of .analysis() is governed by analysis_level. The default, AnalysisLevel.symbol_table, populates classes, methods, and fields. Call-graph computation incurs additional cost: get_call_graph, get_callers, and get_callees require AnalysisLevel.call_graph. Set it up front when call relationships are needed.

Key `.analysis()` arguments

Argument	Applies to	What it does
`project_path`	all	Path to the project directory to analyze.
`analysis_level`	all	`AnalysisLevel.symbol_table` (default) or `AnalysisLevel.call_graph`. The latter is required for call graphs, callers, and callees.
`analysis_backend_path`	Java only	Path to a `codeanalyzer-*.jar`. Omit to auto-download.
`cache_dir`	Python only	Directory for the codeanalyzer-python cache (virtualenv, CodeQL DB, analysis cache). Defaults to `<project_path>/.codeanalyzer`.
`use_codeql`	Python only	When `True` (default), augments Jedi call-graph resolution with CodeQL for more complete edges. Set `False` for faster, Jedi-only analysis.

See the generated reference below for the full signature, including eager, target_files, analysis_json_path, and use_ray.

Worked example

The recurring sample project is Apache Commons CLI, unpacked at commons-cli.

Construct a Java analysis

from cldk import CLDK
from cldk.analysis import AnalysisLevel

analysis = CLDK(language="java").analysis(
    project_path="commons-cli",
    analysis_level=AnalysisLevel.call_graph,   # needed for call-graph methods
)

print(type(analysis).__name__)        # JavaAnalysis
print(len(analysis.get_classes()))    # 23

The first run may download the CodeAnalyzer backend JAR; later runs reuse the cache. From here, every method lives on analysis; see the Java API reference for the full surface.

Construct a Python analysis

from cldk import CLDK

analysis = CLDK(language="python").analysis(project_path="my_pkg")

print(type(analysis).__name__)        # PythonAnalysis
classes = analysis.get_classes()      # Dict[str, PyClass]

Same two-step shape, same method names, only language changed. Methods are documented on the Python API reference.

For an introduction, see What is CLDK? and the Quickstart. For task-oriented snippets, see Common tasks and the cocoa; the concepts page explains analysis levels and call graphs in detail, and the cheat sheet provides a one-page summary.

API reference

The full generated reference follows.

Core CLDK module.

This module provides the top-level entry point for the Code Language Development Kit (CLDK), a unified framework for performing static analysis across multiple programming languages. The primary interface is the CLDK class, which serves as a factory for creating language-specific analysis objects, tree-sitter parsers, and sanitization utilities.

The CLDK supports the following languages

Java: Full static analysis via CodeAnalyzer backend, including symbol tables, call graphs, and code metrics.

Python: Static analysis via codeanalyzer-python backend with optional CodeQL-augmented call graph resolution.

C: Basic analysis via libclang for parsing and extracting code structure.

Typical usage involves instantiating CLDK with a target language, then calling analysis to obtain a language-specific analysis facade.

Note This module requires language-specific backends to be available:

Java: codeanalyzer-*.jar (auto-downloaded or specified via path)

Python: codeanalyzer-python (auto-installed in virtualenv)

C: libclang (must be installed on the system)

`CLDK`

class CLDK

Core class for the Code Language Development Kit (CLDK).

The CLDK class serves as the primary entry point and factory for all code analysis operations. It provides a unified interface for initializing language-specific analysis facades, tree-sitter parsers, and code sanitization utilities.

This class follows the factory pattern, where the language parameter determines which concrete analysis implementation is returned by the analysis, treesitter_parser, and tree_sitter_utils methods.

Parameters:

Name	Type	Description
`language`	`str`	The target programming language for analysis. Supported values are `"java"`, `"python"`, and `"c"` (case-sensitive).

Raises:

NotImplementedError: Raised by factory methods when the specified language is not yet supported.

See Also

JavaAnalysis: Java-specific analysis facade.

PythonAnalysis: Python-specific analysis facade.

CAnalysis: C-specific analysis facade.

Attributes

Name	Type	Description
`language`	`str`

Methods

`CLDK.analysis`

analysis(project_path: str | Path | None = None, source_code: str | None = None, eager: bool = False, analysis_level: str = AnalysisLevel.symbol_table, target_files: List[str] | None = None, analysis_backend_path: str | None = None, analysis_json_path: str | Path = None, cache_dir: str | Path | None = None, use_codeql: bool = True, use_ray: bool = False) -> JavaAnalysis | PythonAnalysis | CAnalysis

Initialize and return a language-specific analysis facade.

This factory method creates an appropriate analysis object based on the language specified during CLDK initialization. The analysis facade provides methods for extracting code structure, call graphs, symbol tables, and other static analysis artifacts.

The method supports two modes of operation:

Project mode: Analyze an entire project directory by providing project_path. This is the recommended mode for comprehensive analysis.
Source code mode (Java only): Analyze a single source code string by providing source_code. Useful for quick analysis of code snippets.

Parameters:

Name	Type	Description
`project_path`	`str \| Path \| None`	Absolute or relative path to the project directory to analyze. The directory should contain source files in the target language. Mutually exclusive with `source_code`.
`source_code`	`str \| None`	Raw source code string to analyze (Java only). Useful for analyzing code snippets without a project structure. Mutually exclusive with `project_path`. Not supported for Python or C languages.
`eager`	`bool`	If `True`, forces regeneration of all analysis caches and databases, ignoring any previously cached results. Defaults to `False` for incremental analysis performance.
`analysis_level`	`str`	The depth of analysis to perform. Controls which analysis artifacts are generated. See `AnalysisLevel` for available options. Defaults to `AnalysisLevel.symbol_table`.
`target_files`	`List[str] \| None`	Optional list of specific file paths (relative to `project_path`) to analyze. When provided, only these files are included in the analysis, improving performance for large projects. Defaults to `None` (analyze all files).
`analysis_backend_path`	`str \| None`	Java only. Path to the directory containing the `codeanalyzer-*.jar` backend executable. If not provided, the JAR is automatically downloaded. Not valid for Python analysis; use `cache_dir` instead.
`analysis_json_path`	`str \| Path`	Path where the analysis database (typically `analysis.json`) should be persisted. Useful for caching analysis results between sessions. If not provided, a default location within the project is used.
`cache_dir`	`str \| Path \| None`	Python only. Directory path for the codeanalyzer-python backend’s cache, including its virtualenv, CodeQL database, and `analysis_cache.json`. When omitted, defaults to `<project_path>/.codeanalyzer`. Ignored for Java and C.
`use_codeql`	`bool`	Python only. If `True` (default), augments Jedi-based call graph resolution with CodeQL analysis for more complete call edges. Set to `False` for faster analysis using only Jedi. Ignored for Java and C.
`use_ray`	`bool`	Python only. If `True`, enables Ray-based parallel processing for analysis. Recommended for very large projects where sequential Jedi/CodeQL analysis would be slow. Requires Ray to be installed. Defaults to `False`. Ignored for Java and C.

Returns:

JavaAnalysis \| PythonAnalysis \| CAnalysis: A language-specific analysis facade instance: - JavaAnalysis for Java projects - PythonAnalysis for Python projects - CAnalysis for C projects

Raises:

CldkInitializationException: Raised in the following cases: - Neither project_path nor source_code is provided. - Both project_path and source_code are provided. - source_code is provided for Python analysis (not supported). - analysis_backend_path is provided for Python analysis (use cache_dir instead).
NotImplementedError: If the language specified during CLDK initialization is not supported.

Note The analysis process may download or build backend tools on first run, which can take additional time. Subsequent runs use cached backends for faster startup.

See Also

AnalysisLevel: Available analysis depth options.

JavaAnalysis: Java analysis methods.

PythonAnalysis: Python analysis methods.

`CLDK.treesitter_parser`

treesitter_parser() -> TreesitterJava

Return a Tree-sitter parser for the selected language.

Creates and returns a language-specific Tree-sitter parser instance that can be used for syntactic analysis, AST traversal, and code querying operations. Tree-sitter provides incremental parsing with excellent performance characteristics for real-time code analysis.

The returned parser provides methods for

Parsing source code into an AST

Running Tree-sitter queries to extract code patterns

Extracting syntactic elements (methods, classes, imports, etc.)

Performing lexical analysis

Returns:

TreesitterJava: A Tree-sitter parser wrapper for Java source code. The parser provides methods such as is_parsable, get_raw_ast, get_all_imports, and various code extraction utilities.

Raises:

NotImplementedError: If the language specified during CLDK initialization does not have a Tree-sitter parser implementation. Currently, only Java is supported.

Note The Tree-sitter parser operates at the syntactic level only and does not perform semantic analysis. For semantic information like resolved types or call graphs, use analysis instead.

See Also

TreesitterJava: Java Tree-sitter parser implementation.

`CLDK.tree_sitter_utils`

tree_sitter_utils(source_code: str) -> TreesitterSanitizer

Return Tree-sitter-based code sanitization utilities for the selected language.

Creates and returns a utility class that provides code transformation and sanitization operations using Tree-sitter for parsing. These utilities are particularly useful for preparing code for LLM consumption, test generation, and code analysis tasks.

The sanitization utilities provide operations such as

Removing unused imports from source code

Keeping only focal methods and their callees for context reduction

Extracting and manipulating test assertions

Identifying and removing dead code

Parameters:

Name	Type	Description
`source_code`	`str`	The source code string to initialize the utilities with. This code will be parsed and made available for transformation operations. Must be valid syntax for the target language.

Returns:

TreesitterSanitizer: A utility wrapper that provides sanitization and transformation methods for Java source code, including: - keep_only_focal_method_and_its_callees - remove_unused_imports

Raises:

NotImplementedError: If the language specified during CLDK initialization does not have sanitization utilities implemented. Currently, only Java is supported.

Note The sanitization utilities modify code at the syntactic level using Tree-sitter patterns. For complex refactoring that requires semantic understanding, consider using the full analysis capabilities via analysis.

See Also

TreesitterSanitizer: Java sanitization utility implementation.

treesitter_parser: For raw Tree-sitter parsing without sanitization utilities.

Core

Overview

Key .analysis() arguments

Worked example

Construct a Java analysis

Construct a Python analysis

API reference

CLDK

Attributes

Methods

CLDK.analysis

CLDK.treesitter_parser

CLDK.tree_sitter_utils

Key `.analysis()` arguments

`CLDK`

`CLDK.analysis`

`CLDK.treesitter_parser`

`CLDK.tree_sitter_utils`