Core
The CLDK class is the top-level entry point. Construct it with a language, then
ask it for an analysis object over your project. You never instantiate
JavaAnalysis or PythonAnalysis directly; CLDK hands you the correct one.
Overview
Section titled “Overview”Two steps, always the same shape:
from cldk import CLDKfrom cldk.analysis import AnalysisLevel
analysis = CLDK(language="java").analysis(project_path="commons-cli")# -> JavaAnalysis, ready to queryCLDK(language=...) accepts "java" and "python" today (Go, TypeScript, Rust,
and C are on the way). The object it returns exposes the primary method used in
most workflows:
.analysis(...): returns the language-specific analysis object (JavaAnalysisorPythonAnalysis) backed by the appropriate static analysis engine. This is where the symbol table and call graph are produced.
flowchart LR
C["CLDK(language)"] --> A[".analysis(project_path)"]
A --> J[JavaAnalysis]
A --> P[PythonAnalysis]
J --> M[Typed models]
P --> M
Analysis levels. The depth of .analysis() is governed by analysis_level.
The default, AnalysisLevel.symbol_table, populates classes, methods, and
fields. Call-graph computation incurs additional cost: get_call_graph,
get_callers, and get_callees require AnalysisLevel.call_graph. Set it up
front when call relationships are needed.
Key .analysis() arguments
Section titled “Key .analysis() arguments”| Argument | Applies to | What it does |
|---|---|---|
project_path | all | Path to the project directory to analyze. |
analysis_level | all | AnalysisLevel.symbol_table (default) or AnalysisLevel.call_graph. The latter is required for call graphs, callers, and callees. |
analysis_backend_path | Java only | Path to a codeanalyzer-*.jar. Omit to auto-download. |
cache_dir | Python only | Directory for the codeanalyzer-python cache (virtualenv, CodeQL DB, analysis cache). Defaults to <project_path>/.codeanalyzer. |
use_codeql | Python only | When True (default), augments Jedi call-graph resolution with CodeQL for more complete edges. Set False for faster, Jedi-only analysis. |
See the generated reference below for the full signature,
including eager, target_files, analysis_json_path, and use_ray.
Worked example
Section titled “Worked example”The recurring sample project is
Apache Commons CLI, unpacked at
commons-cli.
Construct a Java analysis
Section titled “Construct a Java analysis”from cldk import CLDKfrom cldk.analysis import AnalysisLevel
analysis = CLDK(language="java").analysis( project_path="commons-cli", analysis_level=AnalysisLevel.call_graph, # needed for call-graph methods)
print(type(analysis).__name__) # JavaAnalysisprint(len(analysis.get_classes())) # 23The first run may download the CodeAnalyzer backend JAR; later runs reuse the
cache. From here, every method lives on analysis; see the
Java API reference for the full surface.
Construct a Python analysis
Section titled “Construct a Python analysis”from cldk import CLDK
analysis = CLDK(language="python").analysis(project_path="my_pkg")
print(type(analysis).__name__) # PythonAnalysisclasses = analysis.get_classes() # Dict[str, PyClass]Same two-step shape, same method names, only language changed. Methods
are documented on the Python API reference.
For an introduction, see What is CLDK? and the Quickstart. For task-oriented snippets, see Common tasks and the cocoa; the concepts page explains analysis levels and call graphs in detail, and the cheat sheet provides a one-page summary.
API reference
Section titled “API reference”The full generated reference follows.
Core CLDK module.
This module provides the top-level entry point for the Code Language Development
Kit (CLDK), a unified framework for performing static analysis across multiple
programming languages. The primary interface is the CLDK class, which
serves as a factory for creating language-specific analysis objects, tree-sitter
parsers, and sanitization utilities.
The CLDK supports the following languages
- Java: Full static analysis via CodeAnalyzer backend, including symbol tables, call graphs, and code metrics.
- Python: Static analysis via codeanalyzer-python backend with optional CodeQL-augmented call graph resolution.
- C: Basic analysis via libclang for parsing and extracting code structure.
Typical usage involves instantiating CLDK with a target language, then
calling analysis to obtain a language-specific analysis facade.
Note This module requires language-specific backends to be available:
- Java:
codeanalyzer-*.jar(auto-downloaded or specified via path)- Python:
codeanalyzer-python(auto-installed in virtualenv)- C:
libclang(must be installed on the system)
class CLDKCore class for the Code Language Development Kit (CLDK).
The CLDK class serves as the primary entry point and factory for all code analysis operations. It provides a unified interface for initializing language-specific analysis facades, tree-sitter parsers, and code sanitization utilities.
This class follows the factory pattern, where the language parameter
determines which concrete analysis implementation is returned by the
analysis, treesitter_parser, and tree_sitter_utils
methods.
Parameters:
| Name | Type | Description |
|---|---|---|
language | str | The target programming language for analysis. Supported values are "java", "python", and "c" (case-sensitive). |
Raises:
NotImplementedError: Raised by factory methods when the specified language is not yet supported.
See Also
JavaAnalysis: Java-specific analysis facade.PythonAnalysis: Python-specific analysis facade.CAnalysis: C-specific analysis facade.
Attributes
Section titled “Attributes”| Name | Type | Description |
|---|---|---|
language | str |
Methods
Section titled “Methods”CLDK.analysis
Section titled “CLDK.analysis”analysis(project_path: str | Path | None = None, source_code: str | None = None, eager: bool = False, analysis_level: str = AnalysisLevel.symbol_table, target_files: List[str] | None = None, analysis_backend_path: str | None = None, analysis_json_path: str | Path = None, cache_dir: str | Path | None = None, use_codeql: bool = True, use_ray: bool = False) -> JavaAnalysis | PythonAnalysis | CAnalysisInitialize and return a language-specific analysis facade.
This factory method creates an appropriate analysis object based on the language specified during CLDK initialization. The analysis facade provides methods for extracting code structure, call graphs, symbol tables, and other static analysis artifacts.
The method supports two modes of operation:
- Project mode: Analyze an entire project directory by providing
project_path. This is the recommended mode for comprehensive analysis. - Source code mode (Java only): Analyze a single source code string
by providing
source_code. Useful for quick analysis of code snippets.
Parameters:
| Name | Type | Description |
|---|---|---|
project_path | str | Path | None | Absolute or relative path to the project directory to analyze. The directory should contain source files in the target language. Mutually exclusive with source_code. |
source_code | str | None | Raw source code string to analyze (Java only). Useful for analyzing code snippets without a project structure. Mutually exclusive with project_path. Not supported for Python or C languages. |
eager | bool | If True, forces regeneration of all analysis caches and databases, ignoring any previously cached results. Defaults to False for incremental analysis performance. |
analysis_level | str | The depth of analysis to perform. Controls which analysis artifacts are generated. See AnalysisLevel for available options. Defaults to AnalysisLevel.symbol_table. |
target_files | List[str] | None | Optional list of specific file paths (relative to project_path) to analyze. When provided, only these files are included in the analysis, improving performance for large projects. Defaults to None (analyze all files). |
analysis_backend_path | str | None | Java only. Path to the directory containing the codeanalyzer-*.jar backend executable. If not provided, the JAR is automatically downloaded. Not valid for Python analysis; use cache_dir instead. |
analysis_json_path | str | Path | Path where the analysis database (typically analysis.json) should be persisted. Useful for caching analysis results between sessions. If not provided, a default location within the project is used. |
cache_dir | str | Path | None | Python only. Directory path for the codeanalyzer-python backend’s cache, including its virtualenv, CodeQL database, and analysis_cache.json. When omitted, defaults to <project_path>/.codeanalyzer. Ignored for Java and C. |
use_codeql | bool | Python only. If True (default), augments Jedi-based call graph resolution with CodeQL analysis for more complete call edges. Set to False for faster analysis using only Jedi. Ignored for Java and C. |
use_ray | bool | Python only. If True, enables Ray-based parallel processing for analysis. Recommended for very large projects where sequential Jedi/CodeQL analysis would be slow. Requires Ray to be installed. Defaults to False. Ignored for Java and C. |
Returns:
JavaAnalysis \| PythonAnalysis \| CAnalysis: A language-specific analysis facade instance: -JavaAnalysisfor Java projects -PythonAnalysisfor Python projects -CAnalysisfor C projects
Raises:
CldkInitializationException: Raised in the following cases: - Neitherproject_pathnorsource_codeis provided. - Bothproject_pathandsource_codeare provided. -source_codeis provided for Python analysis (not supported). -analysis_backend_pathis provided for Python analysis (usecache_dirinstead).NotImplementedError: If the language specified during CLDK initialization is not supported.
Note The analysis process may download or build backend tools on first run, which can take additional time. Subsequent runs use cached backends for faster startup.
See Also
AnalysisLevel: Available analysis depth options.JavaAnalysis: Java analysis methods.PythonAnalysis: Python analysis methods.
CLDK.treesitter_parser
Section titled “CLDK.treesitter_parser”treesitter_parser() -> TreesitterJavaReturn a Tree-sitter parser for the selected language.
Creates and returns a language-specific Tree-sitter parser instance that can be used for syntactic analysis, AST traversal, and code querying operations. Tree-sitter provides incremental parsing with excellent performance characteristics for real-time code analysis.
The returned parser provides methods for
- Parsing source code into an AST
- Running Tree-sitter queries to extract code patterns
- Extracting syntactic elements (methods, classes, imports, etc.)
- Performing lexical analysis
Returns:
TreesitterJava: A Tree-sitter parser wrapper for Java source code. The parser provides methods such asis_parsable,get_raw_ast,get_all_imports, and various code extraction utilities.
Raises:
NotImplementedError: If the language specified during CLDK initialization does not have a Tree-sitter parser implementation. Currently, only Java is supported.
Note The Tree-sitter parser operates at the syntactic level only and does not perform semantic analysis. For semantic information like resolved types or call graphs, use
analysisinstead.
See Also
TreesitterJava: Java Tree-sitter parser implementation.
CLDK.tree_sitter_utils
Section titled “CLDK.tree_sitter_utils”tree_sitter_utils(source_code: str) -> TreesitterSanitizerReturn Tree-sitter-based code sanitization utilities for the selected language.
Creates and returns a utility class that provides code transformation and sanitization operations using Tree-sitter for parsing. These utilities are particularly useful for preparing code for LLM consumption, test generation, and code analysis tasks.
The sanitization utilities provide operations such as
- Removing unused imports from source code
- Keeping only focal methods and their callees for context reduction
- Extracting and manipulating test assertions
- Identifying and removing dead code
Parameters:
| Name | Type | Description |
|---|---|---|
source_code | str | The source code string to initialize the utilities with. This code will be parsed and made available for transformation operations. Must be valid syntax for the target language. |
Returns:
TreesitterSanitizer: A utility wrapper that provides sanitization and transformation methods for Java source code, including: -keep_only_focal_method_and_its_callees-remove_unused_imports
Raises:
NotImplementedError: If the language specified during CLDK initialization does not have sanitization utilities implemented. Currently, only Java is supported.
Note The sanitization utilities modify code at the syntactic level using Tree-sitter patterns. For complex refactoring that requires semantic understanding, consider using the full analysis capabilities via
analysis.
See Also
TreesitterSanitizer: Java sanitization utility implementation.treesitter_parser: For raw Tree-sitter parsing without sanitization utilities.