Skip to content

codeanalyzer-ts

The codeanalyzer-ts backend analyzes TypeScript and TSX codebases and emits the canonical CLDK analysis.json: the same symbol-table and call-graph schema that Java and Python analyses produce. It is a standalone binary that plugs into the CLDK Python SDK.

codeanalyzer-ts uses ts-morph (the TypeScript compiler API) to parse and resolve a TypeScript project in a single pass. The same TypeChecker instance that builds the symbol table also resolves call targets, making the call-graph derivation cheap and precise.

flowchart LR
    A["TypeScript project<br/>+ tsconfig.json<br/>+ node_modules"] -->|materialize deps| B["npm install"]
    B -->|ts-morph Project| C["Parse &amp; resolve<br/>TypeChecker"]
    C -->|syntactic pass| D["Symbol Table<br/>Module/Class/Callable"]
    C -->|semantic pass| E["Call Graph<br/>tsc resolver + RTA"]
    D --> F["TSApplication"]
    E --> F
    F -->|analysis.json| G["Python SDK<br/>TypeScriptAnalysis"]

Before parsing, the analyzer ensures node_modules is present (via npm install) so the TypeChecker can resolve types. Omit this with --no-build if you have a prepared node_modules.

Walks the ts-morph AST and indexes all declarations (classes, methods, interfaces, enums, type aliases, namespaces, functions) into a flat symbol_table keyed by project-relative file paths.

  • Level 1 (default): ts-morph’s TypeChecker resolves each call site to its declared-type target (exact for static dispatch), plus RTA-style subtype expansion, polymorphic calls on interface/abstract receivers also emit edges to every instantiated concrete override. Provenance: tsc.
  • Level 2: CodeQL enrichment adds dynamic-dispatch and dataflow edges the checker cannot reach (currently stubbed; wired in src/semantic_analysis/codeql/).

Both maintain the no-dangling-edges invariant: every call-graph edge endpoint is a real Callable.signature.

  • bun ≥ 1.3, or Node ≥ 20
  • npm on PATH (to materialize dependencies)
Terminal window
cd codeanalyzer-ts
bun install
bun run build # → dist/codeanalyzer-typescript (standalone binary)

Or run from source without compilation:

Terminal window
bun run src/index.ts -i <project> -a 2

The analyzer accepts these command-line options:

Terminal window
codeanalyzer-typescript -i <path> [options]
OptionDescription
-i, --input <path>Project root to analyze (required)
-o, --output <dir>Write analysis.json to this directory; omit to emit compact JSON to stdout (used by the SDK)
-f, --format <fmt>Output format: json or msgpack (default: json)
-a, --analysis-level <n>1 = tsc resolver call graph + RTA (default); 2 = + CodeQL enrichment
-t, --target-files <paths...>Restrict analysis to specific files (for incremental builds)
--skip-tests / --include-testsSkip or include test files (default: skip)
--eager / --lazyForce clean rebuild vs. reuse cache (default: reuse)
--no-buildSkip npm install; assume node_modules is prepared
--no-phantomsDisable phantom (external) nodes for library imports
-c, --cache-dir <dir>Cache directory (default: <input>/.codeanalyzer)
-vIncrease verbosity; repeatable (e.g. -vv)

All diagnostics are written to stderr; stdout is reserved for JSON (unless -o is specified).

The backend emits analysis.json containing a TSApplication with three top-level keys:

{
"symbol_table": {
"src/index.ts": { ... },
"src/services/user.ts": { ... }
},
"call_graph": [
{ "source": "src/index.main", "target": "src/services/user.UserService.getUser", ... }
],
"external_symbols": {
"express.Router.get": { "name": "get", "module": "express", ... }
},
"entrypoints": {}
}

The symbol_table is a dictionary mapping project-relative file paths to TSModule objects. Each module contains:

  • imports and exports: statement-level detail (module specifier, binding name, type-only markers)
  • classes, interfaces, enums, type_aliases: top-level named types, each with its own structure
  • functions: module-level functions
  • namespaces: recursive containers (same shape as modules)
  • variables: module-scope declarations
  • comments: all JSDoc and inline comments
  • is_tsx, is_declaration_file: file metadata flags

TSClass represents a single class declaration:

{
name: "UserService",
signature: "src/services/user.UserService", // stable ID
comments: [ ... ],
decorators: [ ... ], // @Injectable, @Entity, etc.
base_classes: [ ... ], // extends + implements (mixed)
implements_types: [ ... ], // just interfaces
type_parameters: [ ... ], // <T, U extends Base>
methods: {
"getUser": { ... }, // TSCallable
"saveUser": { ... }
},
attributes: {
"logger": { type: "Logger", is_readonly: true }
},
is_abstract: false,
is_exported: true,
is_ambient: false,
start_line: 10,
end_line: 45
}

The call_graph is an array of edges (TSCallEdge) in identity-only form: each edge records the exact signatures of caller and callee (never dangling):

{
source: "src/services/user.UserService.getUser",
target: "src/db.Database.fetchById",
type: "CALL_DEP",
weight: 1,
provenance: ["tsc"], // "tsc" | "codeql" | other
tags: { "ts.dispatch": "rta" } // RTA subtype expansion tag
}

When the analyzer encounters a call to an imported library function, it creates a phantom node (a synthetic TSExternalSymbol) so the call-graph edge points somewhere real:

{
signature: "express.Router.get",
name: "get",
module: "express",
kind: "function",
is_external: true
}

Disable phantoms with --no-phantoms if you want the graph to be internal-only.

Unlike Java or Python, TypeScript brings several language-specific constructs that codeanalyzer-ts models as first-class citizens:

FeatureSupportDetails
InterfacesFullSeparate interfaces{} collection; queryable separately from classes
Type aliasesFullTSTypeAlias with aliased type text and type parameters
EnumsFullDiscriminated collection; members with computed/literal values
NamespacesFullRecursive containers; same structure as modules
Type parametersFull<T, U extends Base = Default> structured on classes, callables, aliases
DecoratorsStructuredname, qualified_name, positional_arguments[], keyword_arguments{} (for framework entrypoint detection)
ModifiersTyped fieldsaccessibility, is_static, is_async, is_readonly, is_abstract, is_optional, is_ambient
Overload signaturesFulloverload_signatures[] on the implementation callable
JSXTrackedis_tsx flag on modules
Declaration filesTrackedis_declaration_file flag; useful for filtering

TypeScript support in the Python SDK is coming soon. When available, you will be able to analyze TypeScript projects with the same CLDK facade as Java and Python:

from cldk import CLDK
from cldk.analysis import AnalysisLevel
analysis = CLDK(language="typescript").analysis(
project_path="/path/to/ts/project",
analysis_level=AnalysisLevel.call_graph,
)
# All familiar methods:
print(analysis.get_classes()) # Dict[str, TSClass]
graph = analysis.get_call_graph() # networkx.DiGraph

All signatures follow one canonical signatureOf() rule: project-relative file path (without extension) + dot-separated members. For example:

  • Module-level function: src/index.getConfig
  • Method: src/services/user.UserService.getUser
  • Constructor: src/models/User.User.constructor
  • Namespace member: src/api.v1.ApiRouter.get

This ensures every call-graph edge points to a real symbol-table entry.

The analyzer maintains a cache under <project>/.codeanalyzer/ (or -c <dir>), stamped with the analyzer version. The cache is invalidated on:

  • Analyzer version change
  • Any source file modification (content hash mismatch)
  • --eager flag (forces clean rebuild)

Use --lazy (default) to reuse the cache, or --target-files <list> to update only specific files.

  • Symbol table build: O(n) in source lines; ts-morph’s parse is linear
  • Call graph build (level 1): O(c) in call sites; the checker resolution is constant-time per site
  • CodeQL enrichment (level 2): Stubbed; will depend on CodeQL database construction time

For large projects (>100k LOC), expect analysis to take seconds to tens of seconds.

  • Symbol table: Stable; all TypeScript node kinds supported
  • Call graph (level 1): Stable; tsc resolution + RTA expansion proven correct
  • Call graph (level 2): Stubbed; infrastructure in place for CodeQL enrichment
  • Python SDK integration: In progress