Quickstart#
Build code analysis pipelines with LLMs in minutes.
In this quickstart guide, we will use the Apache Commons CLI example codebase to demonstrate code analysis pipeline creation using CLDK, with both local LLM inference and automated code processing.
Installing CLDK and Ollama
This quickstart guide requires CLDK and Ollama. Follow these instructions to set up your environment:
First, install CLDK and Ollama Python SDK:
Then, install Ollama:
Run the following command:
Or, download the installer from here.
Step 1: Set Up Ollama Server#
Model inference with CLDK starts with a local LLM server. We'll use Ollama to host and run the models.
Step 2: Pull the code LLM.#
-
Let's use the Granite 8b-instruct model for this tutorial:
-
Verify the installation:
You should see a response like:
Step 3: Download Sample Codebase#
We'll use Apache Commons CLI as our example Java project:
wget https://github.com/apache/commons-cli/archive/refs/tags/rel/commons-cli-1.7.0.zip -O commons-cli-1.7.0.zip && unzip commons-cli-1.7.0.zip
Let's set the project path for future reference:
About the Sample Project
Apache Commons CLI provides an API for parsing command line options. It's a well-structured Java project that demonstrates various object-oriented patterns, making it ideal for code analysis experiments.
Step 3: Create Analysis Pipeline#
What should I expect?
In about 40 lines of code, we will use CLDK to build a code summarization pipeline that leverages LLMs to generate summaries for a real world Java project! Without CLDK, this would require multiple tools and a much more complex setup.
Let's build a pipeline that analyzes Java methods using LLMs. Create a new file code_summarization.py
:
- Create a new instance of the CLDK class
- Create an
analysis
instance for the Java project. This object will be used to obtain all the analysis artifacts from the java project. - In a nested loop, we can quickly iterate over the methods in the project and extract the code body.
- CLDK comes with a number of treesitter based utilities that can be used to extract and manipulate code snippets.
- We use the
sanitize_focal_class()
method to extract the focal class for the method and sanitize any unwanted code in just one line of code. - We use the
granite-code:8b-instruct
model in this example. Try a different model from Ollama model library.
Running code_summarization.py
#
Save the file as code_summarization.py
and run it:
You'll see output like:
Method: parse
Summary: This method parses command line arguments using the specified Options object...
Method: validateOption
Summary: Validates if an option meets the required criteria including checking...
...
Step 5: Customize Analysis#
The pipeline can be customized in several ways:
Try different Granite model sizes:
Next Steps#
- Explore different analysis tasks like code repair, translation, test generation, and more...
- Create richer prompts with more analysis artifacts that CLDK provides.
- Implement batch processing for larger projects, or use the CLDK SDK to build custom analysis pipelines.