Why and How to use Spring AI: Interacting with llama3 and Ollama
Overview
Introduction
Spring AI is another capability built into the Spring ecosystem. In short, it’s a new module we can add to our Spring apps.
You can find all the components of Spring AI on the official site.
Now, why use Spring AI? As per the Spring AI developers, one possible reason is:
At its core, Spring AI addresses the fundamental challenge of AI integration: Connecting your enterprise Data and APIs with the AI Models.
Spring AI is possibly the first capability written in Java to work with our data and AI capabilities. Before that, we’d have to build things from scratch, for instance, connecting to various AI models and configuring the communication with them. Now, you can simply inject a Spring bean and use its methods.
Thus, for all the non-AI engineers out there who maintain existing Spring apps, you can now simply add one dependency to begin taking advantage of AI stuff in your apps (or, at the very least, you can also be part of the AI hype). Coming from an engineer with a history with Java, I find that reason itself is enough to try Spring AI.
In this series of tutorials, I’ll show you some of its capabilities using a movie recommendation system from scratch with tools like AI chat prompting, vector databases, and AI functions. Thus, along the way, you’ll see how to connect the Spring AI module to an existing enterprise Java application. So, I hope at the very end of this series of tutorials, you’ll be able to use AI capabilities to solve different business problems.
A brief illustration of Spring AI usage
At this point, I’ve already convinced you to try out Spring AI, right? Now, it’s time to build something with it and consider its usage in practice.
For this tutorial, I’ll pick the llama3 and Ollama as our AI tooling. Hence, the first step is to download and install Ollama locally. You can find the available download images on their website. Then, after downloading, you can run it with:
1ollama run llama3
After running and downloading the LLM data, you will see that llama3 is configured correctly in localhost.
Add the Spring AI Ollama dependency
Now, you can add the _spring-ai-ollama-spring-boot-starter _to your existing Spring project or initialize a new one with it:
1<dependency>
2 <groupId>group.springframework.ai</groupId>
3 <artifactId>spring-ai-ollama-spring-boot-starter</artifactId>
4 <version>1.1.0</version>
5</dependency>
Configure the local AI in the application.yml file
Then, add the following content to your application.yml file:
1spring:
2 ai:
3 ollama:
4 base-url: http://localhost:11434
5 chat:
6 options:
7 model: llama3
The configuration above tells Spring which base-url to boot the local LLM and use llama3 AI model.
Call the model with the user message
After configuring the prerequisites and dependencies, we can code something. Let’s start with a service class that injects the OllamaChatModel to call llama3:
1@Service
2public class MovieRecommendationService {
3
4 private final OllamaChatModel ollamaChatClient;
5
6 public MovieRecommendationService(OllamaChatModel ollamaChatClient) {
7 this.ollamaChatClient = ollamaChatClient;
8 }
9
10 public String recommend(String genre) {
11 var generalInstructions = String.format("Give me 5 movie recommendations on the genre %s", genre);
12
13 var currentPromptMessage = new UserMessage(generalInstructions);
14
15 var prompt = new Prompt(currentPromptMessage);
16
17 return ollamaChatClient.call(prompt).getResult().getOutput().getContent();
18 }
19}
The call() method simply calls the LLM via OllamaChatModel, gets the content generated by the AI, and returns it:
- The UserMessage class wraps the user message, so Spring handles it as a user message role in the prompt sent.
- Then, we build a Prompt object to send it to the chat model. The call() method response is a list of Generation, which is an object containing the LLM’s text content generation and metadata.
Adding custom messages to the prompt
To improve the model’s output, you can use internal system prompt messages to provide extra instructions.
Firstly, create the prompt constant in the service class:
1private static final String INSTRUCTIONS_PROMPT_MESSAGE = """
2
3You're a movie recommendation system. Recommend exactly 5 movies on `movie_genre`=%s.
4
5Write the final recommendation using the following template:
6 Movie Name:
7 Synopsis:
8 Cast:
9""";
The prompt above injects a movie_genre variable into a prompt that tells the LLM that we want a list of recommendations with exactly 5 movies in the given format. That helps to make the responses to be more deterministic.
Thus, change the recommend() method to use the new UserMessage:
1var generalInstructions = new UserMessage(String.format(INSTRUCTIONS_PROMPT_MESSAGE, genre));
To customize even further the prompt, you can add a message telling the model to suggest movies to a list of movies that the user has already watched and liked:
1private static final String EXAMPLES_PROMPT_MESSAGE = """
2 Use the `movies_list`
3 below to read each `movie_name`.
4 Recommend similar movies to the ones presented in `movies_list`
5 that falls exactly or close to the `movie_genre`provided.
6 `movies_list`:
7 %s
8""";
Then, in the service class, create a new method that accepts a list of movies:
1public String recommend(String genre, List < String > movies) {
2
3 var moviesCollected = movies.stream()
4 .collect(joining("\n`movie_name`=", "\n", ""));
5
6 var generalInstructions = new UserMessage(String.format(INSTRUCTIONS_PROMPT_MESSAGE, genre));
7
8 var examplesSystemMessage = new SystemMessage(String.format(EXAMPLES_PROMPT_MESSAGE, moviesCollected));
9
10 var prompt = new Prompt(List.of(generalInstructions, examplesSystemMessage));
11
12 return ollamaChatClient.call(prompt)
13 .getResult()
14 .getOutput()
15 .getContent();
16}
The new variable examplesSystemMessage stores a SystemMessage object with instructions to recommend movies based on the user's movie history.
Thus, the prompt is now composed of two instructions: one tells what the movie genre and output format are, and the other indicates the user history. Additionally, since the Prompt constructor accepts a list, you can give any number of instructions to the final prompt, which gives you a lot of flexibility.
Wrapping the model in a REST API
Now, it’s the fun part: We’ll play with the LLM locally. We'll use a REST API to interact with the service you’ve created in previous sections.
Thus, firstly, create the API request body class:
1@Data
2@AllArgsConstructor
3@NoArgsConstructor
4public class MovieRecommendationRequest {
5 @JsonProperty("genre")
6 private String genre;
7
8 @JsonProperty("movies")
9 private List<String> movies;
10}
Secondly, create the response body:
1
2@Data
3@AllArgsConstructor
4public class MovieRecommendationResponse {
5 private String message;
6}
Finally, create the Spring _RestController _class:
1@RestController
2@RequiredArgsConstructor
3@RequestMapping("/movies")
4public class MovieRecommendationController {
5
6 private final MovieRecommendationService movieRecommendationService;
7
8 @PostMapping("/recommend")
9 public MovieRecommendationResponse recommend(@RequestBody MovieRecommendationRequest request) {
10 if (request.getGenre() == null || request.getGenre().isEmpty()) {
11 throw new IllegalArgumentException("Parameter genre is mandatory to recommend movies");
12 }
13
14 if (!isEmpty(request.getMovies())) {
15 return new MovieRecommendationResponse(movieRecommendationService.recommend(request.getGenre(), request.getMovies()));
16 }
17
18 return new MovieRecommendationResponse(movieRecommendationService.recommend(request.getGenre()));
19
20 }
21}
The class first injects the MovieRecommendationService bean and uses it in the POST /movies/recommend resource.
Then, the recommend() method receives a MovieRecommendationRequest and checks if a movie_genre was provided. If not, it fails fast with an error to not call the LLM (which is a complex and expensive operation) with bad input data.
Then, the method checks if the caller provided a movie_list containing the user's movie history, so we use the recommend(String, List<String>) version of the method. Otherwise, we use the simpler recommend(String) method.
With the REST API created and the application running, you can play with the model a bit by calling the following cURL:
1curl --location 'http://localhost:8080/movies/recommend' \
2--header 'Content-Type: application/json' \
3--data '{
4 "genre": "thriller",
5 "movies": [
6 "Heat",
7 "Training Day",
8 "Eyes Wide Shut"
9 ]
10}'
When I called /recommend, I got the following response:
1{
2"message": "\n\n \
3**Recommendation 1:\nMovie Name: Se7en**\n \
4Synopsis: Two detectives, one a veteran and the other a rookie, are tasked with solving a series of gruesome murders that all involve a different type of body part.\n \
5Cast: Brad Pitt, Morgan Freeman\n\n \
6**Recommendation 2:\nMovie Name: Memento**\n \
7Synopsis: A former insurance investigator suffering from short-term memory loss tries to avenge his wife's murder.\n \
8Cast: Guy Pearce, Carrie-Anne Moss\n\n \
9**Recommendation 3:\nMovie Name: The Silence of the Lambs**\n \
10Synopsis: An FBI trainee is assigned to investigate a series of gruesome murders that all involve a particular type of victim. She seeks the help of imprisoned psychiatrist Hannibal Lecter.\n \
11Cast: Jodie Foster, Anthony Hopkins\n\n \
12**Recommendation 4:\nMovie Name: Psycho**\n \
13Synopsis: A young woman takes out a mortgage on her aunt's motel, only to find herself being stalked by a disturbed owner who has a penchant for taxidermy and a violent temper.\n \
14Cast: Janet Leigh, Anthony Perkins\n\n \
15**Recommendation 5:\nMovie Name: The Prestige**\n \
16Synopsis: Two magicians engage in competitive one-upmanship with tragic consequences. The film explores the art of magic, the lengths to which people will go to achieve their goals, and the ultimate cost of obsession.\n \
17Cast: Hugh Jackman, Christian Bale"
18}
Since I’ve watched all the recommended movies, I can tell they’re indeed good recommendations! They are all thriller movies similar to the ones on my movie list, so the model was accurate.
It’s important to mention that the llama3 knowledge is limited to December 2023, so it’ll definitely not recommend any newer movies. It is possible to improve the prompt even further by getting new movies as recommendations using other techniques like Retrieval Augmented Generation (RAG). Additionally, injecting Function calls to get a movie list from an external API could help get more up-to-date answers.
Conclusion
In this tutorial, I’ve shown you the simplest capability of Spring AI: talking to a model using prompts embedded with user and system messages.
The next step is to improve the model recall by creating context-aware conversations about movie recommendations.