Create a movie recommendation system using the RAG technique, Spring AI, and neo4j vector store

Image from https://unsplash.com/.

Overview

Introduction

In general, LLMs can’t retrieve contextual information very well.

This happens due to the knowledge limitation of LLMs: they only know things up to a certain point in time, and the public ones don’t have access to private information. Hence, this significantly limits the assertiveness of the responses.

In this article, I’ll show you how to use the Retrieval Augmented Generation (RAG) technique to improve the LLM efficiency in getting contextual information, with examples using Spring AI and neo4j vector store.

What is RAG and its advantages?

The RAG technique is simple: we augment the prompt with contextual information that the LLM might not know with its built-in knowledge. Internal company documents and policies, newly launched reports, personal information, and protected datasets are examples of contextual information. There’s no way that any public LLM knows that information. So, the main advantage of using RAG is that we can use LLMs more assertively based on a specific context without the LLM knowing that information. We don’t need to retrain the AI when using that technique.

The RAG technique is divided into a few steps. At a high level, it uses a pipeline that:

  1. Extract text from documents
  2. Transform the documents’ text into vectors
  3. Load vectors into a vector store

After the information passes the pipeline, we can query them using a similarity search, which I’ll show in the following sections.

Leveraging the RAG technique for movie recommendation with Spring AI

In this section, you’ll learn how to use Spring AI to extract, transform, and load part of the information available in GroupLens’s movie dataset into the Neo4j vector database. Then, I’ll show you how to use similarity search to stuff the prompt.

Import the required dependencies

The first step is to import the necessary dependencies to work with Ollama:

1<dependency>
2    <groupId>org.springframework.ai</groupId>
3    <artifactId>spring-ai-ollama-spring-boot-starter</artifactId>
4</dependency>

Then, also add the latest neo4j starter dependency:

1<dependency>
2    <groupId>org.springframework.ai</groupId>
3    <artifactId>spring-ai-neo4j-store-spring-boot-starter</artifactId>
4</dependency>

With only these two libraries, you can use Spring Boot auto-configuration for Ollama's models and a neo4j client.

Configure the neo4j vector store

With the dependencies set, we can create a simple docker-compose file to initialize neo4j-db locally:

1version: '3.7'
2services:
3  neo4j-db:
4    image: neo4j:5.19.0
5    environment:
6      - NEO4J_AUTH=neo4j/admin
7    ports:
8      - "7474:7474"
9      - "7687:7687"

The file is pretty simple: we pull the neo4j:5.19 docker image and configure the vector database to listen to port 7687 and the web browser admin to listen to port 7474. Then, we set the neo4j database username=neo4j and password=admin.

One reason for choosing neo4j over other vector databases is that its vector indexes use up to 4096 dimensions, which are fully compatible with the llama3 4096-dimension transformer.

Configure the application.properties file

Now, it’s time to configure the ollama and neo4j dependencies inside the application:

 1spring:
 2  neo4j:
 3    uri: neo4j://127.0.0.1:7687
 4    authentication:
 5      username: neo4j
 6      password: admin
 7  ai:
 8    vectorstore:
 9      neo4j:
10        embedding-dimension: 4096
11        initialize-schema: true
12    ollama:
13      base-url: http://localhost:11434
14      chat:
15        options:
16          model: llama3

Let’s break the file above into parts:

  1. The uri and authentication properties set the database server and the authentication credentials, respectively.
  2. Then we say that we want neo4j as our vector store. Additionally, we set the embedding-dimension to 4096 (since neo4j 5.1.9 supports it and is compatible with llama3) and initialize-schema=true to create the schema on application startup automatically.
  3. Finally, we tell Spring to listen to localhost:11434 (since we’ll start Ollama locally) and set the model to llama3.

Extract and load the dataset to neo4j vectorstore

After configuring lots of stuff, let’s finally start coding something. Our movie recommender works similarly to a chatbot.

In this section, you’ll learn how to load a movie dataset into the vector database.

Firstly, download the reduced dataset file I’ve generated from the MovieLens to do so. You can also get the full dataset and transform it into JSON, but remember it will take longer to process.

Then, copy the dataset JSON file to the classpath inside resources/datasets.

After all that, create the service that loads the data as tokens into the vector database:

 1@Service
 2public class DataLoaderService {
 3    //all-arg constructor
 4
 5    @Value("classpath:/datasets/test.json")
 6    Resource movieDataset;
 7
 8    final VectorStore vectorStore;
 9
10    public void load() {
11
12        DocumentReader reader = new JsonReader(movieDataset);
13        var movies = reader.read();
14
15        TokenTextSplitter tokenTextSplitter = new TokenTextSplitter();
16        var tokenizedDocs = tokenTextSplitter.apply(movies);
17
18        vectorStore.accept(tokenizedDocs);
19    }
20}

The service injects the JSON file as a Spring Resource and the bean VectorStore with the properties defined in the application.properties file. Spring AI also accepts other file formats, such as pure text, PDF, DOCX, HTML, and others.

Finally, we instantiate a JsonReader, read the content in the movieDataset file, and split it into tokens using TokenTextSplitter’s apply(). Finally, we use VectorStore’s accept() to save the tokenized documents in neo4j.

Use similarity search to improve movie recommendation

With the database populated with vectors, it’s possible to execute similarity search queries with them. Let’s see how that works:

 1@Service
 2public class MovieRecommendationService {
 3    //all-arg constructor
 4
 5    final OllamaChatModel ollamaChatClient;
 6
 7    final VectorStore vectorStore;
 8
 9    static final String INSTRUCTIONS_PROMPT = """
10        You're a movie recommendation system. Recommend 5 movies on `movie_genre`=%s.
11        Write the final recommendation using the following template:
12        Movie Name:
13        Synopsis:
14        Cast:
15        """;
16
17    static final String EXAMPLES_PROMPT = """
18        Use movie tags and ratings and `movie_genre` to recommend the best movies based on `movie_list`.
19        `movies_list`:
20
21        %s
22        """;
23
24    static final String SIMILARITY_PROMPT = """
25        Use the `documents` below to provide accurate answers, but act as if you know this information.
26        `documents`
27
28        {documents}
29        """;
30
31   public String recommend(String genre, List&lt;String> movies) {
32        var moviesCollected = movies.stream()
33            .collect(joining("\n`movie_name`=", "\n", ""));
34
35        var generalInstructions = new UserMessage(String.format(INSTRUCTIONS_PROMPT_MESSAGE, genre));
36        var examplesSystemMessage = new SystemMessage(String.format(EXAMPLES_PROMPT_MESSAGE, moviesCollected));
37
38        var similarDocuments = vectorStore.similaritySearch(generalInstructions.getContent());
39        var documentsMessage = similarDocuments.stream().map(Document::getContent).collect(joining(","));
40
41        var similaritySystemMessage = new SystemPromptTemplate(SIMILARITY_PROMPT).createMessage(Map.of("documents", documentsMessage));
42        var prompt = new Prompt(List.of(generalInstructions, examplesSystemMessage, similaritySystemMessage));
43
44        return ollamaChatClient.call(prompt)
45            .getResult()
46            .getOutput()
47            .getContent();
48    }
49}

The service above starts by injecting the beans to call the ollama client and the neo4j vector store.

Then, we create 3 prompts:

  1. The INSTRUCTIONS_PROMPT gives basic instructions about the output format and the LLM mission.
  2. The EXAMPLES_PROMPT uses the user's movie history to tell the LLM to suggest similar movies.
  3. The SIMILARITY_PROMPT is used to append the contextual information obtained from the vector store.

The recommend() method first creates a user message with the general instructions prompt and the user’s movie history. Then, it queries the vector store using similaritySearch() to obtain similar information to the prompt content provided. Finally, we append the information retrieved to the Prompt object to call the ollamaChatClient.

The similarity search works similarly to the vector distance you’ve learned in linear algebra classes. Given the vector stored in neo4j, we calculate how far away it is from the vectorized user input. If the distance is small, then the text context should be somehow similar. This is an excellent place to visualize word distance in the vector space.

The search distance depends on how you configure the vector store. In Neo4j, this is the spring.ai.vectorstore.neo4j.distance-type property. One example of value is cosine, which provides the cosine distance.

Also, the results from the similarity search are used in the system message because the system is the owner of the information, not the end-user. As developers of the system, we write the prompt, query the vector database using similarity search, and stuff the results on the prompt to get better results with the RAG technique.

Expose the movie recommendation API

After creating the service that generates recommendations based on the user input, we can create the API to return movie recommendations to the end-user.

Firstly, create the request object of the API:

1public class MovieRecommendationRequest {
2    //getters, setters, constructors
3    @JsonProperty("genre")
4    String genre;
5
6    @JsonProperty("movies")
7    List<String> movies;
8}

Also, create the response:

1public class MovieRecommendationResponse {
2    //getters, setters, constructors    
3    String message;
4}

Then, create the resource /movies:

 1@RestController
 2@RequiredArgsConstructor
 3@RequestMapping("/movies")
 4public class MovieRecommendationController {
 5
 6    final MovieRecommendationService movieRecommendationService;
 7
 8    final DataLoaderService dataLoaderService;
 9
10    @PostMapping("/recommend")
11    public MovieRecommendationResponse recommend(@RequestBody MovieRecommendationRequest request) {
12        if (request.getGenre() == null || request.getGenre().isEmpty() || request.getMovies == null || request.getMovies().isEmpty()) {
13            throw new IllegalArgumentException("Parameter genre and movies are mandatory to recommend movies");
14        }
15
16        var message = movieRecommendationService.recommend(request.getGenre(), request.getMovies());
17
18        return new MovieRecommendationResponse(message);
19    }
20}

The /recommend resource validates if the genre and movies parameters are valid. Then, it calls the previously created MovieRecommendationService.recommend() to get a proper response.

Test movie recommendation interactions with the LLM

Now, let’s test some interaction with the created service!

It’s important to mention that the application consumes a lot of hardware. Your computer might face out-of-memory errors and blackouts when you don’t have a dedicated GPU. Additionally, the response time of the APIs created, especially the data loader service, might be too high due to the algorithmic complexity of its operations. Yeah, sometimes AI is not that exciting.

Anyways, the first thing you can do is to run the /reload_datasets to transform the dataset into tokens and save them in the vector store:

1curl --location --request POST 'http://localhost:8080/movies/reload_datasets' \
2--data ' '

Then, you can call the /recommend route with a set of movies that you’ve watched and a genre to start getting recommendations:

 1curl --location 'http://localhost:8080/movies/recommend' \
 2--header 'Authorization: Bearer null' \
 3--header 'Content-Type: application/json' \
 4--data '{
 5    "genre": "thriller",
 6    "movies": [
 7        "Heat",
 8        "Training Day",
 9        "Eyes Wide Shut"
10    ]
11}'

The response will also consider the dataset you provided in the vector store, not only the LLM built-in knowledge. This is a nice feature because you can customize the responses to your needs. For instance, our movie recommender prioritizes what our dataset tells that are good movie recommendations based on the rating and tags.

Conclusion

In this article, you’ve learned at a high level what the Retrieval Agumented Generation technique is and how we can use it to improve LLM responses on getting contextual information.

You also learned how to implement a movie recommendation system that leverages the RAG technique for a movie recommendation system using Spring AI, neo4j vector store, and the MovieLens dataset.

Posts in this series