Building a Neo4j Recommendation Engine

Recommendation engines are one of the most practical real-world applications of graph databases. Whether it is Netflix suggesting movies, Amazon recommending products or LinkedIn proposing new connections, these systems rely heavily on traversing relationships between entities.

This is exactly where relational databases often begin to struggle. While SQL databases are exceptional for transactional systems and structured business data, relationship-heavy queries quickly become difficult to maintain and expensive to execute at scale.

In this article, we will build a simple Neo4j recommendation engine using Spring Boot, Docker and Cypher while also exploring practical engineering challenges encountered during implementation.

Neo4j Recommendation Engine

Why Graph Databases Matter for Neo4j Recommendation Engine

For decades, relational databases have been the default choice for backend systems. However, Neo4j recommendation engines expose one of their biggest weaknesses: recursive relationship traversal.

Imagine the following query:

“Find users who bought the same products as Alice, then recommend products those users purchased that Alice has not seen yet.”

In SQL, this often requires:

  • multiple JOINs
  • self-referencing many-to-many tables
  • deeply nested queries
  • expensive index scans

As the depth of traversal increases, the queries become increasingly difficult to read and optimize.

Graph databases approach this problem differently.

Neo4j stores relationships as first-class citizens. Instead of computing relationships dynamically through JOIN operations, nodes directly reference their neighbors through a mechanism called index-free adjacency.

This allows traversals to remain extremely fast even as the dataset grows.

In practical terms:

  • relational databases optimize rows
  • graph databases optimize relationships

For Neo4j recommendation engines, fraud detection, knowledge graphs and social networks, this architectural difference is extremely powerful.

Why We Chose Neo4j

Neo4j is currently the most popular graph database ecosystem and comes with several advantages for backend engineers:

  • intuitive Cypher query language
  • strong visualization tooling
  • mature Java ecosystem
  • official Spring Boot integrations
  • excellent Docker support

Most importantly, Cypher queries resemble natural graph thinking.

Instead of describing complex JOIN logic, you simply describe relationships.

For example:

MATCH (u:User {name: $name})-[:PURCHASED]->(p:Product)<-[:PURCHASED]-(other:User)-[:PURCHASED]->(rec:Product)
WHERE NOT (u)-[:PURCHASED]->(rec)
RETURN DISTINCT rec.name AS recommendation

Even without deep Neo4j knowledge, the traversal logic is relatively readable.

This is one of the biggest strengths of graph databases.

Designing the Neo4j Recommendation Engine

The recommendation engine itself is intentionally simple.

We model:

  • users
  • products
  • purchase relationships

The graph structure looks like this:

(User)-[:PURCHASED]->(Product)

The recommendation algorithm works as follows:

  1. Find products purchased by the target user
  2. Find other users who purchased the same products
  3. Find additional products purchased by those users
  4. Exclude products already purchased by the original user

This collaborative filtering pattern is one of the most common recommendation techniques.

The entire traversal is handled directly inside Neo4j using Cypher instead of implementing nested loops in Java.

This is an important engineering decision.

Many developers accidentally treat graph databases like relational databases by moving traversal logic into application code. This defeats much of the benefit of using a graph database in the first place.

The backend service should remain thin while the graph engine performs the heavy relationship traversal internally.

Docker-First Development Strategy

One practical challenge during development was the lack of a local Java environment.

Instead of installing:

  • JDK
  • Maven
  • Gradle
  • environment variables

we decided to fully containerize the application.

This turned out to be a much better long-term engineering decision.

The host machine only required Docker. Everything else was isolated inside containers.

We used a multi-stage Docker build:

FROM gradle:8-jdk17 AS build
WORKDIR /home/gradle/src
COPY --chown=gradle:gradle . .
RUN gradle build --no-daemon -x test

FROM eclipse-temurin:17-jre-focal
WORKDIR /app
COPY --from=build /home/gradle/src/build/libs/*.jar app.jar
ENTRYPOINT ["java", "-jar", "app.jar"]

This provides several major advantages:

  • identical development and production environments
  • simplified onboarding
  • reproducible builds
  • cleaner CI/CD pipelines
  • explicit dependency management

A new developer can simply run:

docker-compose up

and immediately start working.

This approach dramatically reduces the classic “works on my machine” problem.

Container Networking Challenges

One of the most common Docker mistakes occurred during setup. Initially, the Spring Boot application attempted to connect to Neo4j using:

localhost:7687

Inside Docker containers, localhost refers to the current container itself – not other containers.

This resulted in an UnknownHostException.

The solution was to place both services inside the same Docker network and communicate using service names.

Example:

services:
neo4j-db:
container_name: neo4j-db
networks:
- graph-net

app:
environment:
- SPRING_NEO4J_URI=bolt://neo4j-db:7687
networks:
- graph-net

Docker Compose automatically provides internal DNS resolution between services. This is one of the most important concepts in containerized backend systems.

Spring Boot Transaction Pitfalls

Another interesting issue appeared during integration with Spring Data Neo4j.

Spring Boot auto-configured:

  • TransactionManager
  • ReactiveTransactionManager

When using @Transactional, Spring could not determine which transaction manager to use and threw:

NoUniqueBeanDefinitionException

The fix was straightforward:

@Transactional("transactionManager")
public void seedData() {
...
}

This is an important lesson for senior backend engineers:

explicit configuration is often safer than relying entirely on framework magic.

As systems become more complex and involve multiple databases or paradigms, explicit transaction boundaries become critical for correctness and maintainability.

Why We Used Neo4jClient Instead of Repositories

Spring Data repositories are excellent for standard CRUD operations. However, Neo4j recommendation engines often require highly customized graph traversals.

For this reason, we used:

  • Neo4jClient
  • raw Cypher queries

instead of relying entirely on Object Graph Mapping abstractions.

This provided:

  • more control
  • better query visibility
  • easier optimization
  • simpler debugging

For graph-heavy applications, this hybrid approach is often preferable.

Integration Testing with Real Graph Traversals

One major engineering decision was avoiding mocks for graph testing. Mocking a graph database removes most of the value of testing relationship traversal behavior.

Instead, we introduced a dedicated Dockerized test runner:

test-runner:
image: gradle:8-jdk17
volumes:
- .:/home/gradle/src
environment:
- SPRING_NEO4J_URI=bolt://neo4j-db:7687
command: gradle test --no-daemon

The tests performed:

  • graph seeding
  • real Cypher execution
  • API validation
  • traversal verification

This allowed us to detect:

  • invalid Cypher syntax
  • transaction issues
  • networking problems
  • incorrect traversal logic

Testing against a real graph database provides significantly higher confidence compared to mocked repositories.

When Graph Databases Make Sense

Graph databases are not replacements for relational databases. They are specialized tools optimized for highly connected data.

Neo4j makes the most sense when:

  • relationships are central to the domain
  • traversals become complex
  • JOIN-heavy queries dominate
  • recommendation systems are required
  • knowledge graphs are involved

Typical use cases include:

  • recommendation engines
  • fraud detection
  • social networks
  • AI knowledge graphs
  • dependency mapping
  • supply chain analysis

For standard CRUD business systems, PostgreSQL or MySQL are often still the better choice.

Final Thoughts

Building a Neo4j recommendation engine demonstrates one of the clearest strengths of graph databases: expressing complex relationship traversal in a natural and efficient way.

The project also highlighted several practical backend engineering lessons:

  • containerized development environments
  • Docker networking
  • Spring transaction management
  • graph query optimization
  • integration testing strategies

Most importantly, it showed how moving traversal complexity into the database engine itself can dramatically simplify backend application logic.

As AI systems, Neo4j recommendation engines and knowledge graphs continue growing in importance, graph databases are becoming increasingly valuable tools for backend engineers to understand.

You can check the project in this GitHub repository.

Leave a Reply

Your email address will not be published. Required fields are marked *