DEV Community

Cover image for Build a RAG Pipeline in Java (Text Vector LLM, No Paid APIs)
Sanjay Ghosh
Sanjay Ghosh

Posted on

Build a RAG Pipeline in Java (Text Vector LLM, No Paid APIs)

Ever asked an LLM a question about your own data and received an incorrect or generic answer?

That’s because Large Language Models (LLMs) don’t know your private data.

In this article, we’ll build a complete Retrieval-Augmented Generation (RAG) pipeline using:

  • Java
  • PostgreSQL (with vector support)
  • Ollama (local LLM + embeddings)

πŸ‘‰ No OpenAI / No paid APIs
πŸ‘‰ Fully local
πŸ‘‰ Practical and production-relevant

🧠 What is RAG?

Retrieval-Augmented Generation (RAG) is an architecture that improves LLM responses by:

Retrieving relevant data from a knowledge source
Passing that data to the LLM
Generating an answer grounded in that context

In simple terms:

Instead of guessing, the model first looks up relevant information and then answers.

πŸ” Why Do We Need RAG?

LLMs are powerful, but they have limitations:

  • ❌ They don’t know your private/company data
  • ❌ Their knowledge is static
  • ❌ They can hallucinate

RAG solves this by combining:

  • Your data (database)
  • Smart retrieval (vector search)
  • LLM reasoning (generation)

πŸ“Š RAG Flow (This Project)

We will implement this pipeline:
Text β†’ Embedding β†’ Store in DB

Query β†’ Embedding
↓
Vector Search (Top K)
↓
Pass to LLM
↓
Final Answer

βš™οΈ Prerequisites

1. Install PostgreSQL

Make sure PostgreSQL is installed and running.

2. Install Ollama (Local LLM)

sudo apt-get install zstd
curl -fsSL https://ollama.com/install.sh | sh

3. Pull Required Models

# LLM (for answer generation)
ollama pull llama3

# Embedding model
ollama pull nomic-embed-text

4. Verify Installation

ollama run llama3
If it responds, you’re ready.

🟒 Phase 1: Indexing (Store Data)

In this phase, we:

  1. Convert text β†’ vector (embedding)
  2. Store it in PostgreSQL

Why Embeddings?

Embeddings convert text into numbers so we can measure similarity.
Example:
"OAuth authentication"
β†’ [0.12, -0.98, 0.45, ...]

Database Table

CREATE TABLE text_embeddings (
id SERIAL PRIMARY KEY,
content TEXT,
embedding VECTOR(768)
);

Key Class: EmbeddingService.java

Calls Ollama
Converts text β†’ vector

Snippet

ClassicHttpResponse response = (ClassicHttpResponse) Request.post("http://localhost:11434/api/embeddings")
                .bodyString(body.toString(), ContentType.APPLICATION_JSON)
                .execute()
                .returnResponse();
Enter fullscreen mode Exit fullscreen mode

This returns a numerical vector representation of the input text, which we store in the database.

Key Class: StorageService.java

Stores text + embedding into PostgreSQL

PreparedStatement ps = conn.prepareStatement(
            "INSERT INTO text_embeddings (content, embedding) VALUES (?, ?::vector)"
        );

        ps.setString(1, text);
        ps.setString(2, vector);

        ps.executeUpdate();
Enter fullscreen mode Exit fullscreen mode

Each piece of text is stored along with its vector representation.

πŸ”΅ Phase 2: Query (RAG Flow)

Step 1: User Query

"What is OAuth?"

Step 2: Convert Query β†’ Embedding

Same process as storing text.

Step 3: Retrieve Relevant Data

SELECT content
FROM text_embeddings
ORDER BY embedding <-> ?::vector
LIMIT 3;
πŸ‘‰ This finds the most similar text chunks

Key Class: Retriever.java

This is the R (Retrieval) in RAG.

        PreparedStatement ps = conn.prepareStatement(
            """
            SELECT content
            FROM text_embeddings
            ORDER BY embedding <-> ?::vector
            LIMIT ?
            """
        );

        ps.setString(1, vector);
        ps.setInt(2, topK);

        ResultSet rs = ps.executeQuery();
Enter fullscreen mode Exit fullscreen mode

Step 4: Generate Answer Using LLM

We pass retrieved data to the LLM:

Context:
OAuth 2.0 is an authorization framework...

Question:
What is OAuth?

πŸ‘‰ The LLM generates a clean answer.

Key Class: LLMService.java

This is the G (Generation) in RAG.
Passing Context to the LLM

        String prompt = """
                        Answer briefly in 2-3 sentences.

                        Context:
                        %s

                        Question:
                        %s
                        """.formatted(context, query);
Enter fullscreen mode Exit fullscreen mode

We inject retrieved data into the prompt so the LLM generates grounded answers.

πŸ§ͺ Sample Output
--- Retrieved Context ---
OAuth 2.0 is an authorization framework.
JWT is used for secure authentication.

--- Final Answer ---
OAuth is an authorization framework used to grant access to resources...
🧠 What’s Really Happening?

This is the most important part to understand:

Component Role
Database Stores knowledge
Retriever Finds relevant information
LLM Generates answer

πŸ‘‰ The LLM does NOT retrieve data
πŸ‘‰ The database does NOT generate answers

πŸ’» Full Code

The project includes:

  • EmbeddingService.java
  • StorageService.java
  • Retriever.java
  • LLMService.java
  • RAGApp.java
  • pom.xml

πŸ‘‰ GitHub Repository

https://github.com/knowledgebase21st/Software-Engineering/tree/dev/AI/RAG

πŸš€ Why This Approach is Powerful

  • Works with your own data
  • Reduces hallucination
  • Fully offline (with Ollama)
  • Production-ready pattern

βœ… Conclusion

We built a complete RAG pipeline using Java, PostgreSQL, and Ollama.

This approach combines:

  • Your data
  • Smart retrieval
  • LLM reasoning

Result:
Accurate, context-aware answers using your own knowledge base.

Top comments (0)