Ever asked an LLM a question about your own data and received an incorrect or generic answer?
Thatβs because Large Language Models (LLMs) donβt know your private data.
In this article, weβll build a complete Retrieval-Augmented Generation (RAG) pipeline using:
- Java
- PostgreSQL (with vector support)
- Ollama (local LLM + embeddings)
π No OpenAI / No paid APIs
π Fully local
π Practical and production-relevant
π§ What is RAG?
Retrieval-Augmented Generation (RAG) is an architecture that improves LLM responses by:
Retrieving relevant data from a knowledge source
Passing that data to the LLM
Generating an answer grounded in that context
In simple terms:
Instead of guessing, the model first looks up relevant information and then answers.
π Why Do We Need RAG?
LLMs are powerful, but they have limitations:
- β They donβt know your private/company data
- β Their knowledge is static
- β They can hallucinate
RAG solves this by combining:
- Your data (database)
- Smart retrieval (vector search)
- LLM reasoning (generation)
π RAG Flow (This Project)
We will implement this pipeline:
Text β Embedding β Store in DB
Query β Embedding
β
Vector Search (Top K)
β
Pass to LLM
β
Final Answer
βοΈ Prerequisites
1. Install PostgreSQL
Make sure PostgreSQL is installed and running.
2. Install Ollama (Local LLM)
sudo apt-get install zstd
curl -fsSL https://ollama.com/install.sh | sh
3. Pull Required Models
# LLM (for answer generation)
ollama pull llama3
# Embedding model
ollama pull nomic-embed-text
4. Verify Installation
ollama run llama3
If it responds, youβre ready.
π’ Phase 1: Indexing (Store Data)
In this phase, we:
- Convert text β vector (embedding)
- Store it in PostgreSQL
Why Embeddings?
Embeddings convert text into numbers so we can measure similarity.
Example:
"OAuth authentication"
β [0.12, -0.98, 0.45, ...]
Database Table
CREATE TABLE text_embeddings (
id SERIAL PRIMARY KEY,
content TEXT,
embedding VECTOR(768)
);
Key Class: EmbeddingService.java
Calls Ollama
Converts text β vector
Snippet
ClassicHttpResponse response = (ClassicHttpResponse) Request.post("http://localhost:11434/api/embeddings")
.bodyString(body.toString(), ContentType.APPLICATION_JSON)
.execute()
.returnResponse();
This returns a numerical vector representation of the input text, which we store in the database.
Key Class: StorageService.java
Stores text + embedding into PostgreSQL
PreparedStatement ps = conn.prepareStatement(
"INSERT INTO text_embeddings (content, embedding) VALUES (?, ?::vector)"
);
ps.setString(1, text);
ps.setString(2, vector);
ps.executeUpdate();
Each piece of text is stored along with its vector representation.
π΅ Phase 2: Query (RAG Flow)
Step 1: User Query
"What is OAuth?"
Step 2: Convert Query β Embedding
Same process as storing text.
Step 3: Retrieve Relevant Data
SELECT content
FROM text_embeddings
ORDER BY embedding <-> ?::vector
LIMIT 3;
π This finds the most similar text chunks
Key Class: Retriever.java
This is the R (Retrieval) in RAG.
PreparedStatement ps = conn.prepareStatement(
"""
SELECT content
FROM text_embeddings
ORDER BY embedding <-> ?::vector
LIMIT ?
"""
);
ps.setString(1, vector);
ps.setInt(2, topK);
ResultSet rs = ps.executeQuery();
Step 4: Generate Answer Using LLM
We pass retrieved data to the LLM:
Context:
OAuth 2.0 is an authorization framework...
Question:
What is OAuth?
π The LLM generates a clean answer.
Key Class: LLMService.java
This is the G (Generation) in RAG.
Passing Context to the LLM
String prompt = """
Answer briefly in 2-3 sentences.
Context:
%s
Question:
%s
""".formatted(context, query);
We inject retrieved data into the prompt so the LLM generates grounded answers.
π§ͺ Sample Output
--- Retrieved Context ---
OAuth 2.0 is an authorization framework.
JWT is used for secure authentication.
--- Final Answer ---
OAuth is an authorization framework used to grant access to resources...
π§ Whatβs Really Happening?
This is the most important part to understand:
Component Role
Database Stores knowledge
Retriever Finds relevant information
LLM Generates answer
π The LLM does NOT retrieve data
π The database does NOT generate answers
π» Full Code
The project includes:
- EmbeddingService.java
- StorageService.java
- Retriever.java
- LLMService.java
- RAGApp.java
- pom.xml
π GitHub Repository
https://github.com/knowledgebase21st/Software-Engineering/tree/dev/AI/RAG
π Why This Approach is Powerful
- Works with your own data
- Reduces hallucination
- Fully offline (with Ollama)
- Production-ready pattern
β Conclusion
We built a complete RAG pipeline using Java, PostgreSQL, and Ollama.
This approach combines:
- Your data
- Smart retrieval
- LLM reasoning
Result:
Accurate, context-aware answers using your own knowledge base.
Top comments (0)