Late Afternoon, Real System, Real Confusion
The office was quieter than usual. Most people had already left, but one corner still had life in it.
A whiteboard covered in boxes, arrows, and half-erased notes.
And in front of it,Arjun.
A few weeks into his internship, he had reached that stage where things no longer looked simple… but also didn’t fully make sense yet.
Behind him, leaning on the desk with a coffee mug that had clearly been refilled too many times, stood Maya,the senior systems engineer.
She watched him stare at the diagram for a while before speaking.
“Alright,” she said, calm and direct. “You’ve been staring at that same box for five minutes. What’s bothering you?”
Arjun didn’t turn immediately. He pointed at the whiteboard.
A box labeled:
Redis
“I get that this is for speed,” he said slowly. “Like… we put it in front of the database so things don’t get slow.”
Maya nodded. “Good. That’s the surface-level answer. Keep going.”
Arjun hesitated, then turned.
“But it feels… fake.”
Maya raised an eyebrow. “Fake?”
“Yeah,” he said. “Like it’s not real storage. It’s just… temporary. So why are we trusting it at all?”
Maya smiled.
“Good,” she said. “Now we can actually start.”
What Redis Actually Is (Explained Like You Mean It)
Maya walked to the whiteboard and drew two boxes.
[ Redis ] -------- [ Database ]
(RAM) (Disk)
She tapped the first box.
“Redis lives in memory. RAM. That’s why it’s fast.”
Then she tapped the second.
“The database lives on disk. That’s why it’s slower,but reliable.”
Arjun nodded. “Yeah, I get that part.”
“No,” Maya said, shaking her head slightly. “You understand the words. Not the implication.”
She turned back and wrote:
Redis = Speed, Not Permanence
“Everything about Redis is optimized for one thing.responding fast,” she continued. “Not guaranteeing your data will exist forever.”
Arjun frowned slightly. “So… it can lose data?”
“It will lose data,” Maya corrected.
She started listing on the board:
- Memory limit → eviction
- Crash → possible data loss
- Restart → partial recovery
Then she turned.
“So if you think of Redis as your database, you’ve already made a mistake.”
The First Mental Breakthrough
Arjun crossed his arms, thinking.
“So then what is it actually?”
Maya didn’t answer immediately. Instead, she erased a small section and drew this:
User → App → Redis → Database
“Most requests stop here,” she said, pointing to Redis.
Then she added:
User → App → Redis ❌ (miss) → Database → Redis → User
“This is what happens when Redis doesn’t have the data.”
She stepped back.
“Redis is not the source of truth,” she said.
“It’s a temporary, fast copy of reality.”
Arjun repeated it quietly.
“…temporary copy.”
“Exactly.”
Why We Even Need Redis
Arjun turned back to the board.
“Okay, but why not just make the database faster?”
Maya laughed softly.
“Everyone asks that at some point.”
She drew two timelines.
RAM access → nanoseconds
Disk access → milliseconds
Then she circled them.
“This difference is massive,” she said. “You don’t ‘optimize’ your way out of physics.”
She drew another diagram:
Without Redis:
User → App → Database (every request)
Then:
With Redis:
User → App → Redis (most requests)
↓
Database (rare)
“Redis exists so your database doesn’t collapse under load.”
Where Things Start Getting Dangerous
Arjun nodded, following along.
“Okay… so we cache stuff. Makes sense.”
Maya tilted her head slightly.
“That works when data doesn’t change often,” she said. “But what about things that change constantly?”
She wrote:
- page views
- likes
- active users
Then added:
“Where do those updates happen?”
Arjun answered quickly.
“In Redis… because it’s fast.”
Maya nodded.
“And now you’ve just moved the problem.”
The Fear Kicks In
She wrote on the board:
views:article:123 → 10,482
“This number keeps increasing,” she said.
Arjun nodded.
Then she asked:
“What happens if Redis runs out of memory?”
Arjun paused.
“…it deletes something?”
“Yes.”
“And if that key gets deleted?”
“…we lose the count.”
Maya crossed her arms.
“Now you see the problem.”
The Second Mental Shift: Not All Data Is Equal
Arjun leaned back slightly.
“So we shouldn’t store important stuff there.”
“Exactly,” Maya said. “But let’s define ‘important’ properly.”
She split the board into two sections.
Left Side
Critical Data
- payments
- orders
- balances
She underlined it.
“Lose this, and your system is broken.”
Right Side
High-Speed Data
- analytics
- counters
- sessions
“Lose a bit of this?” she shrugged. “System still works.”
“Now the rules change,” she said.
Two Ways to Write Data
Maya drew two flows.
1. Write-Through
App → Database → Redis
“Safe,” she said. “Slower, but correct.”
2. Write-Behind
App → Redis → (later) Database
“Fast,” she said. “But risky.”
Arjun looked at both.
“So we just pick one?”
Maya shook her head.
“No. We use both. Based on the data.”
The Trade-Off Nobody Escapes
She turned to him.
“You can’t have perfect speed and perfect safety at the same time.”
Arjun nodded slowly.
“Yeah… that makes sense.”
The Data Loss Gap (This One Matters)
Maya drew a timeline.
Time →
[ Redis updated ] ---- (delay) ---- [ DB updated ]
“This gap,” she said, tapping the space in between, “is where things can go wrong.”
Arjun leaned forward.
“If the system crashes there…”
“…you lose data,” Maya finished.
So We Add a Queue
She erased part of the board and drew a new flow:
App:
→ Update Redis
→ Push event → Queue
Then:
Worker:
→ Read Queue
→ Update Database
Arjun looked at it.
“So now even if Redis crashes…”
“The queue still has the data,” Maya said.
“And if the worker crashes?”
“It resumes from the queue.”
Arjun nodded.
“…okay, that’s solid.”
When Things Go Wrong (And They Will)
Maya didn’t respond immediately.
Instead, she tilted her head slightly.
“Solid… but not perfect,” she said.
Arjun frowned. “What do you mean?”
“Walk it through again,” Maya said.
Arjun looked back at the board.
“Okay… we write to Redis, push to the queue…”
He paused.
“What if Redis evicts the key before the worker runs?”
Maya nodded. “Keep going.”
“And then… what if the worker fails before writing to the database?”
Now he stopped completely.
“…then the data never reaches the database.”
Maya crossed her arms.
“And Redis already lost it.”
They both looked at the board.
“So we still lose data,” Arjun said quietly.
“Exactly.”
Maya uncapped the marker again.
She drew it out slowly.
1. Write → Redis
2. Push → Queue
3. Redis evicts key ❌
4. Worker fails ❌
5. Data never reaches DB
She stepped back.
“This,” she said, “is the kind of failure that doesn’t crash your system.”
Arjun frowned.
“…it just loses data.”
“Exactly.”
The Subtle Problem With “Safe Enough”
Maya crossed her arms.
“The queue reduces risk,” she said. “But it doesn’t eliminate it.”
Arjun nodded slowly.
“So what do we do?”
Three Things Real Systems Add
Maya held up three fingers.
“Once you reach this level, you start thinking about three things.”
1. Idempotent Writes
“If the worker retries the same update twice,” she said, “nothing should break.”
She wrote:
Bad:
INCR views → duplicates possible
Better:
SET views = value with version
“You design your writes so repeating them is safe.”
2. Assume the Queue Lies
Arjun blinked. “The queue… lies?”
Maya smiled slightly.
“Not maliciously. But it might deliver the same message twice. Or later than expected.”
She underlined it.
Always assume at-least-once delivery
“Your system must handle duplicates.”
3. Retry Properly
“What if the worker fails?” she continued.
“Retry?” Arjun said.
“Yes. But not blindly.”
She wrote:
- Retry with delay
- Exponential backoff
- Dead-letter queue (failed forever)
“If you don’t handle failures properly,” she said, “they disappear quietly.”
The Final Realization
Arjun leaned back.
“So even the ‘safe’ design isn’t actually safe.”
Maya nodded.
“Nothing is perfectly safe,” she said. “You just keep reducing risk.”
Updating the Mental Model
Maya walked back to the board one last time and added a small note next to the system:
[ Redis ] → fast but fragile
[ Queue ] → safer, but needs retries
[ Database ] → final truth
Then underneath it, she wrote:
Design for failure, not success
Arjun stared at the board.
“…this got complicated fast.”
“Yeah,” she said. “Welcome to distributed systems.”
But There’s Still a Subtle Problem
Maya smiled slightly.
“But There’s always one more problem.”
She drew:
1. Redis loses key
2. User requests data
3. App checks DB
4. DB is slightly outdated
Arjun’s eyes narrowed.
“So the user sees old data.”
“Exactly.”
Read Repair (The Quiet Fix)
Maya added one more layer:
Cache miss →
Fetch DB →
Check queue →
Merge updates →
Return →
Update Redis
Arjun blinked.
“So reading actually fixes the data?”
“Yep.”
Everything Comes Together
Maya stepped back and rewrote the system cleanly:
[ Redis ]
↓
User → App → Cache Layer
↓
[ Queue ]
↓
[ Database ]
She pointed at each part.
“Redis gives you speed.”
“Queue gives you safety.”
“Database gives you truth.”
The Final Understanding
Arjun looked at the board for a long moment.
Then he said quietly:
“So Redis isn’t the system.”
Maya smiled.
“It’s part of the system.”
He nodded.
“And the database…”
“…is reality,” she finished.
The One Sentence That Matters
Maya picked up the marker one last time and wrote:
Move fast in Redis, commit safely through queues, trust the database.
Arjun read it once.
Then again.
This time, the diagram didn’t feel confusing.
It felt… inevitable.
If You Walk Away Remembering This
Maya capped the marker and turned.
“Before you leave,” she said, “tell me what you learned.”
Arjun didn’t hesitate this time.
- Redis is fast but temporary
- Data there can disappear
- Important data goes to the database first, then update Redis
- Fast changing data can go to Redis first with a queue
- Queues make things safer, but not perfect
- Everything is a trade off
Maya nodded.
“Good. Now you’re not just using Redis anymore.”
Arjun looked back at the board one last time.
“It’s not just about making it fast,” he said.
Maya shook her head slightly.
“It’s about knowing where it can break.”
She picked up her coffee.
“That’s when you start thinking like a systems engineer.”
Top comments (2)
the write-behind + queue pattern is the part most tutorials skip. one gotcha worth calling out: when the queue worker fails to write to the database and the key has already been evicted from redis, you need idempotent writes and at-least-once delivery semantics, otherwise a crash at exactly the wrong moment still loses data silently. the read repair step you mentioned at the end helps catch stale reads, but it doesn't help with data that was never flushed at all. have you run into queue retry strategies in practice for this kind of setup?
Yeah that’s a really good callout.
I actually went back and added a section in the article after reading your comment exactly around that failure window where Redis evicts the key and the worker fails before the DB write.
And you’re right, in that case the data never makes it anywhere durable, so read repair can’t help at all.
From what I understand so far, that’s where things like idempotent writes, assuming at least once delivery, and proper retry strategies (backoff / DLQ, etc.) start becoming necessary.I haven’t implemented a full retry system in practice yet, but that’s exactly what I’m planning to experiment with next.
Appreciate you pointing it out it definitely pushed both the article and my understanding a level deeper.