This is a submission for the Notion MCP Challenge
I'm 24. I dropped out. I'm building an AI startup from Addis Ababa, Ethiopia.
I built Arlo in 9 days because I kept thinking about a specific number: 253 million people with vision loss navigate the web the same way every single time - from zero, with no memory of what helped them before. Every visit. Every site. From scratch.
Notion MCP is what finally made a real solution possible.
The Problem Nobody Talks About
A sighted person lands on a flight booking page and within 3 seconds they know: there's a search bar at the top, filters on the left, results in the middle. Three seconds.
A blind user with a screen reader starts from the top and listens. Every navigation link. Every cookie banner. Every decorative image. Every sponsored result. On a site like Kayak, that's often 200+ elements before a single fare. And every visit starts from zero - the screen reader has no memory of what helped last time.
I built Arlo because that's not good enough.
What I Built
Arlo is an AI companion that gives visually impaired users the same 3-second superpower sighted people have.
You tell Arlo what you want to do. Arlo reads the entire page and tells you exactly what matters - in natural spoken language. Like a trusted friend who can see the screen.
But here's what makes Arlo different from every other accessibility tool:
Arlo remembers you. And that memory lives in Notion.
Every visit, Arlo learns. It learns that you always pick the cheapest option. It learns that on Amazon you skip sponsored results. It learns that the SSA website has a confusing dropdown on step 3 that catches people off guard. All of that gets saved to your personal Notion database โ structured, readable, yours to own and edit.
The next visit, Arlo opens with: "I remember you've been here before. Last time you were looking for Delta flights and picked the 7am option โ want me to head straight there?"
That's not a screen reader. That's a companion.
Video Demo
Live: https://arlo.arcumet.com
Try it yourself: paste any URL, speak or type your goal, and Arlo guides you.
The Flow
1. You say what you want
Type it or speak it. Arlo uses GLM-ASR for voice โ accurate across accents.
2. Arlo reads the entire page
Not static HTML parsing โ GLM Web Reader fully renders the page including JavaScript. React apps, SPAs, Google Flights, Twitter โ all work.
3. Notion memory is checked
Before analyzing, Arlo queries your Notion database: "What do I know about this domain? What has this user done here before?" That context shapes everything.
4. Arlo speaks
Not a list of elements. Arlo says: "You're on Amazon search results. Based on what I remember, you prefer under $100 and skip sponsored results. The first non-sponsored option is the Soundcore Q20i at $59.99."
After the visit, new learnings are written back to Notion via MCP. The loop closes.
How I Used Notion MCP
Notion isn't a feature in Arlo. Notion is Arlo's brain.
Without Notion, Arlo is just another AI tool that forgets you the moment you close the tab. With Notion MCP, Arlo becomes something that grows with you โ a companion that gets better every single time you use it.
The MCP integration loop
User visits page
โ
Arlo queries Notion MCP: "What do I know about this domain?"
โ
GLM-4.6 analyzes page + goal + memory context
โ
Arlo speaks guidance (Hume Octave ultra-realistic TTS)
โ
New insights written back to Notion via MCP
โ
Next visit: Arlo already knows you
Every memory entry is a full rich Notion page โ not just a database row. Heading blocks, bullet context, callout explaining what was learned, linked back to the source page. The user can open Notion and read exactly what Arlo knows about them, edit it, or delete it. Transparent, human-readable memory they own.
The Notion MCP server integration
Arlo uses @notionhq/notion-mcp-server with stdio transport for all writes โ the same MCP protocol that Claude Desktop, Cursor, and other AI tools use:
// Spawn the Notion MCP server as a subprocess
const transport = new StdioClientTransport({
command: "node",
args: [MCP_SERVER_BIN, "--transport", "stdio"],
env: { NOTION_TOKEN: process.env.NOTION_API_KEY },
});
const client = new Client({ name: "arlo", version: "1.0.0" }, { capabilities: {} });
await client.connect(transport);
// Write memory via MCP tool call โ not REST API
await client.callTool({ name: "API-post-page", arguments: { ... } });
Show me the code
GitHub: https://github.com/Garinmckayl/arlo
Technical Stack
| Layer | What |
|---|---|
| Page reading | GLM Web Reader API โ full JS rendering |
| Intelligence | GLM-4.6 with thinking mode |
| Vision | GLM-4.6V for screenshot analysis |
| Voice input | GLM-ASR-2512 |
| Voice output | Hume Octave TTS โ ultra-realistic |
| Memory writes | Notion MCP (@notionhq/notion-mcp-server stdio) |
| Memory reads | Notion REST API (zero-latency for live context) |
| Framework | Next.js 16, deployed on Vercel |
Why This Matters
Most AI accessibility tools are built by people who don't need them, for a problem they've read about rather than felt. They work on clean demo sites and fall apart on the chaotic, JS-heavy, dark-pattern-filled reality of the actual web.
Arlo is built around the real failure mode: the web doesn't remember you, and that costs blind users enormous time and cognitive load on every single visit.
The Notion memory layer isn't a clever integration for the sake of a hackathon. It's the answer to a real question: if this tool is going to be useful long-term, it needs to get better with use, and the user needs to be able to trust and control what it knows about them.
Notion is the right answer. It's human-readable. It's editable. It's already where people organize their lives. And with MCP, it becomes a living brain that any AI tool can read from and write to.
Built in 9 days ยท Live at https://arlo.arcumet.com ยท GitHub
Top comments (4)
I think you have a broken github repo link.
The project is noble, I root for you to win this challenge :)
appriciate it! here is the github github.com/Garinmckayl/arlo
This is exactly the kind of project that makes me optimistic about AI accessibility. The "3-second superpower" framing is powerful โ it captures the speed gap between sighted and blind users navigating the visual world.
We're tackling a related problem at AnveVoice (anvevoice.app) โ making websites themselves accessible through voice interaction. Instead of relying on screen readers to interpret (often broken) HTML, our voice agent takes real DOM actions: clicks buttons, fills forms, navigates pages, all through natural speech. Sub-700ms latency in 50+ languages including 22 Indian languages.
The stat that keeps driving us: 96.3% of websites fail WCAG 2.1 AA standards. Rather than waiting for every website to fix their accessibility, we give users a voice layer that works on top of any site.
What's your approach to handling edge cases where visual context is ambiguous? That's where we've found the hardest problems โ when even sighted users aren't sure what a UI element does.
Some comments may only be visible to logged-in visitors. Sign in to view all comments.