My $0/Month AI Stack for Building Developer Tools — Gemini, Ollama, and Apple Vision
I've shipped 7 Mac apps in the past year. Every app has AI features. My monthly AI infrastructure cost: $0.
Not "free trial with a credit card on file." Not "free tier that runs out." Actually zero, with no payment method attached to any of it.
Here's the exact stack, how each piece fits together, and the decision logic for choosing between them.
The Stack
1. Gemini API — Google AI Studio
What it is: REST API access to Google's Gemini models Cost: Free tier — 500 req/day on Gemini 2.5 Flash, no credit card Setup time: 2 minutes (sign in, click "Get API Key")
Best for tasks that need strong reasoning: debugging complex errors, analyzing documents, explaining code. The thinking model traces causality chains that smaller local models miss.
2. Ollama — Local LLMs
What it is: Run open-source language models locally on your machine Cost: Free, open source, unlimited Setup time: 5 minutes (brew install, pull a model)
brew install ollama
ollama pull gemma2 # 9B general model
ollama pull qwen2.5-coder:1.5b # tiny, fast code model
Best for privacy-sensitive processing and code autocomplete. Nothing leaves your machine.
3. Apple Vision Framework — On-Device OCR
What it is: macOS built-in text recognition engine Cost: Free, ships with every Mac Setup time: Zero — it's already there
Best for extracting text from scanned PDFs and images. No API key, no network call, no cost. Callable from a Tauri app via a Swift sidecar binary.
The Decision Logic
Is the data sensitive?
(medical, legal, financial, corporate secrets)
YES → Ollama or Apple Vision (local only, nothing leaves the machine)
NO → continue
Do you need strong reasoning?
(debug complex errors, trace causality, analyze documents)
YES → Gemini API
NO → Ollama (saves your daily quota)
Is it code autocomplete?
YES → Ollama + qwen2.5-coder:1.5b (fast, private, free)
Is it OCR on Mac?
YES → Apple Vision Framework (zero cost, zero setup)
How I Use Each One
Gemini API: HiyokoLogcat's one-click log diagnosis. Android crash traces benefit from Gemini's knowledge of Android internals. Logs go through a PII filter before sending.
Ollama (qwen2.5-coder:1.5b): Code autocomplete in VS Code via Continue.dev. Runs locally, fast, never sends code to any cloud.
Ollama (gemma2): HiyokoHelper's terminal error analysis. Terminal errors don't usually contain sensitive data, but running locally avoids any concern.
Apple Vision: HiyokoLogcat and HiyokoHelper's PDF OCR. Scanned documents stay on the machine.
The PII Problem
Free tier APIs may use submitted data for training. Before sending anything to Gemini, mask the sensitive parts:
use regex::Regex;
use once_cell::sync::Lazy;
static IP_RE: Lazy = Lazy::new(||
Regex::new(r"\b(?:\d{1,3}\.){3}\d{1,3}\b").unwrap()
);
static EMAIL_RE: Lazy = Lazy::new(||
Regex::new(r"\b[A-Za-z0-9._%+-]+@[A-Za-z0-9.-]+\.[A-Z|a-z]{2,}\b").unwrap()
);
static TOKEN_RE: Lazy = Lazy::new(||
Regex::new(r"\b[A-Za-z0-9+/]{20,}={0,2}\b").unwrap()
);
pub fn mask_pii(text: &str) -> String {
let text = IP_RE.replace_all(text, "[IP]");
let text = EMAIL_RE.replace_all(&text, "[EMAIL]");
let text = TOKEN_RE.replace_all(&text, "[TOKEN]");
text.to_string()
}
Mask before sending. Always. Even on data that looks clean.
Hardware Reality
On an 8-year-old MacBook Air (Intel, 8GB RAM):
| Model | First token | Quality |
|---|---|---|
| qwen2.5-coder:1.5b | ~1s | Great for autocomplete |
| gemma2 (9B) | ~8s | Good for general tasks |
| llama3 (8B) | ~8s | Similar to gemma2 |
| 70B models | Not viable | Not enough RAM |
On Apple Silicon (M1/M2/M3): significantly better. Unified memory means larger models run well. If you're on Apple Silicon, local quality improves substantially.
The "Bring Your Own Key" Model
For apps I distribute, I don't embed an API key. Users get their own free key from Google AI Studio and paste it into settings.
This means:
I pay nothing
Users pay nothing
Each user has their own quota (no shared limits)
Users feel ownership over the AI feature
The setup friction (2 minutes) filters out users who aren't genuinely interested. The users who set up their own key are more engaged with the feature.
Scaling Up
This stack handles individual users comfortably. When does it break?
Gemini free tier: 500 req/day per API key. At scale, users with heavy usage will hit this. Solution: move to Vertex AI (paid), or keep the BYOK model so each user has their own quota.
Local models: No scaling issue — each user runs their own instance.
Apple Vision: No scaling issue — runs on the user's machine.
For a solo developer with hundreds of users: $0/month is sustainable for a long time.
Hiyoko PDF Vault → https://hiyokoko.gumroad.com/l/HiyokoPDFVault HiyokoLogcat (OSS) → github.com/hiyoyok/HiyokoLogcat X → @hiyoyok

