My $0/Month AI Stack — Gemini, Ollama & Apple Vision

I've shipped 7 Mac apps in the past year. Every app has AI features. My monthly AI infrastructure cost: $0.

Not "free trial with a credit card on file." Not "free tier that runs out." Actually zero, with no payment method attached to any of it.

Here's the exact stack, how each piece fits together, and the decision logic for choosing between them.

The Stack

1. Gemini API — Google AI Studio

What it is: REST API access to Google's Gemini models Cost: Free tier — 500 req/day on Gemini 2.5 Flash, no credit card Setup time: 2 minutes (sign in, click "Get API Key")

Best for tasks that need strong reasoning: debugging complex errors, analyzing documents, explaining code. The thinking model traces causality chains that smaller local models miss.

2. Ollama — Local LLMs

What it is: Run open-source language models locally on your machine Cost: Free, open source, unlimited Setup time: 5 minutes (brew install, pull a model)

brew install ollama
ollama pull gemma2        # 9B general model
ollama pull qwen2.5-coder:1.5b  # tiny, fast code model

Best for privacy-sensitive processing and code autocomplete. Nothing leaves your machine.

3. Apple Vision Framework — On-Device OCR

What it is: macOS built-in text recognition engine Cost: Free, ships with every Mac Setup time: Zero — it's already there

Best for extracting text from scanned PDFs and images. No API key, no network call, no cost. Callable from a Tauri app via a Swift sidecar binary.

The Decision Logic

Is the data sensitive?
(medical, legal, financial, corporate secrets)
  YES → Ollama or Apple Vision (local only, nothing leaves the machine)
  NO  → continue

Do you need strong reasoning?
(debug complex errors, trace causality, analyze documents)
  YES → Gemini API
  NO  → Ollama (saves your daily quota)

Is it code autocomplete?
  YES → Ollama + qwen2.5-coder:1.5b (fast, private, free)

Is it OCR on Mac?
  YES → Apple Vision Framework (zero cost, zero setup)

How I Use Each One

Gemini API: HiyokoLogcat's one-click log diagnosis. Android crash traces benefit from Gemini's knowledge of Android internals. Logs go through a PII filter before sending.

Ollama (qwen2.5-coder:1.5b): Code autocomplete in VS Code via Continue.dev. Runs locally, fast, never sends code to any cloud.

Ollama (gemma2): HiyokoHelper's terminal error analysis. Terminal errors don't usually contain sensitive data, but running locally avoids any concern.

Apple Vision: HiyokoLogcat and HiyokoHelper's PDF OCR. Scanned documents stay on the machine.

The PII Problem

Free tier APIs may use submitted data for training. Before sending anything to Gemini, mask the sensitive parts:

use regex::Regex;
use once_cell::sync::Lazy;

static IP_RE: Lazy = Lazy::new(||
    Regex::new(r"\b(?:\d{1,3}\.){3}\d{1,3}\b").unwrap()
);
static EMAIL_RE: Lazy = Lazy::new(||
    Regex::new(r"\b[A-Za-z0-9._%+-]+@[A-Za-z0-9.-]+\.[A-Z|a-z]{2,}\b").unwrap()
);
static TOKEN_RE: Lazy = Lazy::new(||
    Regex::new(r"\b[A-Za-z0-9+/]{20,}={0,2}\b").unwrap()
);

pub fn mask_pii(text: &str) -> String {
    let text = IP_RE.replace_all(text, "[IP]");
    let text = EMAIL_RE.replace_all(&text, "[EMAIL]");
    let text = TOKEN_RE.replace_all(&text, "[TOKEN]");
    text.to_string()
}

Mask before sending. Always. Even on data that looks clean.

Hardware Reality

On an 8-year-old MacBook Air (Intel, 8GB RAM):

Model	First token	Quality
qwen2.5-coder:1.5b	~1s	Great for autocomplete
gemma2 (9B)	~8s	Good for general tasks
llama3 (8B)	~8s	Similar to gemma2
70B models	Not viable	Not enough RAM

On Apple Silicon (M1/M2/M3): significantly better. Unified memory means larger models run well. If you're on Apple Silicon, local quality improves substantially.

The "Bring Your Own Key" Model

For apps I distribute, I don't embed an API key. Users get their own free key from Google AI Studio and paste it into settings.

This means:

I pay nothing
Users pay nothing
Each user has their own quota (no shared limits)
Users feel ownership over the AI feature

The setup friction (2 minutes) filters out users who aren't genuinely interested. The users who set up their own key are more engaged with the feature.

Scaling Up

This stack handles individual users comfortably. When does it break?

Gemini free tier: 500 req/day per API key. At scale, users with heavy usage will hit this. Solution: move to Vertex AI (paid), or keep the BYOK model so each user has their own quota.
Local models: No scaling issue — each user runs their own instance.
Apple Vision: No scaling issue — runs on the user's machine.

For a solo developer with hundreds of users: $0/month is sustainable for a long time.

Hiyoko PDF Vault → https://hiyokoko.gumroad.com/l/HiyokoPDFVault HiyokoLogcat (OSS) → github.com/hiyoyok/HiyokoLogcat X → @hiyoyok

My $0/Month AI Stack for Building Developer Tools — Gemini, Ollama, and Apple Vision

The Stack

1. Gemini API — Google AI Studio

2. Ollama — Local LLMs

3. Apple Vision Framework — On-Device OCR

The Decision Logic

How I Use Each One

The PII Problem

Hardware Reality

The "Bring Your Own Key" Model

Scaling Up

Comments

More from this blog

Type-Driven Design in Rust — Patterns I Learned Building Real Desktop Apps

Gemini vs Claude vs GPT-4 for Developer Tools — An Honest Comparison After Building With All Three

The Complete Gemini API Guide for Developers in 2026 — Free Tier, Models, and Real Code

I Built a Mac App to Fix Android File Transfer — Here's What I Learned

Command Palette

The Stack

1. Gemini API — Google AI Studio

2. Ollama — Local LLMs

3. Apple Vision Framework — On-Device OCR

The Decision Logic

How I Use Each One

The PII Problem

Hardware Reality

The "Bring Your Own Key" Model

Scaling Up

Comments

More from this blog