BOVO Digital
BOVO Digital
Tutorials10 min read

Tutorial: Gemma 4 Locally with Ollama + n8n — Your First 100% Free and Private AI Agent

You're paying OpenAI API fees for your n8n automations. Every workflow costs money. And your data goes to external servers. With Google Gemma 4 (Apache 2.0) + Ollama, run a frontier-level LLM for free locally and connect it to n8n in 20 minutes.

William Aklamavo

William Aklamavo

April 6, 2026

Tutorial: Gemma 4 Locally with Ollama + n8n — Your First 100% Free and Private AI Agent

Before we start: why this setup changes the rules

On April 2, 2026, Google launched Gemma 4 under the Apache 2.0 license. This isn't a demo model or a bridled model to attract developers. It's a frontier-level model — comparable to Claude Haiku and GPT-4o mini — available in 4 sizes (2B, 8B, 16B, 31B) and usable for free, locally, with no data sent externally.

Combined with Ollama (the local model runtime that exploded in popularity in 2025) and connected to n8n, this setup gives you:

  • Zero inference cost — no paid API
  • Absolute privacy — your data never leaves your machine
  • Unlimited throughput — no rate limits, no quotas
  • Native tool use — Gemma 4 natively supports tool calls (function calling) for your n8n agents

Here's the step-by-step tutorial. Duration: 20 minutes if you've never installed Ollama.


Prerequisites

Minimum hardware:

  • For Gemma 4 2B: 8 GB RAM (runs even on a 2022 laptop without GPU)
  • For Gemma 4 8B: 16 GB RAM or a dedicated GPU (NVIDIA 8 GB VRAM)
  • For Gemma 4 16B and 31B: dedicated GPU recommended (16-24 GB VRAM)

Software:

  • macOS, Linux or Windows 10/11
  • n8n installed locally or in the cloud (n8n.cloud, VPS with Docker)
  • 5 GB free disk space for the 2B model (15 GB for 8B)

If you don't have n8n yet, start with our tutorial to create your first AI agent with n8n — it walks you from installation to your first workflow.


Step 1: Install Ollama

Ollama is a runtime that simplifies running local LLMs. It handles downloading, quantization and the local API automatically.

On macOS / Linux:

curl -fsSL https://ollama.com/install.sh | sh

On Windows: Download the installer from ollama.com and run it. Ollama installs as a Windows service and starts automatically.

Verify installation:

ollama --version
# → ollama version 0.3.x or higher

Step 2: Download Gemma 4

# 2B version — recommended for machines without GPU
ollama pull gemma4:2b

# 8B version — better quality, requires 16 GB RAM or GPU
ollama pull gemma4:8b

Download takes 5 to 15 minutes depending on your connection (2-5 GB depending on version).


Step 3: Test Gemma 4 locally

Before integrating it into n8n, verify the model works correctly:

ollama run gemma4:2b

You'll enter an interactive chat. Type a few questions to test response quality. To exit: /bye

Tool use test (function calling):

curl http://localhost:11434/api/chat -d '{
  "model": "gemma4:2b",
  "messages": [{ "role": "user", "content": "What is the weather in Paris?" }],
  "tools": [{
    "type": "function",
    "function": {
      "name": "get_weather",
      "description": "Gets weather for a city",
      "parameters": {
        "type": "object",
        "properties": { "city": { "type": "string" } },
        "required": ["city"]
      }
    }
  }]
}'

Gemma 4 should return a structured tool call — proof that function calling works.


Step 4: Connect Gemma 4 to n8n

Ollama exposes an OpenAI-compatible REST API at http://localhost:11434. That's what we'll use in n8n.

If n8n runs locally (same machine as Ollama):

In your n8n workflow, add an HTTP Request node with this configuration:

  • Method: POST
  • URL: http://localhost:11434/api/chat
  • Body (JSON):
{
  "model": "gemma4:2b",
  "messages": [
    { "role": "system", "content": "You are a helpful and precise assistant." },
    { "role": "user", "content": "{{ $json.message }}" }
  ],
  "stream": false
}

If n8n runs in the cloud or on a VPS:

You need to expose Ollama on the network. On the server hosting Ollama:

# Launch Ollama exposing on all interfaces
OLLAMA_HOST=0.0.0.0:11434 ollama serve

Then in n8n, replace localhost with your Ollama server's IP.

Alternative — use n8n's OpenAI node with the Ollama API:

n8n includes a "Chat Model (OpenAI)" node that can point to any OpenAI-compatible API. Configure an "OpenAI API" credential with:

  • Base URL: http://localhost:11434/v1
  • API Key: ollama (any value, Ollama doesn't require one)
  • Model: gemma4:2b

Step 5: First agent workflow with Gemma 4

Here's a concrete example: an agent that summarizes incoming emails and classifies them by priority.

Workflow structure:

  1. Trigger: Gmail / IMAP — triggered on each new email
  2. HTTP Request → Ollama/Gemma 4 with the prompt: "Analyze this email and return a JSON with: {subject: string, priority: 'high'|'medium'|'low', summary: string (max 2 sentences), action_required: boolean}. Email: {{ $json.body }}"
  3. JSON Parse → Extracts fields from the returned JSON
  4. Switch → Branch on priority
  5. Slack / Email → Notification for high-priority emails only

This workflow runs locally, sorts your emails without any data going to OpenAI, and costs €0/month.

To go further with n8n agents, check our guide on deploying AI MCP agents in 20 minutes.


Performance and limitations to know

What Gemma 4 2B does well:

  • Classification, summarization, structured information extraction
  • Responses in 20+ languages (supports 140 languages)
  • Reasoning on contexts up to 250,000 tokens

What Gemma 4 2B does less well:

  • Complex mathematical reasoning (prefer 8B or 16B)
  • Complex multi-file code (8B is significantly better)
  • Speed: ~15-30 tokens/second on CPU, ~80-150 tokens/second on GPU

For production: Gemma 4 2B is perfect for prototyping and simple-to-medium use cases. For production agents with high volumes or complex cases, we recommend either the 8B model on GPU, or a hybrid local + cloud architecture that we regularly design at BOVO Digital.


From demo to production

This setup is ideal for rapid prototyping without budget. When you've validated your use case and want to scale — with high availability, persistent memory, RAG on your documents, and monitoring — that's where a production architecture comes in.

Read our article on n8n vs Make to understand how to choose your automation stack based on volume and context.

You've validated your use case locally and want to move to production?

Let's build the robust version together →

Discover our AI automation and intelligent agent services — and William Aklamavo's profile, who delivers these production architectures.

Tags

#Gemma 4#Ollama#n8n#Local LLM#AI Agent#Open Source#Tutorial#Free
William Aklamavo

William Aklamavo

Web development and automation expert, passionate about technological innovation and digital entrepreneurship.