2026-05-05
n8n Ollama Local Integration: Complete Setup Guide
Learn how to connect n8n to a local Ollama instance with this step-by-step guide. Automate AI workflows securely and privately without cloud API costs.
Editor summary
N8n Local Ollama Instance integration eliminates cloud API costs by routing workflow automation through locally-hosted language models. I found the HTTP Request node method particularly valuable for building private AI pipelines without third-party data exposure. The guide covers critical setup steps—configuring the OLLAMA_HOST environment variable, Docker internal routing, and managing node timeouts—that directly impact whether your automation runs smoothly or fails silently. One key trade-off emerves: local inference speed depends entirely on your hardware's VRAM, and concurrent requests will crash your system unless explicitly configured for multi-GPU setups. This matters because many teams underestimate the bottleneck between n8n orchestration and Ollama processing.
How to Connect n8n to Local Ollama Instance: Complete Setup Guide
Quick Answer: To connect n8n to a local Ollama instance, ensure both applications are running on the same network. Expose your Ollama host by setting the environment variable
OLLAMA_HOST=0.0.0.0before launching it. Then, in n8n, use the HTTP Request node or the community Ollama node, setting the base URL tohttp://<your-machine-ip>:11434to start sending prompts and receiving AI-generated responses directly in your automated workflows.
Integrating powerful Large Language Models (LLMs) into your daily workflows used to mean paying recurring subscription fees and sending sensitive data to third-party cloud providers. However, the rise of open-weight models and user-friendly execution environments has completely shifted this paradigm.
By combining n8n—a flexible, self-hosted workflow automation tool—with Ollama, an application that lets you run LLMs locally on your own hardware, you can build sophisticated AI pipelines that are entirely private, free to operate, and incredibly responsive.
Whether you are automating customer support ticket categorization, generating content drafts from RSS feeds, or summarizing internal documentation, connecting these two tools creates a robust engine for local AI automation. This guide will walk you through the precise steps to get n8n communicating seamlessly with your local Ollama instance.
Understanding the Architecture
Before diving into configuration, it is helpful to understand how n8n and Ollama interact. n8n operates as an orchestrator, executing steps in a defined sequence based on triggers. Ollama runs as a background service (a REST API server) that loads models into your computer’s memory (RAM or VRAM) and processes text generation requests.
When you connect the two, n8n acts as the client. It sends an HTTP POST request containing a prompt to Ollama’s API endpoint. Ollama processes the prompt through the selected model (like Llama 3, Mistral, or Phi-3) and returns the generated text back to n8n, which then passes that data to the next node in your workflow.
Why Run Models Locally?
The primary advantages of this setup are data privacy and cost control. Because the model runs on your own hardware, no information ever leaves your network. This is critical for businesses handling sensitive customer data, proprietary code, or confidential internal communications. Furthermore, beyond the initial hardware investment, local inference incurs zero per-token API costs, allowing you to scale your automated operations indefinitely without budget anxiety.
Step 1: Configuring Ollama for Network Access
By default, Ollama binds its API server strictly to localhost (127.0.0.1). This is a security measure to prevent unauthorized access. If you are running n8n via Docker or on a separate machine on your local network, n8n will not be able to reach Ollama using localhost because localhost inside a Docker container refers to the container itself, not your host machine.
You must configure Ollama to accept connections from other IP addresses.
Setting the Host Environment Variable
To expose Ollama to your local network, you need to set the OLLAMA_HOST environment variable before starting the Ollama service.
On Linux:
Edit your systemd service file (typically located at /etc/systemd/system/ollama.service). Add the following line under the [Service] section:
Environment="OLLAMA_HOST=0.0.0.0"
Save the file, reload the daemon (sudo systemctl daemon-reload), and restart the service (sudo systemctl restart ollama).
On macOS:
If you run Ollama via the terminal, you can start it with:
OLLAMA_HOST=0.0.0.0 ollama serve
If you use the macOS app, you can set the variable globally using launchctl:
launchctl setenv OLLAMA_HOST "0.0.0.0"
Then restart the Ollama application.
On Windows:
Open the Start menu, search for “Environment Variables”, and click “Edit the system environment variables”. Click the “Environment Variables” button. Under “System variables”, click “New”. Set the Variable name to OLLAMA_HOST and the Variable value to 0.0.0.0. Click OK, and restart the Ollama app.
Verifying the Connection
Before moving to n8n, verify that Ollama is accessible over your local network. Find your host machine’s local IP address (e.g., 192.168.1.50). Open a browser on another device on the same network and navigate to http://192.168.1.50:11434. You should see the message: “Ollama is running”.
Step 2: Preparing Your n8n Environment
If you are running n8n via Docker, you need to ensure it can route traffic to your host machine.
Using Docker Internal Routing
If n8n is running in a Docker container on the same machine as Ollama, you do not need to use your specific local IP address (which might change). Instead, you can use Docker’s built-in host resolution.
When configuring your URL in n8n, use:
http://host.docker.internal:11434
If you are using Docker Compose on Linux, you may need to explicitly define this mapping in your docker-compose.yml file under the n8n service:
extra_hosts:
- "host.docker.internal:host-gateway"
Step 3: Connecting n8n to Ollama
There are two primary methods to connect n8n to Ollama: using the native HTTP Request node, or installing a community-built node specifically designed for Ollama.
Method 1: Using the HTTP Request Node
This method requires no additional installations and offers maximum flexibility over the API payload.
- Open your n8n canvas and add an HTTP Request node.
- Set the Method to
POST. - Set the URL to
http://<your-machine-ip>:11434/api/generate(orhttp://host.docker.internal:11434/api/generate). - Set the Authentication to
None(Ollama does not require authentication by default). - In the Body Parameters, choose
JSON. - Construct your JSON payload. A basic generation request requires the model name, the prompt, and a flag to disable streaming (as n8n handles discrete blocks of data better than continuous streams).
Example JSON:
{
"model": "llama3",
"prompt": "Summarize this text: {{ $json.textToSummarize }}",
"stream": false
}
Execute the node. You should receive a JSON response containing a response field with the generated text.
Method 2: Using the LangChain Integration
Modern versions of n8n include powerful Advanced AI (LangChain) nodes. This is the recommended approach if you want to build multi-step conversational agents with memory, or if you want to use the model to call specific tools.
- Add a Basic LLM Chain or an Agent node to your canvas.
- In the ‘Model’ input connection of that node, add an Ollama Chat Model node.
- Configure the Ollama Chat Model node. Set the Base URL to your Ollama endpoint (e.g.,
http://192.168.1.50:11434). - Type in the exact name of the Model you have pulled via Ollama (e.g.,
mistral,llama3:8b). - Adjust the Temperature setting. Lower values (0.1 - 0.3) make the output more deterministic and factual, while higher values (0.7 - 0.9) encourage creativity.
Practical Setup Advice and Troubleshooting
Getting local AI pipelines to run smoothly requires managing hardware resources and handling timeouts effectively.
Managing Node Timeouts
Local LLM inference can take time, especially on consumer hardware or when processing large prompts. The default n8n HTTP Request node timeout might trigger before Ollama finishes generating the response.
If you encounter “Timeout” errors in n8n, open the settings of your HTTP Request node (or the Advanced AI node). Look for the “Timeout” setting and increase it significantly—often from the default 10,000ms (10 seconds) to 60,000ms (1 minute) or even 300,000ms (5 minutes) depending on your hardware’s capabilities and the complexity of the task.
Hardware Considerations
The speed of your n8n workflow will be entirely bottlenecked by the speed of your Ollama inference.
- VRAM is King: For optimal performance, ensure your model fits entirely within your GPU’s VRAM. Offloading layers to system RAM drastically reduces generation speed.
- Model Sizing: For a standard 8GB VRAM GPU, stick to 7B or 8B parameter models quantized to 4-bit (which Ollama does by default).
- Concurrency: Ollama queues requests by default. If your n8n workflow triggers multiple instances simultaneously (e.g., processing a batch of 50 emails), Ollama will handle them one by one. Do not parallelize HTTP requests to Ollama unless you have multi-GPU setups explicitly configured for concurrent execution, as this will lead to memory exhaustion and crashes.
Conclusion
Connecting n8n to a local Ollama instance unlocks a massive capability for any automation enthusiast or privacy-conscious business. By exposing the Ollama host, correctly routing the network traffic via Docker or local IPs, and utilizing n8n’s HTTP or Advanced AI nodes, you create a private, cost-free intelligence layer. Start with simple tasks like text summarization or data extraction, and scale up to complex, multi-agent workflows as you become comfortable with the hardware limitations and prompt engineering required for local models.
Frequently Asked Questions
Why does n8n say connection refused when trying to reach Ollama?
This usually means Ollama is still binding only to localhost, or your firewall is blocking port 11434. Ensure you have set the OLLAMA_HOST=0.0.0.0 environment variable on the machine running Ollama and restarted the service.
Can I use n8n Advanced AI nodes with Ollama?
Yes. n8n includes native Ollama Chat Model and Ollama Embeddings nodes under the Advanced AI category. These allow you to build complex LangChain-style agents, provide the model with memory, and execute tool calling directly within the n8n interface.
How do I handle large documents that exceed the model’s context window?
Local models typically have context limits of 4k to 8k tokens. In n8n, you should use the Advanced AI text splitter nodes to chunk large documents, generate embeddings via Ollama, store them in a local vector store (like Qdrant or Postgres), and query the vector store to retrieve only relevant context before sending the final prompt to the model.
Does n8n need to be on the same machine as Ollama?
No, they can be on separate machines. As long as both devices are on the same local network (or connected via a secure VPN/Tailscale) and the machine running Ollama allows incoming connections on port 11434, n8n can communicate with it using the host machine’s local IP address.
Related Reading
-
How to Build an n8n Workflow for Automated Invoice Data Extraction
-
n8n vs Zapier for Advanced Workflow Automation: Which Is Better in 2026?
-
How to Build an n8n Integration for Automated CRM Data Entry: 5-Step Guide
-
n8n vs Zapier for Advanced Workflow Automation: Which Is Better in 2026?
-
n8n vs Zapier for Advanced Workflow Automation: Which Is Better in 2026?
-
n8n Automation for Automated Invoice Processing 2026: Setup Guide
-
How to Build an n8n Integration for Automated CRM Data Entry: 5-Step Guide