Skip to content
Michael A. Kuykendall edited this page Dec 16, 2025 · 4 revisions

Shimmy Wiki

Welcome to the Shimmy documentation wiki - your comprehensive guide to the 5MB Ollama alternative.

🚀 Quick Navigation

Getting Started

Features

  • GPU Support - CUDA, Metal, and MOE hybrid acceleration
  • MOE Support - Mixture of Experts CPU/GPU offloading (documentation coming soon)
  • Model Filtering - Smart LLM-only model discovery (documentation coming soon)
  • OpenAI API - Full compatibility with existing tools (documentation coming soon)

🔬 Shimmy Vision (NEW)

Development

Troubleshooting

🎯 What is Shimmy?

Shimmy is a lightweight Ollama alternative that provides:

  • OpenAI API compatibility - drop-in replacement for existing tools
  • MOE CPU offloading - run 70B+ models on consumer hardware
  • Smart model filtering - automatically excludes non-LLM models
  • Multi-GPU support - CUDA, Metal, Vulkan, OpenCL acceleration
  • Release gate quality - 6-gate validation ensures reliability
  • Cross-platform - Windows, macOS, Linux binaries
  • Lightweight - sub-10MB binary vs 500MB+ alternatives
  • Vision API - AI-powered OCR, layout analysis, web scraping

📦 Quick Install

# Linux/macOS
curl -L https://github.com/Michael-A-Kuykendall/shimmy/releases/latest/download/shimmy -o shimmy
chmod +x shimmy

# Windows
curl -L https://github.com/Michael-A-Kuykendall/shimmy/releases/latest/download/shimmy.exe -o shimmy.exe

# With Rust/Cargo (recommended - includes MOE)
cargo install shimmy --features moe

# CUDA + MOE for NVIDIA GPUs
cargo install shimmy --features llama-cuda,moe

🔗 External Links


📝 This wiki is automatically maintained and updated with each release.

Clone this wiki locally