High-performance message queue for Node.js with first-class AI orchestration. Built on Valkey/Redis Streams with a Rust NAPI core.
Completes and fetches the next job in a single server-side function call (1 RTT per job), hash-tags every key for zero-config clustering, and ships AI-native primitives for cost tracking, token streaming, human-in-the-loop suspend/resume, model failover, TPM rate limiting, budget caps, rolling usage summaries, and vector search.
npm install glide-mqimport { Queue, Worker } from 'glide-mq';
const connection = { addresses: [{ host: 'localhost', port: 6379 }] };
const queue = new Queue('tasks', { connection });
await queue.add('send-email', { to: 'user@example.com', subject: 'Welcome' });
const worker = new Worker(
'tasks',
async (job) => {
await sendEmail(job.data.to, job.data.subject);
return { sent: true };
},
{ connection, concurrency: 10 },
);import { Queue, Worker } from 'glide-mq';
const queue = new Queue('ai', { connection });
await queue.add(
'inference',
{ prompt: 'Explain message queues' },
{
fallbacks: [{ model: 'gpt-5.4-nano', provider: 'openai' }],
lockDuration: 120000,
},
);
const worker = new Worker(
'ai',
async (job) => {
const result = await callLLM(job.data.prompt);
await job.reportUsage({
model: 'gpt-5.4',
tokens: { input: 50, output: 200 },
costs: { total: 0.003 },
});
await job.stream({ type: 'token', content: result });
return result;
},
{ connection, tokenLimiter: { maxTokens: 100000, duration: 60000 } },
);- Background jobs and task processing - email, image processing, data pipelines, webhooks, any async work.
- Scheduled and recurring work - cron jobs, interval tasks, bounded schedulers.
- Distributed workflows - parent-child trees, DAGs, fan-in/fan-out, step jobs, dynamic children.
- High-throughput queues over real networks - 1 RTT per job via Valkey Server Functions, up to 38% faster than alternatives.
- LLM pipelines and model orchestration - cost tracking, token streaming, model failover, budget caps without external middleware.
- Valkey/Redis clusters - hash-tagged keys out of the box with zero configuration.
| Aspect | glide-mq |
|---|---|
| Network per job | 1 RTT - complete + fetch next in a single FCALL |
| Client | Rust NAPI bindings via valkey-glide - no JS protocol parsing |
| Server logic | Persistent Valkey Function library (FUNCTION LOAD + FCALL) - no per-call EVAL |
| Cluster | Hash-tagged keys (glide:{queueName}:*) route to the same slot automatically |
| AI-native | Cost tracking, token streaming, suspend/resume, fallback chains, TPM limits, budget caps, usage summaries |
| Vector search | KNN similarity queries over job data via Valkey Search |
Seven primitives for LLM and agent workflows, built into the core API.
- Cost tracking -
job.reportUsage()records model, tokens, cost, latency per job.queue.getFlowUsage()aggregates across flows andqueue.getUsageSummary()rolls usage up across queues. - Token streaming -
job.stream(chunk)pushes LLM output tokens in real time.queue.readStream(jobId)consumes them with optional long-polling. - Suspend/resume -
job.suspend()pauses mid-processor for human approval or webhook callback.queue.signal(jobId, name, data)resumes with external input. - Fallback chains - ordered
fallbacksarray on job options. On failure, the next retry readsjob.currentFallbackfor the alternate model/provider. - TPM rate limiting -
tokenLimiteron worker options enforces tokens-per-minute caps. Combine with RPMlimiterfor dual-axis rate control. - Budget caps -
FlowProducer.add(flow, { budget })setsmaxTotalTokensandmaxTotalCostacross all jobs in a flow. Jobs fail or pause when exceeded. - Per-job lock duration - override
lockDurationper job for adaptive stall detection. Short for classifiers, long for multi-minute LLM calls.
See Usage - AI-native primitives for full examples.
- 1 RTT per job - complete current + fetch next in a single server-side function call
- Cluster-native - hash-tagged keys, zero cluster configuration
- Workflows - FlowProducer trees, DAGs with fan-in, chain/group/chord, step jobs, dynamic children
- Scheduling - 5-field cron with timezone, fixed intervals, bounded schedulers
- Retries - exponential, fixed, or custom backoff with dead-letter queues
- Rate limiting - per-group sliding window, token bucket, global queue-wide limits
- Broadcast - fan-out pub/sub with NATS-style subject filtering and independent subscriber retries
- Batch processing - process multiple jobs at once for bulk I/O
- Request-reply -
queue.addAndWait()for synchronous RPC patterns - Deduplication - simple, throttle, and debounce modes
- Compression - transparent gzip at the queue level
- Serverless - lightweight
ProducerandServerlessPoolfor Lambda/Edge - OpenTelemetry - automatic span emission with bring-your-own tracer
- In-memory testing -
TestQueueandTestWorkerwith zero Valkey dependency - Cross-language - HTTP proxy with request-reply, SSE streams, schedulers, flow create/read/tree/delete, usage summaries, and broadcast fan-out for non-Node.js services
Benchmarked on AWS ElastiCache Valkey 8.2 (r7g.large) with TLS, EC2 client in the same region.
| Concurrency | glide-mq | Leading Alternative | Delta |
|---|---|---|---|
| c=5 | 10,754 j/s | 9,866 j/s | +9% |
| c=10 | 18,218 j/s | 13,541 j/s | +35% |
| c=15 | 19,583 j/s | 14,162 j/s | +38% |
| c=20 | 19,408 j/s | 16,085 j/s | +21% |
The advantage comes from completing and fetching the next job in a single FCALL. The savings compound over real network latency - exactly the conditions in every production deployment. At high concurrency both libraries converge toward the Valkey single-thread ceiling.
Reproduce with npm run bench or npx tsx benchmarks/elasticache-head-to-head.ts against your own infrastructure.
All examples live in glidemq-examples. Pick a directory, npm install, npx tsx <file>.ts.
| AI Orchestration | Core Patterns | Framework Plugins |
|---|---|---|
| Usage & Costs | Basics | Hono |
| Budget & TPM | Workflows | Fastify |
| Streaming | Advanced | Hapi |
| Agent Loops | DAG Flows | NestJS |
| Search & Vectors | Scheduling | Express |
| SDK Integrations | Batch Processing | Next.js |
- You need a log-based event streaming platform. glide-mq is a job/task queue, not a partitioned event log. It does not provide Kafka-style topic partitions, consumer offset management, or event replay.
- You need browser support. The Rust NAPI client requires a server-side runtime (Node.js 20+, Bun, or Deno with NAPI support).
- You need exactly-once semantics. glide-mq provides at-least-once delivery. Duplicate processing is rare but possible - design processors to be idempotent.
- You need to run without Valkey or Redis. Production use requires Valkey 7.0+ or Redis 7.0+. For dev/testing,
TestQueue/TestWorkerrun fully in-memory.
| Guide | Topics |
|---|---|
| Usage | Queue, Worker, Producer, request-reply, SSE, proxy, flow HTTP API, usage summaries |
| Workflows | FlowProducer, DAG, chain/group/chord, dynamic children |
| Advanced | Schedulers, rate limiting, dedup, compression, retries, DLQ |
| Broadcast | Pub/sub fan-out, subject filtering |
| Observability | OpenTelemetry, metrics, job logs, dashboard |
| Serverless | Producer, ServerlessPool, Lambda/Edge, HTTP proxy |
| Testing | In-memory TestQueue and TestWorker |
| Wire Protocol | Cross-language FCALL specs, Python/Go examples |
| Step Jobs | Step-job workflows with moveToDelayed |
| Durability | Durability guarantees, persistence, delivery semantics |
| Architecture | Internal architecture and design reference |
| Migration | API mapping guide for migrating from other queues |
| Package | Description |
|---|---|
| @glidemq/speedkey | Valkey GLIDE client with native NAPI bindings |
| @glidemq/dashboard | Web UI for metrics, schedulers, job mutations |
| @glidemq/hono | Hono middleware |
| @glidemq/fastify | Fastify plugin |
| @glidemq/nestjs | NestJS module |
| @glidemq/hapi | Hapi plugin |
| glidemq.dev | Full documentation site |
Bug reports, feature requests, and pull requests are welcome.
Apache-2.0