localhost.
Pull an Image
Pull an Image
Images are hosted on Docker Hub under the
bytez namespace. The image name matches the model ID with / replaced by _.Start a Container
Start a Container
| Variable | Required | Default | Description |
|---|---|---|---|
KEY | Yes | - | Your Bytez API key (for analytics and update notifications) |
PORT | No | 80 | Port the server listens on inside the container |
DEVICE | No | auto | Where to load weights: auto, cuda, or cpu |
| Option | Description |
|---|---|
--gpus all | Enable GPU acceleration (requires NVIDIA drivers + CUDA) |
-v /local/path:/server/model | Mount a local directory for weight caching |
-p HOST:CONTAINER | Map container port to host port |
Common Configurations
Common Configurations
Run on GPURun on CPUCache Weights LocallyAvoid re-downloading weights on every run by mounting a local directory:
Run Inference
Run Inference
Once the container is running, send POST requests to StreamingSet
/run.Chat Models"stream": true to receive tokens as they’re generated:Request Body by Task
Request Body by Task
Different model tasks require different inputs. Here’s a quick reference:
| Task | Required Fields | Example |
|---|---|---|
chat | messages | {"messages": [{"role": "user", "content": "Hi"}]} |
text-generation | text | {"text": "Once upon a time"} |
image-text-to-text | messages with image | {"messages": [{"role": "user", "content": [{"type": "text", "text": "Describe this"}, {"type": "image", "url": "..."}]}]} |
text-to-image | text | {"text": "A cat in space"} |
automatic-speech-recognition | url or base64 | {"url": "https://example.com/audio.wav"} |
feature-extraction | text | {"text": "Embed this sentence"} |
Full HTTP Reference
See complete request/response params and examples for all 30+ task types.
Run Offline (Air-Gapped)
Run Offline (Air-Gapped)
Create a self-contained image with weights baked in - no internet required at runtime.Step 1: Run the container once to download weightsWait for the model to fully load (check with Step 2: Save as a new imageStep 3: Run offline (no internet needed)Optional: Export for another machine
docker logs -f my-model). Once ready, stop it:Troubleshooting
Troubleshooting
Container won’t startCheck that Docker is installed and running. For GPU support, ensure you have NVIDIA drivers and the NVIDIA Container Toolkit installed.Out of memoryTry
DEVICE=auto to split the model across GPU and CPU memory. For large models, you may need more VRAM or system RAM.Slow first requestThe first request loads model weights into memory. Subsequent requests are fast. Use weight caching (-v mount) to speed up container restarts.Model only works with specific DEVICE settingSome models only support auto, cuda, or cpu. If one doesn’t work, try another.