Containers - Bytez

Every open-source model on Bytez is available as a Docker image. Pull it, run it, and make requests to localhost.

Pull an Image

Images are hosted on Docker Hub under the bytez namespace. The image name matches the model ID with / replaced by _.

# Pattern: bytez/{org}_{model-name}
docker pull bytez/qwen_qwen3-4b

Find model IDs at bytez.com/models or via the List Models API.

Start a Container

docker run -d \
  -e KEY=YOUR_BYTEZ_KEY \
  -e PORT=8000 \
  -p 8000:8000 \
  bytez/qwen_qwen3-4b

This runs the container in the background. Get your API key at bytez.com/api/key.View logs

# Follow logs (live)
docker logs -f <container_id>

# View recent logs
docker logs <container_id>

To run attached and watch logs directly, replace -d with -it. Press Ctrl+C to stop.

Environment Variables

Variable	Required	Default	Description
`KEY`	Yes	-	Your Bytez API key (for analytics and update notifications)
`PORT`	No	`80`	Port the server listens on inside the container
`DEVICE`	No	`auto`	Where to load weights: `auto`, `cuda`, or `cpu`

Docker Options

Option	Description
`--gpus all`	Enable GPU acceleration (requires NVIDIA drivers + CUDA)
`-v /local/path:/server/model`	Mount a local directory for weight caching
`-p HOST:CONTAINER`	Map container port to host port

Common Configurations

Run on GPU

docker run -d \
  --gpus all \
  -e KEY=YOUR_BYTEZ_KEY \
  -e PORT=8000 \
  -p 8000:8000 \
  bytez/qwen_qwen3-4b

Run on CPU

docker run -d \
  -e DEVICE=cpu \
  -e KEY=YOUR_BYTEZ_KEY \
  -e PORT=8000 \
  -p 8000:8000 \
  bytez/qwen_qwen3-4b

Cache Weights LocallyAvoid re-downloading weights on every run by mounting a local directory:

docker run -d \
  --gpus all \
  -v /path/to/cache:/server/model \
  -e HF_HOME=/server/model \
  -e KEY=YOUR_BYTEZ_KEY \
  -e PORT=8000 \
  -p 8000:8000 \
  bytez/qwen_qwen3-4b

If you’re going to create the same model container multiple times, then for large models (70B+), caching is highly recommended. Downloads can take hours otherwise.

Run Inference

Once the container is running, send POST requests to /run.Chat Models

curl -X POST http://localhost:8000/run \
  -H "Content-Type: application/json" \
  -d '{
    "messages": [
      { "role": "system", "content": "You are a helpful assistant" },
      { "role": "user", "content": "What is the capital of France?" }
    ],
    "stream": false,
    "params": {
      "max_new_tokens": 100,
      "temperature": 0.7
    }
  }'

StreamingSet "stream": true to receive tokens as they’re generated:

curl -X POST http://localhost:8000/run \
  -H "Content-Type: application/json" \
  -d '{
    "messages": [
      { "role": "user", "content": "Write a haiku about coding" }
    ],
    "stream": true
  }'

Request Body by Task

Different model tasks require different inputs. Here’s a quick reference:

Task	Required Fields	Example
`chat`	`messages`	`{"messages": [{"role": "user", "content": "Hi"}]}`
`text-generation`	`text`	`{"text": "Once upon a time"}`
`image-text-to-text`	`messages` with image	`{"messages": [{"role": "user", "content": [{"type": "text", "text": "Describe this"}, {"type": "image", "url": "..."}]}]}`
`text-to-image`	`text`	`{"text": "A cat in space"}`
`automatic-speech-recognition`	`url` or `base64`	`{"url": "https://example.com/audio.wav"}`
`feature-extraction`	`text`	`{"text": "Embed this sentence"}`

Full HTTP Reference

See complete request/response params and examples for all 30+ task types.

Run Offline (Air-Gapped)

Create a self-contained image with weights baked in - no internet required at runtime.Step 1: Run the container once to download weights

docker run -d \
  -e KEY=YOUR_BYTEZ_KEY \
  -e PORT=8000 \
  -p 8000:8000 \
  --name my-model \
  bytez/qwen_qwen3-4b

Wait for the model to fully load (check with docker logs -f my-model). Once ready, stop it:

docker stop my-model

Step 2: Save as a new image

docker commit my-model my-model-offline

Step 3: Run offline (no internet needed)

docker run -d \
  -e KEY=YOUR_BYTEZ_KEY \
  -e PORT=8000 \
  -p 8000:8000 \
  my-model-offline

To verify it’s truly offline, add --network none to the run command.

Optional: Export for another machine

# Save to a file
docker save my-model-offline -o my-model-offline.tar

# Load on another machine
docker load -i my-model-offline.tar

Troubleshooting

Container won’t startCheck that Docker is installed and running. For GPU support, ensure you have NVIDIA drivers and the NVIDIA Container Toolkit installed.Out of memoryTry DEVICE=auto to split the model across GPU and CPU memory. For large models, you may need more VRAM or system RAM.Slow first requestThe first request loads model weights into memory. Subsequent requests are fast. Use weight caching (-v mount) to speed up container restarts.Model only works with specific DEVICE settingSome models only support auto, cuda, or cpu. If one doesn’t work, try another.

Need Help?

Discord

Get live support from the community

Model API

Full HTTP Reference

​Need Help?

Discord

Need Help?