Model endpoints

AI Inference on Dedicated GPUs

Denvr Model Endpoints provide containers to self-host GPU-accelerated inference services for open source and customized AI models. Model endpoints expose OpenAI-compatible APIs with API Key authentication for fast integration with agentic, chat, and LLM-based systems.

Model Catalog

The model catalog contains the most commonly used GenAI open source models for chat, reasoning, tool calling, and code generation. Each model allows different precision and quantization options to match budget and SLA requirements. Catalog modes are pre-downloaded to Denvr AI Cloud to increase startup time.

circle-info

Denvr can enable tenant-specific models or different model parameterization as requested.

Creating an Endpoint

Configure your model with the following options:

Name

Unique identifier or label for the model instance. Allows users to manage and track different models within their infrastructure.

Resource Pool

Defines how compute resources are allocated, either on-demand for dynamic allocation or reserved for dedicated single-tenant resources.

Model

Selection of model variants for parameters, active parameters, and model precision

API Keys

Generate or provide secret keys for Bearer-authorization token

Choose the GPU instance to run the model on, and Launch!

circle-info

Models are validated to work on listed instance types. The optimization engine will use model-parallelism (MP) or data-parallelism (DP) to use all of the GPUs available.

Managing Model Endpoints

Application overview will display the Model overview, online status, connection info, and access to runtime logs for troubleshooting.

Endpoints can be stopped, restarted, as well as deleted completely.

Accessing the Endpoint

The model overview provides the Private and Public IPs as well as the public DNS. The connection info will show a full example of the HTTPS endpoint and example to use with curl.

The same URL, model, and API KEY can be used in any application or code that is OpenAI-compatible. This includes OpenWebUI, n8n, and OpenCode.

Runtime Logs and Metrics

The application details screen displays the model engine's log file for troubleshooting.

circle-info

Model runtime logs are currently not available via Denvr API.

Custom Models

Models not listed in our Catalog, including private models, can be run using our vLLM Server or Ollama Server applications.

The configuration for vLLM Server allows you to specify:

Launch command

Overwrite the container entry command to include vLLM parameters.

Environment variables

View or edit parameters passed into the vLLM engine

Last updated