Model endpoints
AI Inference on Dedicated GPUs
Denvr Model Endpoints provide containers to self-host GPU-accelerated inference services for open source and customized AI models. Model endpoints expose OpenAI-compatible APIs with API Key authentication for fast integration with agentic, chat, and LLM-based systems.
Model Catalog
The model catalog contains the most commonly used GenAI open source models for chat, reasoning, tool calling, and code generation. Each model allows different precision and quantization options to match budget and SLA requirements. Catalog modes are pre-downloaded to Denvr AI Cloud to increase startup time.
Denvr can enable tenant-specific models or different model parameterization as requested.

Creating an Endpoint
Configure your model with the following options:

Name
Unique identifier or label for the model instance. Allows users to manage and track different models within their infrastructure.
Resource Pool
Defines how compute resources are allocated, either on-demand for dynamic allocation or reserved for dedicated single-tenant resources.
Model
Selection of model variants for parameters, active parameters, and model precision
API Keys
Generate or provide secret keys for Bearer-authorization token
Choose the GPU instance to run the model on, and Launch!
Models are validated to work on listed instance types. The optimization engine will use model-parallelism (MP) or data-parallelism (DP) to use all of the GPUs available.
Managing Model Endpoints
Application overview will display the Model overview, online status, connection info, and access to runtime logs for troubleshooting.

Endpoints can be stopped, restarted, as well as deleted completely.
Accessing the Endpoint
The model overview provides the Private and Public IPs as well as the public DNS. The connection info will show a full example of the HTTPS endpoint and example to use with curl.

The same URL, model, and API KEY can be used in any application or code that is OpenAI-compatible. This includes OpenWebUI, n8n, and OpenCode.
Runtime Logs and Metrics
The application details screen displays the model engine's log file for troubleshooting.
Model runtime logs are currently not available via Denvr API.
Custom Models
Models not listed in our Catalog, including private models, can be run using our vLLM Server or Ollama Server applications.

The configuration for vLLM Server allows you to specify:
Launch command
Overwrite the container entry command to include vLLM parameters.
Environment variables
View or edit parameters passed into the vLLM engine
Last updated