# Model endpoints

Denvr Model Endpoints provide containers to self-host GPU-accelerated inference services for open source and customized AI models.  Model endpoints expose OpenAI-compatible APIs with API Key authentication for fast integration with agentic, chat, and LLM-based systems.

* [Model Catalog](#model-catalog)
* [Creating an Endpoint](#creating-an-endpoint)
* [Managing Endpoints](#managing-model-endpoints)
* [Accessing the Endpoint](#accessing-the-endpoint)
* [Runtime Logs and Metrics](#runtime-logs-and-metrics)
* [Custom Models](#custom-models)

## Model Catalog

The model catalog contains the most commonly used GenAI open source models for chat, reasoning, tool calling, and code generation.  Each model allows different precision and quantization options to match budget and SLA requirements.\
\
Catalog modes are pre-downloaded to Denvr AI Cloud to increase startup time.

{% hint style="info" %}
Denvr can enable tenant-specific models or different model parameterization as requested.
{% endhint %}

<figure><img src="https://1008771031-files.gitbook.io/~/files/v0/b/gitbook-x-prod.appspot.com/o/spaces%2Fo84Tuz16JeqaVoFstgKj%2Fuploads%2F7m2dqJ0U5j7E21t5ha4L%2Fimage.png?alt=media&#x26;token=dc59acce-1dda-4bce-a740-176055d38a2c" alt=""><figcaption></figcaption></figure>

## Creating an Endpoint

Configure your model with the following options:

<figure><img src="https://1008771031-files.gitbook.io/~/files/v0/b/gitbook-x-prod.appspot.com/o/spaces%2Fo84Tuz16JeqaVoFstgKj%2Fuploads%2FnsSw7YJGhUJTPFdSeDqF%2Fimage.png?alt=media&#x26;token=220cc74f-36ea-4078-a3b7-3fe1dbd320a0" alt=""><figcaption></figcaption></figure>

<table data-header-hidden data-full-width="false"><thead><tr><th width="188.91015625"></th><th></th></tr></thead><tbody><tr><td>Name</td><td>Unique identifier or label for the model instance. Allows users to manage and track different models within their infrastructure.</td></tr><tr><td>Resource Pool</td><td>Defines how compute resources are allocated, either <strong>on-demand</strong> for dynamic allocation or <strong>reserved</strong> for dedicated single-tenant resources.</td></tr><tr><td>Model</td><td>Selection of model variants for parameters, active parameters, and model precision</td></tr><tr><td>API Keys</td><td>Generate or provide secret keys for Bearer-authorization token</td></tr></tbody></table>

Choose the GPU instance to run the model on, and Launch!

{% hint style="info" %}
Models are validated to work on listed instance types.  The optimization engine will use model-parallelism (MP) or data-parallelism (DP) to use all of the GPUs available.&#x20;
{% endhint %}

## Managing Model Endpoints

Application overview will display the Model overview, online status, connection info, and access to runtime logs for troubleshooting.

<figure><img src="https://1008771031-files.gitbook.io/~/files/v0/b/gitbook-x-prod.appspot.com/o/spaces%2Fo84Tuz16JeqaVoFstgKj%2Fuploads%2F0S5jewP0ebeBbvwVTxaW%2Fimage.png?alt=media&#x26;token=39cda747-3adf-428b-866c-3d3be190403f" alt=""><figcaption></figcaption></figure>

Endpoints can be stopped, restarted, as well as deleted completely.

## Accessing the Endpoint

The model overview provides the Private and Public IPs as well as the public DNS.  The connection info will show a full example of the HTTPS endpoint and example to use with curl.

<figure><img src="https://1008771031-files.gitbook.io/~/files/v0/b/gitbook-x-prod.appspot.com/o/spaces%2Fo84Tuz16JeqaVoFstgKj%2Fuploads%2FeCuU60CWJScfH2WEbMkr%2Fimage.png?alt=media&#x26;token=8c306a5b-5c0c-4b57-ae9c-721aae08ed7c" alt="" width="563"><figcaption></figcaption></figure>

The same URL, model, and API KEY can be used in any application or code that is OpenAI-compatible.  This includes OpenWebUI, n8n, and OpenCode.&#x20;

## Runtime Logs and Metrics

The application details screen displays the model engine's log file for troubleshooting.

{% hint style="info" %}
Model runtime logs are currently not available via Denvr API.
{% endhint %}

## Custom Models

Models not listed in our Catalog, including private models, can be run using our vLLM Server or Ollama Server applications.

<figure><img src="https://1008771031-files.gitbook.io/~/files/v0/b/gitbook-x-prod.appspot.com/o/spaces%2Fo84Tuz16JeqaVoFstgKj%2Fuploads%2FYNE592h70SHMaIq3Rxgg%2Fimage.png?alt=media&#x26;token=0dee494e-9eea-4d8e-8bef-fb2e7c27527c" alt=""><figcaption></figcaption></figure>

The configuration for vLLM Server allows you to specify:

<table data-header-hidden><thead><tr><th width="207.0859375"></th><th></th></tr></thead><tbody><tr><td>Launch command</td><td>Overwrite the container entry command to include vLLM parameters.</td></tr><tr><td>Environment variables</td><td>View or edit parameters passed into the vLLM engine</td></tr></tbody></table>
