Mistral Chat Completions API - vLLM engine
Use vLLM’s OpenAI-compatible Chat Completions endpoint to call HexGrid-hosted Mistral models with a messages-based interface.
This page provides copy-pasteable cURL-only examples for standard chat, reasoning, streaming, tool calling, tool-result continuation, JSON output, and common generation parameters.
Endpoint
POST http://<server-ip>:<port>/v1/chat/completions
Set these environment variables before running the examples:
export HEXGRID_API_KEY="your-hexgrid-api-key"
export MISTRAL_BASE_URL="http://<server-ip>:<port>/v1"
export MISTRAL_MODEL="mistralai/Mistral-Small-3.2-24B-Instruct-2506"
You can replace mistralai/Mistral-Small-3.2-24B-Instruct-2506 with another HexGrid-hosted Mistral model, such as:
mistralai/Mistral-7B-Instruct-v0.3
mistralai/Mistral-Small-Instruct-2409
mistralai/Mistral-Small-3.1-24B-Instruct-2503
mistralai/Mistral-Small-3.2-24B-Instruct-2506
mistralai/Mistral-Large-Instruct-2411
mistralai/Magistral-Small-2506
mistralai/Ministral-3-14B-Reasoning-2512
Use the exact model ID configured in your HexGrid deployment.
Create a chat completion
Generate a normal non-streaming response from a Mistral model served by vLLM.
Required attributes
- Name
model- Type
- string
- Description
The served Mistral model name, for example
mistralai/Mistral-Small-3.2-24B-Instruct-2506.
- Name
messages- Type
- array
- Description
The conversation so far. Each message has a
roleandcontent. Common roles aresystem,user,assistant, andtool.
Optional attributes used here
- Name
temperature- Type
- number
- Description
Sampling temperature. Higher values produce more varied output; lower values produce more deterministic output.
- Name
top_p- Type
- number
- Description
Nucleus sampling probability threshold.
- Name
max_tokens- Type
- integer
- Description
Maximum number of tokens to generate.
Request
curl -X POST "$MISTRAL_BASE_URL/chat/completions" \
-H "Authorization: Bearer $HEXGRID_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"model": "'"$MISTRAL_MODEL"'",
"messages": [
{
"role": "system",
"content": "You are a helpful assistant."
},
{
"role": "user",
"content": "Give me a one-sentence explanation of what Mistral models are."
}
],
"temperature": 0.7,
"top_p": 0.9,
"max_tokens": 256
}'
Response shape
{
"id": "chatcmpl-abc123",
"object": "chat.completion",
"created": 1760000000,
"model": "mistralai/Mistral-Small-3.2-24B-Instruct-2506",
"choices": [
{
"index": 0,
"message": {
"role": "assistant",
"content": "Mistral models are open and commercial large language models from Mistral AI designed for chat, coding, reasoning, multilingual tasks, and tool-using applications."
},
"finish_reason": "stop",
"logprobs": null
}
],
"usage": {
"prompt_tokens": 33,
"completion_tokens": 30,
"total_tokens": 63
}
}
Reasoning
Use this section for HexGrid-hosted Mistral reasoning-family models such as Magistral or Ministral Reasoning.
For Mistral reasoning models, vLLM can parse reasoning text into a separate response field when the deployment uses the Mistral reasoning parser. Mistral reasoning traces use [THINK]...[/THINK] internally, and vLLM extracts that reasoning into the response.
For standard Mistral Instruct models, use a reasoning-style system prompt instead of expecting a separate parsed reasoning field.
Required attributes
- Name
model- Type
- string
- Description
A Mistral reasoning-family model configured in HexGrid, for example
mistralai/Magistral-Small-2506ormistralai/Ministral-3-14B-Reasoning-2512.
- Name
messages- Type
- array
- Description
The user conversation.
Optional attributes used here
- Name
temperature- Type
- number
- Description
Use a lower value for more deterministic reasoning.
- Name
max_tokens- Type
- integer
- Description
Increase this value for reasoning-heavy tasks.
Request
curl -X POST "$MISTRAL_BASE_URL/chat/completions" \
-H "Authorization: Bearer $HEXGRID_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"model": "'"$MISTRAL_MODEL"'",
"messages": [
{
"role": "system",
"content": "You are a careful reasoning assistant. Provide a concise final answer."
},
{
"role": "user",
"content": "Which is greater, 9.11 or 9.8? Explain briefly."
}
],
"temperature": 0.3,
"top_p": 0.9,
"max_tokens": 1024
}'
Response shape for reasoning-enabled deployments
{
"id": "chatcmpl-reasoning123",
"object": "chat.completion",
"created": 1760000001,
"model": "mistralai/Magistral-Small-2506",
"choices": [
{
"index": 0,
"message": {
"role": "assistant",
"reasoning": "Compare the numbers digit by digit. Both have integer part 9. The tenths digit of 9.8 is 8, while the tenths digit of 9.11 is 1, so 9.8 is larger.",
"content": "9.8 is greater than 9.11."
},
"finish_reason": "stop",
"logprobs": null
}
],
"usage": {
"prompt_tokens": 43,
"completion_tokens": 78,
"total_tokens": 121
}
}
Response shape for standard instruct deployments
{
"id": "chatcmpl-reasoning-style123",
"object": "chat.completion",
"created": 1760000001,
"model": "mistralai/Mistral-Small-3.2-24B-Instruct-2506",
"choices": [
{
"index": 0,
"message": {
"role": "assistant",
"content": "9.8 is greater than 9.11. Both numbers start with 9, but 9.8 has 8 in the tenths place while 9.11 has 1 in the tenths place."
},
"finish_reason": "stop",
"logprobs": null
}
],
"usage": {
"prompt_tokens": 43,
"completion_tokens": 42,
"total_tokens": 85
}
}
Streaming
Stream tokens as Server-Sent Events instead of waiting for the complete response.
Streaming attributes
- Name
stream- Type
- boolean
- Description
Set to
trueto return incremental chunks.
- Name
stream_options- Type
- object
- Description
Optional streaming configuration.
{ "include_usage": true }requests usage in the final stream chunk.
- Name
max_tokens- Type
- integer
- Description
Maximum number of tokens to generate.
Request
curl -N -X POST "$MISTRAL_BASE_URL/chat/completions" \
-H "Authorization: Bearer $HEXGRID_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"model": "'"$MISTRAL_MODEL"'",
"messages": [
{
"role": "system",
"content": "You are a helpful assistant."
},
{
"role": "user",
"content": "Explain Mistral models in three short bullet points."
}
],
"stream": true,
"stream_options": {
"include_usage": true
},
"temperature": 0.7,
"top_p": 0.9,
"max_tokens": 256
}'
Sample streamed response shape
data: {"id":"chatcmpl-stream123","object":"chat.completion.chunk","created":1760000002,"model":"mistralai/Mistral-Small-3.2-24B-Instruct-2506","choices":[{"index":0,"delta":{"role":"assistant"},"finish_reason":null}],"usage":null}
data: {"id":"chatcmpl-stream123","object":"chat.completion.chunk","created":1760000002,"model":"mistralai/Mistral-Small-3.2-24B-Instruct-2506","choices":[{"index":0,"delta":{"content":"- Mistral models are language models from Mistral AI for chat, coding, and agent workflows."},"finish_reason":null}],"usage":null}
data: {"id":"chatcmpl-stream123","object":"chat.completion.chunk","created":1760000002,"model":"mistralai/Mistral-Small-3.2-24B-Instruct-2506","choices":[{"index":0,"delta":{"content":"\n- Many Mistral models support function calling and structured tool-use patterns."},"finish_reason":null}],"usage":null}
data: {"id":"chatcmpl-stream123","object":"chat.completion.chunk","created":1760000002,"model":"mistralai/Mistral-Small-3.2-24B-Instruct-2506","choices":[{"index":0,"delta":{"content":"\n- They can be served through vLLM using an OpenAI-compatible API."},"finish_reason":null}],"usage":null}
data: {"id":"chatcmpl-stream123","object":"chat.completion.chunk","created":1760000002,"model":"mistralai/Mistral-Small-3.2-24B-Instruct-2506","choices":[{"index":0,"delta":{},"finish_reason":"stop"}],"usage":null}
data: {"id":"chatcmpl-stream123","object":"chat.completion.chunk","created":1760000002,"model":"mistralai/Mistral-Small-3.2-24B-Instruct-2506","choices":[],"usage":{"prompt_tokens":29,"completion_tokens":45,"total_tokens":74}}
data: [DONE]
Streaming reasoning
Stream both reasoning and final answer chunks for a Mistral reasoning-family model.
This response shape applies when the HexGrid deployment uses vLLM’s Mistral reasoning parser. Reasoning chunks appear in delta.reasoning, while final answer chunks appear in delta.content.
Streaming reasoning attributes
- Name
stream- Type
- boolean
- Description
Set to
true.
- Name
stream_options- Type
- object
- Description
Use
{ "include_usage": true }to request token usage in the final stream chunk.
- Name
max_tokens- Type
- integer
- Description
Increase this for reasoning-heavy prompts.
Request
curl -N -X POST "$MISTRAL_BASE_URL/chat/completions" \
-H "Authorization: Bearer $HEXGRID_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"model": "'"$MISTRAL_MODEL"'",
"messages": [
{
"role": "user",
"content": "Which is greater, 9.11 or 9.8?"
}
],
"stream": true,
"stream_options": {
"include_usage": true
},
"temperature": 0.3,
"top_p": 0.9,
"max_tokens": 1024
}'
Sample streamed response shape
data: {"id":"chatcmpl-reason-stream123","object":"chat.completion.chunk","created":1760000003,"model":"mistralai/Magistral-Small-2506","choices":[{"index":0,"delta":{"role":"assistant","reasoning":"Compare the decimal values digit by digit."},"finish_reason":null}],"usage":null}
data: {"id":"chatcmpl-reason-stream123","object":"chat.completion.chunk","created":1760000003,"model":"mistralai/Magistral-Small-2506","choices":[{"index":0,"delta":{"reasoning":" Both have integer part 9. The tenths digit of 9.8 is 8, while the tenths digit of 9.11 is 1."},"finish_reason":null}],"usage":null}
data: {"id":"chatcmpl-reason-stream123","object":"chat.completion.chunk","created":1760000003,"model":"mistralai/Magistral-Small-2506","choices":[{"index":0,"delta":{"content":"9.8 is greater than 9.11."},"finish_reason":null}],"usage":null}
data: {"id":"chatcmpl-reason-stream123","object":"chat.completion.chunk","created":1760000003,"model":"mistralai/Magistral-Small-2506","choices":[{"index":0,"delta":{},"finish_reason":"stop"}],"usage":null}
data: {"id":"chatcmpl-reason-stream123","object":"chat.completion.chunk","created":1760000003,"model":"mistralai/Magistral-Small-2506","choices":[],"usage":{"prompt_tokens":16,"completion_tokens":64,"total_tokens":80}}
data: [DONE]
Tool calling
Provide function tools that the model may call. The model can return tool_calls instead of a final text answer.
HexGrid-hosted Mistral models have tool calling enabled by default.
Tool attributes
- Name
tools- Type
- array
- Description
Tool definitions. Use
type: "function"with a JSON schema for parameters.
- Name
tool_choice- Type
- string | object
- Description
Controls tool use. Use
"auto"to let the model decide,"none"to disable tool calls,"required"to force at least one tool call, or an object to force a named function.
- Name
parallel_tool_calls- Type
- boolean
- Description
Set to
falseif you want at most one tool call in a single response.
Request
curl -X POST "$MISTRAL_BASE_URL/chat/completions" \
-H "Authorization: Bearer $HEXGRID_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"model": "'"$MISTRAL_MODEL"'",
"messages": [
{
"role": "system",
"content": "You are a helpful assistant. Use tools when needed."
},
{
"role": "user",
"content": "What is the weather in Paris today?"
}
],
"tools": [
{
"type": "function",
"function": {
"name": "get_current_weather",
"description": "Get the current weather for a city or district.",
"parameters": {
"type": "object",
"properties": {
"location": {
"type": "string",
"description": "City or district, such as Paris, San Francisco, or Bengaluru."
},
"unit": {
"type": "string",
"enum": ["celsius", "fahrenheit"],
"description": "Temperature unit."
}
},
"required": ["location"]
}
}
}
],
"tool_choice": "auto",
"parallel_tool_calls": false,
"temperature": 0.2,
"max_tokens": 512
}'
Response shape
{
"id": "chatcmpl-tool123",
"object": "chat.completion",
"created": 1760000004,
"model": "mistralai/Mistral-Small-3.2-24B-Instruct-2506",
"choices": [
{
"index": 0,
"message": {
"role": "assistant",
"content": "",
"tool_calls": [
{
"id": "call_abc123",
"type": "function",
"function": {
"name": "get_current_weather",
"arguments": "{\"location\":\"Paris\",\"unit\":\"celsius\"}"
}
}
]
},
"finish_reason": "tool_calls",
"logprobs": null
}
],
"usage": {
"prompt_tokens": 118,
"completion_tokens": 24,
"total_tokens": 142
}
}
Required tool calling
Force the model to call at least one tool by setting tool_choice to "required".
Tool attributes
- Name
tool_choice- Type
- string
- Description
Set to
"required"to force at least one tool call.
- Name
tools- Type
- array
- Description
Tool definitions available to the model.
Request
curl -X POST "$MISTRAL_BASE_URL/chat/completions" \
-H "Authorization: Bearer $HEXGRID_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"model": "'"$MISTRAL_MODEL"'",
"messages": [
{
"role": "user",
"content": "Find the current weather for Paris."
}
],
"tools": [
{
"type": "function",
"function": {
"name": "get_current_weather",
"description": "Get the current weather for a city or district.",
"parameters": {
"type": "object",
"properties": {
"location": {
"type": "string",
"description": "City or district."
}
},
"required": ["location"]
}
}
}
],
"tool_choice": "required",
"parallel_tool_calls": false,
"temperature": 0.2,
"max_tokens": 512
}'
Response shape
{
"id": "chatcmpl-required-tool123",
"object": "chat.completion",
"created": 1760000005,
"model": "mistralai/Mistral-Small-3.2-24B-Instruct-2506",
"choices": [
{
"index": 0,
"message": {
"role": "assistant",
"content": "",
"tool_calls": [
{
"id": "call_weather_001",
"type": "function",
"function": {
"name": "get_current_weather",
"arguments": "{\"location\":\"Paris\"}"
}
}
]
},
"finish_reason": "tool_calls"
}
],
"usage": {
"prompt_tokens": 92,
"completion_tokens": 20,
"total_tokens": 112
}
}
Named tool calling
Force a specific tool by passing an object to tool_choice.
Tool choice object
- Name
tool_choice- Type
- object
- Description
Use
{ "type": "function", "function": { "name": "..." } }to force a specific tool.
Request
curl -X POST "$MISTRAL_BASE_URL/chat/completions" \
-H "Authorization: Bearer $HEXGRID_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"model": "'"$MISTRAL_MODEL"'",
"messages": [
{
"role": "user",
"content": "Use the weather tool to check Paris."
}
],
"tools": [
{
"type": "function",
"function": {
"name": "get_current_weather",
"description": "Get the current weather for a city or district.",
"parameters": {
"type": "object",
"properties": {
"location": {
"type": "string"
},
"unit": {
"type": "string",
"enum": ["celsius", "fahrenheit"]
}
},
"required": ["location"]
}
}
}
],
"tool_choice": {
"type": "function",
"function": {
"name": "get_current_weather"
}
},
"parallel_tool_calls": false,
"temperature": 0.2,
"max_tokens": 512
}'
Response shape
{
"id": "chatcmpl-named-tool123",
"object": "chat.completion",
"created": 1760000006,
"model": "mistralai/Mistral-Small-3.2-24B-Instruct-2506",
"choices": [
{
"index": 0,
"message": {
"role": "assistant",
"content": "",
"tool_calls": [
{
"id": "call_weather_002",
"type": "function",
"function": {
"name": "get_current_weather",
"arguments": "{\"location\":\"Paris\",\"unit\":\"celsius\"}"
}
}
]
},
"finish_reason": "tool_calls"
}
],
"usage": {
"prompt_tokens": 105,
"completion_tokens": 22,
"total_tokens": 127
}
}
Tool result
After your application executes the selected tool, send the tool result back so Mistral can produce a final answer.
Tool result message
- Name
role- Type
- string
- Description
Use
toolfor OpenAI-compatible tool result messages.
- Name
tool_call_id- Type
- string
- Description
The
idreturned by the assistant message’stool_callsitem.
- Name
content- Type
- string
- Description
The tool result. If the result is structured data, serialize it as a JSON string.
Request
curl -X POST "$MISTRAL_BASE_URL/chat/completions" \
-H "Authorization: Bearer $HEXGRID_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"model": "'"$MISTRAL_MODEL"'",
"messages": [
{
"role": "user",
"content": "What is the weather in Paris today?"
},
{
"role": "assistant",
"content": "",
"tool_calls": [
{
"id": "call_abc123",
"type": "function",
"function": {
"name": "get_current_weather",
"arguments": "{\"location\":\"Paris\",\"unit\":\"celsius\"}"
}
}
]
},
{
"role": "tool",
"tool_call_id": "call_abc123",
"content": "{\"location\":\"Paris\",\"temperature\":18,\"condition\":\"Partly cloudy\",\"unit\":\"celsius\"}"
}
],
"max_tokens": 1024
}'
Response shape
{
"id": "chatcmpl-tool-result123",
"object": "chat.completion",
"created": 1760000007,
"model": "mistralai/Mistral-Small-3.2-24B-Instruct-2506",
"choices": [
{
"index": 0,
"message": {
"role": "assistant",
"content": "The weather in Paris today is partly cloudy, with a temperature of 18°C."
},
"finish_reason": "stop",
"logprobs": null
}
],
"usage": {
"prompt_tokens": 156,
"completion_tokens": 20,
"total_tokens": 176
}
}
JSON mode
Request valid JSON output using vLLM’s OpenAI-compatible response_format.
JSON attributes
- Name
response_format- Type
- object
- Description
Set to
{ "type": "json_object" }to request JSON object output.
- Name
messages- Type
- array
- Description
Include an explicit instruction to return JSON in the system or user message.
Request
curl -X POST "$MISTRAL_BASE_URL/chat/completions" \
-H "Authorization: Bearer $HEXGRID_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"model": "'"$MISTRAL_MODEL"'",
"messages": [
{
"role": "system",
"content": "You are a helpful assistant. Return only valid JSON."
},
{
"role": "user",
"content": "Create a JSON object with three short product tagline ideas for a note-taking app."
}
],
"response_format": {
"type": "json_object"
},
"temperature": 0.7,
"top_p": 0.9,
"max_tokens": 512
}'
Response shape
{
"id": "chatcmpl-json123",
"object": "chat.completion",
"created": 1760000008,
"model": "mistralai/Mistral-Small-3.2-24B-Instruct-2506",
"choices": [
{
"index": 0,
"message": {
"role": "assistant",
"content": "{\"taglines\":[\"Capture ideas before they disappear.\",\"Your thoughts, organized instantly.\",\"Notes that keep up with you.\"]}"
},
"finish_reason": "stop",
"logprobs": null
}
],
"usage": {
"prompt_tokens": 48,
"completion_tokens": 36,
"total_tokens": 84
}
}
JSON schema
Request output that follows a JSON schema using vLLM’s response_format with type: "json_schema".
JSON schema attributes
- Name
response_format- Type
- object
- Description
Output-format constraint.
- Name
json_schema- Type
- object
- Description
The schema that the response should follow.
Request
curl -X POST "$MISTRAL_BASE_URL/chat/completions" \
-H "Authorization: Bearer $HEXGRID_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"model": "'"$MISTRAL_MODEL"'",
"messages": [
{
"role": "system",
"content": "Return only JSON that matches the provided schema."
},
{
"role": "user",
"content": "Create three short product tagline ideas for a note-taking app."
}
],
"response_format": {
"type": "json_schema",
"json_schema": {
"name": "tagline_response",
"schema": {
"type": "object",
"properties": {
"taglines": {
"type": "array",
"items": {
"type": "string"
},
"minItems": 3,
"maxItems": 3
}
},
"required": ["taglines"],
"additionalProperties": false
}
}
},
"temperature": 0.7,
"top_p": 0.9,
"max_tokens": 512
}'
Response shape
{
"id": "chatcmpl-schema123",
"object": "chat.completion",
"created": 1760000009,
"model": "mistralai/Mistral-Small-3.2-24B-Instruct-2506",
"choices": [
{
"index": 0,
"message": {
"role": "assistant",
"content": "{\"taglines\":[\"Capture ideas before they disappear.\",\"Your thoughts, organized instantly.\",\"Notes that keep up with you.\"]}"
},
"finish_reason": "stop"
}
],
"usage": {
"prompt_tokens": 86,
"completion_tokens": 36,
"total_tokens": 122
}
}
Structured outputs with vLLM parameters
Use vLLM’s structured_outputs request field when you want backend-guided output constraints.
vLLM supports structured outputs in the OpenAI-compatible server and accepts constraints such as JSON schema, regex, choice, grammar, and structural tags.
Structured output attributes
- Name
structured_outputs- Type
- object
- Description
vLLM-specific structured output constraints.
- Name
json- Type
- object
- Description
JSON schema to constrain the model output.
Request
curl -X POST "$MISTRAL_BASE_URL/chat/completions" \
-H "Authorization: Bearer $HEXGRID_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"model": "'"$MISTRAL_MODEL"'",
"messages": [
{
"role": "system",
"content": "Return only JSON."
},
{
"role": "user",
"content": "Classify the sentiment of this text: vLLM makes Mistral serving fast and easy."
}
],
"structured_outputs": {
"json": {
"type": "object",
"properties": {
"sentiment": {
"type": "string",
"enum": ["positive", "neutral", "negative"]
},
"confidence": {
"type": "number"
}
},
"required": ["sentiment", "confidence"],
"additionalProperties": false
}
},
"temperature": 0,
"max_tokens": 128
}'
Response shape
{
"id": "chatcmpl-structured123",
"object": "chat.completion",
"created": 1760000010,
"model": "mistralai/Mistral-Small-3.2-24B-Instruct-2506",
"choices": [
{
"index": 0,
"message": {
"role": "assistant",
"content": "{\"sentiment\":\"positive\",\"confidence\":0.94}"
},
"finish_reason": "stop"
}
],
"usage": {
"prompt_tokens": 45,
"completion_tokens": 12,
"total_tokens": 57
}
}
Multiple vLLM parameters
Use this example when you need a broader set of vLLM generation controls.
For raw HTTP/cURL, vLLM-specific parameters can be merged directly into the JSON request body.
Parameters shown here
- Name
temperature- Type
- number
- Description
Sampling temperature.
- Name
top_p- Type
- number
- Description
Nucleus sampling probability threshold.
- Name
top_k- Type
- integer
- Description
vLLM-specific top-k sampling parameter.
- Name
min_p- Type
- number
- Description
vLLM-specific minimum probability sampling parameter.
- Name
repetition_penalty- Type
- number
- Description
vLLM-specific repetition penalty.
- Name
presence_penalty- Type
- number
- Description
Penalizes tokens based on whether they already appeared.
- Name
frequency_penalty- Type
- number
- Description
Penalizes tokens based on how frequently they appeared.
- Name
seed- Type
- integer
- Description
Best-effort deterministic seed.
- Name
n- Type
- integer
- Description
Number of candidate responses to generate.
- Name
logprobs- Type
- boolean
- Description
Whether to return token log probabilities if supported by the serving configuration.
- Name
top_logprobs- Type
- integer
- Description
Number of top candidate tokens to return when
logprobsis enabled.
Request
curl -X POST "$MISTRAL_BASE_URL/chat/completions" \
-H "Authorization: Bearer $HEXGRID_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"model": "'"$MISTRAL_MODEL"'",
"messages": [
{
"role": "system",
"content": "You are a helpful assistant. Keep the answer concise."
},
{
"role": "user",
"content": "Give me five naming ideas for an AI inference platform."
}
],
"temperature": 0.7,
"top_p": 0.9,
"top_k": 40,
"min_p": 0.0,
"repetition_penalty": 1.05,
"presence_penalty": 0.2,
"frequency_penalty": 0.2,
"max_tokens": 512,
"seed": 1234,
"n": 1,
"logprobs": true,
"top_logprobs": 2,
"stream": false
}'
Response shape
{
"id": "chatcmpl-params123",
"object": "chat.completion",
"created": 1760000011,
"model": "mistralai/Mistral-Small-3.2-24B-Instruct-2506",
"choices": [
{
"index": 0,
"message": {
"role": "assistant",
"content": "1. HexGrid Inference\n2. ModelForge\n3. TensorRoute\n4. InferaCloud\n5. LatticeAI"
},
"finish_reason": "stop",
"logprobs": {
"content": [
{
"token": "1",
"logprob": -0.02,
"bytes": [49],
"top_logprobs": [
{
"token": "1",
"logprob": -0.02,
"bytes": [49]
},
{
"token": "-",
"logprob": -4.1,
"bytes": [45]
}
]
}
]
}
}
],
"usage": {
"prompt_tokens": 38,
"completion_tokens": 34,
"total_tokens": 72
}
}
Official sources
- Mistral vLLM deployment docs: https://docs.mistral.ai/models/deployment/local-deployment/vllm
- Mistral function calling docs: https://docs.mistral.ai/studio-api/conversations/function-calling
- Mistral models overview: https://docs.mistral.ai/models/overview
- vLLM OpenAI-compatible server docs: https://docs.vllm.ai/en/latest/serving/openai_compatible_server/
- vLLM tool calling docs: https://docs.vllm.ai/en/latest/features/tool_calling/
- vLLM Mistral tool parser docs: https://docs.vllm.ai/en/latest/api/vllm/tool_parsers/mistral_tool_parser/
- vLLM Mistral reasoning parser docs: https://docs.vllm.ai/en/stable/api/vllm/reasoning/mistral_reasoning_parser/
- vLLM structured outputs docs: https://docs.vllm.ai/en/latest/features/structured_outputs/
- vLLM Ministral-3 Reasoning usage guide: https://docs.vllm.ai/projects/recipes/en/latest/Mistral/Ministral-3-Reasoning.html
- Mistral-7B-Instruct-v0.3 model card: https://huggingface.co/mistralai/Mistral-7B-Instruct-v0.3
- Mistral-Large-Instruct-2411 model card: https://huggingface.co/mistralai/Mistral-Large-Instruct-2411