How to include both reasoning_content and content in the conversation history you send back to the model.
Note on thinking-in-context with vLLM: When building multi-turn agentic loops, include both reasoning_content and content in the conversation history you send back to the model. The reasoning content should be re-wrapped in ... tags within the assistant message.
If I am deploying that model on vLLM for public openclaw users , how can I ensure that?
We've updated the model card and docs to cover this exact scenario in detail.
When building multi-turn loops on vLLM, pass the reasoning back as the reasoning field (not reasoning_content) on assistant messages in subsequent requests. The chat template automatically re-wraps it in <think>...</think> tags during tokenization.
For OpenClaw deployments specifically: OpenClaw preserves full assistant turns across steps, so the main thing to watch for is the field name. If your SDK exposes reasoning_content on the response object, map it to reasoning before sending the next request to vLLM. Also keep assistant content as an empty string "" rather than null on tool-call turns.
Full details with code examples: https://docs.arcee.ai/capabilities/reasoning-traces