How to include both reasoning_content and content in the conversation history you send back to the model.

by rdsm - opened about 1 month ago

Note on thinking-in-context with vLLM: When building multi-turn agentic loops, include both reasoning_content and content in the conversation history you send back to the model. The reasoning content should be re-wrapped in ... tags within the assistant message.

If I am deploying that model on vLLM for public openclaw users , how can I ensure that?

annekethvij

Arcee AI org 11 days ago

We've updated the model card and docs to cover this exact scenario in detail.

When building multi-turn loops on vLLM, pass the reasoning back as the reasoning field (not reasoning_content) on assistant messages in subsequent requests. The chat template automatically re-wraps it in <think>...</think> tags during tokenization.

For OpenClaw deployments specifically: OpenClaw preserves full assistant turns across steps, so the main thing to watch for is the field name. If your SDK exposes reasoning_content on the response object, map it to reasoning before sending the next request to vLLM. Also keep assistant content as an empty string "" rather than null on tool-call turns.

Full details with code examples: https://docs.arcee.ai/capabilities/reasoning-traces

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment