Qwen is a series of LLMs released and maintained by Alibaba Cloud. QwQ is the model with reasoning capabilities in Qwen series. A while ago, the team released a preview version of this model and now, they’ve released QwQ-32B model completely. It is available in Huggingface and Ollama model repository.
Links
https://huggingface.co/Qwen/QwQ-32B
https://ollama.com/library/qwq
They’ve used reinforcement learning (RL) scaling approach driven by outcome-based rewards. As mentioned in their blog post, instead of a traditional reward model, an accuracy verifier is used in training this model. It is trained with rewards from general reward model and some rule-based verifiers. You can use QwQ-32B via Hugging Face Transformers and Alibaba Cloud DashScope API.
Example Code with Transformers
from transformers import AutoModelForCausalLM, AutoTokenizer
model_name = "Qwen/QwQ-32B"
model = AutoModelForCausalLM.from_pretrained(
model_name,
torch_dtype="auto",
device_map="auto"
)
tokenizer = AutoTokenizer.from_pretrained(model_name)
prompt = "How many r's are in the word \"strawberry\""
messages = [
{"role": "user", "content": prompt}
]
text = tokenizer.apply_chat_template(
messages,
tokenize=False,
add_generation_prompt=True
)
model_inputs = tokenizer([text], return_tensors="pt").to(model.device)
generated_ids = model.generate(
**model_inputs,
max_new_tokens=32768
)
generated_ids = [
output_ids[len(input_ids):] for input_ids, output_ids in zip(model_inputs.input_ids, generated_ids)
]
response = tokenizer.batch_decode(generated_ids, skip_special_tokens=True)[0]
print(response)
Example Code with DashScope API
from openai import OpenAI
import os
# Initialize OpenAI client
client = OpenAI(
# If the environment variable is not configured, replace with your API Key: api_key="sk-xxx"
# How to get an API Key:https://help.aliyun.com/zh/model-studio/developer-reference/get-api-key
api_key=os.getenv("DASHSCOPE_API_KEY"),
base_url="https://dashscope.aliyuncs.com/compatible-mode/v1"
)
reasoning_content = ""
content = ""
is_answering = False
completion = client.chat.completions.create(
model="qwq-32b",
messages=[
{"role": "user", "content": "Which is larger, 9.9 or 9.11?"}
],
stream=True,
# Uncomment the following line to return token usage in the last chunk
# stream_options={
# "include_usage": True
# }
)
print("\n" + "=" * 20 + "reasoning content" + "=" * 20 + "\n")
for chunk in completion:
# If chunk.choices is empty, print usage
if not chunk.choices:
print("\nUsage:")
print(chunk.usage)
else:
delta = chunk.choices[0].delta
# Print reasoning content
if hasattr(delta, 'reasoning_content') and delta.reasoning_content is not None:
print(delta.reasoning_content, end='', flush=True)
reasoning_content += delta.reasoning_content
else:
if delta.content != "" and is_answering is False:
print("\n" + "=" * 20 + "content" + "=" * 20 + "\n")
is_answering = True
# Print content
print(delta.content, end='', flush=True)
content += delta.content
Performance Evaluation
Below is the evaluation chart to show how this 32B model is competing against other reasoning model, especially DeepSeek-R1–671B.
It is heavily competing against DeepSeek-R1–671B model in all the five benchmarks and outperforming OpenAI-o1-mini (except for IFEval).
I was wondering what OpenAI’s ChatGPT would think about it 🤣
Here’s a small insight from the comparison between QwQ-32b and Deepseek-r1–671b, generated by ChatGPT with this provided chart.
(NOTE: For some reason, 671b is shown 67.1b by ChatGPT — Please ignore)
It is clear that the storage requirements and hardware requirements are higher for Deepseek-R1–671b model when compared to QwQ-32b model.
Based on the evaluation, it is understood that the QwQ model is performing mostly very well for real-world tasks.
You can also try using the QwQ reasoning model in Qwen Chat at https://chat.qwen.ai/
NOTE — Referenced this video for some information: https://youtu.be/W85kbOduL8c?si=058s4_cmslrhRAxk
Happy Learning !