DeepSeek’s
DeepSeek-R1 is notable for its training through
reinforcement learning (RL) alone, without any use of supervised fine-tuning (SFT), making it stand out amongst other models.
R1 achieved various benchmarks that were previously dominated by OpenAI. Its unique feature includes the utilization of long
Chain of Thought (CoT) ideas and
self-verification processes, which allows the model to verify its own answers (
seangoedecke).