Implement test-time compute scaling for math problems
Generate high-quality predictions from images
Powerful foundation model for zero-shot object tracking
Transcribe audio or YouTube videos into text