Sandbox

community

AI & ML interests

None defined yet.

Kseniase 
posted an update about 9 hours ago
view post
Post
481
15 Outstanding Research Papers from NeurIPS 2025

NeurIPS 2025, as a premier annual event in machine learning and computational neuroscience, tackles major topics like the future of AI, current research, and the most difficult challenges. While we’re not attending this year, we’re closely following the updates and today we pull together a quick, easy-to-digest roundup of a few standout papers so you can jump in without getting overwhelmed.

Here is a list of 15 papers from NeurIPS 2025, including 8 top research papers that received awards, along with 7 others that caught our attention:

1. Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks → https://neurips.cc/virtual/2025/loc/san-diego/test-of-time/128328
Test of Time Award winner. Introduces the RPN, a small convnet that predicts objectness and boxes on shared features, enabling Faster R-CNN to share computation and run around 5 fps on a GPU

2. Artificial Hivemind: The Open-Ended Homogeneity of LMs (and Beyond) → https://neurips.cc/virtual/2025/loc/san-diego/poster/121421
Releases a huge open-ended prompt dataset, showing that LLMs often fall into an “artificial hivemind” – generate surprisingly similar answers – and measuring diversity collapse

3. Optimal Mistake Bounds for Transductive Online Learning → https://neurips.cc/virtual/2025/loc/san-diego/poster/119098
Settles a 30-year-old question by showing how much unlabeled data helps in online learning – it gives a precise quadratic advantage with tight matching bounds

4. Gated Attention for LLMs: Non-linearity, Sparsity, and Attention-Sink-Free → https://neurips.cc/virtual/2025/loc/san-diego/poster/120216
Demonstrates how gating actually affects attention: a simple sigmoid gate after Scaled Dot-Product Attention (SDPA) boosts performance, stability, and long-context behavior by adding useful nonlinearity and sparse modulation

Read further below ⬇️
Also, subscribe to the Turing Post: https://www.turingpost.com/subscribe
  • 1 reply
·
Kseniase 
posted an update 7 days ago
view post
Post
6145
9 Recent advances in Multi-Agent Systems (all open-source)

The idea to split tasks across multiple agents instead of relying on one universal agent is now seen as one of the most effective ways to build an AI stack. Concepts like “agent swarms” were highlighted at the AI Engineer Code Summit in NYC (Nov 20–21) as the winning architecture. And this trend is not only about coding and software. It applies across all AI domains.

So here is some recent research that helps keep multi-agent systems (MAS) better and up-to-date:

1. LatentMAS → Latent Collaboration in Multi-Agent Systems (2511.20639)
AI agents share their hidden "thoughts" directly in latent space instead of talking through text. This makes collaboration and reasoning way faster and accurate (no extra training needed)

2. Puppeteer → Multi-Agent Collaboration via Evolving Orchestration (2505.19591)
Uses a “puppeteer” LLM that dynamically decides which agents (“puppets”) to call and in what order. By learning this orchestration with reinforcement learning (RL), the system solves complex tasks more efficiently and with fewer compute costs

3. MADD → MADD: Multi-Agent Drug Discovery Orchestra (2511.08217)
A MAS with 4 agents for drug discovery. It lets researchers describe a drug discovery task in plain language. Then MADD automatically builds and runs the full hit-identification pipeline, making AI-driven drug design a simple end-to-end workflow

4. Multi-Agent Tool-Integrated Policy Optimization (MATPO) → Multi-Agent Tool-Integrated Policy Optimization (2510.04678)
Lets one LLM act as multiple agents (like a planner and a worker) by using different prompts and training them together with RL. So you get the benefits of a multi-agent system without needing multiple models

If you're interested in trends in multi-agent for software development of the future, explore my article with the emergent playbook. This is super interesting → https://www.turingpost.com/p/aisoftwarestack
Also, subscribe to the Turing Post: https://www.turingpost.com/subscribe

Read further below ⬇️
  • 2 replies
·
Kseniase 
posted an update 14 days ago
view post
Post
1939
6 Essential Reads on Spatial Intelligence

In AI, spatial intelligence is basically the model’s “sense of space” – its ability to understand where things are, how they relate, and how they move. It lets an AI models navigate a room, interpret a scene, or figure out how objects fit together, like giving it a built-in mental map. For example, world models can't live without spatial intelligence.

Here are 6 good reads to explore what spatial intelligence is and how it's evolving:

1. From Words to Worlds: Spatial Intelligence is AI’s Next Frontier by Fei-Fei Li → https://drfeifei.substack.com/p/from-words-to-worlds-spatial-intelligence
Fei-Fei Li, the godmother of AI, is a key figure in spatial intelligence, since her work in computer vision, especially ImageNet, helped AI learn to recognize and understand objects in space. She's recently started a blog, and this post, in particular, argues that true intelligence requires grounding in space, understanding geometry, motion and consequences in the real world

2. Spatial Reasoning in Multimodal LLMs: A Survey of
Tasks, Benchmarks and Methods → https://arxiv.org/abs/2511.15722
Breaks down how AI models handle spatial reasoning from a cognitive angle, maps all the existing tasks and benchmarks to that framework

3. What is Spatial Intelligence? → https://www.turingpost.com/p/cvhistory5
Our special article easily explains what spatial intelligence actually is, why it matters, and how researchers are trying to boost it so machines can better understand and navigate the physical world

4. From 2D to 3D Cognition: A Brief Survey of General World
Models → https://arxiv.org/pdf/2506.20134
Shows how AI world models are evolving from simple 2D perception to full-on 3D understanding, explaining the tech behind it, what new 3D abilities these models gain, and where they’re used in the real world

Read further below ⬇️
If you like it, also subscribe to the Turing Post: https://www.turingpost.com/subscribe
  • 1 reply
·
Kseniase 
posted an update 21 days ago
view post
Post
6029
12 Types of JEPA

Since Yann LeCun together with Randall Balestriero released a new paper on JEPA (Joint-Embedding Predictive Architecture), laying out its theory and introducing an efficient practical version called LeJEPA, we figured you might need even more JEPA. Here are 7 recent JEPA variants plus 5 iconic ones:

1. LeJEPA → LeJEPA: Provable and Scalable Self-Supervised Learning Without the Heuristics (2511.08544)
Explains a full theory for JEPAs, defining the “ideal” JEPA embedding as an isotropic Gaussian, and proposes the SIGReg objective to push JEPA toward this ideal, resulting in practical LeJEPA

2. JEPA-T → JEPA-T: Joint-Embedding Predictive Architecture with Text Fusion for Image Generation (2510.00974)
A text-to-image model that tokenizes images and captions with a joint predictive Transformer, enhances fusion with cross-attention and text embeddings before training loss, and generates images by iteratively denoising visual tokens conditioned on text

3. Text-JEPA → Speaking in Words, Thinking in Logic: A Dual-Process Framework in QA Systems (2507.20491)
Converts natural language into first-order logic, with a Z3 solver handling reasoning, enabling efficient, explainable QA with far lower compute than large LLMs

4. N-JEPA (Noise-based JEPA) → Improving Joint Embedding Predictive Architecture with Diffusion Noise (2507.15216)
Connects self-supervised learning with diffusion-style noise by using noise-based masking and multi-level schedules, especially improving visual classification

5. SparseJEPA → SparseJEPA: Sparse Representation Learning of Joint Embedding Predictive Architectures (2504.16140)
Adds sparse representation learning to make embeddings more interpretable and efficient. It groups latent variables by shared semantic structure using a sparsity penalty while preserving accuracy

6. TS-JEPA (Time Series JEPA) → Joint Embeddings Go Temporal (2509.25449)
Adapts JEPA to time-series by learning latent self-supervised representations and predicting future latents for robustness to noise and confounders

Read further below ↓
It you like it, also subscribe to the Turing Post: https://www.turingpost.com/subscribe
  • 1 reply
·
Kseniase 
posted an update 28 days ago
view post
Post
4080
7+ Main precision formats used in AI:

Precision is very important in AI as it shapes how accurate and efficient models are. It controls how finely numbers are represented, approximating real-world values with formats like fixed-point and floating-point. A recent BF16 → FP16 study renewed attention to precision impact.
Here are the main precision types used in AI, from full precision for training to ultra-low precision for inference:

1. FP32 (Float32):
Standard full-precision float used in most training: 1 sign bit, 8 exponent bits, 23 mantissa bits. Default for backward-compatible training and baseline numerical stability

2. FP16 (Float16) → https://arxiv.org/abs/2305.10947v6
Half-precision float. It balances accuracy and efficiency. 1 sign bit, 5 exponent bits, 10 mantissa bits. Common on NVIDIA Tensor Cores and mixed-precision setups. There’s now a new wave of using it in reinforcement learning: https://www.turingpost.com/p/fp16

3. BF16 (BFloat16) → https://cloud.google.com/blog/products/ai-machine-learning/bfloat16-the-secret-to-high-performance-on-cloud-tpus
Same dynamic range as FP32 but fewer mantissa bits: 1 sign bit, 8 exponent bits (same as FP32), 7 mantissa bits. It was developed by the research group Google Brain as part of their AI/ML infrastructure work at Google. Preferred on TPUs and modern GPUs

4. FP8 (E4M3 / E5M2) → https://proceedings.neurips.cc/paper_files/paper/2018/file/335d3d1cd7ef05ec77714a215134914c-Paper.pdf
Emerging standard for training and inference on NVIDIA Hopper (H100) and Blackwell (B200) tensor cores and AMD MI300. Also supported in NVIDIA’s Transformer Engine: https://developer.nvidia.com/blog/floating-point-8-an-introduction-to-efficient-lower-precision-ai-training/
E4M3 = 4 exponent, 3 mantissa bits
E5M2 = 5 exponent, 2 mantissa bits

Read further below ⬇️
If you like this, also subscribe to the Turing post: https://www.turingpost.com/subscribe
  • 1 reply
·
Kseniase 
posted an update about 1 month ago
view post
Post
11128
11 Fascinating new Policy Optimization techniques

Policy optimization (PO) algorithms are central to training AI models with preference-based feedback. In recent weeks, numerous new PO methods have emerged that build on or replace the popular PPO and GRPO, solving their issues. Here are 11 of them:

1. BAlanced Policy Optimization (BAPO) → BAPO: Stabilizing Off-Policy Reinforcement Learning for LLMs via Balanced Policy Optimization with Adaptive Clipping (2510.18927)
Dynamically adjusting the clipping bounds in PPO-style updates to balance positive and negative gradients and prevent entropy collapse

2. Training-Free GRPO → Training-Free Group Relative Policy Optimization (2510.08191)
Instead of using numeric rewards, it compares rollouts semantically to distill useful knowledge as a token prior, which is then applied during inference to guide the model’s behavior

3. Asymmetric Importance Sampling Policy Optimization (ASPO) → ASPO: Asymmetric Importance Sampling Policy Optimization (2510.06062)
Fixes imbalanced token weighting in LLM training. It flips the importance sampling ratios for positive tokens to correct over- and under-updates, and adds a soft dual-clipping step to keep gradients stable

4. In-Context Steered Policy Optimization (ICPO) → https://arxiv.org/abs/2510.26519
Uses a model’s own in-context learning ability to guide training with existing data. It combines Mixed-Policy GRPO with Implicit Expert Forcing to expand exploration and adds Expert Region Reject Sampling and Annealed Expert-Bonus Reward Shaping to ensure stability and balanced expert influence

5. Graph-Enhanced Policy Optimization (GEPO) → https://arxiv.org/abs/2510.26270
Builds a graph of an agent’s experiences to understand how different states connect, guide exploration and assign rewards more effectively

6. Information Gain-based Policy Optimization (IGPO) → Information Gain-based Policy Optimization: A Simple and Effective Approach for Multi-Turn LLM Agents (2510.14967)
Uses the model’s own belief updates to create dense, informative feedback for smoother multi-turn learning

Read further below ⬇️
If you like this, also subscribe to the Turing post: https://www.turingpost.com/subscribe
  • 2 replies
·
Kseniase 
posted an update about 1 month ago
view post
Post
847
12 Awesome GitHub repos to upgrade your AI coding

Coding is the field where AI is welcomed with open arms. Here’s a collection to help you take your AI-assisted coding workflows to the next level of convenience and efficiency:

1. Smol Developer → https://github.com/smol-ai/developer
A lightweight AI “junior dev” that takes your product spec and automatically scaffolds or helps you build full codebases

2. Tabby → https://github.com/TabbyML/tabby
A self-hosted AI coding assistant that runs locally as an alternative to GitHub Copilot. Easy to integrate, GPU-friendly, and doesn’t rely on the cloud

3. Beads (bd) Issue Tracker → https://github.com/steveyegge/beads
Gives coding agents long-term memory, letting them organize, plan, and execute complex tasks reliably across sessions

4. MetaGPT → https://github.com/FoundationAgents/MetaGPT
A multi-agent framework that imitates a software company team using LLMs. It assigns AI agents roles like PM, Architect, and Developer to produce user stories, designs, specs, and final code

5. Open Interpreter → https://github.com/openinterpreter/open-interpreter
Gives you ChatGPT’s coding power with full local control – no limits, no sandbox – so you can automate, analyze, and create anything right from your desktop through a chat interface

6. OpenSpec → https://github.com/Fission-AI/OpenSpec
A lightweight, spec-driven development tool that helps humans and AI agree on what to build before any code is written

7. PR-Agent → https://github.com/qodo-ai/pr-agent
An AI code reviewer that automatically reviews, describes, and improves pull requests across GitHub, GitLab, and other platforms

8. BabyAGI → https://github.com/yoheinakajima/babyagi
A self-building AI framework that gives agents the ability to write, manage, and refine their own functions, turning them from passive tools into active, self-building systems

9 ...⬇️

Subscribe to the Turing Post: https://www.turingpost.com/subscribe – your shortcut to deep, clear AI analysis
  • 2 replies
·
Kseniase 
posted an update about 2 months ago
view post
Post
4043
5 Lectures and keynotes defining AI right now

If you want to understand the multifaceted AI landscape in 2025 and see where the field is heading – start with (or revisit) these legendary talks. They can help you capture what’s happening in AI from multiple angles:

1. Andrej Karpathy: Software Is Changing (Again) → https://www.youtube.com/watch?v=LCEmiRjPEtQ
Unveils Software 3.0 – a paradigm where LLMs are the new computers, programmed with prompts instead of code. The key: developers must now master coding, training, and prompting as AI becomes the heart of software building

2. Richard Sutton, The OaK Architecture: A Vision of SuperIntelligence from Experience → https://www.youtube.com/watch?v=gEbbGyNkR2U
Unveils the OaK (Options and Knowledge) architecture – a model-based RL framework for continual intelligence, where every component learns, meta-learns & builds hierarchical abstractions

3. GTC March 2025 Keynote with NVIDIA CEO Jensen Huang → https://www.youtube.com/watch?v=_waPvOwL9Z8
Dives into the accelerated computing and the importance of Physical AI. From the Blackwell GPU architecture & AI factories to breakthroughs in agentic AI & robotics, Jensen Huang explains how NVIDIA aims to power every layer of the AI ecosystem

4. Yann LeCun "Mathematical Obstacles on the Way to Human-Level AI" → https://www.youtube.com/watch?v=ETZfkkv6V7
Yann LeCun always argues we need a new path to machines that reason about the world – not LLMs or RL. So this lecture is about self-supervised systems with world models, planning, memory and energy-based learning

5. Andrew Ng: State of AI Agents → https://www.youtube.com/watch?v=4pYzYmSdSH4
Highlights one of the most pressing topics of 2025 – agents, explaining why most effective AI agents rely on simple, linear workflows built from modular “Lego-brick” tasks + what predicts AI startup success in the new agent era

Subscribe to the Turing Post: https://www.turingpost.com/subscribe –your shortcut to deep, clear AI analysis
Kseniase 
posted an update about 2 months ago
view post
Post
3180
9 Powerful AI Video Generation Tools

Since Sora 2 is on fire these weeks, reminding us what high-quality video generation should look like, we decided you really need this list of video generation tools – great alternatives or complements to it.

1. Sora 2 → https://openai.com/sora/
It needs no introduction, but this OpenAI’s text-to-video model produces short, ultra-realistic clips across styles (cinematic, photorealistic, animated, etc.) with synced audio

2. Google Veo 3 (Gemini Video Generation) → https://aistudio.google.com/models/veo-3
Part of Gemini AI. Generates 8-second high-fidelity videos from text or images with native sound: background soundtracks and realistic voices with near-perfect lip sync

3. Runway (Gen-4 by Runway ML) → https://runwayml.com/
Text, image, or video-to-video generation with advanced editing like changing lighting, weather, camera angles or replacing objects. Popular in AI filmmaking

4. Pika Labs → https://pollo.ai/m/pika-ai
Provides creative, often stylized short videos – from cinematic mini-scenes to cartoon-like animations. Ideal for social media clips and visual storytelling. Plus, you can add playful effects to manipulate objects in the generated videos

5. Luma’s Dream Machine → https://lumalabs.ai/dream-machine
Powered by Luma AI’s latest Ray 3 model, it quickly visualizes story ideas, animated concept art, or abstract motion videos. It supports consistent custom characters and seamless looping

Read further below ⬇️
If you like it, also subscribe to the Turing Post https://www.turingpost.com/subscribe
  • 1 reply
·
Kseniase 
posted an update 2 months ago
view post
Post
3822
8 Emerging trends in Reinforcement Learning

Reinforcement learning is having a moment - and not just this week. Some of its directions are already showing huge promise, while others are still early but exciting. Here’s a look at what’s happening right now in RL:

1. Reinforcement Pre-Training (RPT) → Reinforcement Pre-Training (2506.08007)
Reframes next-token pretraining as RL with verifiable rewards, yielding scalable reasoning gains

2. Reinforcement Learning from Human Feedback (RLHF) → Deep reinforcement learning from human preferences (1706.03741)
The top approach. It trains a model using human preference feedback, building a reward model and then optimizing the policy to generate outputs people prefer

3. Reinforcement Learning with Verifiable Rewards (RLVR) → Reinforcement Learning with Verifiable Rewards Implicitly Incentivizes Correct Reasoning in Base LLMs (2506.14245)
Moves from subjective (human-labeled) rewards to objective ones that can be automatically verified, like in math, code, or rubrics as reward, for example → Reinforcement Learning with Rubric Anchors (2508.12790), Rubrics as Rewards: Reinforcement Learning Beyond Verifiable Domains (2507.17746)

4. Multi-objective RL → Pareto Multi-Objective Alignment for Language Models (2508.07768)
Trains LMs to balance multiple goals at once, like being helpful but also concise or creative, ensuring that improving one goal doesn’t ruin another

5. Parallel thinking RL → Parallel-R1: Towards Parallel Thinking via Reinforcement Learning (2509.07980)
Trains parallel chains of thought, boosting math accuracy and final ceilings. It first teaches the model “parallel thinking” skill on easier problems, then uses RL to refine it on harder ones

Read further below ⬇️
And if you like this, subscribe to the Turing post: https://www.turingpost.com/subscribe

Also, check out our recent guide about the past, present and future of RL: https://www.turingpost.com/p/rlguide
·
Kseniase 
posted an update 2 months ago
view post
Post
4581
12 Excellent MCP Servers

The family of MCP (Model Context Protocol) servers keeps expanding to bridge agents, models, tools, web, data and apps. Here are 12 useful MCP servers that will help you create convenient agentic ecosystems:

1. Chrome DevTools MCP → https://github.com/ChromeDevTools/chrome-devtools-mcp
Lets your coding agent (Gemini, Claude, Cursor, Copilot) control a live Chrome browser with full DevTools access for automation, debugging, and performance analysis

2. Windows-MCP → https://github.com/CursorTouch/Windows-MCP
Provides interaction between agents and Windows, handling file navigation, app control, UI actions, QA testing

3. MCPControl → https://github.com/claude-did-this/MCPControl
Windows control server for programmatic control of mouse, keyboard, window management, and screen capture

4. MetaMCP → https://github.com/metatool-ai/metamcp
A proxy that aggregates multiple MCP servers into one, with middleware support. Works as a standard MCP server for any client

5. MindsDB → https://github.com/mindsdb/mindsdb
Humans, models, agents and apps get accurate answers from large-scale data sources

6. Playwright MCP → https://github.com/microsoft/playwright-mcp
Lets LLMs interact with web pages via structured accessibility snapshots, no need for screenshots or visually-tuned models

7. MCP Access Point → https://github.com/sxhxliang/mcp-access-point
Bridges MCP clients with HTTP services, no server-side changes needed

8. Browserbase MCP Server → https://github.com/browserbase/mcp-server-browserbase
Connects LLMs to external data and tools, adding cloud browser automation via Browserbase and Stagehand. It enables LLMs to browse, capture, extract, and act on web pages with precision

9. Yutu → https://github.com/eat-pray-ai/yutu
Automates YouTube workflows, managing videos, playlists, channels, comments, captions, etc.

3 more below ↓
Also, subscribe to the Turing Post: https://www.turingpost.com/subscribe
  • 1 reply
·
Kseniase 
posted an update 3 months ago
view post
Post
6149
10 awesome advanced LoRA approaches

Low-Rank Adaptation (LoRA) is the go-to method for efficient model fine-tuning that adds small low-rank matrices instead of retraining full models. The field isn’t standing still – new LoRA variants push the limits of efficiency, generalization, and personalization. So we’re sharing 10 of the latest LoRA approaches you should know about:

1. Mixture-of-LoRA-experts → Mixture of Low-Rank Adapter Experts in Generalizable Audio Deepfake Detection (2509.13878)
Adds multiple low-rank adapters (LoRA) into a model’s layers, and a routing mechanism activates the most suitable ones for each input. This lets the model adapt better to new unseen conditions

2. Amortized Bayesian Meta-Learning for LoRA (ABMLL) → Amortized Bayesian Meta-Learning for Low-Rank Adaptation of Large Language Models (2508.14285)
Balances global and task-specific parameters within a Bayesian framework to improve uncertainty calibration and generalization to new tasks without high memory or compute costs

3. AutoLoRA → AutoLoRA: Automatic LoRA Retrieval and Fine-Grained Gated Fusion for Text-to-Image Generation (2508.02107)
Automatically retrieves and dynamically aggregates public LoRAs for stronger T2I generation

4. aLoRA (Activated LoRA) → Activated LoRA: Fine-tuned LLMs for Intrinsics (2504.12397)
Only applies LoRA after invocation, letting the model reuse the base model’s KV cache instead of recomputing the full turn’s KV cache. Efficient in multi-turn conversations

5. LiLoRA (LoRA in LoRA) → LoRA in LoRA: Towards Parameter-Efficient Architecture Expansion for Continual Visual Instruction Tuning (2508.06202)
Shares the LoRA matrix A across tasks and additionally low-rank-decomposes matrix B to cut parameters in continual vision-text MLLMs

6. Sensitivity-LoRA → Sensitivity-LoRA: Low-Load Sensitivity-Based Fine-Tuning for Large Language Models (2509.09119)
Dynamically assigns ranks to weight matrices based on their sensitivity, measured using second-order derivatives

Read further below ↓
Also, subscribe to the Turing Post: https://www.turingpost.com/subscribe
·
Kseniase 
posted an update 3 months ago
view post
Post
6350
6 Recent & free sources to master Reinforcement Learning

Almost every week new research and resources on RL come out. Knowledge needs to be constantly refreshed and updated with the latest trends. So today, we’re sharing 6 free sources to help you stay on track with RL:

1. A Survey of Continual Reinforcement Learning → https://arxiv.org/abs/2506.21872
Covers continual RL (CRL): how agents can keep learning and adapt to new tasks without forgetting past ones. It analyses methods, benchmarks, evaluation metrics &challenges

2. The Deep Reinforcement Learning course by Hugging Face → https://huggingface.co/learn/deep-rl-course/unit0/introduction
This is a popular free course, regularly updated. Includes community interaction, exercises, leaderboards, etc.

3. Reinforcement Learning Specialization (Coursera, University of Alberta) → https://www.coursera.org/specializations/reinforcement-learning
A 4-course series introducing foundational RL, implementing different algorithms, culminating in a capstone. It's a great structured path

4. A Technical Survey of Reinforcement Learning Techniques for LLMs → A Technical Survey of Reinforcement Learning Techniques for Large Language Models (2507.04136)
Looks at how RL is being used for/with LLMs for alignment, reasoning, preference signals, etc. Covers methods like RLHF, RLAIF, DPO, PPO, GRPO & applications from code gen to tool use

5. A Survey of Reinforcement Learning for Software Engineering → https://arxiv.org/abs/2507.12483
Good if you're interested in RL-applied domains. Examines how RL is used in software engineering tasks: maintenance, development, evaluation. Covering 115 papers since DRL introduction, it summarizes trends, gaps & challenges

6. A Survey of Reinforcement Learning for LRMs → https://arxiv.org/abs/2509.08827
Tracks the way from LLMs to LRMs via RL. Covers reward design, policy optimization, use cases and future approaches like continual, memory, model-based RL and more

If you liked this, subscribe to The Turing Post https://www.turingpost.com/subscribe
  • 1 reply
·
Kseniase 
posted an update 3 months ago
view post
Post
7042
10 Latest Preference Optimization Techniques

Models need feedback on what makes outputs “good” or “bad.” Policy optimization (PO) turns preferences and rewards into actual training signals. This field is evolving quickly, moving far beyond classics like PPO and GRPO. So here is our overview of 10 newest PO methods:

1. Pref-GRPO → Pref-GRPO: Pairwise Preference Reward-based GRPO for Stable Text-to-Image Reinforcement Learning (2508.20751)
Stabilizes text-to-image reinforcement learning (RL) with pairwise preference rewards and a unified UNIGENBENCH benchmark

2. PVPO (Policy with Value Preference Optimization) → PVPO: Pre-Estimated Value-Based Policy Optimization for Agentic Reasoning (2508.21104)
This critic-free RL method uses a pre-trained model as a reference anchor to reduce bias and guide learning, selecting high-value examples through data pre-sampling

3. DCPO (Dynamic Clipping Policy Optimization) → DCPO: Dynamic Clipping Policy Optimization (2509.02333)
Uses dynamic clipping, which adjusts probability limits per token for better token exploration, and smooth reward standardization to balance rewards over training steps and prevent wasted updates

4. ARPO (Agentic Reinforced Policy Optimization) → Agentic Reinforced Policy Optimization (2507.19849)
Optimizes multi-turn LLM agents that use external tools. It uses an entropy-based adaptive rollout to explore post-tool use and an advantage attribution method to better assign credit across steps, leading to more efficient tool use with fewer resources

5. GRPO-RoC (Group Relative Policy Optimization with Resampling-on-Correct) → rStar2-Agent: Agentic Reasoning Technical Report (2508.20722)
Oversamples rollouts, then resamples them to keep diverse mistakes and only the highest-quality correct answers. It reduces noises and ends up with stronger reasoning in a code environment

Read further below ⬇️
If you like this, also subscribe to the Turing post: https://www.turingpost.com/subscribe
  • 1 reply
·
Kseniase 
posted an update 3 months ago
view post
Post
474
11 Powerful Image Models

Everyone is buzzing around image generation this week, or more specifically, Google's Nano-Banana. So today we want to share a list of models that can be your great toolkit for image generation + editing + multi-turn refinement.

1. Gemini 2.5 Flash Image, or Nano-Banana →
https://deepmind.google/models/gemini/image/
Google’s newest image model with conversational editing, character consistency, and multi-image fusion. Available in AI Studio and the Gemini API. Price: $2.50 per 1M tokens

2. FLUX (Black Forest Labs) → https://bfl.ai/
A family of models known for rich detail and, excellent prompt adherence, and fast iterative generation. Offered in several variants, from Pro to open-source, it's accessible via Hugging Face, Replicate, Azure AI Foundry, etc., and used as a base in many pipelines. Price: $0.025-0.08 per image

3. Midjourney v7 → https://www.midjourney.com/
Enhanced image fidelity, prompt comprehension, and anatomical coherence (hands, bodies, objects) + provides a smart lightbox editor. The Omni-reference tool improves character and object consistency in your images. It remains accessible via Discord with a supporting web interface. Price: $10-60/month

4. Stable Diffusion 3.5 (Stability AI) → https://stability.ai/stable-image
Open-weights line with improved text rendering, photorealism, and
prompt adherence compared to earlier versions. It introduces technical innovations through its MMDiT architecture. Price: $0.025-0.065 per image

5. OpenAI GPT-Image-1 →https://platform.openai.com/docs/guides/image-generation?image-generation-model=gpt-image-1
It's the same multimodal model that powers ChatGPT's image capabilities, offering high-fidelity image generation, precise edits, including inpainting, and accurate text rendering. Available via the Images API. Price: $40 per 1M tokens

Read further below ⬇️
If you like this, also subscribe to the Turing post: https://www.turingpost.com/subscribe
  • 1 reply
·
fdaudens 
posted an update 4 months ago
view post
Post
6016
Want to learn to build an AI Agent? I put together a cookbook for creating your own news research agent with OpenAI GPT-OSS:

- Searches headlines & specific sites
- Pulls full articles when you need depth
- Summarizes with clickable sources
- Runs in a simple Gradio chat UI
- No GPU, no local setup — just open-weight GPT-OSS models via Hugging Face

If you’ve been wanting to try agents but weren’t sure where to start, this is an end-to-end example you can fork, run, and adapt.

Full guide + code https://huggingface.co/blog/fdaudens/openai-gpt-oss-agent-inference-providers
  • 3 replies
·
fdaudens 
posted an update 4 months ago
view post
Post
560
What can OpenAI’s new open models do with the news? I built a News Agent to find out.

It can answer questions about the news in real time, and every answer comes with original source links so you can dive deeper.

Ask it things like:
- "What are the top news stories today?"
- "What's the latest on artificial intelligence?"
- Follow-up questions on specific stories

Runs with Hugging Face inference providers, letting you compare results from the OpenAI 20B and 120B models

So far, I’m quite impressed by the capabilities of even the smaller 20B model. Definitely not a perfect project, but curious to hear your thoughts!

https://huggingface.co/spaces/fdaudens/gpt-oss-news-agent
  • 2 replies
·
BrigitteTousi 
posted an update 4 months ago
fdaudens 
posted an update 4 months ago
view post
Post
3429
OpenAI’s GPT-OSS has sparked ~400 new models on Hugging Face and racked up 5M downloads in less than a week, already outpacing DeepSeek R1’s first-week numbers.

For comparison: when R1 launched, I tracked 550 derivatives (across 8 base models) in a week, with ~3M downloads. GPT-OSS is ahead on adoption and engagement.

It’s also the most-liked release of any major LLM this summer. The 20B and 120B versions quickly shot past Kimi K2, GLM 4.5, and others in likes.

Most-downloaded GPT-OSS models include LM Studio and Unsloth AI versions:
1️⃣ openai/gpt-oss-20b - 2.0M
2️⃣ lmstudio-community/gpt-oss-20b-MLX-8bit - 750K
3️⃣ openai/gpt-oss-120b - 430K
4️⃣ unsloth/gpt-oss-20b-GGUF - 380K
5️⃣ lmstudio-community/gpt-oss-20b-GGUF - 330K

The 20B version is clearly finding its audience, showing the power of smaller, faster, more memory- and energy-efficient models. (These numbers don’t include calls to the models via inference providers, so the real usage is likely even bigger, especially for the 120B version)

Open-weight models let anyone build on top. Empower the builders, and innovation takes off. 🚀
  • 1 reply
·
Kseniase 
posted an update 4 months ago
view post
Post
3677
6 Must-read books about AI and Machine Learning:

Sharing some free, useful resources for you. In this collection, we’ve gathered the most recent books to give you up-to-date information on key fundamental topics. Hope this helps you master AI and machine learning:

1. Machine Learning Systems by Vijay Janapa Reddi → https://www.mlsysbook.ai/
Provides a framework for building effective ML solutions, covering data engineering, optimization, hardware-aware training, inference acceleration, architecture choice, and other key principles

2. Generative Diffusion Modeling: A Practical Handbook by Zihan Ding, Chi Jin → https://arxiv.org/abs/2412.17162
Offers a unified view of diffusion models: probabilistic, score-based, consistency, rectified flow, pre/post-training. It aligns notations with code to close the “paper-to-code” gap.

3. Geometric Deep Learning: Grids, Groups, Graphs, Geodesics, and Gauges → https://arxiv.org/abs/2104.13478
Explores unified geometric principles to analyze neural networks' architectures: CNNs, RNNs, GNNs, Transformers, and guide the design of the future ones

4. Mathematical Foundations of Geometric Deep Learning by Haitz Saez de Ocariz Borde and Michael Bronstein → https://arxiv.org/abs/2508.02723
Dives into the the key math concepts behind geometric Deep Learning: geometric and analytical structures, vector calculus, differential geometry, etc.

5. Interpretable Machine Learning by Christoph Molnar → https://github.com/christophM/interpretable-ml-book
Practical guide to simple, transparent models (e.g., decision trees) and model-agnostic methods like LIME, Shapley values, permutation importance, and accumulated local effects.

6. Understanding Deep Learning by Simon J.D. Prince → https://udlbook.github.io/udlbook/
Explores core deep learning concenpts: models, training, evaluation, RL, architectures for images, text, and graphs, addressing open theoretical questions

Also, subscribe to the Turing Post: https://www.turingpost.com/subscribe
  • 2 replies
·