Get trending papers in your email inbox once a day!
Get trending papers in your email inbox!
SubscribeUnraveling the Gradient Descent Dynamics of Transformers
While the Transformer architecture has achieved remarkable success across various domains, a thorough theoretical foundation explaining its optimization dynamics is yet to be fully developed. In this study, we aim to bridge this understanding gap by answering the following two core questions: (1) Which types of Transformer architectures allow Gradient Descent (GD) to achieve guaranteed convergence? and (2) Under what initial conditions and architectural specifics does the Transformer achieve rapid convergence during training? By analyzing the loss landscape of a single Transformer layer using Softmax and Gaussian attention kernels, our work provides concrete answers to these questions. Our findings demonstrate that, with appropriate weight initialization, GD can train a Transformer model (with either kernel type) to achieve a global optimal solution, especially when the input embedding dimension is large. Nonetheless, certain scenarios highlight potential pitfalls: training a Transformer using the Softmax attention kernel may sometimes lead to suboptimal local solutions. In contrast, the Gaussian attention kernel exhibits a much favorable behavior. Our empirical study further validate the theoretical findings.
Finding extremal periodic orbits with polynomial optimisation, with application to a nine-mode model of shear flow
Tobasco et al. [Physics Letters A, 382:382-386, 2018; see https://doi.org/10.1016/j.physleta.2017.12.023] recently suggested that trajectories of ODE systems that optimize the infinite-time average of a certain observable can be localized using sublevel sets of a function that arise when bounding such averages using so-called auxiliary functions. In this paper we demonstrate that this idea is viable and allows for the computation of extremal unstable periodic orbits (UPOs) for polynomial ODE systems. First, we prove that polynomial optimization is guaranteed to produce auxiliary functions that yield near-sharp bounds on time averages, which is required in order to localize the extremal orbit accurately. Second, we show that points inside the relevant sublevel sets can be computed efficiently through direct nonlinear optimization. Such points provide good initial conditions for UPO computations. As a proof of concept, we then combine these methods with a single-shooting Newton-Raphson algorithm to study extremal UPOs for a nine-dimensional model of sinusoidally forced shear flow. We discover three previously unknown families of UPOs, one of which simultaneously minimizes the mean energy dissipation rate and maximizes the mean perturbation energy relative to the laminar state for Reynolds numbers approximately between 81.24 and 125.
Suppressing the sample variance of DESI-like galaxy clustering with fast simulations
Ongoing and upcoming galaxy redshift surveys, such as the Dark Energy Spectroscopic Instrument (DESI) survey, will observe vast regions of sky and a wide range of redshifts. In order to model the observations and address various systematic uncertainties, N-body simulations are routinely adopted, however, the number of large simulations with sufficiently high mass resolution is usually limited by available computing time. Therefore, achieving a simulation volume with the effective statistical errors significantly smaller than those of the observations becomes prohibitively expensive. In this study, we apply the Convergence Acceleration by Regression and Pooling (CARPool) method to mitigate the sample variance of the DESI-like galaxy clustering in the AbacusSummit simulations, with the assistance of the quasi-N-body simulations FastPM. Based on the halo occupation distribution (HOD) models, we construct different FastPM galaxy catalogs, including the luminous red galaxies (LRGs), emission line galaxies (ELGs), and quasars, with their number densities and two-point clustering statistics well matched to those of AbacusSummit. We also employ the same initial conditions between AbacusSummit and FastPM to achieve high cross-correlation, as it is useful in effectively suppressing the variance. Our method of reducing noise in clustering is equivalent to performing a simulation with volume larger by a factor of 5 and 4 for LRGs and ELGs, respectively. We also mitigate the standard deviation of the LRG bispectrum with the triangular configurations k_2=2k_1=0.2 h/Mpc by a factor of 1.6. With smaller sample variance on galaxy clustering, we are able to constrain the baryon acoustic oscillations (BAO) scale parameters to higher precision. The CARPool method will be beneficial to better constrain the theoretical systematics of BAO, redshift space distortions (RSD) and primordial non-Gaussianity (NG).
Space and Time Continuous Physics Simulation From Partial Observations
Modern techniques for physical simulations rely on numerical schemes and mesh-refinement methods to address trade-offs between precision and complexity, but these handcrafted solutions are tedious and require high computational power. Data-driven methods based on large-scale machine learning promise high adaptivity by integrating long-range dependencies more directly and efficiently. In this work, we focus on fluid dynamics and address the shortcomings of a large part of the literature, which are based on fixed support for computations and predictions in the form of regular or irregular grids. We propose a novel setup to perform predictions in a continuous spatial and temporal domain while being trained on sparse observations. We formulate the task as a double observation problem and propose a solution with two interlinked dynamical systems defined on, respectively, the sparse positions and the continuous domain, which allows to forecast and interpolate a solution from the initial condition. Our practical implementation involves recurrent GNNs and a spatio-temporal attention observer capable of interpolating the solution at arbitrary locations. Our model not only generalizes to new initial conditions (as standard auto-regressive models do) but also performs evaluation at arbitrary space and time locations. We evaluate on three standard datasets in fluid dynamics and compare to strong baselines, which are outperformed both in classical settings and in the extended new task requiring continuous predictions.
Signal Temporal Logic Neural Predictive Control
Ensuring safety and meeting temporal specifications are critical challenges for long-term robotic tasks. Signal temporal logic (STL) has been widely used to systematically and rigorously specify these requirements. However, traditional methods of finding the control policy under those STL requirements are computationally complex and not scalable to high-dimensional or systems with complex nonlinear dynamics. Reinforcement learning (RL) methods can learn the policy to satisfy the STL specifications via hand-crafted or STL-inspired rewards, but might encounter unexpected behaviors due to ambiguity and sparsity in the reward. In this paper, we propose a method to directly learn a neural network controller to satisfy the requirements specified in STL. Our controller learns to roll out trajectories to maximize the STL robustness score in training. In testing, similar to Model Predictive Control (MPC), the learned controller predicts a trajectory within a planning horizon to ensure the satisfaction of the STL requirement in deployment. A backup policy is designed to ensure safety when our controller fails. Our approach can adapt to various initial conditions and environmental parameters. We conduct experiments on six tasks, where our method with the backup policy outperforms the classical methods (MPC, STL-solver), model-free and model-based RL methods in STL satisfaction rate, especially on tasks with complex STL specifications while being 10X-100X faster than the classical methods.
Exact solutions to the nonlinear dynamics of learning in deep linear neural networks
Despite the widespread practical success of deep learning methods, our theoretical understanding of the dynamics of learning in deep neural networks remains quite sparse. We attempt to bridge the gap between the theory and practice of deep learning by systematically analyzing learning dynamics for the restricted case of deep linear neural networks. Despite the linearity of their input-output map, such networks have nonlinear gradient descent dynamics on weights that change with the addition of each new hidden layer. We show that deep linear networks exhibit nonlinear learning phenomena similar to those seen in simulations of nonlinear networks, including long plateaus followed by rapid transitions to lower error solutions, and faster convergence from greedy unsupervised pretraining initial conditions than from random initial conditions. We provide an analytical description of these phenomena by finding new exact solutions to the nonlinear dynamics of deep learning. Our theoretical analysis also reveals the surprising finding that as the depth of a network approaches infinity, learning speed can nevertheless remain finite: for a special class of initial conditions on the weights, very deep networks incur only a finite, depth independent, delay in learning speed relative to shallow networks. We show that, under certain conditions on the training data, unsupervised pretraining can find this special class of initial conditions, while scaled random Gaussian initializations cannot. We further exhibit a new class of random orthogonal initial conditions on weights that, like unsupervised pre-training, enjoys depth independent learning times. We further show that these initial conditions also lead to faithful propagation of gradients even in deep nonlinear networks, as long as they operate in a special regime known as the edge of chaos.
Neural Implicit Surface Evolution
This work investigates the use of smooth neural networks for modeling dynamic variations of implicit surfaces under the level set equation (LSE). For this, it extends the representation of neural implicit surfaces to the space-time R^3times R, which opens up mechanisms for continuous geometric transformations. Examples include evolving an initial surface towards general vector fields, smoothing and sharpening using the mean curvature equation, and interpolations of initial conditions. The network training considers two constraints. A data term is responsible for fitting the initial condition to the corresponding time instant, usually R^3 times {0}. Then, a LSE term forces the network to approximate the underlying geometric evolution given by the LSE, without any supervision. The network can also be initialized based on previously trained initial conditions, resulting in faster convergence compared to the standard approach.
Beyond Memorization: Extending Reasoning Depth with Recurrence, Memory and Test-Time Compute Scaling
Reasoning is a core capability of large language models, yet understanding how they learn and perform multi-step reasoning remains an open problem. In this study, we explore how different architectures and training methods affect model multi-step reasoning capabilities within a cellular automata framework. By training on state sequences generated with random Boolean functions for random initial conditions to exclude memorization, we demonstrate that most neural architectures learn to abstract the underlying rules. While models achieve high accuracy in next-state prediction, their performance declines sharply if multi-step reasoning is required. We confirm that increasing model depth plays a crucial role for sequential computations. We demonstrate that an extension of the effective model depth with recurrence, memory, and test-time compute scaling substantially enhances reasoning capabilities.
Reduced-Order Neural Operators: Learning Lagrangian Dynamics on Highly Sparse Graphs
We present a neural operator architecture to simulate Lagrangian dynamics, such as fluid flow, granular flows, and elastoplasticity. Traditional numerical methods, such as the finite element method (FEM), suffer from long run times and large memory consumption. On the other hand, approaches based on graph neural networks are faster but still suffer from long computation times on dense graphs, which are often required for high-fidelity simulations. Our model, GIOROM or Graph Interaction Operator for Reduced-Order Modeling, learns temporal dynamics within a reduced-order setting, capturing spatial features from a highly sparse graph representation of the input and generalizing to arbitrary spatial locations during inference. The model is geometry-aware and discretization-agnostic and can generalize to different initial conditions, velocities, and geometries after training. We show that point clouds of the order of 100,000 points can be inferred from sparse graphs with sim1000 points, with negligible change in computation time. We empirically evaluate our model on elastic solids, Newtonian fluids, Non-Newtonian fluids, Drucker-Prager granular flows, and von Mises elastoplasticity. On these benchmarks, our approach results in a 25times speedup compared to other neural network-based physics simulators while delivering high-fidelity predictions of complex physical systems and showing better performance on most benchmarks. The code and the demos are provided at https://github.com/HrishikeshVish/GIOROM.
From Zero to Turbulence: Generative Modeling for 3D Flow Simulation
Simulations of turbulent flows in 3D are one of the most expensive simulations in computational fluid dynamics (CFD). Many works have been written on surrogate models to replace numerical solvers for fluid flows with faster, learned, autoregressive models. However, the intricacies of turbulence in three dimensions necessitate training these models with very small time steps, while generating realistic flow states requires either long roll-outs with many steps and significant error accumulation or starting from a known, realistic flow state - something we aimed to avoid in the first place. Instead, we propose to approach turbulent flow simulation as a generative task directly learning the manifold of all possible turbulent flow states without relying on any initial flow state. For our experiments, we introduce a challenging 3D turbulence dataset of high-resolution flows and detailed vortex structures caused by various objects and derive two novel sample evaluation metrics for turbulent flows. On this dataset, we show that our generative model captures the distribution of turbulent flows caused by unseen objects and generates high-quality, realistic samples amenable for downstream applications without access to any initial state.
Hierarchical cycle-tree packing model for $K$-core attack problem
The K-core of a graph is the unique maximum subgraph within which each vertex connects to K or more other vertices. The optimal K-core attack problem asks to delete the minimum number of vertices from the K-core to induce its complete collapse. A hierarchical cycle-tree packing model is introduced here for this challenging combinatorial optimization problem. We convert the temporally long-range correlated K-core pruning dynamics into locally tree-like static patterns and analyze this model through the replica-symmetric cavity method of statistical physics. A set of coarse-grained belief propagation equations are derived to predict single vertex marginal probabilities efficiently. The associated hierarchical cycle-tree guided attack ({\tt hCTGA}) algorithm is able to construct nearly optimal attack solutions for regular random graphs and Erd\"os-R\'enyi random graphs. Our cycle-tree packing model may also be helpful for constructing optimal initial conditions for other irreversible dynamical processes on sparse random graphs.
Discrete Galerkin Method for Fractional Integro-Differential Equations
In this paper, we develop a fully discrete Galerkin method for solving initial value fractional integro-differential equations(FIDEs). We consider Generalized Jacobi polynomials(GJPs) with indexes corresponding to the number of homogeneous initial conditions as natural basis functions for the approximate solution. The fractional derivatives are used in the Caputo sense. The numerical solvability of algebraic system obtained from implementation of proposed method for a special case of FIDEs is investigated. We also provide a suitable convergence analysis to approximate solutions under a more general regularity assumption on the exact solution.
SgrA* spin and mass estimates through the detection of multiple extremely large mass-ratio inspirals
We analyze the parameter estimation accuracy that can be achieved for the mass and spin of SgrA*, the SMBH in our Galactic Center, by detecting multiple extremely large mass-ratio inspirals (XMRIs). XMRIs are formed by brown dwarfs (BD) inspiraling into a supermassive black hole (SMBH), thus emitting gravitational waves (GWs) inside the detection band of future space-based detectors such as LISA and TianQin. Theoretical estimates suggest the presence of approximately 10 XMRIs emitting detectable GWs, making them some of the most promising candidates for space-based GW detectors. Our analysis indicates that even if individual sources have low SNRs (approx10), high-precision parameter estimates can still be achieved by detecting multiple sources. In this case, the accuracy of the parameter estimates increases by approximately one to two orders of magnitude, at least. Moreover, by analyzing a small sample of 400 initial conditions for XMRIs formed in the Galactic Center, we estimate that almost 80 % of the detectable XMRIs orbiting SgrA* will have eccentricities between 0.43 to 0.95 and an SNRin [10,100]. The remaining sim20 % of the sources have an SNRin [100,1000] and eccentricities ranging from 0.25 to 0.92. Additionally, some XMRIs with high SNR are far from being circular. These loud sources with SNRapprox 1000 can have eccentricities as high as eapprox0.7; although their detection chances are low, representing lesssim2 % of the detectable sources, their presence is not ruled out.
Learning Two-agent Motion Planning Strategies from Generalized Nash Equilibrium for Model Predictive Control
We introduce an Implicit Game-Theoretic MPC (IGT-MPC), a decentralized algorithm for two-agent motion planning that uses a learned value function that predicts the game-theoretic interaction outcomes as the terminal cost-to-go function in a model predictive control (MPC) framework, guiding agents to implicitly account for interactions with other agents and maximize their reward. This approach applies to competitive and cooperative multi-agent motion planning problems which we formulate as constrained dynamic games. Given a constrained dynamic game, we randomly sample initial conditions and solve for the generalized Nash equilibrium (GNE) to generate a dataset of GNE solutions, computing the reward outcome of each game-theoretic interaction from the GNE. The data is used to train a simple neural network to predict the reward outcome, which we use as the terminal cost-to-go function in an MPC scheme. We showcase emerging competitive and coordinated behaviors using IGT-MPC in scenarios such as two-vehicle head-to-head racing and un-signalized intersection navigation. IGT-MPC offers a novel method integrating machine learning and game-theoretic reasoning into model-based decentralized multi-agent motion planning.
Calibrated Chaos: Variance Between Runs of Neural Network Training is Harmless and Inevitable
Typical neural network trainings have substantial variance in test-set performance between repeated runs, impeding hyperparameter comparison and training reproducibility. We present the following results towards understanding this variation. (1) Despite having significant variance on their test-sets, we demonstrate that standard CIFAR-10 and ImageNet trainings have very little variance in their performance on the test-distributions from which those test-sets are sampled, suggesting that variance is less of a practical issue than previously thought. (2) We present a simplifying statistical assumption which closely approximates the structure of the test-set accuracy distribution. (3) We argue that test-set variance is inevitable in the following two senses. First, we show that variance is largely caused by high sensitivity of the training process to initial conditions, rather than by specific sources of randomness like the data order and augmentations. Second, we prove that variance is unavoidable given the observation that ensembles of trained networks are well-calibrated. (4) We conduct preliminary studies of distribution-shift, fine-tuning, data augmentation and learning rate through the lens of variance between runs.
Nonlinear Deterministic Filter for Inertial Navigation and Bias Estimation with Guaranteed Performance
Unmanned vehicle navigation concerns estimating attitude, position, and linear velocity of the vehicle the six degrees of freedom (6 DoF). It has been known that the true navigation dynamics are highly nonlinear modeled on the Lie Group of SE_{2}(3). In this paper, a nonlinear filter for inertial navigation is proposed. The filter ensures systematic convergence of the error components starting from almost any initial condition. Also, the errors converge asymptotically to the origin. Experimental results validates the robustness of the proposed filter.
Probabilistic Partitive Partitioning (PPP)
Clustering is a NP-hard problem. Thus, no optimal algorithm exists, heuristics are applied to cluster the data. Heuristics can be very resource-intensive, if not applied properly. For substantially large data sets computational efficiencies can be achieved by reducing the input space if a minimal loss of information can be achieved. Clustering algorithms, in general, face two common problems: 1) these converge to different settings with different initial conditions and; 2) the number of clusters has to be arbitrarily decided beforehand. This problem has become critical in the realm of big data. Recently, clustering algorithms have emerged which can speedup computations using parallel processing over the grid but face the aforementioned problems. Goals: Our goals are to find methods to cluster data which: 1) guarantee convergence to the same settings irrespective of the initial conditions; 2) eliminate the need to establish the number of clusters beforehand, and 3) can be applied to cluster large datasets. Methods: We introduce a method that combines probabilistic and combinatorial clustering methods to produce repeatable and compact clusters that are not sensitive to initial conditions. This method harnesses the power of k-means (a combinatorial clustering method) to cluster/partition very large dimensional datasets and uses the Gaussian Mixture Model (a probabilistic clustering method) to validate the k-means partitions. Results: We show that this method produces very compact clusters that are not sensitive to initial conditions. This method can be used to identify the most 'separable' set in a dataset which increases the 'clusterability' of a dataset. This method also eliminates the need to specify the number of clusters in advance.
Mamba Integrated with Physics Principles Masters Long-term Chaotic System Forecasting
Long-term forecasting of chaotic systems from short-term observations remains a fundamental and underexplored challenge due to the intrinsic sensitivity to initial conditions and the complex geometry of strange attractors. Existing approaches often rely on long-term training data or focus on short-term sequence correlations, struggling to maintain predictive stability and dynamical coherence over extended horizons. We propose PhyxMamba, a novel framework that integrates a Mamba-based state-space model with physics-informed principles to capture the underlying dynamics of chaotic systems. By reconstructing the attractor manifold from brief observations using time-delay embeddings, PhyxMamba extracts global dynamical features essential for accurate forecasting. Our generative training scheme enables Mamba to replicate the physical process, augmented by multi-token prediction and attractor geometry regularization for physical constraints, enhancing prediction accuracy and preserving key statistical invariants. Extensive evaluations on diverse simulated and real-world chaotic systems demonstrate that PhyxMamba delivers superior long-term forecasting and faithfully captures essential dynamical invariants from short-term data. This framework opens new avenues for reliably predicting chaotic systems under observation-scarce conditions, with broad implications across climate science, neuroscience, epidemiology, and beyond. Our code is open-source at https://github.com/tsinghua-fib-lab/PhyxMamba.
PolyPythias: Stability and Outliers across Fifty Language Model Pre-Training Runs
The stability of language model pre-training and its effects on downstream performance are still understudied. Prior work shows that the training process can yield significantly different results in response to slight variations in initial conditions, e.g., the random seed. Crucially, the research community still lacks sufficient resources and tools to systematically investigate pre-training stability, particularly for decoder-only language models. We introduce the PolyPythias, a set of 45 new training runs for the Pythia model suite: 9 new seeds across 5 model sizes, from 14M to 410M parameters, resulting in about 7k new checkpoints that we release. Using these new 45 training runs, in addition to the 5 already available, we study the effects of different initial conditions determined by the seed -- i.e., parameters' initialisation and data order -- on (i) downstream performance, (ii) learned linguistic representations, and (iii) emergence of training phases. In addition to common scaling behaviours, our analyses generally reveal highly consistent training dynamics across both model sizes and initial conditions. Further, the new seeds for each model allow us to identify outlier training runs and delineate their characteristics. Our findings show the potential of using these methods to predict training stability.
FuXi Weather: A data-to-forecast machine learning system for global weather
Weather forecasting traditionally relies on numerical weather prediction (NWP) systems that integrates global observational systems, data assimilation (DA), and forecasting models. Despite steady improvements in forecast accuracy over recent decades, further advances are increasingly constrained by high computational costs, the underutilization of vast observational datasets, and the challenges of obtaining finer resolution. These limitations, alongside the uneven distribution of observational networks, result in global disparities in forecast accuracy, leaving some regions vulnerable to extreme weather. Recent advances in machine learning present a promising alternative, providing more efficient and accurate forecasts using the same initial conditions as NWP. However, current machine learning models still depend on the initial conditions generated by NWP systems, which require extensive computational resources and expertise. Here we introduce FuXi Weather, a machine learning weather forecasting system that assimilates data from multiple satellites. Operating on a 6-hourly DA and forecast cycle, FuXi Weather generates reliable and accurate 10-day global weather forecasts at a spatial resolution of 0.25^circ. FuXi Weather is the first system to achieve all-grid, all-surface, all-channel, and all-sky DA and forecasting, extending skillful forecast lead times beyond those of the European Centre for Medium-range Weather Forecasts (ECMWF) high-resolution forecasts (HRES) while using significantly fewer observations. FuXi Weather consistently outperforms ECMWF HRES in observation-sparse regions, such as central Africa, demonstrating its potential to improve forecasts where observational infrastructure is limited.
Lagrangian PINNs: A causality-conforming solution to failure modes of physics-informed neural networks
Physics-informed neural networks (PINNs) leverage neural-networks to find the solutions of partial differential equation (PDE)-constrained optimization problems with initial conditions and boundary conditions as soft constraints. These soft constraints are often considered to be the sources of the complexity in the training phase of PINNs. Here, we demonstrate that the challenge of training (i) persists even when the boundary conditions are strictly enforced, and (ii) is closely related to the Kolmogorov n-width associated with problems demonstrating transport, convection, traveling waves, or moving fronts. Given this realization, we describe the mechanism underlying the training schemes such as those used in eXtended PINNs (XPINN), curriculum regularization, and sequence-to-sequence learning. For an important category of PDEs, i.e., governed by non-linear convection-diffusion equation, we propose reformulating PINNs on a Lagrangian frame of reference, i.e., LPINNs, as a PDE-informed solution. A parallel architecture with two branches is proposed. One branch solves for the state variables on the characteristics, and the second branch solves for the low-dimensional characteristics curves. The proposed architecture conforms to the causality innate to the convection, and leverages the direction of travel of the information in the domain. Finally, we demonstrate that the loss landscapes of LPINNs are less sensitive to the so-called "complexity" of the problems, compared to those in the traditional PINNs in the Eulerian framework.
On the local analyticity for the Euler equations
In this paper, we study the existence and uniqueness of solutions to the Euler equations with initial conditions that exhibit analytic regularity near the boundary and Sobolev regularity away from it. A key contribution of this work is the introduction of the diamond-analyticity framework, which captures the spatial decay of the analyticity radius in a structured manner, improving upon uniform analyticity approaches. We employ the Leray projection and a nonstandard mollification technique to demonstrate that the quotient between the imaginary and real parts of the analyticity radius remains unrestricted, thus extending the analyticity persistence results beyond traditional constraints. Our methodology combines analytic-Sobolev estimates with an iterative scheme which is nonstandard in the Cauchy-Kowalevskaya framework, ensuring rigorous control over the evolution of the solution. These results contribute to a deeper understanding of the interplay between analyticity and boundary effects in fluid equations. They might have implications for the study of the inviscid limit of the Navier-Stokes equations and the role of complex singularities in fluid dynamics.
Diffusion Generative Inverse Design
Inverse design refers to the problem of optimizing the input of an objective function in order to enact a target outcome. For many real-world engineering problems, the objective function takes the form of a simulator that predicts how the system state will evolve over time, and the design challenge is to optimize the initial conditions that lead to a target outcome. Recent developments in learned simulation have shown that graph neural networks (GNNs) can be used for accurate, efficient, differentiable estimation of simulator dynamics, and support high-quality design optimization with gradient- or sampling-based optimization procedures. However, optimizing designs from scratch requires many expensive model queries, and these procedures exhibit basic failures on either non-convex or high-dimensional problems.In this work, we show how denoising diffusion models (DDMs) can be used to solve inverse design problems efficiently and propose a particle sampling algorithm for further improving their efficiency. We perform experiments on a number of fluid dynamics design challenges, and find that our approach substantially reduces the number of calls to the simulator compared to standard techniques.
FRESA:Feedforward Reconstruction of Personalized Skinned Avatars from Few Images
We present a novel method for reconstructing personalized 3D human avatars with realistic animation from only a few images. Due to the large variations in body shapes, poses, and cloth types, existing methods mostly require hours of per-subject optimization during inference, which limits their practical applications. In contrast, we learn a universal prior from over a thousand clothed humans to achieve instant feedforward generation and zero-shot generalization. Specifically, instead of rigging the avatar with shared skinning weights, we jointly infer personalized avatar shape, skinning weights, and pose-dependent deformations, which effectively improves overall geometric fidelity and reduces deformation artifacts. Moreover, to normalize pose variations and resolve coupled ambiguity between canonical shapes and skinning weights, we design a 3D canonicalization process to produce pixel-aligned initial conditions, which helps to reconstruct fine-grained geometric details. We then propose a multi-frame feature aggregation to robustly reduce artifacts introduced in canonicalization and fuse a plausible avatar preserving person-specific identities. Finally, we train the model in an end-to-end framework on a large-scale capture dataset, which contains diverse human subjects paired with high-quality 3D scans. Extensive experiments show that our method generates more authentic reconstruction and animation than state-of-the-arts, and can be directly generalized to inputs from casually taken phone photos. Project page and code is available at https://github.com/rongakowang/FRESA.
FuXi-ENS: A machine learning model for medium-range ensemble weather forecasting
Ensemble forecasting is crucial for improving weather predictions, especially for forecasts of extreme events. Constructing an ensemble prediction system (EPS) based on conventional NWP models is highly computationally expensive. ML models have emerged as valuable tools for deterministic weather forecasts, providing forecasts with significantly reduced computational requirements and even surpassing the forecast performance of traditional NWP models. However, challenges arise when applying ML models to ensemble forecasting. Recent ML models, such as GenCast and SEEDS model, rely on the ERA5 EDA or operational NWP ensemble members for forecast generation. Their spatial resolution is also considered too coarse for many applications. To overcome these limitations, we introduce FuXi-ENS, an advanced ML model designed to deliver 6-hourly global ensemble weather forecasts up to 15 days. This model runs at a significantly increased spatial resolution of 0.25\textdegree, incorporating 5 atmospheric variables at 13 pressure levels, along with 13 surface variables. By leveraging the inherent probabilistic nature of Variational AutoEncoder (VAE), FuXi-ENS optimizes a loss function that combines the CRPS and the KL divergence between the predicted and target distribution, facilitating the incorporation of flow-dependent perturbations in both initial conditions and forecast. This innovative approach makes FuXi-ENS an advancement over the traditional ones that use L1 loss combined with the KL loss in standard VAE models for ensemble weather forecasting. Results demonstrate that FuXi-ENS outperforms ensemble forecasts from the ECMWF, a world leading NWP model, in the CRPS of 98.1% of 360 variable and forecast lead time combinations. This achievement underscores the potential of the FuXi-ENS model to enhance ensemble weather forecasts, offering a promising direction for further development in this field.
Straight to Zero: Why Linearly Decaying the Learning Rate to Zero Works Best for LLMs
LLMs are commonly trained with a learning rate (LR) warmup, followed by cosine decay to 10% of the maximum (10x decay). In a large-scale empirical study, we show that under an optimal peak LR, a simple linear decay-to-zero (D2Z) schedule consistently outperforms other schedules when training at compute-optimal dataset sizes. D2Z is superior across a range of model sizes, batch sizes, datasets, and vocabularies. Benefits increase as dataset size increases. Leveraging a novel interpretation of AdamW as an exponential moving average of weight updates, we show how linear D2Z optimally balances the demands of early training (moving away from initial conditions) and late training (averaging over more updates in order to mitigate gradient noise). In experiments, a 610M-parameter model trained for 80 tokens-per-parameter (TPP) using D2Z achieves lower loss than when trained for 200 TPP using 10x decay, corresponding to an astonishing 60% compute savings. Models such as Llama2-7B, trained for 286 TPP with 10x decay, could likely have saved a majority of compute by training with D2Z.
Towards an end-to-end artificial intelligence driven global weather forecasting system
The weather forecasting system is important for science and society, and significant achievements have been made in applying artificial intelligence (AI) to medium-range weather forecasting. However, existing AI-based weather forecasting models rely on analysis or reanalysis products from traditional numerical weather prediction (NWP) systems as initial conditions for making predictions. Initial states are typically generated by traditional data assimilation components, which are computational expensive and time-consuming. Here we present an AI-based data assimilation model, i.e., Adas, for global weather variables. By introducing the confidence matrix, Adas employs gated convolution to handle sparse observations and gated cross-attention for capturing the interactions between the background and observations. Further, we combine Adas with the advanced AI-based forecasting model (i.e., FengWu) to construct the first end-to-end AI-based global weather forecasting system: FengWu-Adas. We demonstrate that Adas can assimilate global observations to produce high-quality analysis, enabling the system operate stably for long term. Moreover, we are the first to apply the methods to real-world scenarios, which is more challenging and has considerable practical application potential. We have also achieved the forecasts based on the analyses generated by AI with a skillful forecast lead time exceeding that of the IFS for the first time.
General Covariance Data Augmentation for Neural PDE Solvers
The growing body of research shows how to replace classical partial differential equation (PDE) integrators with neural networks. The popular strategy is to generate the input-output pairs with a PDE solver, train the neural network in the regression setting, and use the trained model as a cheap surrogate for the solver. The bottleneck in this scheme is the number of expensive queries of a PDE solver needed to generate the dataset. To alleviate the problem, we propose a computationally cheap augmentation strategy based on general covariance and simple random coordinate transformations. Our approach relies on the fact that physical laws are independent of the coordinate choice, so the change in the coordinate system preserves the type of a parametric PDE and only changes PDE's data (e.g., initial conditions, diffusion coefficient). For tried neural networks and partial differential equations, proposed augmentation improves test error by 23% on average. The worst observed result is a 17% increase in test error for multilayer perceptron, and the best case is a 80% decrease for dilated residual network.
Constructor Theory of Thermodynamics
All current formulations of thermodynamics invoke some form of coarse-graining or ensembles as the supposed link between their own laws and the microscopic laws of motion. They deal only with ensemble-averages, expectation values, macroscopic limits, infinite heat baths, etc., not with the details of physical variables of individual microscopic systems. They are consistent with the laws of motion for finite systems only in certain approximations, which improve with increasing scale, given various assumptions about initial conditions which are neither specified precisely nor even thought to hold exactly in nature. Here I propose a new formulation of the zeroth, first and second laws, improving upon the axiomatic approach to thermodynamics (Carath\'eodory, 1909; Lieb & Yngvason, 1999), via the principles of the recently proposed constructor theory. Specifically, I provide a non-approximative, scale-independent formulation of 'adiabatic accessibility'; this in turn provides a non-approximative, scale-independent distinction between work and heat and reveals an unexpected connection between information theory and the first law of thermodynamics (not just the second). It also achieves the long-sought unification of the axiomatic approach with Kelvin's.
Cultural Evolution of Cooperation among LLM Agents
Large language models (LLMs) provide a compelling foundation for building generally-capable AI agents. These agents may soon be deployed at scale in the real world, representing the interests of individual humans (e.g., AI assistants) or groups of humans (e.g., AI-accelerated corporations). At present, relatively little is known about the dynamics of multiple LLM agents interacting over many generations of iterative deployment. In this paper, we examine whether a "society" of LLM agents can learn mutually beneficial social norms in the face of incentives to defect, a distinctive feature of human sociality that is arguably crucial to the success of civilization. In particular, we study the evolution of indirect reciprocity across generations of LLM agents playing a classic iterated Donor Game in which agents can observe the recent behavior of their peers. We find that the evolution of cooperation differs markedly across base models, with societies of Claude 3.5 Sonnet agents achieving significantly higher average scores than Gemini 1.5 Flash, which, in turn, outperforms GPT-4o. Further, Claude 3.5 Sonnet can make use of an additional mechanism for costly punishment to achieve yet higher scores, while Gemini 1.5 Flash and GPT-4o fail to do so. For each model class, we also observe variation in emergent behavior across random seeds, suggesting an understudied sensitive dependence on initial conditions. We suggest that our evaluation regime could inspire an inexpensive and informative new class of LLM benchmarks, focussed on the implications of LLM agent deployment for the cooperative infrastructure of society.
LifeGPT: Topology-Agnostic Generative Pretrained Transformer Model for Cellular Automata
The Game of Life (Life), a well known algorithm within the broader class of cellular automata (CA), exhibits complex emergent dynamics, with extreme sensitivity to initial conditions. Modeling and predicting such intricate behavior without explicit knowledge of the system's underlying topology presents a significant challenge, motivating the development of algorithms that can generalize across various grid configurations and boundary conditions. We develop a decoder-only generative pretrained transformer model to solve this problem, showing that our model can simulate Life on a toroidal grid with no prior knowledge on the size of the grid, or its periodic boundary conditions (LifeGPT). LifeGPT is topology-agnostic with respect to its training data and our results show that a GPT model is capable of capturing the deterministic rules of a Turing-complete system with near-perfect accuracy, given sufficiently diverse training data. We also introduce the idea of an `autoregressive autoregressor' to recursively implement Life using LifeGPT. Our results pave the path towards true universal computation within a large language model (LLM) framework, synthesizing of mathematical analysis with natural language processing, and probing AI systems for situational awareness about the evolution of such algorithms without ever having to compute them. Similar GPTs could potentially solve inverse problems in multicellular self-assembly by extracting CA-compatible rulesets from real-world biological systems to create new predictive models, which would have significant consequences for the fields of bioinspired materials, tissue engineering, and architected materials design.
Comprehensive study of magnetic field evolution in relativistic jets based on 2D simulations
We use two-dimensional particle-in-cell simulations to investigate the generation and evolution of the magnetic field associated with the propagation of a jet for various initial conditions. We demonstrate that, in general, the magnetic field is initially grown by the Weibel and Mushroom instabilities. However, the field is saturated by the Alfv'en current limit. For initially non-magnetized plasma, we show that the growth of the magnetic field is delayed when the matter density of the jet environment is lower, which are in agreement with simple analytical predictions. We show that the higher Lorentz factor (gtrsim 2) prevents rapid growth of the magnetic fields. When the initial field is troidal, the position of the magnetic filaments moves away from the jet as the field strength increases. The axial initial field helps the jet maintain its shape more effectively than the troidal initial field.
The rise of data-driven weather forecasting
Data-driven modeling based on machine learning (ML) is showing enormous potential for weather forecasting. Rapid progress has been made with impressive results for some applications. The uptake of ML methods could be a game-changer for the incremental progress in traditional numerical weather prediction (NWP) known as the 'quiet revolution' of weather forecasting. The computational cost of running a forecast with standard NWP systems greatly hinders the improvements that can be made from increasing model resolution and ensemble sizes. An emerging new generation of ML models, developed using high-quality reanalysis datasets like ERA5 for training, allow forecasts that require much lower computational costs and that are highly-competitive in terms of accuracy. Here, we compare for the first time ML-generated forecasts with standard NWP-based forecasts in an operational-like context, initialized from the same initial conditions. Focusing on deterministic forecasts, we apply common forecast verification tools to assess to what extent a data-driven forecast produced with one of the recently developed ML models (PanguWeather) matches the quality and attributes of a forecast from one of the leading global NWP systems (the ECMWF IFS). The results are very promising, with comparable skill for both global metrics and extreme events, when verified against both the operational analysis and synoptic observations. Increasing forecast smoothness and bias drift with forecast lead time are identified as current drawbacks of ML-based forecasts. A new NWP paradigm is emerging relying on inference from ML models and state-of-the-art analysis and reanalysis datasets for forecast initialization and model training.
Forecasting Global Weather with Graph Neural Networks
We present a data-driven approach for forecasting global weather using graph neural networks. The system learns to step forward the current 3D atmospheric state by six hours, and multiple steps are chained together to produce skillful forecasts going out several days into the future. The underlying model is trained on reanalysis data from ERA5 or forecast data from GFS. Test performance on metrics such as Z500 (geopotential height) and T850 (temperature) improves upon previous data-driven approaches and is comparable to operational, full-resolution, physical models from GFS and ECMWF, at least when evaluated on 1-degree scales and when using reanalysis initial conditions. We also show results from connecting this data-driven model to live, operational forecasts from GFS.
ConFIG: Towards Conflict-free Training of Physics Informed Neural Networks
The loss functions of many learning problems contain multiple additive terms that can disagree and yield conflicting update directions. For Physics-Informed Neural Networks (PINNs), loss terms on initial/boundary conditions and physics equations are particularly interesting as they are well-established as highly difficult tasks. To improve learning the challenging multi-objective task posed by PINNs, we propose the ConFIG method, which provides conflict-free updates by ensuring a positive dot product between the final update and each loss-specific gradient. It also maintains consistent optimization rates for all loss terms and dynamically adjusts gradient magnitudes based on conflict levels. We additionally leverage momentum to accelerate optimizations by alternating the back-propagation of different loss terms. We provide a mathematical proof showing the convergence of the ConFIG method, and it is evaluated across a range of challenging PINN scenarios. ConFIG consistently shows superior performance and runtime compared to baseline methods. We also test the proposed method in a classic multi-task benchmark, where the ConFIG method likewise exhibits a highly promising performance. Source code is available at https://tum-pbs.github.io/ConFIG
Executable Functional Abstractions: Inferring Generative Programs for Advanced Math Problems
Scientists often infer abstract procedures from specific instances of problems and use the abstractions to generate new, related instances. For example, programs encoding the formal rules and properties of a system have been useful in fields ranging from RL (procedural environments) to physics (simulation engines). These programs can be seen as functions which execute to different outputs based on their parameterizations (e.g., gridworld configuration or initial physical conditions). We introduce the term EFA (Executable Functional Abstraction) to denote such programs for math problems. EFA-like constructs have been shown to be useful for math reasoning as problem generators for stress-testing models. However, prior work has been limited to abstractions for grade-school math (whose simple rules are easy to encode in programs), while generating EFAs for advanced math has thus far required human engineering. We explore the automatic construction of EFAs for advanced math problems. We operationalize the task of automatically constructing EFAs as a program synthesis task, and develop EFAGen, which conditions an LLM on a seed math problem and its step-by-step solution to generate candidate EFA programs that are faithful to the generalized problem and solution class underlying the seed problem. Furthermore, we formalize properties any valid EFA must possess in terms of executable unit tests, and show how the tests can be used as verifiable rewards to train LLMs to become better writers of EFAs. We demonstrate that EFAs constructed by EFAGen behave rationally by remaining faithful to seed problems, produce learnable problem variations, and that EFAGen can infer EFAs across multiple diverse sources of competition-level math problems. Finally, we show downstream uses of model-written EFAs e.g. finding problem variations that are harder or easier for a learner to solve, as well as data generation.
PINNACLE: PINN Adaptive ColLocation and Experimental points selection
Physics-Informed Neural Networks (PINNs), which incorporate PDEs as soft constraints, train with a composite loss function that contains multiple training point types: different types of collocation points chosen during training to enforce each PDE and initial/boundary conditions, and experimental points which are usually costly to obtain via experiments or simulations. Training PINNs using this loss function is challenging as it typically requires selecting large numbers of points of different types, each with different training dynamics. Unlike past works that focused on the selection of either collocation or experimental points, this work introduces PINN Adaptive ColLocation and Experimental points selection (PINNACLE), the first algorithm that jointly optimizes the selection of all training point types, while automatically adjusting the proportion of collocation point types as training progresses. PINNACLE uses information on the interaction among training point types, which had not been considered before, based on an analysis of PINN training dynamics via the Neural Tangent Kernel (NTK). We theoretically show that the criterion used by PINNACLE is related to the PINN generalization error, and empirically demonstrate that PINNACLE is able to outperform existing point selection methods for forward, inverse, and transfer learning problems.
Learning Neural Constitutive Laws From Motion Observations for Generalizable PDE Dynamics
We propose a hybrid neural network (NN) and PDE approach for learning generalizable PDE dynamics from motion observations. Many NN approaches learn an end-to-end model that implicitly models both the governing PDE and constitutive models (or material models). Without explicit PDE knowledge, these approaches cannot guarantee physical correctness and have limited generalizability. We argue that the governing PDEs are often well-known and should be explicitly enforced rather than learned. Instead, constitutive models are particularly suitable for learning due to their data-fitting nature. To this end, we introduce a new framework termed "Neural Constitutive Laws" (NCLaw), which utilizes a network architecture that strictly guarantees standard constitutive priors, including rotation equivariance and undeformed state equilibrium. We embed this network inside a differentiable simulation and train the model by minimizing a loss function based on the difference between the simulation and the motion observation. We validate NCLaw on various large-deformation dynamical systems, ranging from solids to fluids. After training on a single motion trajectory, our method generalizes to new geometries, initial/boundary conditions, temporal ranges, and even multi-physics systems. On these extremely out-of-distribution generalization tasks, NCLaw is orders-of-magnitude more accurate than previous NN approaches. Real-world experiments demonstrate our method's ability to learn constitutive laws from videos.
Full Transport General Relativistic Radiation Magnetohydrodynamics for Nucleosynthesis in Collapsars
We model a compact black hole-accretion disk system in the collapsar scenario with full transport, frequency dependent, general relativistic radiation magnetohydrodynamics. We examine whether or not winds from a collapsar disk can undergo rapid neutron capture (r-process) nucleosynthesis and significantly contribute to solar r-process abundances. We find the inclusion of accurate transport has significant effects on outflows, raising the electron fraction above Y_{rm e} sim 0.3 and preventing third peak r-process material from being synthesized. We analyze the time-evolution of neutrino processes and electron fraction in the disk and present a simple one-dimensional model for the vertical structure that emerges. We compare our simulation to semi-analytic expectations and argue that accurate neutrino transport and realistic initial and boundary conditions are required to capture the dynamics and nucleosynthetic outcome of a collapsar.
Gaussian Process Priors for Systems of Linear Partial Differential Equations with Constant Coefficients
Partial differential equations (PDEs) are important tools to model physical systems, and including them into machine learning models is an important way of incorporating physical knowledge. Given any system of linear PDEs with constant coefficients, we propose a family of Gaussian process (GP) priors, which we call EPGP, such that all realizations are exact solutions of this system. We apply the Ehrenpreis-Palamodov fundamental principle, which works like a non-linear Fourier transform, to construct GP kernels mirroring standard spectral methods for GPs. Our approach can infer probable solutions of linear PDE systems from any data such as noisy measurements, or pointwise defined initial and boundary conditions. Constructing EPGP-priors is algorithmic, generally applicable, and comes with a sparse version (S-EPGP) that learns the relevant spectral frequencies and works better for big data sets. We demonstrate our approach on three families of systems of PDE, the heat equation, wave equation, and Maxwell's equations, where we improve upon the state of the art in computation time and precision, in some experiments by several orders of magnitude.
BuzzSet v1.0: A Dataset for Pollinator Detection in Field Conditions
Pollinator insects such as honeybees and bumblebees are vital to global food production and ecosystem stability, yet their populations are declining due to increasing anthropogenic and environmental stressors. To support scalable, automated pollinator monitoring, we introduce BuzzSet, a new large-scale dataset of high-resolution pollinator images collected in real agricultural field conditions. BuzzSet contains 7856 manually verified and labeled images, with over 8000 annotated instances across three classes: honeybees, bumblebees, and unidentified insects. Initial annotations were generated using a YOLOv12 model trained on external data and refined via human verification using open-source labeling tools. All images were preprocessed into 256~times~256 tiles to improve the detection of small insects. We provide strong baselines using the RF-DETR transformer-based object detector. The model achieves high F1-scores of 0.94 and 0.92 for honeybee and bumblebee classes, respectively, with confusion matrix results showing minimal misclassification between these categories. The unidentified class remains more challenging due to label ambiguity and lower sample frequency, yet still contributes useful insights for robustness evaluation. Overall detection quality is strong, with a best mAP@0.50 of 0.559. BuzzSet offers a valuable benchmark for small object detection, class separation under label noise, and ecological computer vision.
Synthetic Shifts to Initial Seed Vector Exposes the Brittle Nature of Latent-Based Diffusion Models
Recent advances in Conditional Diffusion Models have led to substantial capabilities in various domains. However, understanding the impact of variations in the initial seed vector remains an underexplored area of concern. Particularly, latent-based diffusion models display inconsistencies in image generation under standard conditions when initialized with suboptimal initial seed vectors. To understand the impact of the initial seed vector on generated samples, we propose a reliability evaluation framework that evaluates the generated samples of a diffusion model when the initial seed vector is subjected to various synthetic shifts. Our results indicate that slight manipulations to the initial seed vector of the state-of-the-art Stable Diffusion (Rombach et al., 2022) can lead to significant disturbances in the generated samples, consequently creating images without the effect of conditioning variables. In contrast, GLIDE (Nichol et al., 2022) stands out in generating reliable samples even when the initial seed vector is transformed. Thus, our study sheds light on the importance of the selection and the impact of the initial seed vector in the latent-based diffusion model.
Adversarial Classification: Necessary conditions and geometric flows
We study a version of adversarial classification where an adversary is empowered to corrupt data inputs up to some distance varepsilon, using tools from variational analysis. In particular, we describe necessary conditions associated with the optimal classifier subject to such an adversary. Using the necessary conditions, we derive a geometric evolution equation which can be used to track the change in classification boundaries as varepsilon varies. This evolution equation may be described as an uncoupled system of differential equations in one dimension, or as a mean curvature type equation in higher dimension. In one dimension, and under mild assumptions on the data distribution, we rigorously prove that one can use the initial value problem starting from varepsilon=0, which is simply the Bayes classifier, in order to solve for the global minimizer of the adversarial problem for small values of varepsilon. In higher dimensions we provide a similar result, albeit conditional to the existence of regular solutions of the initial value problem. In the process of proving our main results we obtain a result of independent interest connecting the original adversarial problem with an optimal transport problem under no assumptions on whether classes are balanced or not. Numerical examples illustrating these ideas are also presented.
Existence-Uniqueness Theory and Small-Data Decay for a Reaction-Diffusion Model of Wildfire Spread
I examine some analytical properties of a nonlinear reaction-diffusion system that has been used to model the propagation of a wildfire. I establish global-in-time existence and uniqueness of bounded mild solutions to the Cauchy problem for this system given bounded initial data. In particular, this shows that the model does not allow for thermal blow-up. If the initial temperature and fuel density also satisfy certain integrability conditions, the L^2-norms of these global solutions are uniformly bounded in time. Additionally, I use a bootstrap argument to show that small initial temperatures give rise to solutions that decay to zero as time goes to infinity, proving the existence of initial states that do not develop into travelling combustion waves.
Lighting and Rotation Invariant Real-time Vehicle Wheel Detector based on YOLOv5
Creating an object detector, in computer vision, has some common challenges when initially developed based on Convolutional Neural Network (CNN) architecture. These challenges are more apparent when creating model that needs to adapt to images captured by various camera orientations, lighting conditions, and environmental changes. The availability of the initial training samples to cover all these conditions can be an enormous challenge with a time and cost burden. While the problem can exist when creating any type of object detection, some types are less common and have no pre-labeled image datasets that exists publicly. Sometime public datasets are not reliable nor comprehensive for a rare object type. Vehicle wheel is one of those example that been chosen to demonstrate the approach of creating a lighting and rotation invariant real-time detector based on YOLOv5 architecture. The objective is to provide a simple approach that could be used as a reference for developing other types of real-time object detectors.
Expanding continual few-shot learning benchmarks to include recognition of specific instances
Continual learning and few-shot learning are important frontiers in progress towards broader Machine Learning (ML) capabilities. There is a growing body of work in both, but few works combining the two. One exception is the Continual few-shot Learning (CFSL) framework of Antoniou et al. arXiv:2004.11967. In this study, we extend CFSL in two ways that capture a broader range of challenges, important for intelligent agent behaviour in real-world conditions. First, we modify CFSL to make it more comparable to standard continual learning experiments, where usually a much larger number of classes are presented. Second, we introduce an 'instance test' which requires recognition of specific instances of classes -- a capability of animal cognition that is usually neglected in ML. For an initial exploration of ML model performance under these conditions, we selected representative baseline models from the original CFSL work and added a model variant with replay. As expected, learning more classes is more difficult than the original CFSL experiments, and interestingly, the way in which image instances and classes are presented affects classification performance. Surprisingly, accuracy in the baseline instance test is comparable to other classification tasks, but poor given significant occlusion and noise. The use of replay for consolidation improves performance substantially for both types of tasks, but particularly the instance test.
Real-Time Iteration Scheme for Diffusion Policy
Diffusion Policies have demonstrated impressive performance in robotic manipulation tasks. However, their long inference time, resulting from an extensive iterative denoising process, and the need to execute an action chunk before the next prediction to maintain consistent actions limit their applicability to latency-critical tasks or simple tasks with a short cycle time. While recent methods explored distillation or alternative policy structures to accelerate inference, these often demand additional training, which can be resource-intensive for large robotic models. In this paper, we introduce a novel approach inspired by the Real-Time Iteration (RTI) Scheme, a method from optimal control that accelerates optimization by leveraging solutions from previous time steps as initial guesses for subsequent iterations. We explore the application of this scheme in diffusion inference and propose a scaling-based method to effectively handle discrete actions, such as grasping, in robotic manipulation. The proposed scheme significantly reduces runtime computational costs without the need for distillation or policy redesign. This enables a seamless integration into many pre-trained diffusion-based models, in particular, to resource-demanding large models. We also provide theoretical conditions for the contractivity which could be useful for estimating the initial denoising step. Quantitative results from extensive simulation experiments show a substantial reduction in inference time, with comparable overall performance compared with Diffusion Policy using full-step denoising. Our project page with additional resources is available at: https://rti-dp.github.io/.
FastEdit: Fast Text-Guided Single-Image Editing via Semantic-Aware Diffusion Fine-Tuning
Conventional Text-guided single-image editing approaches require a two-step process, including fine-tuning the target text embedding for over 1K iterations and the generative model for another 1.5K iterations. Although it ensures that the resulting image closely aligns with both the input image and the target text, this process often requires 7 minutes per image, posing a challenge for practical application due to its time-intensive nature. To address this bottleneck, we introduce FastEdit, a fast text-guided single-image editing method with semantic-aware diffusion fine-tuning, dramatically accelerating the editing process to only 17 seconds. FastEdit streamlines the generative model's fine-tuning phase, reducing it from 1.5K to a mere 50 iterations. For diffusion fine-tuning, we adopt certain time step values based on the semantic discrepancy between the input image and target text. Furthermore, FastEdit circumvents the initial fine-tuning step by utilizing an image-to-image model that conditions on the feature space, rather than the text embedding space. It can effectively align the target text prompt and input image within the same feature space and save substantial processing time. Additionally, we apply the parameter-efficient fine-tuning technique LoRA to U-net. With LoRA, FastEdit minimizes the model's trainable parameters to only 0.37\% of the original size. At the same time, we can achieve comparable editing outcomes with significantly reduced computational overhead. We conduct extensive experiments to validate the editing performance of our approach and show promising editing capabilities, including content addition, style transfer, background replacement, and posture manipulation, etc.
The discrete generalized exchange-driven system
We study a discrete model for generalized exchange-driven growth in which the particle exchanged between two clusters is not limited to be of size one. This set of models include as special cases the usual exchange-driven growth system and the coagulation-fragmentation system with binary fragmentation. Under reasonable general condition on the rate coefficients we establish the existence of admissible solutions, meaning solutions that are obtained as appropriate limit of solutions to a finite-dimensional truncation of the infinite-dimensional ODE. For these solutions we prove that, in the class of models we call isolated both the total number of particles and the total mass are conserved, whereas in those models we can non-isolated only the mass is conserved. Additionally, under more restrictive growth conditions for the rate equations we obtain uniqueness of solutions to the initial value problems.
Object-Centric Domain Randomization for 3D Shape Reconstruction in the Wild
One of the biggest challenges in single-view 3D shape reconstruction in the wild is the scarcity of <3D shape, 2D image>-paired data from real-world environments. Inspired by remarkable achievements via domain randomization, we propose ObjectDR which synthesizes such paired data via a random simulation of visual variations in object appearances and backgrounds. Our data synthesis framework exploits a conditional generative model (e.g., ControlNet) to generate images conforming to spatial conditions such as 2.5D sketches, which are obtainable through a rendering process of 3D shapes from object collections (e.g., Objaverse-XL). To simulate diverse variations while preserving object silhouettes embedded in spatial conditions, we also introduce a disentangled framework which leverages an initial object guidance. After synthesizing a wide range of data, we pre-train a model on them so that it learns to capture a domain-invariant geometry prior which is consistent across various domains. We validate its effectiveness by substantially improving 3D shape reconstruction models on a real-world benchmark. In a scale-up evaluation, our pre-training achieves 23.6% superior results compared with the pre-training on high-quality computer graphics renderings.
Meta Flow Matching: Integrating Vector Fields on the Wasserstein Manifold
Numerous biological and physical processes can be modeled as systems of interacting entities evolving continuously over time, e.g. the dynamics of communicating cells or physical particles. Learning the dynamics of such systems is essential for predicting the temporal evolution of populations across novel samples and unseen environments. Flow-based models allow for learning these dynamics at the population level - they model the evolution of the entire distribution of samples. However, current flow-based models are limited to a single initial population and a set of predefined conditions which describe different dynamics. We argue that multiple processes in natural sciences have to be represented as vector fields on the Wasserstein manifold of probability densities. That is, the change of the population at any moment in time depends on the population itself due to the interactions between samples. In particular, this is crucial for personalized medicine where the development of diseases and their respective treatment response depends on the microenvironment of cells specific to each patient. We propose Meta Flow Matching (MFM), a practical approach to integrating along these vector fields on the Wasserstein manifold by amortizing the flow model over the initial populations. Namely, we embed the population of samples using a Graph Neural Network (GNN) and use these embeddings to train a Flow Matching model. This gives MFM the ability to generalize over the initial distributions unlike previously proposed methods. We demonstrate the ability of MFM to improve prediction of individual treatment responses on a large scale multi-patient single-cell drug screen dataset.
Learning Control by Iterative Inversion
We propose iterative inversion -- an algorithm for learning an inverse function without input-output pairs, but only with samples from the desired output distribution and access to the forward function. The key challenge is a distribution shift between the desired outputs and the outputs of an initial random guess, and we prove that iterative inversion can steer the learning correctly, under rather strict conditions on the function. We apply iterative inversion to learn control. Our input is a set of demonstrations of desired behavior, given as video embeddings of trajectories (without actions), and our method iteratively learns to imitate trajectories generated by the current policy, perturbed by random exploration noise. Our approach does not require rewards, and only employs supervised learning, which can be easily scaled to use state-of-the-art trajectory embedding techniques and policy representations. Indeed, with a VQ-VAE embedding, and a transformer-based policy, we demonstrate non-trivial continuous control on several tasks. Further, we report an improved performance on imitating diverse behaviors compared to reward based methods.
Ctrl-Crash: Controllable Diffusion for Realistic Car Crashes
Video diffusion techniques have advanced significantly in recent years; however, they struggle to generate realistic imagery of car crashes due to the scarcity of accident events in most driving datasets. Improving traffic safety requires realistic and controllable accident simulations. To tackle the problem, we propose Ctrl-Crash, a controllable car crash video generation model that conditions on signals such as bounding boxes, crash types, and an initial image frame. Our approach enables counterfactual scenario generation where minor variations in input can lead to dramatically different crash outcomes. To support fine-grained control at inference time, we leverage classifier-free guidance with independently tunable scales for each conditioning signal. Ctrl-Crash achieves state-of-the-art performance across quantitative video quality metrics (e.g., FVD and JEDi) and qualitative measurements based on a human-evaluation of physical realism and video quality compared to prior diffusion-based methods.
StreamingT2V: Consistent, Dynamic, and Extendable Long Video Generation from Text
Text-to-video diffusion models enable the generation of high-quality videos that follow text instructions, making it easy to create diverse and individual content. However, existing approaches mostly focus on high-quality short video generation (typically 16 or 24 frames), ending up with hard-cuts when naively extended to the case of long video synthesis. To overcome these limitations, we introduce StreamingT2V, an autoregressive approach for long video generation of 80, 240, 600, 1200 or more frames with smooth transitions. The key components are:(i) a short-term memory block called conditional attention module (CAM), which conditions the current generation on the features extracted from the previous chunk via an attentional mechanism, leading to consistent chunk transitions, (ii) a long-term memory block called appearance preservation module, which extracts high-level scene and object features from the first video chunk to prevent the model from forgetting the initial scene, and (iii) a randomized blending approach that enables to apply a video enhancer autoregressively for infinitely long videos without inconsistencies between chunks. Experiments show that StreamingT2V generates high motion amount. In contrast, all competing image-to-video methods are prone to video stagnation when applied naively in an autoregressive manner. Thus, we propose with StreamingT2V a high-quality seamless text-to-long video generator that outperforms competitors with consistency and motion. Our code will be available at: https://github.com/Picsart-AI-Research/StreamingT2V
Exploratory Preference Optimization: Harnessing Implicit Q*-Approximation for Sample-Efficient RLHF
Reinforcement learning from human feedback (RLHF) has emerged as a central tool for language model alignment. We consider online exploration in RLHF, which exploits interactive access to human or AI feedback by deliberately encouraging the model to produce diverse, maximally informative responses. By allowing RLHF to confidently stray from the pre-trained model, online exploration offers the possibility of novel, potentially super-human capabilities, but its full potential as a paradigm for language model training has yet to be realized, owing to computational and statistical bottlenecks in directly adapting existing reinforcement learning techniques. We propose a new algorithm for online exploration in RLHF, Exploratory Preference Optimization (XPO), which is simple and practical -- a one-line change to (online) Direct Preference Optimization (DPO; Rafailov et al., 2023) -- yet enjoys the strongest known provable guarantees and promising empirical performance. XPO augments the DPO objective with a novel and principled exploration bonus, empowering the algorithm to explore outside the support of the initial model and human feedback data. In theory, we show that XPO is provably sample-efficient and converges to a near-optimal language model policy under natural exploration conditions, irrespective of whether the initial model has good coverage. Our analysis, which builds on the observation that DPO implicitly performs a form of Q^{star}-approximation (or, Bellman error minimization), combines previously disparate techniques from language modeling and theoretical reinforcement learning in a serendipitous fashion through the perspective of KL-regularized Markov decision processes. Empirically, we find that XPO is more sample-efficient than non-exploratory DPO variants in a preliminary evaluation.
Flow: A Modular Approach to Automated Agentic Workflow Generation
Multi-agent frameworks powered by large language models (LLMs) have demonstrated great success in automated planning and task execution. However, the effective adjustment of Agentic workflows during execution has not been well-studied. A effective workflow adjustment is crucial, as in many real-world scenarios, the initial plan must adjust to unforeseen challenges and changing conditions in real-time to ensure the efficient execution of complex tasks. In this paper, we define workflows as an activity-on-vertex (AOV) graphs. We continuously refine the workflow by dynamically adjusting task allocations based on historical performance and previous AOV with LLM agents. To further enhance system performance, we emphasize modularity in workflow design based on measuring parallelism and dependence complexity. Our proposed multi-agent framework achieved efficient sub-task concurrent execution, goal achievement, and error tolerance. Empirical results across different practical tasks demonstrate dramatic improvements in the efficiency of multi-agent frameworks through dynamic workflow updating and modularization.
A Model Generalization Study in Localizing Indoor Cows with COw LOcalization (COLO) dataset
Precision livestock farming (PLF) increasingly relies on advanced object localization techniques to monitor livestock health and optimize resource management. This study investigates the generalization capabilities of YOLOv8 and YOLOv9 models for cow detection in indoor free-stall barn settings, focusing on varying training data characteristics such as view angles and lighting, and model complexities. Leveraging the newly released public dataset, COws LOcalization (COLO) dataset, we explore three key hypotheses: (1) Model generalization is equally influenced by changes in lighting conditions and camera angles; (2) Higher model complexity guarantees better generalization performance; (3) Fine-tuning with custom initial weights trained on relevant tasks always brings advantages to detection tasks. Our findings reveal considerable challenges in detecting cows in images taken from side views and underscore the importance of including diverse camera angles in building a detection model. Furthermore, our results emphasize that higher model complexity does not necessarily lead to better performance. The optimal model configuration heavily depends on the specific task and dataset. Lastly, while fine-tuning with custom initial weights trained on relevant tasks offers advantages to detection tasks, simpler models do not benefit similarly from this approach. It is more efficient to train a simple model with pre-trained weights without relying on prior relevant information, which can require intensive labor efforts. Future work should focus on adaptive methods and advanced data augmentation to improve generalization and robustness. This study provides practical guidelines for PLF researchers on deploying computer vision models from existing studies, highlights generalization issues, and contributes the COLO dataset containing 1254 images and 11818 cow instances for further research.
FoundLoc: Vision-based Onboard Aerial Localization in the Wild
Robust and accurate localization for Unmanned Aerial Vehicles (UAVs) is an essential capability to achieve autonomous, long-range flights. Current methods either rely heavily on GNSS, face limitations in visual-based localization due to appearance variances and stylistic dissimilarities between camera and reference imagery, or operate under the assumption of a known initial pose. In this paper, we developed a GNSS-denied localization approach for UAVs that harnesses both Visual-Inertial Odometry (VIO) and Visual Place Recognition (VPR) using a foundation model. This paper presents a novel vision-based pipeline that works exclusively with a nadir-facing camera, an Inertial Measurement Unit (IMU), and pre-existing satellite imagery for robust, accurate localization in varied environments and conditions. Our system demonstrated average localization accuracy within a 20-meter range, with a minimum error below 1 meter, under real-world conditions marked by drastic changes in environmental appearance and with no assumption of the vehicle's initial pose. The method is proven to be effective and robust, addressing the crucial need for reliable UAV localization in GNSS-denied environments, while also being computationally efficient enough to be deployed on resource-constrained platforms.
Efficient Video Prediction via Sparsely Conditioned Flow Matching
We introduce a novel generative model for video prediction based on latent flow matching, an efficient alternative to diffusion-based models. In contrast to prior work, we keep the high costs of modeling the past during training and inference at bay by conditioning only on a small random set of past frames at each integration step of the image generation process. Moreover, to enable the generation of high-resolution videos and to speed up the training, we work in the latent space of a pretrained VQGAN. Finally, we propose to approximate the initial condition of the flow ODE with the previous noisy frame. This allows to reduce the number of integration steps and hence, speed up the sampling at inference time. We call our model Random frame conditioned flow Integration for VidEo pRediction, or, in short, RIVER. We show that RIVER achieves superior or on par performance compared to prior work on common video prediction benchmarks, while requiring an order of magnitude fewer computational resources.
Fast Sampling of Diffusion Models via Operator Learning
Diffusion models have found widespread adoption in various areas. However, their sampling process is slow because it requires hundreds to thousands of network evaluations to emulate a continuous process defined by differential equations. In this work, we use neural operators, an efficient method to solve the probability flow differential equations, to accelerate the sampling process of diffusion models. Compared to other fast sampling methods that have a sequential nature, we are the first to propose parallel decoding method that generates images with only one model forward pass. We propose diffusion model sampling with neural operator (DSNO) that maps the initial condition, i.e., Gaussian distribution, to the continuous-time solution trajectory of the reverse diffusion process. To model the temporal correlations along the trajectory, we introduce temporal convolution layers that are parameterized in the Fourier space into the given diffusion model backbone. We show our method achieves state-of-the-art FID of 4.12 for CIFAR-10 and 8.35 for ImageNet-64 in the one-model-evaluation setting.
PINNsFormer: A Transformer-Based Framework For Physics-Informed Neural Networks
Physics-Informed Neural Networks (PINNs) have emerged as a promising deep learning framework for approximating numerical solutions to partial differential equations (PDEs). However, conventional PINNs, relying on multilayer perceptrons (MLP), neglect the crucial temporal dependencies inherent in practical physics systems and thus fail to propagate the initial condition constraints globally and accurately capture the true solutions under various scenarios. In this paper, we introduce a novel Transformer-based framework, termed PINNsFormer, designed to address this limitation. PINNsFormer can accurately approximate PDE solutions by utilizing multi-head attention mechanisms to capture temporal dependencies. PINNsFormer transforms point-wise inputs into pseudo sequences and replaces point-wise PINNs loss with a sequential loss. Additionally, it incorporates a novel activation function, Wavelet, which anticipates Fourier decomposition through deep neural networks. Empirical results demonstrate that PINNsFormer achieves superior generalization ability and accuracy across various scenarios, including PINNs failure modes and high-dimensional PDEs. Moreover, PINNsFormer offers flexibility in integrating existing learning schemes for PINNs, further enhancing its performance.
Foundation Inference Models for Markov Jump Processes
Markov jump processes are continuous-time stochastic processes which describe dynamical systems evolving in discrete state spaces. These processes find wide application in the natural sciences and machine learning, but their inference is known to be far from trivial. In this work we introduce a methodology for zero-shot inference of Markov jump processes (MJPs), on bounded state spaces, from noisy and sparse observations, which consists of two components. First, a broad probability distribution over families of MJPs, as well as over possible observation times and noise mechanisms, with which we simulate a synthetic dataset of hidden MJPs and their noisy observation process. Second, a neural network model that processes subsets of the simulated observations, and that is trained to output the initial condition and rate matrix of the target MJP in a supervised way. We empirically demonstrate that one and the same (pretrained) model can infer, in a zero-shot fashion, hidden MJPs evolving in state spaces of different dimensionalities. Specifically, we infer MJPs which describe (i) discrete flashing ratchet systems, which are a type of Brownian motors, and the conformational dynamics in (ii) molecular simulations, (iii) experimental ion channel data and (iv) simple protein folding models. What is more, we show that our model performs on par with state-of-the-art models which are finetuned to the target datasets.
Quantum Thermalization via Travelling Waves
Isolated quantum many-body systems which thermalize under their own dynamics are expected to act as their own thermal baths, thereby bringing their local subsystems to thermal equilibrium. Here we show that the infinite-dimensional limit of a quantum lattice model, as described by Dynamical Mean-Field theory (DMFT), provides a natural framework to understand this self-consistent thermalization process. Using the Fermi-Hubbard model as working example, we demonstrate that the emergence of a self-consistent bath thermalising the system is characterized by a sharp thermalization front, moving balistically and separating the initial condition from the long-time thermal fixed point. We characterize the full DMFT dynamics through an effective temperature for which we derive a travelling-wave equation of the Fisher-Kolmogorov-Petrovsky-Piskunov (FKPP) type. This equation allows to predict the asymptotic shape of the front and its velocity, which match perfectly the full DMFT numerics. Our results provide a new angle to understand the onset of quantum thermalisation in closed isolated systems.
A PINN Approach to Symbolic Differential Operator Discovery with Sparse Data
Given ample experimental data from a system governed by differential equations, it is possible to use deep learning techniques to construct the underlying differential operators. In this work we perform symbolic discovery of differential operators in a situation where there is sparse experimental data. This small data regime in machine learning can be made tractable by providing our algorithms with prior information about the underlying dynamics. Physics Informed Neural Networks (PINNs) have been very successful in this regime (reconstructing entire ODE solutions using only a single point or entire PDE solutions with very few measurements of the initial condition). We modify the PINN approach by adding a neural network that learns a representation of unknown hidden terms in the differential equation. The algorithm yields both a surrogate solution to the differential equation and a black-box representation of the hidden terms. These hidden term neural networks can then be converted into symbolic equations using symbolic regression techniques like AI Feynman. In order to achieve convergence of these neural networks, we provide our algorithms with (noisy) measurements of both the initial condition as well as (synthetic) experimental data obtained at later times. We demonstrate strong performance of this approach even when provided with very few measurements of noisy data in both the ODE and PDE regime.
Optimally truncated WKB approximation for the highly oscillatory stationary 1D Schrödinger equation
We discuss the numerical solution of initial value problems for varepsilon^2,varphi''+a(x),varphi=0 in the highly oscillatory regime, i.e., with a(x)>0 and 0<varepsilonll 1. We analyze and implement an approximate solution based on the well-known WKB-ansatz. The resulting approximation error is of magnitude O(varepsilon^{N}) where N refers to the truncation order of the underlying asymptotic series. When the optimal truncation order N_{opt} is chosen, the error behaves like O(varepsilon^{-2}exp(-cvarepsilon^{-1})) with some c>0.
Global Well-posedness for 2D non-resistive MHD equations in half-space
This paper focuses on the initial boundary value problem of two-dimensional non-resistive MHD equations in a half space. We prove that the MHD equations have a unique global strong solution around the equilibrium state (0,e_1) for Dirichlet boundary condition of velocity and modified Neumann boundary condition of magnetic.
Sharp seasonal threshold property for cooperative population dynamics with concave nonlinearities
We consider a biological population whose environment varies periodically in time, exhibiting two very different "seasons" : one is favorable and the other one is unfavorable. For monotone differential models with concave nonlinearities, we address the following question: the system's period being fixed, under what conditions does there exist a critical duration for the unfavorable season? By "critical duration" we mean that above some threshold, the population cannot sustain and extincts, while below this threshold, the system converges to a unique periodic and positive solution. We term this a "sharp seasonal threshold property" (SSTP, for short). Building upon a previous result, we obtain sufficient conditions for SSTP in any dimension and apply our criterion to a two-dimensional model featuring juvenile and adult populations of insects.
How connectivity structure shapes rich and lazy learning in neural circuits
In theoretical neuroscience, recent work leverages deep learning tools to explore how some network attributes critically influence its learning dynamics. Notably, initial weight distributions with small (resp. large) variance may yield a rich (resp. lazy) regime, where significant (resp. minor) changes to network states and representation are observed over the course of learning. However, in biology, neural circuit connectivity could exhibit a low-rank structure and therefore differs markedly from the random initializations generally used for these studies. As such, here we investigate how the structure of the initial weights -- in particular their effective rank -- influences the network learning regime. Through both empirical and theoretical analyses, we discover that high-rank initializations typically yield smaller network changes indicative of lazier learning, a finding we also confirm with experimentally-driven initial connectivity in recurrent neural networks. Conversely, low-rank initialization biases learning towards richer learning. Importantly, however, as an exception to this rule, we find lazier learning can still occur with a low-rank initialization that aligns with task and data statistics. Our research highlights the pivotal role of initial weight structures in shaping learning regimes, with implications for metabolic costs of plasticity and risks of catastrophic forgetting.
Anisotropic Compact Star Model Satisfying Karmarkar Conditions
A new class of solutions describing the composition of compact stars has been proposed, assuming that the fluid distribution inside the star is anisotropic. This is achieved by assuming the appropriate metric potential and then solving Einstein's field equations using Karmarkar conditions [Karmarkar K. R., Proc. Indian Acad. Sci. 27 (1948) 56] to derive the expressions for star density, the radial and tangential pressures in terms of the constants A, B, a paramter `a' and the curvature parameter R. The equations thus obtained have been passed through rigorous conditional analysis. It is further shown that the model is physically viable and mathematically well-behaved, fulfilling the requisite conditions viz., regularity condition, strong energy condition, causality condition, etc. Observed star candidates including EXO 1785-248, SMC X-1, SAXJ1808.43658(SS2), HER X-1, 4U 1538-52, Cen X-3 and LMC X-4 were found to conform to a good approximation through the outcome of this model for a=0.5.
Constant Acceleration Flow
Rectified flow and reflow procedures have significantly advanced fast generation by progressively straightening ordinary differential equation (ODE) flows. They operate under the assumption that image and noise pairs, known as couplings, can be approximated by straight trajectories with constant velocity. However, we observe that modeling with constant velocity and using reflow procedures have limitations in accurately learning straight trajectories between pairs, resulting in suboptimal performance in few-step generation. To address these limitations, we introduce Constant Acceleration Flow (CAF), a novel framework based on a simple constant acceleration equation. CAF introduces acceleration as an additional learnable variable, allowing for more expressive and accurate estimation of the ODE flow. Moreover, we propose two techniques to further improve estimation accuracy: initial velocity conditioning for the acceleration model and a reflow process for the initial velocity. Our comprehensive studies on toy datasets, CIFAR-10, and ImageNet 64x64 demonstrate that CAF outperforms state-of-the-art baselines for one-step generation. We also show that CAF dramatically improves few-step coupling preservation and inversion over Rectified flow. Code is available at https://github.com/mlvlab/CAF{https://github.com/mlvlab/CAF}.
Schrödinger-Poisson systems with a general critical nonlinearity
We consider a Schr\"odinger-Poisson system involving a general nonlinearity at critical growth and we prove the existence of positive solutions. The Ambrosetti-Rabinowitz condition is not required. We also study the asymptotics of solutions with respect to a parameter.
Learning large scale industrial physics simulations
In an industrial group like Safran, numerical simulations of physical phenomena are integral to most design processes. At Safran's corporate research center, we enhance these processes by developing fast and reliable surrogate models for various physics. We focus here on two technologies developed in recent years. The first is a physical reduced-order modeling method for non-linear structural mechanics and thermal analysis, used for calculating the lifespan of high-pressure turbine blades and performing heat analysis of high-pressure compressors. The second technology involves learning physics simulations with non-parameterized geometrical variability using classical machine learning tools, such as Gaussian process regression. Finally, we present our contributions to the open-source and open-data community.
Isoperimetry and the properness of weak inverse mean curvature flow
We prove a new existence theorem for proper solutions of Huisken and Ilmanen's weak inverse mean curvature flow, assuming certain non-degeneracy conditions on the isoperimetric profile. In particular, no curvature assumption is imposed in our existence theorem.
All you need is a good init
Layer-sequential unit-variance (LSUV) initialization - a simple method for weight initialization for deep net learning - is proposed. The method consists of the two steps. First, pre-initialize weights of each convolution or inner-product layer with orthonormal matrices. Second, proceed from the first to the final layer, normalizing the variance of the output of each layer to be equal to one. Experiment with different activation functions (maxout, ReLU-family, tanh) show that the proposed initialization leads to learning of very deep nets that (i) produces networks with test accuracy better or equal to standard methods and (ii) is at least as fast as the complex schemes proposed specifically for very deep nets such as FitNets (Romero et al. (2015)) and Highway (Srivastava et al. (2015)). Performance is evaluated on GoogLeNet, CaffeNet, FitNets and Residual nets and the state-of-the-art, or very close to it, is achieved on the MNIST, CIFAR-10/100 and ImageNet datasets.
Early warning signals: The charted and uncharted territories
The realization that complex systems such as ecological communities can collapse or shift regimes suddenly and without rapid external forcing poses a serious challenge to our understanding and management of the natural world. The potential to identify early warning signals that would allow researchers and managers to predict such events before they happen has therefore been an invaluable discovery that offers a way forward in spite of such seemingly unpredictable behavior. Research into early warning signals has demonstrated that it is possible to define and detect such early warning signals in advance of a transition in certain contexts. Here we describe the pattern emerging as research continues to explore just how far we can generalize these results. A core of examples emerges that shares three properties: the phenomenon of rapid regime shifts, a pattern of 'critical slowing down' that can be used to detect the approaching shift, and a mechanism of bifurcation driving the sudden change. As research has expanded beyond these core examples, it is becoming clear that not all systems that show regime shifts exhibit critical slowing down, or vice versa. Even when systems exhibit critical slowing down, statistical detection is a challenge. We review the literature that explores these edge cases and highlight the need for (a) new early warning behaviors that can be used in cases where rapid shifts do not exhibit critical slowing down, (b) the development of methods to identify which behavior might be an appropriate signal when encountering a novel system; bearing in mind that a positive indication for some systems is a negative indication in others, and (c) statistical methods that can distinguish between signatures of early warning behaviors and noise.
IDInit: A Universal and Stable Initialization Method for Neural Network Training
Deep neural networks have achieved remarkable accomplishments in practice. The success of these networks hinges on effective initialization methods, which are vital for ensuring stable and rapid convergence during training. Recently, initialization methods that maintain identity transition within layers have shown good efficiency in network training. These techniques (e.g., Fixup) set specific weights to zero to achieve identity control. However, settings of remaining weight (e.g., Fixup uses random values to initialize non-zero weights) will affect the inductive bias that is achieved only by a zero weight, which may be harmful to training. Addressing this concern, we introduce fully identical initialization (IDInit), a novel method that preserves identity in both the main and sub-stem layers of residual networks. IDInit employs a padded identity-like matrix to overcome rank constraints in non-square weight matrices. Furthermore, we show the convergence problem of an identity matrix can be solved by stochastic gradient descent. Additionally, we enhance the universality of IDInit by processing higher-order weights and addressing dead neuron problems. IDInit is a straightforward yet effective initialization method, with improved convergence, stability, and performance across various settings, including large-scale datasets and deep models.
Pair State Transfer
Let L denote the Laplacian matrix of a graph G. We study continuous quantum walks on G defined by the transition matrix U(t)=expleft(itLright). The initial state is of the pair state form, e_a-e_b with a,b being any two vertices of G. We provide two ways to construct infinite families of graphs that have perfect pair transfer. We study a "transitivity" phenomenon which cannot occur in vertex state transfer. We characterize perfect pair state transfer on paths and cycles. We also study the case when quantum walks are generated by the unsigned Laplacians of underlying graphs and the initial states are of the plus state form, e_a+e_b. When the underlying graphs are bipartite, plus state transfer is equivalent to pair state transfer.
Surface Patches with Rounded Corners
We analyze surface patches with a corner that is rounded in the sense that the partial derivatives at that point are antiparallel. Sufficient conditions for G^1 smoothness are given, which, up to a certain degenerate case, are also necessary. Further, we investigate curvature integrability and present examples
Early Warning Signals and the Prosecutor's Fallacy
Early warning signals have been proposed to forecast the possibility of a critical transition, such as the eutrophication of a lake, the collapse of a coral reef, or the end of a glacial period. Because such transitions often unfold on temporal and spatial scales that can be difficult to approach by experimental manipulation, research has often relied on historical observations as a source of natural experiments. Here we examine a critical difference between selecting systems for study based on the fact that we have observed a critical transition and those systems for which we wish to forecast the approach of a transition. This difference arises by conditionally selecting systems known to experience a transition of some sort and failing to account for the bias this introduces -- a statistical error often known as the Prosecutor's Fallacy. By analysing simulated systems that have experienced transitions purely by chance, we reveal an elevated rate of false positives in common warning signal statistics. We further demonstrate a model-based approach that is less subject to this bias than these more commonly used summary statistics. We note that experimental studies with replicates avoid this pitfall entirely.
IMF slope derived from a pure probabilistic model
The stellar initial mass function is of great significance for the study of star formation and galactic structure. Observations indicate that the IMF follows a power-law form. This work derived that when the expected number of stars formed from a spherical molecular cloud is much greater than 1, there is a relationship between the slope alpha of the IMF and r^n in the radius-density relation of spherically symmetric gas clouds, given by alpha = 3/(n+3) (Gamma_{IMF} = n/(n+3)). This conclusion is close to the results of numerical simulations and observations, but it is derived from a pure probabilistic model, which may have underlying reasons worth pondering.
Phase-space analysis of the viscous fluid cosmological models in the coincident f(Q) gravity
In this article, we consider a newly proposed parameterization of the viscosity coefficient zeta, specifically zeta=zeta_0 {Omega^s_m} H , where zeta_0 = zeta_0{{Omega^s_{m_0}}} within the coincident f(Q) gravity formalism. We consider a non-linear function f(Q)= -Q +alpha Q^n, where alpha and n are arbitrary model parameters, which is a power-law correction to the STEGR scenario. We find an autonomous system by invoking the dimensionless density parameters as the governing phase-space variables. We discuss the physical significance of the model corresponding to the parameter choices n=-1 and n=2 along with the exponent choices s=0, 0.5, and 1.05. We find that model I shows the stable de-Sitter type or stable phantom type (depending on the choice of exponent s) behavior with no transition epoch, whereas model II shows the evolutionary phase from the radiation epoch to the accelerated de-Sitter epoch via passing through the matter-dominated epoch. Hence, we conclude that model I provides a good description of the late-time cosmology but fails to describe the transition epoch, whereas model II modifies the description in the context of the early universe and provides a good description of the matter and radiation era along with the transition phase.
Dynamic processes in superconductors and the laws of thermodynamics
The transition from the superconducting to the normal state in a magnetic field was considered as a irreversible thermodynamic process before 1933 because of Joule heating. But all physicists became to consider this transition as reversible after 1933 because of the obvious contradiction of the Meissner effect with the second law of thermodynamics if this transition is considered as a irreversible process. This radical change of the opinion contradicted logic since the dissipation of the kinetic energy of the surface screening current into Joule heat in the normal state cannot depend on how this current appeared in the superconducting state. The inconsistency of the conventional theory of superconductivity, created in the framework of the equilibrium thermodynamics, with Joule heating, on which Jorge Hirsch draws reader's attention, is a consequence of this history. In order to avoid contradiction with the second law of thermodynamics, physicists postulated in the thirties of the last century that the surface screening current is damped without the generation of Joule heat. This postulate contradicts not only logic and the conventional theory of superconductivity but also experimental results.
Introduction to Holographic Superconductors
These lectures give an introduction to the theory of holographic superconductors. These are superconductors that have a dual gravitational description using gauge/gravity duality. After introducing a suitable gravitational theory, we discuss its properties in various regimes: the probe limit, the effects of backreaction, the zero temperature limit, and the addition of magnetic fields. Using the gauge/gravity dictionary, these properties reproduce many of the standard features of superconductors. Some familiarity with gauge/gravity duality is assumed. A list of open problems is included at the end.
CEERS Epoch 1 NIRCam Imaging: Reduction Methods and Simulations Enabling Early JWST Science Results
We present the data release and data reduction process for the Epoch 1 NIRCam observations for the Cosmic Evolution Early Release Science Survey (CEERS). These data consist of NIRCam imaging in six broadband filters (F115W, F150W, F200W, F277W, F356W and F444W) and one medium band filter (F410M) over four pointings, obtained in parallel with primary CEERS MIRI observations (Yang et al. in prep). We reduced the NIRCam imaging with the JWST Calibration Pipeline, with custom modifications and reduction steps designed to address additional features and challenges with the data. Here we provide a detailed description of each step in our reduction and a discussion of future expected improvements. Our reduction process includes corrections for known pre-launch issues such as 1/f noise, as well as in-flight issues including snowballs, wisps, and astrometric alignment. Many of our custom reduction processes were first developed with pre-launch simulated NIRCam imaging over the full 10 CEERS NIRCam pointings. We present a description of the creation and reduction of this simulated dataset in the Appendix. We provide mosaics of the real images in a public release, as well as our reduction scripts with detailed explanations to allow users to reproduce our final data products. These represent one of the first official public datasets released from the Directors Discretionary Early Release Science (DD-ERS) program.
The continuous extension of the logarithmic double layer potential to the Ahlfors-regular boundary
For the real part of the Cauchy-type integral that is known to be the logarithmic potential of the double layer, a necessary and sufficient condition for the continuous extension to the Ahlfors-regular boundary is established.
Ill-posedness of the Kelvin-Helmholtz problem for compressible Euler fluids
In this paper, when the magnitude of the Mach number is strictly between some fixed small enough constant and 2, we can prove the linear and nonlinear ill-posedness of the Kelvin-Helmholtz problem for compressible ideal fluids. To our best knowledge, this is the first reslult that proves the nonlinear ill-posedness to the Kelvin-Helmholtz problem for the compressible Euler fluids.
Concentrating solutions of the fractional (p,q)-Choquard equation with exponential growth
This article deals with the following fractional (p,q)-Choquard equation with exponential growth of the form: $varepsilon^{ps}(-Delta)_{p}^{s}u+varepsilon^{qs}(-Delta)_q^su+ Z(x)(|u|^{p-2}u+|u|^{q-2}u)=varepsilon^{mu-N}[|x|^{-mu}*F(u)]f(u) in R^N, where s\in (0,1), \varepsilon>0 is a parameter, 2\leq p=N{s}<q, and 0<\mu<N. The nonlinear function f has an exponential growth at infinity and the continuous potential function Z satisfies suitable natural conditions. With the help of the Ljusternik-Schnirelmann category theory and variational methods, the multiplicity and concentration of positive solutions are obtained for \varepsilon>0$ small enough. In a certain sense, we generalize some previously known results.
Quantifying Limits to Detection of Early Warning for Critical Transitions
Catastrophic regime shifts in complex natural systems may be averted through advanced detection. Recent work has provided a proof-of-principle that many systems approaching a catastrophic transition may be identified through the lens of early warning indicators such as rising variance or increased return times. Despite widespread appreciation of the difficulties and uncertainty involved in such forecasts, proposed methods hardly ever characterize their expected error rates. Without the benefits of replicates, controls, or hindsight, applications of these approaches must quantify how reliable different indicators are in avoiding false alarms, and how sensitive they are to missing subtle warning signs. We propose a model based approach in order to quantify this trade-off between reliability and sensitivity and allow comparisons between different indicators. We show these error rates can be quite severe for common indicators even under favorable assumptions, and also illustrate how a model-based indicator can improve this performance. We demonstrate how the performance of an early warning indicator varies in different data sets, and suggest that uncertainty quantification become a more central part of early warning predictions.
Can We Scale Transformers to Predict Parameters of Diverse ImageNet Models?
Pretraining a neural network on a large dataset is becoming a cornerstone in machine learning that is within the reach of only a few communities with large-resources. We aim at an ambitious goal of democratizing pretraining. Towards that goal, we train and release a single neural network that can predict high quality ImageNet parameters of other neural networks. By using predicted parameters for initialization we are able to boost training of diverse ImageNet models available in PyTorch. When transferred to other datasets, models initialized with predicted parameters also converge faster and reach competitive final performance.
An efficient Asymptotic-Preserving scheme for the Boltzmann mixture with disparate mass
In this paper, we develop and implement an efficient asymptotic-preserving (AP) scheme to solve the gas mixture of Boltzmann equations under the disparate mass scaling relevant to the so-called "epochal relaxation" phenomenon. The disparity in molecular masses, ranging across several orders of magnitude, leads to significant challenges in both the evaluation of collision operators and the designing of time-stepping schemes to capture the multi-scale nature of the dynamics. A direct implementation of the spectral method faces prohibitive computational costs as the mass ratio increases due to the need to resolve vastly different thermal velocities. Unlike [I. M. Gamba, S. Jin, and L. Liu, Commun. Math. Sci., 17 (2019), pp. 1257-1289], we propose an alternative approach based on proper truncation of asymptotic expansions of the collision operators, which significantly reduces the computational complexity and works well for small varepsilon. By incorporating the separation of three time scales in the model's relaxation process [P. Degond and B. Lucquin-Desreux, Math. Models Methods Appl. Sci., 6 (1996), pp. 405-436], we design an AP scheme that captures the specific dynamics of the disparate mass model while maintaining computational efficiency. Numerical experiments demonstrate the effectiveness of the proposed scheme in handling large mass ratios of heavy and light species, as well as capturing the epochal relaxation phenomenon.
Stability Analysis for a Class of Heterogeneous Catalysis Models
We prove stability for a class of heterogeneous catalysis models in the L_p-setting. We consider a setting in a finite three-dimensional pore of cylinder-like geometry, with the lateral walls acting as a catalytic surface. Under a reasonable condition on the involved parameters, we show that given equilibria are normally stable, i.e. solutions are attracted at an exponential rate. The potential incidence of instability is discussed as well.
An operator preconditioning perspective on training in physics-informed machine learning
In this paper, we investigate the behavior of gradient descent algorithms in physics-informed machine learning methods like PINNs, which minimize residuals connected to partial differential equations (PDEs). Our key result is that the difficulty in training these models is closely related to the conditioning of a specific differential operator. This operator, in turn, is associated to the Hermitian square of the differential operator of the underlying PDE. If this operator is ill-conditioned, it results in slow or infeasible training. Therefore, preconditioning this operator is crucial. We employ both rigorous mathematical analysis and empirical evaluations to investigate various strategies, explaining how they better condition this critical operator, and consequently improve training.
Exploiting Pretrained Biochemical Language Models for Targeted Drug Design
Motivation: The development of novel compounds targeting proteins of interest is one of the most important tasks in the pharmaceutical industry. Deep generative models have been applied to targeted molecular design and have shown promising results. Recently, target-specific molecule generation has been viewed as a translation between the protein language and the chemical language. However, such a model is limited by the availability of interacting protein-ligand pairs. On the other hand, large amounts of unlabeled protein sequences and chemical compounds are available and have been used to train language models that learn useful representations. In this study, we propose exploiting pretrained biochemical language models to initialize (i.e. warm start) targeted molecule generation models. We investigate two warm start strategies: (i) a one-stage strategy where the initialized model is trained on targeted molecule generation (ii) a two-stage strategy containing a pre-finetuning on molecular generation followed by target specific training. We also compare two decoding strategies to generate compounds: beam search and sampling. Results: The results show that the warm-started models perform better than a baseline model trained from scratch. The two proposed warm-start strategies achieve similar results to each other with respect to widely used metrics from benchmarks. However, docking evaluation of the generated compounds for a number of novel proteins suggests that the one-stage strategy generalizes better than the two-stage strategy. Additionally, we observe that beam search outperforms sampling in both docking evaluation and benchmark metrics for assessing compound quality. Availability and implementation: The source code is available at https://github.com/boun-tabi/biochemical-lms-for-drug-design and the materials are archived in Zenodo at https://doi.org/10.5281/zenodo.6832145
A study of a deterministic model for meningitis epidemic
A compartmental deterministic model that allows (1) immunity from two stages of infection and carriage, and (2) disease induced death, is used in studying the dynamics of meningitis epidemic process in a closed population. It allows for difference in the transmission rate of infection to a susceptible by a carrier and an infective. It is generalized to allow a proportion ({\phi}) of those susceptibles infected to progress directly to infectives in stage I. Both models are used in this study. The threshold conditions for the spread of carrier and infectives in stage I are derived for the two models. Sensitivity analysis is performed on the reproductive number derived from the next generation matrix. The case-carrier ratio profile for various parameters and threshold values are shown. So also are the graphs of the total number ever infected as influenced by {\epsilon} and {\phi}. The infection transmission rate (eta), the odds in favor of a carrier, over an infective, in transmitting an infection to a susceptible ({\epsilon}) and the carrier conversion rate ({\phi}) to an infective in stage I, are identified as key parameters that should be subject of attention for any control intervention strategy. The case-carrier ratio profiles provide evidence of a critical case-carrier ratio attained before the number of reported cases grows to an epidemic level. They also provide visual evidence of epidemiological context, in this case, epidemic incidence (in later part of dry season) and endemic incidence (during rainy season). Results from total proportion ever infected suggest that the model, in which {\phi}=0 obtained, can adequately represent, in essence, the generalized model for this study.
Equilibrium of Charges and Differential Equations Solved by Polynomials II
We continue study of equilibrium of two species of 2d coulomb charges (or point vortices in 2d ideal fluid) started in Lv. Although for two species of vortices with circulation ratio -1 the relationship between the equilibria and the factorization/Darboux transformation of the Schrodinger operator was established a long ago, the question about similar relationship for the ratio -2 remained unanswered. Here we present the answer.
The History of Primordial Black Holes
We overview the history of primordial black hole (PBH) research from the first papers around 50 years ago to the present epoch. The history may be divided into four periods, the dividing lines being marked by three key developments: inflation on the theoretical front and the detection of microlensing events by the MACHO project and gravitational waves by the LIGO/Virgo/KAGRA project on the observation front. However, they are also characterised by somewhat different focuses of research. The period 1967-1980 covered the groundbreaking work on PBH formation and evaporation. The period 1980-1996 mainly focussed on their formation, while the period 1996-2016 consolidated the work on formation but also collated the constraints on the PBH abundance. In the period 2016-2024 there was a shift of emphasis to the search for evidence for PBHs and - while opinions about the strength of the purported evidence vary - this has motivated more careful studies of some aspects of the subject. Certainly the soaring number of papers on PBHs in this last period indicates a growing interest in the topic.
High-dimensional dynamics of generalization error in neural networks
We perform an average case analysis of the generalization dynamics of large neural networks trained using gradient descent. We study the practically-relevant "high-dimensional" regime where the number of free parameters in the network is on the order of or even larger than the number of examples in the dataset. Using random matrix theory and exact solutions in linear models, we derive the generalization error and training error dynamics of learning and analyze how they depend on the dimensionality of data and signal to noise ratio of the learning problem. We find that the dynamics of gradient descent learning naturally protect against overtraining and overfitting in large networks. Overtraining is worst at intermediate network sizes, when the effective number of free parameters equals the number of samples, and thus can be reduced by making a network smaller or larger. Additionally, in the high-dimensional regime, low generalization error requires starting with small initial weights. We then turn to non-linear neural networks, and show that making networks very large does not harm their generalization performance. On the contrary, it can in fact reduce overtraining, even without early stopping or regularization of any sort. We identify two novel phenomena underlying this behavior in overcomplete models: first, there is a frozen subspace of the weights in which no learning occurs under gradient descent; and second, the statistical properties of the high-dimensional regime yield better-conditioned input correlations which protect against overtraining. We demonstrate that naive application of worst-case theories such as Rademacher complexity are inaccurate in predicting the generalization performance of deep neural networks, and derive an alternative bound which incorporates the frozen subspace and conditioning effects and qualitatively matches the behavior observed in simulation.
Morphological Regimes of Rotating Moist Convection
Moist convection is a physical process where the latent heat released by condensation acts as a buoyancy source that can enhance or even trigger an overturning convective instability. Since the saturation temperature often decreases with height, condensation releases latent heat preferentially in regions of upflow. Due to this inhomogeneous heat source, moist convection may be more sensitive to changes in flow morphology, such as those induced by rotation, than dry Rayleigh-B\'enard convection. In order to study the effects of rotation on flows driven by latent heat release, we present a suite of numerical simulations that solve the Rainy-B\'enard equations (Vallis et al. 2019). We identify three morphological regimes: a cellular regime and a plume regime broadly analogous to those found in rotating Rayleigh B\'enard convection, and a novel funnel regime that lacks a clear analog within the regimes exhibited by dry convection. We measure energy fluxes through the system and report rotational scalings of the Reynolds and moist Nusselt numbers. We find that moist static energy transport, as measured by a moist Nusselt number, is significantly enhanced in the funnel regime without a corresponding enhancement in Reynolds number, indicating that this funnel regime produces structures with more favorable correlations between the temperature and vertical velocity.
Symmetries and Asymptotically Flat Space
The construction of a theory of quantum gravity is an outstanding problem that can benefit from better understanding the laws of nature that are expected to hold in regimes currently inaccessible to experiment. Such fundamental laws can be found by considering the classical counterparts of a quantum theory. For example, conservation laws in a quantum theory often stem from conservation laws of the corresponding classical theory. In order to construct such laws, this thesis is concerned with the interplay between symmetries and conservation laws of classical field theories and their application to asymptotically flat spacetimes. This work begins with an explanation of symmetries in field theories with a focus on variational symmetries and their associated conservation laws. Boundary conditions for general relativity are then formulated on three-dimensional asymptotically flat spacetimes at null infinity using the method of conformal completion. Conserved quantities related to asymptotic symmetry transformations are derived and their properties are studied. This is done in a manifestly coordinate independent manner. In a separate step a coordinate system is introduced, such that the results can be compared to existing literature. Next, asymptotically flat spacetimes which contain both future as well as past null infinity are considered. Asymptotic symmetries occurring at these disjoint regions of three-dimensional asymptotically flat spacetimes are linked and the corresponding conserved quantities are matched. Finally, it is shown how asymptotic symmetries lead to the notion of distinct Minkowski spaces that can be differentiated by conserved quantities.
Critical scaling law for the deposition efficiency of inertia-driven particle collisions with a cylinder in high Reynolds number air flow
The Earth's atmosphere is an aerosol, it contains suspended particles. When air flows over an obstacle such as an aircraft wing or tree branch, these particles may not follow the same paths as the air flowing around the obstacle. Instead the particles in the air may deviate from the path of the air and so collide with the surface of the obstacle. It is known that particle inertia can drive this deposition, and that there is a critical value of this inertia, below which no point particles deposit. Particle inertia is measured by the Stokes number, St. We show that near the critical value of the Stokes number, St_c, the amount of deposition has the unusual scaling law of exp(-1/(St-St_c)^{1/2}). The scaling is controlled by the stagnation point of the flow. This scaling is determined by the time for the particle to reach the surface of the cylinder varying as 1/(St-St_c)^{1/2}, together with the distance away from the stagnation point (perpendicular to the flow direction) increasing exponentially with time. The scaling law applies to inviscid flow, a model for flow at high Reynolds numbers. The unusual scaling means that the amount of particles deposited increases only very slowly above the critical Stokes number. This has consequences for applications ranging from rime formation and fog harvesting to pollination.
Advantages and Bottlenecks of Quantum Machine Learning for Remote Sensing
This concept paper aims to provide a brief outline of quantum computers, explore existing methods of quantum image classification techniques, so focusing on remote sensing applications, and discuss the bottlenecks of performing these algorithms on currently available open source platforms. Initial results demonstrate feasibility. Next steps include expanding the size of the quantum hidden layer and increasing the variety of output image options.
Faster logconcave sampling from a cold start in high dimension
We present a faster algorithm to generate a warm start for sampling an arbitrary logconcave density specified by an evaluation oracle, leading to the first sub-cubic sampling algorithms for inputs in (near-)isotropic position. A long line of prior work incurred a warm-start penalty of at least linear in the dimension, hitting a cubic barrier, even for the special case of uniform sampling from convex bodies. Our improvement relies on two key ingredients of independent interest. (1) We show how to sample given a warm start in weaker notions of distance, in particular q-R\'enyi divergence for q=mathcal{O}(1), whereas previous analyses required stringent infty-R\'enyi divergence (with the exception of Hit-and-Run, whose known mixing time is higher). This marks the first improvement in the required warmness since Lov\'asz and Simonovits (1991). (2) We refine and generalize the log-Sobolev inequality of Lee and Vempala (2018), originally established for isotropic logconcave distributions in terms of the diameter of the support, to logconcave distributions in terms of a geometric average of the support diameter and the largest eigenvalue of the covariance matrix.
Transformers learn through gradual rank increase
We identify incremental learning dynamics in transformers, where the difference between trained and initial weights progressively increases in rank. We rigorously prove this occurs under the simplifying assumptions of diagonal weight matrices and small initialization. Our experiments support the theory and also show that phenomenon can occur in practice without the simplifying assumptions.
The information-theoretic foundation of thermodynamic work extraction
In this paper I apply newly-proposed information-theoretic principles to thermodynamic work extraction. I show that if it is possible to extract work deterministically from a physical system prepared in any one of a set of states, then those states must be distinguishable from one another. This result is formulated independently of scale and of particular dynamical laws; it also provides a novel connection between thermodynamics and information theory, established via the law of conservation of energy (rather than the second law of thermodynamics). Albeit compatible with these conclusions, existing thermodynamics approaches cannot provide a result of such generality, because they are scale-dependent (relying on ensembles or coarse-graining) or tied to particular dynamical laws. This paper thus provides a broader foundation for thermodynamics, with implications for the theory of von Neumann's universal constructor
ZerO Initialization: Initializing Neural Networks with only Zeros and Ones
Deep neural networks are usually initialized with random weights, with adequately selected initial variance to ensure stable signal propagation during training. However, selecting the appropriate variance becomes challenging especially as the number of layers grows. In this work, we replace random weight initialization with a fully deterministic initialization scheme, viz., ZerO, which initializes the weights of networks with only zeros and ones (up to a normalization factor), based on identity and Hadamard transforms. Through both theoretical and empirical studies, we demonstrate that ZerO is able to train networks without damaging their expressivity. Applying ZerO on ResNet achieves state-of-the-art performance on various datasets, including ImageNet, which suggests random weights may be unnecessary for network initialization. In addition, ZerO has many benefits, such as training ultra deep networks (without batch-normalization), exhibiting low-rank learning trajectories that result in low-rank and sparse solutions, and improving training reproducibility.
Computational Life: How Well-formed, Self-replicating Programs Emerge from Simple Interaction
The fields of Origin of Life and Artificial Life both question what life is and how it emerges from a distinct set of "pre-life" dynamics. One common feature of most substrates where life emerges is a marked shift in dynamics when self-replication appears. While there are some hypotheses regarding how self-replicators arose in nature, we know very little about the general dynamics, computational principles, and necessary conditions for self-replicators to emerge. This is especially true on "computational substrates" where interactions involve logical, mathematical, or programming rules. In this paper we take a step towards understanding how self-replicators arise by studying several computational substrates based on various simple programming languages and machine instruction sets. We show that when random, non self-replicating programs are placed in an environment lacking any explicit fitness landscape, self-replicators tend to arise. We demonstrate how this occurs due to random interactions and self-modification, and can happen with and without background random mutations. We also show how increasingly complex dynamics continue to emerge following the rise of self-replicators. Finally, we show a counterexample of a minimalistic programming language where self-replicators are possible, but so far have not been observed to arise.
Exploring the limits of nucleonic metamodelling using different relativistic density functionals
In this work, we explore two classes of density dependent relativistic mean-field models, their predictions of proton fractions at high densities and neutron star structure. We have used a metamodelling approach to these relativistic density functionals. We have generated a large ensemble of models with these classes and then applied constraints from theoretical and experimental nuclear physics and astrophysical observations. We find that both models produce similar equations of state and neutron star mass-radius sequences. But, their underlying compositions, denoted by the proton fraction in this case, are vastly different. This reinstates previous findings that information on composition gets masqueraded in beta-equilibrium. Additional observations of non-equilibrium phenomena are necessary to pin it down.
Reinforcement Learning for Adaptive Time-Stepping in the Chaotic Gravitational Three-Body Problem
Many problems in astrophysics cover multiple orders of magnitude in spatial and temporal scales. While simulating systems that experience rapid changes in these conditions, it is essential to adapt the (time-) step size to capture the behavior of the system during those rapid changes and use a less accurate time step at other, less demanding, moments. We encounter three problems with traditional methods. Firstly, making such changes requires expert knowledge of the astrophysics as well as of the details of the numerical implementation. Secondly, some parameters that determine the time-step size are fixed throughout the simulation, which means that they do not adapt to the rapidly changing conditions of the problem. Lastly, we would like the choice of time-step size to balance accuracy and computation effort. We address these challenges with Reinforcement Learning by training it to select the time-step size dynamically. We use the integration of a system of three equal-mass bodies that move due to their mutual gravity as an example of its application. With our method, the selected integration parameter adapts to the specific requirements of the problem, both in terms of computation time and accuracy while eliminating the expert knowledge needed to set up these simulations. Our method produces results competitive to existing methods and improve the results found with the most commonly-used values of time-step parameter. This method can be applied to other integrators without further retraining. We show that this extrapolation works for variable time-step integrators but does not perform to the desired accuracy for fixed time-step integrators.
Decomposed Prompt Tuning via Low-Rank Reparameterization
While prompt tuning approaches have achieved competitive performance with high efficiency, we observe that they invariably employ the same initialization process, wherein the soft prompt is either randomly initialized or derived from an existing embedding vocabulary. In contrast to these conventional methods, this study aims to investigate an alternative way to derive soft prompt. Our empirical studies show that the soft prompt typically exhibits a low intrinsic rank characteristic. With such observations, we propose decomposed prompt tuning, a novel approach that utilizes low-rank matrices to initialize the soft prompt. Through the low-rank reparameterization, our method significantly reduces the number of trainable parameters while maintaining effectiveness. Experimental results on the SuperGLUE benchmark in both high-resource and low-resource scenarios demonstrate the effectiveness of the proposed method.
Building an AdS/CFT superconductor
We show that a simple gravitational theory can provide a holographically dual description of a superconductor. There is a critical temperature, below which a charged condensate forms via a second order phase transition and the (DC) conductivity becomes infinite. The frequency dependent conductivity develops a gap determined by the condensate. We find evidence that the condensate consists of pairs of quasiparticles.
A Learnable Prior Improves Inverse Tumor Growth Modeling
Biophysical modeling, particularly involving partial differential equations (PDEs), offers significant potential for tailoring disease treatment protocols to individual patients. However, the inverse problem-solving aspect of these models presents a substantial challenge, either due to the high computational requirements of model-based approaches or the limited robustness of deep learning (DL) methods. We propose a novel framework that leverages the unique strengths of both approaches in a synergistic manner. Our method incorporates a DL ensemble for initial parameter estimation, facilitating efficient downstream evolutionary sampling initialized with this DL-based prior. We showcase the effectiveness of integrating a rapid deep-learning algorithm with a high-precision evolution strategy in estimating brain tumor cell concentrations from magnetic resonance images. The DL-Prior plays a pivotal role, significantly constraining the effective sampling-parameter space. This reduction results in a fivefold convergence acceleration and a Dice-score of 95%
Physics in Next-token Prediction
We discovered the underlying physics in Next-token Prediction (NTP). We identified the law of information conservation within NTP and proposed the First Law of Information Capacity (IC-1), demonstrating that the essence of intelligence emergence in auto-regressive models is fundamentally a process of information transfer. We also introduced Landauer's Principle into NTP, formulating the Second Law of Information Capacity (IC-2), which establishes the relationship between auto-regressive model training and energy consumption. Additionally, we presented several corollaries, which hold practical significance for production practices. Finally, we validated the compatibility and complementarity of our findings with existing theories.
Datasets of Fire and Crime Incidents in Pampanga, Philippines
The fire and crime incident datasets were requested and collected from two Philippine regional agencies (i.e., the Bureau of Fire Protection and the Philippine National Police). The datasets were used to initially analyze and map both fire and crime incidents within the province of Pampanga for a specific time frame. Several data preparation, normalization, and data cleaning steps were implemented to properly map and identify patterns within the datasets. The initial results also indicate the leading causes of fire and crimes are rubbish and acts against property. Fires mostly occur during the dry season in the province. Crime is particularly high during December, and most of the fire and crime incidents occur during the time when people are most active. The dataset was able to present the temporal characteristics of the fire and crime incidents that occurred in the province of Pampanga. Merge the existing dataset with the other datasets from other related agencies to get a bigger picture and produce more objective results that could be used for decision-making.
Quantifying the Rise and Fall of Complexity in Closed Systems: The Coffee Automaton
In contrast to entropy, which increases monotonically, the "complexity" or "interestingness" of closed systems seems intuitively to increase at first and then decrease as equilibrium is approached. For example, our universe lacked complex structures at the Big Bang and will also lack them after black holes evaporate and particles are dispersed. This paper makes an initial attempt to quantify this pattern. As a model system, we use a simple, two-dimensional cellular automaton that simulates the mixing of two liquids ("coffee" and "cream"). A plausible complexity measure is then the Kolmogorov complexity of a coarse-grained approximation of the automaton's state, which we dub the "apparent complexity." We study this complexity measure, and show analytically that it never becomes large when the liquid particles are non-interacting. By contrast, when the particles do interact, we give numerical evidence that the complexity reaches a maximum comparable to the "coffee cup's" horizontal dimension. We raise the problem of proving this behavior analytically.
Fluctuation Domains in Adaptive Evolution
We derive an expression for the variation between parallel trajectories in phenotypic evolution, extending the well known result that predicts the mean evolutionary path in adaptive dynamics or quantitative genetics. We show how this expression gives rise to the notion of fluctuation domains - parts of the fitness landscape where the rate of evolution is very predictable (due to fluctuation dissipation) and parts where it is highly variable (due to fluctuation enhancement). These fluctuation domains are determined by the curvature of the fitness landscape. Regions of the fitness landscape with positive curvature, such as adaptive valleys or branching points, experience enhancement. Regions with negative curvature, such as adaptive peaks, experience dissipation. We explore these dynamics in the ecological scenarios of implicit and explicit competition for a limiting resource.
Explainable artificial intelligence model to predict acute critical illness from electronic health records
We developed an explainable artificial intelligence (AI) early warning score (xAI-EWS) system for early detection of acute critical illness. While maintaining a high predictive performance, our system explains to the clinician on which relevant electronic health records (EHRs) data the prediction is grounded. Acute critical illness is often preceded by deterioration of routinely measured clinical parameters, e.g., blood pressure and heart rate. Early clinical prediction is typically based on manually calculated screening metrics that simply weigh these parameters, such as Early Warning Scores (EWS). The predictive performance of EWSs yields a tradeoff between sensitivity and specificity that can lead to negative outcomes for the patient. Previous work on EHR-trained AI systems offers promising results with high levels of predictive performance in relation to the early, real-time prediction of acute critical illness. However, without insight into the complex decisions by such system, clinical translation is hindered. In this letter, we present our xAI-EWS system, which potentiates clinical translation by accompanying a prediction with information on the EHR data explaining it.
Using Deep Learning to Design High Aspect Ratio Fusion Devices
The design of fusion devices is typically based on computationally expensive simulations. This can be alleviated using high aspect ratio models that employ a reduced number of free parameters, especially in the case of stellarator optimization where non-axisymmetric magnetic fields with a large parameter space are optimized to satisfy certain performance criteria. However, optimization is still required to find configurations with properties such as low elongation, high rotational transform, finite plasma beta, and good fast particle confinement. In this work, we train a machine learning model to construct configurations with favorable confinement properties by finding a solution to the inverse design problem, that is, obtaining a set of model input parameters for given desired properties. Since the solution of the inverse problem is non-unique, a probabilistic approach, based on mixture density networks, is used. It is shown that optimized configurations can be generated reliably using this method.
Unveiling two deeply embedded young protostars in the S68N Class 0 protostellar core with JWST/NIRSpec
The near-infrared (NIR) emission of the youngest protostars still needs to be characterized to better understand the evolution of their accretion and ejection activity. We analyze James Webb Space Telescope NIRSpec 1.7 -- 5.3 mum observations of two deeply embedded sources in the S68N protostellar core in Serpens. The North Central (NC) source exhibits a highly obscured spectrum (A_K ~ 4.8 mag) that is modeled with a pre-main-sequence photosphere and a hot disk component. The photospheric parameters are consistent with a young, low-mass photosphere, as suggested by the low surface gravity, log g of 1.95 pm 0.15 cm s^{-2}. The hot disk suggests that accretion onto the central protostellar embryo is ongoing, although prototypical accretion-tracing emission lines HI are not detected. The South Central (SC) source, which is even more embedded (A_K ~ 8 mag; no continuum is detected shortward of 3.6 mum) appears to be driving the large-scale S68N protostellar outflow, and launches a collimated hot molecular jet detected in \Ht and CO ro-vibrational lines. Shock modeling of the \Ht (ro)vibrational lines establishes that fast C-type shocks (geq 30 km s^{-1}), with high pre-shock density (geq 10^7 cm^{-3}), and strong magnetic field (b ~ 3--10, where B = b,times,textrm{n_{H} (cm^{-3})},muG) best match the data. The bright CO fundamental line forest suggests energetic excitation, with the contribution of non-LTE effects, ie irradiation pumping. Detected OH and CH^{+} ro-vibrational lines support this hypothesis. These two Class 0 protostars seem to be in very young evolutionary stages and still have to acquire the bulk of their final stellar masses. These results demonstrate that JWST enables unprecedented diagnostics of these first stages of the protostellar evolutionary phase.
Linking Past and Future Null Infinity in Three Dimensions
We provide a mapping between past null and future null infinity in three-dimensional flat space, using symmetry considerations. From this we derive a mapping between the corresponding asymptotic symmetry groups. By studying the metric at asymptotic regions, we find that the mapping is energy preserving and yields an infinite number of conservation laws.
Rich Feature Construction for the Optimization-Generalization Dilemma
There often is a dilemma between ease of optimization and robust out-of-distribution (OoD) generalization. For instance, many OoD methods rely on penalty terms whose optimization is challenging. They are either too strong to optimize reliably or too weak to achieve their goals. We propose to initialize the networks with a rich representation containing a palette of potentially useful features, ready to be used by even simple models. On the one hand, a rich representation provides a good initialization for the optimizer. On the other hand, it also provides an inductive bias that helps OoD generalization. Such a representation is constructed with the Rich Feature Construction (RFC) algorithm, also called the Bonsai algorithm, which consists of a succession of training episodes. During discovery episodes, we craft a multi-objective optimization criterion and its associated datasets in a manner that prevents the network from using the features constructed in the previous iterations. During synthesis episodes, we use knowledge distillation to force the network to simultaneously represent all the previously discovered features. Initializing the networks with Bonsai representations consistently helps six OoD methods achieve top performance on ColoredMNIST benchmark. The same technique substantially outperforms comparable results on the Wilds Camelyon17 task, eliminates the high result variance that plagues other methods, and makes hyperparameter tuning and model selection more reliable.
Noisy dynamical systems evolve error correcting codes and modularity
Noise is a ubiquitous feature of the physical world. As a result, the first prerequisite of life is fault tolerance: maintaining integrity of state despite external bombardment. Recent experimental advances have revealed that biological systems achieve fault tolerance by implementing mathematically intricate error-correcting codes and by organizing in a modular fashion that physically separates functionally distinct subsystems. These elaborate structures represent a vanishing volume in the massive genetic configuration space. How is it possible that the primitive process of evolution, by which all biological systems evolved, achieved such unusual results? In this work, through experiments in Boolean networks, we show that the simultaneous presence of error correction and modularity in biological systems is no coincidence. Rather, it is a typical co-occurrence in noisy dynamic systems undergoing evolution. From this, we deduce the principle of error correction enhanced evolvability: systems possessing error-correcting codes are more effectively improved by evolution than those without.
Jovian Vortex Hunter: a citizen science project to study Jupiter's vortices
The Jovian atmosphere contains a wide diversity of vortices, which have a large range of sizes, colors and forms in different dynamical regimes. The formation processes for these vortices is poorly understood, and aside from a few known, long-lived ovals, such as the Great Red Spot, and Oval BA, vortex stability and their temporal evolution are currently largely unknown. In this study, we use JunoCam data and a citizen-science project on Zooniverse to derive a catalog of vortices, some with repeated observations, through May 2018 to Sep 2021, and analyze their associated properties, such as size, location and color. We find that different colored vortices (binned as white, red, brown and dark), follow vastly different distributions in terms of their sizes and where they are found on the planet. We employ a simplified stability criterion using these vortices as a proxy, to derive a minimum Rossby deformation length for the planet of sim1800 km. We find that this value of L_d is largely constant throughout the atmosphere, and does not have an appreciable meridional gradient.
Dynamics of the Beta Pictoris planetary system and possibility of an additional planet
The Beta Pictoris system is characterized by a dusty debris disk, in addition to the presence of two already known planets. This makes it a particularly interesting case for studying the formation and evolution of planetary systems at a stage where giant planets have already formed, most of the protoplanetary gas has dissipated, and terrestrial planets could emerge. Our goal here is to explore the possibility of additional planets orbiting beyond the outermost known one, beta Pic b. More specifically, we aim to assess whether additional planets in the system could explain the discrepancy between the predicted cutoff of the disk inner cavity at sim28 au with only two planets, and the observed one at sim50 au. We perform an exhaustive dynamical modeling of the debris disk and the carving of its inner edge, by introducing one or two additional planets beyond beta Pic b, coplanar with the disk. Guided by theoretical predictions for the parameter space - mass, semi-major axis, eccentricity - allowed for additional planets, we further carry out a set of N-body simulations, using the symplectic integrator RMVS3. Our simulations indicate that an additional planet with a low eccentricity of 0.05, a mass between 0.15 and 1 M_{Jup}, and a semi-major axis between 30 and 36 au, would be consistent with the observations of an inner debris disk edge at 50 au. We have also explored the hypotheses of a higher eccentricity and the presence of two additional lower mass planets instead of one, which could also account for these observations. While we have found that one or even two additional planets could explain the observed location of the disk inner edge, these hypothetical planets remain in most cases below the current observational limits of high contrast imaging. Future observational campaigns with improved sensitivity will help lowering these limits and perhaps detect that planet.
Collective Dynamics from Stochastic Thermodynamics
From a viewpoint of stochastic thermodynamics, we derive equations that describe the collective dynamics near the order-disorder transition in the globally coupled XY model and near the synchronization-desynchronization transition in the Kuramoto model. A new way of thinking is to interpret the deterministic time evolution of a macroscopic variable as an external operation to a thermodynamic system. We then find that the irreversible work determines the equation for the collective dynamics. When analyzing the Kuramoto model, we employ a generalized concept of irreversible work which originates from a non-equilibrium identity associated with steady state thermodynamics.
Train longer, generalize better: closing the generalization gap in large batch training of neural networks
Background: Deep learning models are typically trained using stochastic gradient descent or one of its variants. These methods update the weights using their gradient, estimated from a small fraction of the training data. It has been observed that when using large batch sizes there is a persistent degradation in generalization performance - known as the "generalization gap" phenomena. Identifying the origin of this gap and closing it had remained an open problem. Contributions: We examine the initial high learning rate training phase. We find that the weight distance from its initialization grows logarithmically with the number of weight updates. We therefore propose a "random walk on random landscape" statistical model which is known to exhibit similar "ultra-slow" diffusion behavior. Following this hypothesis we conducted experiments to show empirically that the "generalization gap" stems from the relatively small number of updates rather than the batch size, and can be completely eliminated by adapting the training regime used. We further investigate different techniques to train models in the large-batch regime and present a novel algorithm named "Ghost Batch Normalization" which enables significant decrease in the generalization gap without increasing the number of updates. To validate our findings we conduct several additional experiments on MNIST, CIFAR-10, CIFAR-100 and ImageNet. Finally, we reassess common practices and beliefs concerning training of deep models and suggest they may not be optimal to achieve good generalization.
Initializing Models with Larger Ones
Weight initialization plays an important role in neural network training. Widely used initialization methods are proposed and evaluated for networks that are trained from scratch. However, the growing number of pretrained models now offers new opportunities for tackling this classical problem of weight initialization. In this work, we introduce weight selection, a method for initializing smaller models by selecting a subset of weights from a pretrained larger model. This enables the transfer of knowledge from pretrained weights to smaller models. Our experiments demonstrate that weight selection can significantly enhance the performance of small models and reduce their training time. Notably, it can also be used together with knowledge distillation. Weight selection offers a new approach to leverage the power of pretrained models in resource-constrained settings, and we hope it can be a useful tool for training small models in the large-model era. Code is available at https://github.com/OscarXZQ/weight-selection.
Frechet Differentiability in Besov Spaces in the Optimal Control of Parabolic Free Boundary Problems
We consider the inverse Stefan type free boundary problem, where information on the boundary heat flux and density of the sources are missing and must be found along with the temperature and the free boundary. We pursue optimal control framework where boundary heat flux, density of sources, and free boundary are components of the control vector. The optimality criteria consists of the minimization of the L_2-norm declinations of the temperature measurements at the final moment, phase transition temperature, and final position of the free boundary. We prove the Frechet differentiability in Besov spaces, and derive the formula for the Frechet differential under minimal regularity assumptions on the data. The result implies a necessary condition for optimal control and opens the way to the application of projective gradient methods in Besov spaces for the numerical solution of the inverse Stefan problem.
Completely Discretized, Finite Quantum Mechanics
I propose a version of quantum mechanics featuring a discrete and finite number of states that is plausibly a model of the real world. The model is based on standard unitary quantum theory of a closed system with a finite-dimensional Hilbert space. Given certain simple conditions on the spectrum of the Hamiltonian, Schr\"odinger evolution is periodic, and it is straightforward to replace continuous time with a discrete version, with the result that the system only visits a discrete and finite set of state vectors. The biggest challenges to the viability of such a model come from cosmological considerations. The theory may have implications for questions of mathematical realism and finitism.
Black holes and the loss landscape in machine learning
Understanding the loss landscape is an important problem in machine learning. One key feature of the loss function, common to many neural network architectures, is the presence of exponentially many low lying local minima. Physical systems with similar energy landscapes may provide useful insights. In this work, we point out that black holes naturally give rise to such landscapes, owing to the existence of black hole entropy. For definiteness, we consider 1/8 BPS black holes in N = 8 string theory. These provide an infinite family of potential landscapes arising in the microscopic descriptions of corresponding black holes. The counting of minima amounts to black hole microstate counting. Moreover, the exact numbers of the minima for these landscapes are a priori known from dualities in string theory. Some of the minima are connected by paths of low loss values, resembling mode connectivity. We estimate the number of runs needed to find all the solutions. Initial explorations suggest that Stochastic Gradient Descent can find a significant fraction of the minima.
Toward Formal Data Set Verification for Building Effective Machine Learning Models
In order to properly train a machine learning model, data must be properly collected. To guarantee a proper data collection, verifying that the collected data set holds certain properties is a possible solution. For example, guaranteeing that the data set contains samples across the whole input space, or that the data set is balanced w.r.t. different classes. We present a formal approach for verifying a set of arbitrarily stated properties over a data set. The proposed approach relies on the transformation of the data set into a first order logic formula, which can be later verified w.r.t. the different properties also stated in the same logic. A prototype tool, which uses the z3 solver, has been developed; the prototype can take as an input a set of properties stated in a formal language and formally verify a given data set w.r.t. to the given set of properties. Preliminary experimental results show the feasibility and performance of the proposed approach, and furthermore the flexibility for expressing properties of interest.
Idempotent Generative Network
We propose a new approach for generative modeling based on training a neural network to be idempotent. An idempotent operator is one that can be applied sequentially without changing the result beyond the initial application, namely f(f(z))=f(z). The proposed model f is trained to map a source distribution (e.g, Gaussian noise) to a target distribution (e.g. realistic images) using the following objectives: (1) Instances from the target distribution should map to themselves, namely f(x)=x. We define the target manifold as the set of all instances that f maps to themselves. (2) Instances that form the source distribution should map onto the defined target manifold. This is achieved by optimizing the idempotence term, f(f(z))=f(z) which encourages the range of f(z) to be on the target manifold. Under ideal assumptions such a process provably converges to the target distribution. This strategy results in a model capable of generating an output in one step, maintaining a consistent latent space, while also allowing sequential applications for refinement. Additionally, we find that by processing inputs from both target and source distributions, the model adeptly projects corrupted or modified data back to the target manifold. This work is a first step towards a ``global projector'' that enables projecting any input into a target data distribution.
Understanding Emergent Abilities of Language Models from the Loss Perspective
Recent studies have put into question the belief that emergent abilities in language models are exclusive to large models. This skepticism arises from two observations: 1) smaller models can also exhibit high performance on emergent abilities and 2) there is doubt on the discontinuous metrics used to measure these abilities. In this paper, we propose to study emergent abilities in the lens of pre-training loss, instead of model size or training compute. We demonstrate that the models with the same pre-training loss, but different model and data sizes, generate the same performance on various downstream tasks. We also discover that a model exhibits emergent abilities on certain tasks -- regardless of the continuity of metrics -- when its pre-training loss falls below a specific threshold. Before reaching this threshold, its performance remains at the level of random guessing. This inspires us to redefine emergent abilities as those that manifest in models with lower pre-training losses, highlighting that these abilities cannot be predicted by merely extrapolating the performance trends of models with higher pre-training losses.
Optimal Control of Coefficients in Parabolic Free Boundary Problems Modeling Laser Ablation
Inverse Stefan problem arising in modeling of laser ablation of biomedical tissues is analyzed, where information on the coefficients, heat flux on the fixed boundary, and density of heat sources are missing and must be found along with the temperature and free boundary. Optimal control framework is employed, where the missing data and the free boundary are components of the control vector, and optimality criteria are based on the final moment measurement of the temperature and position of the free boundary. Discretization by finite differences is pursued, and convergence of the discrete optimal control problems to the original problem is proven.
Generalized chiral instabilities, linking numbers, and non-invertible symmetries
We demonstrate a universal mechanism of a class of instabilities in infrared regions for massless Abelian p-form gauge theories with topological interactions, which we call generalized chiral instabilities. Such instabilities occur in the presence of initial electric fields for the p-form gauge fields. We show that the dynamically generated magnetic fields tend to decrease the initial electric fields and result in configurations with linking numbers, which can be characterized by non-invertible global symmetries. The so-called chiral plasma instability and instabilities of the axion electrodynamics and (4+1)-dimensional Maxwell-Chern-Simons theory in electric fields can be described by the generalized chiral instabilities in a unified manner. We also illustrate this mechanism in the (2+1)-dimensional Goldstone-Maxwell model in electric field.
SQuADDS: A validated design database and simulation workflow for superconducting qubit design
We present an open-source database of superconducting quantum device designs that may be used as the starting point for customized devices. Each design can be generated programmatically using the open-source Qiskit Metal package, and simulated using finite-element electromagnetic solvers. We present a robust workflow for achieving high accuracy on design simulations. Many designs in the database are experimentally validated, showing excellent agreement between simulated and measured parameters. Our database includes a front-end interface that allows users to generate ``best-guess'' designs based on desired circuit parameters. This project lowers the barrier to entry for research groups seeking to make a new class of devices by providing them a well-characterized starting point from which to refine their designs.
Exact Solution of the Frustrated Potts Model with Next-Nearest-Neighbor Interactions in One Dimension: An AI-Aided Discovery
The one-dimensional J_1-J_2 q-state Potts model is solved exactly for arbitrary q, based on using OpenAI's latest reasoning model o3-mini-high to exactly solve the q=3 case. The exact results provide insights to outstanding physical problems such as the stacking of atomic or electronic orders in layered materials and the formation of a T_c-dome-shaped phase often seen in unconventional superconductors. The work is anticipated to fuel both the research in one-dimensional frustrated magnets for recently discovered finite-temperature application potentials and the fast moving topic area of AI for sciences.
Emergent Abilities of Large Language Models
Scaling up language models has been shown to predictably improve performance and sample efficiency on a wide range of downstream tasks. This paper instead discusses an unpredictable phenomenon that we refer to as emergent abilities of large language models. We consider an ability to be emergent if it is not present in smaller models but is present in larger models. Thus, emergent abilities cannot be predicted simply by extrapolating the performance of smaller models. The existence of such emergence implies that additional scaling could further expand the range of capabilities of language models.
Interpretable structural model error discovery from sparse assimilation increments using spectral bias-reduced neural networks: A quasi-geostrophic turbulence test case
Earth system models suffer from various structural and parametric errors in their representation of nonlinear, multi-scale processes, leading to uncertainties in their long-term projections. The effects of many of these errors (particularly those due to fast physics) can be quantified in short-term simulations, e.g., as differences between the predicted and observed states (analysis increments). With the increase in the availability of high-quality observations and simulations, learning nudging from these increments to correct model errors has become an active research area. However, most studies focus on using neural networks, which while powerful, are hard to interpret, are data-hungry, and poorly generalize out-of-distribution. Here, we show the capabilities of Model Error Discovery with Interpretability and Data Assimilation (MEDIDA), a general, data-efficient framework that uses sparsity-promoting equation-discovery techniques to learn model errors from analysis increments. Using two-layer quasi-geostrophic turbulence as the test case, MEDIDA is shown to successfully discover various linear and nonlinear structural/parametric errors when full observations are available. Discovery from spatially sparse observations is found to require highly accurate interpolation schemes. While NNs have shown success as interpolators in recent studies, here, they are found inadequate due to their inability to accurately represent small scales, a phenomenon known as spectral bias. We show that a general remedy, adding a random Fourier feature layer to the NN, resolves this issue enabling MEDIDA to successfully discover model errors from sparse observations. These promising results suggest that with further development, MEDIDA could be scaled up to models of the Earth system and real observations.
Parabolic-elliptic and indirect-direct simplifications in chemotaxis systems driven by indirect signalling
Under relevant biological situations of the signalling process on a much faster time scale compared to the species diffusion and all interactions, we study two singular limits corresponding to varepsilonto 0^+ with a fixed tau>0, and (varepsilon,tau)to (0^+,0^+) arising in the following indirect signalling chemotaxis system with no-flux accross the boundary align* \left\{array{lllllll} \partial_t n=\Delta n-\nabla\cdot(n\nabla c)&in \Omega\times(0,\infty),\\ \varepsilon\partial_t c=\Delta c-c+w&in \Omega\times(0,\infty),\\ \varepsilon\partial_t w=\tau\Delta w-w+n&in \Omega\times(0,\infty),\\ (n,c,w)_{t=0}=(n_0,c_0,w_0)&on \Omega, array\right. align* up to the critical dimension N=4, called parabolic-elliptic and indirect-direct simplifications, respectively. We provide rigorous analysis for these simplifications, including passage to the limits, convergence rate estimates with the initial layer effect, and the convergence to critical manifolds.
A New Approach for Constraining Large-Scale Temperature Fluctuations in the Intergalactic Medium
The reionization of helium is thought to occur at 2.5lesssim zlesssim4, marking the last phase transition and final global heating event of the intergalactic medium (IGM). Since it is driven by rare quasars, helium reionization should give rise to strong temperature fluctuations in the IGM between neutral and recently-ionized regions of order sigma (ln T) sim Delta T/T = 20-50%. We introduce a novel method to search for reionization-induced temperature fluctuations in the IGM by using the effective optical depths of the Lyman-alpha forest towards a large number of background quasars. Higher IGM temperatures give rise to lower effective optical depths in the Lyman-alpha forest, implying that temperature fluctuations will broaden the observed optical depth distribution. We measured the distributions of effective Lyman-alpha forest optical depths across 71 X-Shooter spectra from the XQ-100 survey in four redshift bins from z=3.76 to z=4.19 and compared them to a large-volume cosmological hydrodynamical simulation. A good agreement is found between the observations and the simulation, which does not include temperature fluctuations; therefore, we do not detect a signature of helium reionization. We then post-process the simulations to include an increasing amount of temperature fluctuations until the model becomes inconsistent with the observations. We obtain tight constraints on sigma (ln T) < 0.29 (<0.40) at 2 sigma (3 sigma) at z=3.76 when averaging over scales of 100 comoving Mpc, and weaker constraints for higher redshifts and smaller scales. Our constraints are the tightest to date, and imply that either the IGM temperature contrast caused by helium reionization is less than sim30%, or that the process has not yet significantly started at z=3.76.
Dynamical evolution of massless particles in star clusters with NBODY6++GPU-MASSLESS: I. Free-floating MLPs
Context. Low-mass bodies, such as comets, asteroids, planetesimals, and free-floating planets, are continuously injected into the intra-cluster environment after expulsion from their host planetary systems. These can be modeled as massless particles (MLPs, hereafter). The dynamics of large populations of MLPs, however, has yet received little attention in literature. Aims. We investigate the dynamical evolution of MLP populations in star clusters, and characterize their kinematics and ejection rates. Methods. We present NBODY6++GPU-MASSLESS, a modified version of the N-body simulation code NBODY6++GPU, that allows fast integration of star clusters that contain large numbers of massless particles (MLPs). NBODY6++GPU-MASSLESS contains routines specifically directed at the dynamical evolution of low-mass bodies, such as planets. Results. Unlike stars, MLPs do not participate in the mass segregation process. Instead, MLPs mostly follow the gravitational potential of the star cluster, which gradually decreases over time due to stellar ejections and stellar evolution. The dynamical evolution of MLPs is primarily affected by the evolution of the core of the star cluster. This is most apparent in the outer regions for clusters with higher initial densities. High escape rates of MLPs are observed before the core-collapse, after which escape rates remain stable. Denser star clusters undergo a more intense core collapse, but this does not impact the dynamical evolution of MLPs. The speeds of escaping stars are similar to those of escaping MLPs, when disregarding the high-velocity ejections of neutron stars during the first 50 Myr.
An Old-Fashioned Framework for Machine Learning in Turbulence Modeling
The objective is to provide clear and well-motivated guidance to Machine Learning (ML) teams, founded on our experience in empirical turbulence modeling. Guidance is also needed for modeling outside ML. ML is not yet successful in turbulence modeling, and many papers have produced unusable proposals either due to errors in math or physics, or to severe overfitting. We believe that "Turbulence Culture" (TC) takes years to learn and is difficult to convey especially considering the modern lack of time for careful study; important facts which are self-evident after a career in turbulence research and modeling and extensive reading are easy to miss. In addition, many of them are not absolute facts, a consequence of the gaps in our understanding of turbulence and the weak connection of models to first principles. Some of the mathematical facts are rigorous, but the physical aspects often are not. Turbulence models are surprisingly arbitrary. Disagreement between experts confuses the new entrants. In addition, several key properties of the models are ascertained through non-trivial analytical properties of the differential equations, which puts them out of reach of purely data-driven ML-type approaches. The best example is the crucial behavior of the model at the edge of the turbulent region (ETR). The knowledge we wish to put out here may be divided into "Mission" and "Requirements," each combining physics and mathematics. Clear lists of "Hard" and "Soft" constraints are presented. A concrete example of how DNS data could be used, possibly allied with ML, is first carried through and illustrates the large number of decisions needed. Our focus is on creating effective products which will empower CFD, rather than on publications.
Learning to Relax: Setting Solver Parameters Across a Sequence of Linear System Instances
Solving a linear system Ax=b is a fundamental scientific computing primitive for which numerous solvers and preconditioners have been developed. These come with parameters whose optimal values depend on the system being solved and are often impossible or too expensive to identify; thus in practice sub-optimal heuristics are used. We consider the common setting in which many related linear systems need to be solved, e.g. during a single numerical simulation. In this scenario, can we sequentially choose parameters that attain a near-optimal overall number of iterations, without extra matrix computations? We answer in the affirmative for Successive Over-Relaxation (SOR), a standard solver whose parameter omega has a strong impact on its runtime. For this method, we prove that a bandit online learning algorithm--using only the number of iterations as feedback--can select parameters for a sequence of instances such that the overall cost approaches that of the best fixed omega as the sequence length increases. Furthermore, when given additional structural information, we show that a contextual bandit method asymptotically achieves the performance of the instance-optimal policy, which selects the best omega for each instance. Our work provides the first learning-theoretic treatment of high-precision linear system solvers and the first end-to-end guarantees for data-driven scientific computing, demonstrating theoretically the potential to speed up numerical methods using well-understood learning algorithms.
Gas dynamics around a Jupiter mass planet: II. Chemical evolution of circumplanetary material
In an ongoing effort to understand planet formation the link between the chemistry of the protoplanetary disk and the properties of resulting planets have long been a subject of interest. These connections have generally been made between mature planets and young protoplanetary disks through the carbon-to-oxygen (C/O) ratio. In a rare number of systems, young protoplanets have been found within their natal protoplanetary disks. These systems offer a unique opportunity to directly study the delivery of gas from the protoplanetary disk to the planet. In this work we post-process 3D numerical simulations of an embedded Jupiter-massed planet in its protoplanetary disk to explore the chemical evolution of gas as it flows from the disk to the planet. The relevant dust to this chemical evolution is assumed to be small, co-moving grains with a reduced dust-to-gas ratio indicative of the upper atmosphere of a protoplanetary disk. We find that as the gas enters deep into the planet's gravitational well, it warms significantly (up to sim 800 K), releasing all of the volatile content from the ice phase. This change in phase can influence our understanding of the delivery of volatile species to the atmospheres of giant planets. The primary carbon, oxygen, and sulfur carrying ices: CO_2, H_2O, and H_2S are released into the gas phase and along with the warm gas temperatures near the embedded planets lead to the production of unique species like CS, SO, and SO_2 compared to the protoplanetary disk. We compute the column densities of SO, SO_2, CS, and H_2CS in our model and find that their values are consistent with previous observational studies.
Constructor Theory of Information
We present a theory of information expressed solely in terms of which transformations of physical systems are possible and which are impossible - i.e. in constructor-theoretic terms. Although it includes conjectured laws of physics that are directly about information, independently of the details of particular physical instantiations, it does not regard information as an a priori mathematical or logical concept, but as something whose nature and properties are determined by the laws of physics alone. It does not suffer from the circularity at the foundations of existing information theory (namely that information and distinguishability are each defined in terms of the other). It explains the relationship between classical and quantum information, and reveals the single, constructor-theoretic property underlying the most distinctive phenomena associated with the latter, including the lack of in-principle distinguishability of some states, the impossibility of cloning, the existence of pairs of variables that cannot simultaneously have sharp values, the fact that measurement processes can be both deterministic and unpredictable, the irreducible perturbation caused by measurement, and entanglement (locally inaccessible information).
Meta Learning of Interface Conditions for Multi-Domain Physics-Informed Neural Networks
Physics-informed neural networks (PINNs) are emerging as popular mesh-free solvers for partial differential equations (PDEs). Recent extensions decompose the domain, applying different PINNs to solve the equation in each subdomain and aligning the solution at the interface of the subdomains. Hence, they can further alleviate the problem complexity, reduce the computational cost, and allow parallelization. However, the performance of the multi-domain PINNs is sensitive to the choice of the interface conditions for solution alignment. While quite a few conditions have been proposed, there is no suggestion about how to select the conditions according to specific problems. To address this gap, we propose META Learning of Interface Conditions (METALIC), a simple, efficient yet powerful approach to dynamically determine the optimal interface conditions for solving a family of parametric PDEs. Specifically, we develop two contextual multi-arm bandit models. The first one applies to the entire training procedure, and online updates a Gaussian process (GP) reward surrogate that given the PDE parameters and interface conditions predicts the solution error. The second one partitions the training into two stages, one is the stochastic phase and the other deterministic phase; we update a GP surrogate for each phase to enable different condition selections at the two stages so as to further bolster the flexibility and performance. We have shown the advantage of METALIC on four bench-mark PDE families.
Download by Parachute: Retrieval of Assets from High Altitude Balloons
We present a publicly-available toolkit of flight-proven hardware and software to retrieve 5 TB of data or small physical samples from a stratospheric balloon platform. Before launch, a capsule is attached to the balloon, and rises with it. Upon remote command, the capsule is released and descends via parachute, continuously transmitting its location. Software to predict the trajectory can be used to select a safe but accessible landing site. We dropped two such capsules from the SuperBIT telescope, in September 2019. The capsules took ~37 minutes to descend from ~30 km altitude. They drifted 32 km and 19 km horizontally, but landed within 300 m and 600 m of their predicted landing sites. We found them easily, and successfully recovered the data. We welcome interest from other balloon teams for whom the technology would be useful.
PROSE-FD: A Multimodal PDE Foundation Model for Learning Multiple Operators for Forecasting Fluid Dynamics
We propose PROSE-FD, a zero-shot multimodal PDE foundational model for simultaneous prediction of heterogeneous two-dimensional physical systems related to distinct fluid dynamics settings. These systems include shallow water equations and the Navier-Stokes equations with incompressible and compressible flow, regular and complex geometries, and different buoyancy settings. This work presents a new transformer-based multi-operator learning approach that fuses symbolic information to perform operator-based data prediction, i.e. non-autoregressive. By incorporating multiple modalities in the inputs, the PDE foundation model builds in a pathway for including mathematical descriptions of the physical behavior. We pre-train our foundation model on 6 parametric families of equations collected from 13 datasets, including over 60K trajectories. Our model outperforms popular operator learning, computer vision, and multi-physics models, in benchmark forward prediction tasks. We test our architecture choices with ablation studies.
Response: Emergent analogical reasoning in large language models
In their recent Nature Human Behaviour paper, "Emergent analogical reasoning in large language models," (Webb, Holyoak, and Lu, 2023) the authors argue that "large language models such as GPT-3 have acquired an emergent ability to find zero-shot solutions to a broad range of analogy problems." In this response, we provide counterexamples of the letter string analogies. In our tests, GPT-3 fails to solve even the easiest variants of the problems presented in the original paper. Zero-shot reasoning is an extraordinary claim that requires extraordinary evidence. We do not see that evidence in our experiments. To strengthen claims of humanlike reasoning such as zero-shot reasoning, it is important that the field develop approaches that rule out data memorization.
Why is AI hard and Physics simple?
We discuss why AI is hard and why physics is simple. We discuss how physical intuition and the approach of theoretical physics can be brought to bear on the field of artificial intelligence and specifically machine learning. We suggest that the underlying project of machine learning and the underlying project of physics are strongly coupled through the principle of sparsity, and we call upon theoretical physicists to work on AI as physicists. As a first step in that direction, we discuss an upcoming book on the principles of deep learning theory that attempts to realize this approach.
Zero Temperature Limit of Holographic Superconductors
We consider holographic superconductors whose bulk description consists of gravity minimally coupled to a Maxwell field and charged scalar field with general potential. We give an analytic argument that there is no "hard gap": the real part of the conductivity at low frequency remains nonzero (although typically exponentially small) even at zero temperature. We also numerically construct the gravitational dual of the ground state of some holographic superconductors. Depending on the charge and dimension of the condensate, the infrared theory can have emergent conformal or just Poincare symmetry. In all cases studied, the area of the horizon of the dual black hole goes to zero in the extremal limit, consistent with a nondegenerate ground state.
Ensemble Kalman Diffusion Guidance: A Derivative-free Method for Inverse Problems
When solving inverse problems, it is increasingly popular to use pre-trained diffusion models as plug-and-play priors. This framework can accommodate different forward models without re-training while preserving the generative capability of diffusion models. Despite their success in many imaging inverse problems, most existing methods rely on privileged information such as derivative, pseudo-inverse, or full knowledge about the forward model. This reliance poses a substantial limitation that restricts their use in a wide range of problems where such information is unavailable, such as in many scientific applications. To address this issue, we propose Ensemble Kalman Diffusion Guidance (EnKG) for diffusion models, a derivative-free approach that can solve inverse problems by only accessing forward model evaluations and a pre-trained diffusion model prior. We study the empirical effectiveness of our method across various inverse problems, including scientific settings such as inferring fluid flows and astronomical objects, which are highly non-linear inverse problems that often only permit black-box access to the forward model.
Prompt Design and Engineering: Introduction and Advanced Methods
Prompt design and engineering has become an important discipline in just the past few months. In this paper, we provide an introduction to the main concepts and design approaches. We also provide more advanced techniques all the way to those needed to design LLM-based agents. We finish by providing a list of existing tools for prompt engineering.
Small-scale proxies for large-scale Transformer training instabilities
Teams that have trained large Transformer-based models have reported training instabilities at large scale that did not appear when training with the same hyperparameters at smaller scales. Although the causes of such instabilities are of scientific interest, the amount of resources required to reproduce them has made investigation difficult. In this work, we seek ways to reproduce and study training stability and instability at smaller scales. First, we focus on two sources of training instability described in previous work: the growth of logits in attention layers (Dehghani et al., 2023) and divergence of the output logits from the log probabilities (Chowdhery et al., 2022). By measuring the relationship between learning rate and loss across scales, we show that these instabilities also appear in small models when training at high learning rates, and that mitigations previously employed at large scales are equally effective in this regime. This prompts us to investigate the extent to which other known optimizer and model interventions influence the sensitivity of the final loss to changes in the learning rate. To this end, we study methods such as warm-up, weight decay, and the muParam (Yang et al., 2022), and combine techniques to train small models that achieve similar losses across orders of magnitude of learning rate variation. Finally, to conclude our exploration we study two cases where instabilities can be predicted before they emerge by examining the scaling behavior of model activation and gradient norms.
Strategy Proof Mechanisms for Facility Location in Euclidean and Manhattan Space
We study the impact on mechanisms for facility location of moving from one dimension to two (or more) dimensions and Euclidean or Manhattan distances. We consider three fundamental axiomatic properties: anonymity which is a basic fairness property, Pareto optimality which is one of the most important efficiency properties, and strategy proofness which ensures agents do not have an incentive to mis-report. We also consider how well such mechanisms can approximate the optimal welfare. Our results are somewhat negative. Moving from one dimension to two (or more) dimensions often makes these axiomatic properties more difficult to achieve. For example, with two facilities in Euclidean space or with just a single facility in Manhattan space, no mechanism is anonymous, Pareto optimal and strategy proof. By contrast, mechanisms on the line exist with all three properties.We also show that approximation ratios may increase when moving to two (or more) dimensions. All our impossibility results are minimal. If we drop one of the three axioms (anonymity, Pareto optimality or strategy proofness) multiple mechanisms satisfy the other two axioms.
Stochastic acceleration in arbitrary astrophysical environments
Turbulent magnetic fields are to some extent a universal feature in astrophysical phenomena. Charged particles that encounter these turbulence get on average accelerated according to the so-called second-order Fermi process. However, in most astrophysical environments there are additional competing processes, such as different kinds of first-order energy changes and particle escape, that effect the resulting momentum distribution of the particles. In this work we provide to our knowledge the first semi-analytical solution of the isotropic steady-state momentum diffusion equation including continuous and catastrophic momentum changes that can be applied to any arbitrary astrophysical system of interest. Here, we adopt that the assigned magnetic turbulence is constrained on a finite range and the particle flux vanishes beyond these boundaries. Consequently, we show that the so-called pile-up bump -- that has for some special cases long been established -- is a universal feature of stochastic acceleration that emerges around the momentum chi_{rm eq} where acceleration and continuous loss are in equilibrium if the particle's residence time in the system is sufficient at chi_{rm eq}. In general, the impact of continuous and catastrophic momentum changes plays a crucial role in the shape of the steady-state momentum distribution of the accelerated particles, where simplified unbroken power-law approximations are often not adequate.
Thermodynamics and bulk viscosity of approximate black hole duals to finite temperature quantum chromodynamics
We consider classes of translationally invariant black hole solutions whose equations of state closely resemble that of QCD at zero chemical potential. We use these backgrounds to compute the ratio zeta/s of bulk viscosity to entropy density. For a class of black holes that exhibits a first order transition, we observe a sharp rise in zeta/s near T_c. For constructions that exhibit a smooth cross-over, like QCD does, the rise in zeta/s is more modest. We conjecture that divergences in zeta/s for black hole horizons are related to extrema of the entropy density as a function of temperature.
Learning How to Ask: Querying LMs with Mixtures of Soft Prompts
Natural-language prompts have recently been used to coax pretrained language models into performing other AI tasks, using a fill-in-the-blank paradigm (Petroni et al., 2019) or a few-shot extrapolation paradigm (Brown et al., 2020). For example, language models retain factual knowledge from their training corpora that can be extracted by asking them to "fill in the blank" in a sentential prompt. However, where does this prompt come from? We explore the idea of learning prompts by gradient descent -- either fine-tuning prompts taken from previous work, or starting from random initialization. Our prompts consist of "soft words," i.e., continuous vectors that are not necessarily word type embeddings from the language model. Furthermore, for each task, we optimize a mixture of prompts, learning which prompts are most effective and how to ensemble them. Across multiple English LMs and tasks, our approach hugely outperforms previous methods, showing that the implicit factual knowledge in language models was previously underestimated. Moreover, this knowledge is cheap to elicit: random initialization is nearly as good as informed initialization.
Online Estimation of SAT Solving Runtime
We present an online method for estimating the cost of solving SAT problems. Modern SAT solvers present several challenges to estimate search cost including non-chronological backtracking, learning and restarts. Our method uses a linear model trained on data gathered at the start of search. We show the effectiveness of this method using random and structured problems. We demonstrate that predictions made in early restarts can be used to improve later predictions. We also show that we can use such cost estimations to select a solver from a portfolio.
Coherent Structures Governing Transport at Turbulent Interfaces
In an experiment on a turbulent jet, we detect interfacial turbulent layers in a frame that moves, on average, along with the \tnti. This significantly prolongs the observation time of scalar and velocity structures and enables the measurement of two types of Lagrangian coherent structures. One structure, the finite-time Lyapunov field (FTLE), quantifies advective transport barriers of fluid parcels while the other structure highlights barriers of diffusive momentum transport. These two complementary structures depend on large-scale and small-scale motion and are therefore associated with the growth of the turbulent region through engulfment or nibbling, respectively. We detect the \tnti\ from cluster analysis, where we divide the measured scalar field into four clusters. Not only the \tnti\ can be found this way, but also the next, internal, turbulent-turbulent interface. Conditional averages show that these interfaces are correlated with barriers of advective and diffusive transport when the Lagrangian integration time is smaller than the integral time scale. Diffusive structures decorrelate faster since they have a smaller timescale. Conditional averages of these structures at internal turbulent-turbulent interfaces show the same pattern with a more pronounced jump at the interface indicative of a shear layer. This is quite an unexpected outcome, as the internal interface is now defined not by the presence or absence of vorticity, but by conditional vorticity corresponding to two uniform concentration zones. The long-time diffusive momentum flux along Lagrangian paths represents the growth of the turbulent flow into the irrotational domain, a direct demonstration of nibbling. The diffusive flux parallel to the \tnti\ appears to be concentrated in a diffusive superlayer whose width is comparable with the Taylor microscale, which is relatively invariant in time.
On Preemption and Learning in Stochastic Scheduling
We study single-machine scheduling of jobs, each belonging to a job type that determines its duration distribution. We start by analyzing the scenario where the type characteristics are known and then move to two learning scenarios where the types are unknown: non-preemptive problems, where each started job must be completed before moving to another job; and preemptive problems, where job execution can be paused in the favor of moving to a different job. In both cases, we design algorithms that achieve sublinear excess cost, compared to the performance with known types, and prove lower bounds for the non-preemptive case. Notably, we demonstrate, both theoretically and through simulations, how preemptive algorithms can greatly outperform non-preemptive ones when the durations of different job types are far from one another, a phenomenon that does not occur when the type durations are known.
Chemical Heredity as Group Selection at the Molecular Level
Many examples of cooperation exist in biology. In chemical systems however, which can sometimes be quite complex, we do not appear to observe intricate cooperative interactions. A key question for the origin of life, is then how can molecular cooperation first arise in an abiotic system prior to the emergence of biological replication. We postulate that selection at the molecular level is a driving force behind the complexification of chemical systems, particularly during the origins of life. In the theory of multilevel selection the two selective forces are: within-group and between-group, where the former tends to favor "selfish" replication of individuals and the latter favor cooperation between individuals enhancing the replication of the group as a whole. These forces can be quantified using the Price equation, which is a standard tool used in evolutionary biology to quantify evolutionary change. Our central claim is that replication and heredity in chemical systems are subject to selection, and quantifiable using the multilevel Price equation. We demonstrate this using the Graded Autocatalysis Replication Domain computer model, describing simple protocell composed out of molecules and its replication, which respectively analogue to the group and the individuals. In contrast to previous treatments of this model, we treat the lipid molecules themselves as replicating individuals and the protocells they form as groups of individuals. Our goal is to demonstrate how evolutionary biology tools and concepts can be applied in chemistry and we suggest that molecular cooperation may arise as a result of group selection. Further, the biological relation of parent-progeny is proposed to be analogue to the reactant-product relation in chemistry, thus allowing for tools from evolutionary biology to be applied to chemistry and would deepen the connection between chemistry and biology.
Dissecting the Effects of SGD Noise in Distinct Regimes of Deep Learning
Understanding when the noise in stochastic gradient descent (SGD) affects generalization of deep neural networks remains a challenge, complicated by the fact that networks can operate in distinct training regimes. Here we study how the magnitude of this noise T affects performance as the size of the training set P and the scale of initialization alpha are varied. For gradient descent, alpha is a key parameter that controls if the network is `lazy'(alphagg1) or instead learns features (alphall1). For classification of MNIST and CIFAR10 images, our central results are: (i) obtaining phase diagrams for performance in the (alpha,T) plane. They show that SGD noise can be detrimental or instead useful depending on the training regime. Moreover, although increasing T or decreasing alpha both allow the net to escape the lazy regime, these changes can have opposite effects on performance. (ii) Most importantly, we find that the characteristic temperature T_c where the noise of SGD starts affecting the trained model (and eventually performance) is a power law of P. We relate this finding with the observation that key dynamical quantities, such as the total variation of weights during training, depend on both T and P as power laws. These results indicate that a key effect of SGD noise occurs late in training by affecting the stopping process whereby all data are fitted. Indeed, we argue that due to SGD noise, nets must develop a stronger `signal', i.e. larger informative weights, to fit the data, leading to a longer training time. A stronger signal and a longer training time are also required when the size of the training set P increases. We confirm these views in the perceptron model, where signal and noise can be precisely measured. Interestingly, exponents characterizing the effect of SGD depend on the density of data near the decision boundary, as we explain.
Thermodynamics of black holes featuring primary scalar hair
In this work, we embark on the thermodynamic investigation concerning a family of primary charged black holes within the context of shift and parity symmetric Beyond Horndeski gravity. Employing the Euclidean approach, we derive the functional expression for the free energy and derive the first thermodynamic law, offering a methodology to address the challenge of extracting the thermal quantities in shift-symmetric scalar tensor theories characterized by linear time dependence in the scalar field. Following the formal analysis, we provide some illustrative examples focusing on the thermal evaporation of these fascinating objects.
Tides on Lava Worlds: Application to Close-in Exoplanets and the Early Earth-Moon System
Understanding the physics of planetary magma oceans has been the subject of growing efforts, in light of the increasing abundance of Solar system samples and extrasolar surveys. A rocky planet harboring such an ocean is likely to interact tidally with its host star, planetary companions, or satellites. To date, however, models of the tidal response and heat generation of magma oceans have been restricted to the framework of weakly viscous solids, ignoring the dynamical fluid behavior of the ocean beyond a critical melt fraction. Here we provide a handy analytical model that accommodates this phase transition, allowing for a physical estimation of the tidal response of lava worlds. We apply the model in two settings: The tidal history of the early Earth-Moon system in the aftermath of the giant impact; and the tidal interplay between short-period exoplanets and their host stars. For the former, we show that the fluid behavior of the Earth's molten surface drives efficient early Lunar recession to {sim} 25 Earth radii within 10^4{-} 10^5 years, in contrast with earlier predictions. For close-in exoplanets, we report on how their molten surfaces significantly change their spin-orbit dynamics, allowing them to evade spin-orbit resonances and accelerating their track towards tidal synchronization from a Gyr to Myr timescale. Moreover, we re-evaluate the energy budgets of detected close-in exoplanets, highlighting how the surface thermodynamics of these planets are likely controlled by enhanced, fluid-driven tidal heating, rather than vigorous insolation, and how this regime change substantially alters predictions for their surface temperatures.
Faster Convergence of Stochastic Accelerated Gradient Descent under Interpolation
We prove new convergence rates for a generalized version of stochastic Nesterov acceleration under interpolation conditions. Unlike previous analyses, our approach accelerates any stochastic gradient method which makes sufficient progress in expectation. The proof, which proceeds using the estimating sequences framework, applies to both convex and strongly convex functions and is easily specialized to accelerated SGD under the strong growth condition. In this special case, our analysis reduces the dependence on the strong growth constant from rho to rho as compared to prior work. This improvement is comparable to a square-root of the condition number in the worst case and address criticism that guarantees for stochastic acceleration could be worse than those for SGD.
Template estimation in computational anatomy: Fréchet means in top and quotient spaces are not consistent
In this article, we study the consistency of the template estimation with the Fr\'echet mean in quotient spaces. The Fr\'echet mean in quotient spaces is often used when the observations are deformed or transformed by a group action. We show that in most cases this estimator is actually inconsistent. We exhibit a sufficient condition for this inconsistency, which amounts to the folding of the distribution of the noisy template when it is projected to the quotient space. This condition appears to be fulfilled as soon as the support of the noise is large enough. To quantify this inconsistency we provide lower and upper bounds of the bias as a function of the variability (the noise level). This shows that the consistency bias cannot be neglected when the variability increases.
Serverless Cold Starts and Where to Find Them
This paper releases and analyzes a month-long trace of 85 billion user requests and 11.9 million cold starts from Huawei's serverless cloud platform. Our analysis spans workloads from five data centers. We focus on cold starts and provide a comprehensive examination of the underlying factors influencing the number and duration of cold starts. These factors include trigger types, request synchronicity, runtime languages, and function resource allocations. We investigate components of cold starts, including pod allocation time, code and dependency deployment time, and scheduling delays, and examine their relationships with runtime languages, trigger types, and resource allocation. We introduce pod utility ratio to measure the pod's useful lifetime relative to its cold start time, giving a more complete picture of cold starts, and see that some pods with long cold start times have longer useful lifetimes. Our findings reveal the complexity and multifaceted origins of the number, duration, and characteristics of cold starts, driven by differences in trigger types, runtime languages, and function resource allocations. For example, cold starts in Region 1 take up to 7 seconds, dominated by dependency deployment time and scheduling. In Region 2, cold starts take up to 3 seconds and are dominated by pod allocation time. Based on this, we identify opportunities to reduce the number and duration of cold starts using strategies for multi-region scheduling. Finally, we suggest directions for future research to address these challenges and enhance the performance of serverless cloud platforms. Our datasets and code are available here https://github.com/sir-lab/data-release
A Single Goal is All You Need: Skills and Exploration Emerge from Contrastive RL without Rewards, Demonstrations, or Subgoals
In this paper, we present empirical evidence of skills and directed exploration emerging from a simple RL algorithm long before any successful trials are observed. For example, in a manipulation task, the agent is given a single observation of the goal state and learns skills, first for moving its end-effector, then for pushing the block, and finally for picking up and placing the block. These skills emerge before the agent has ever successfully placed the block at the goal location and without the aid of any reward functions, demonstrations, or manually-specified distance metrics. Once the agent has learned to reach the goal state reliably, exploration is reduced. Implementing our method involves a simple modification of prior work and does not require density estimates, ensembles, or any additional hyperparameters. Intuitively, the proposed method seems like it should be terrible at exploration, and we lack a clear theoretical understanding of why it works so effectively, though our experiments provide some hints.
Optimal design of plane elastic membranes using the convexified Föppl's model
This work puts forth a new optimal design formulation for planar elastic membranes. The goal is to minimize the membrane's compliance through choosing the material distribution described by a positive Radon measure. The deformation of the membrane itself is governed by the convexified F\"{o}ppl's model. The uniqueness of this model lies in the convexity of its variational formulation despite the inherent nonlinearity of the strain-displacement relation. It makes it possible to rewrite the optimization problem as a pair of mutually dual convex variational problems. In the primal problem a linear functional is maximized with respect to displacement functions while enforcing that point-wisely the strain lies in an unbounded closed convex set. The dual problem consists in finding equilibrated stresses that are to minimize a convex integral functional of linear growth defined on the space of Radon measures. The pair of problems is analysed: existence and regularity results are provided, together with the system of optimality criteria. To demonstrate the computational potential of the pair, a finite element scheme is developed around it. Upon reformulation to a conic-quadratic & semi-definite programming problem, the method is employed to produce numerical simulations for several load case scenarios.