Recent posts:
-
Can LLMs Learn to Reason End-to-End?
I was recently thinking about reasoning models and why RL methods like GRPO have become so prominent in that context. Previously, people were finetuning LLMs on step-by-step reasoning traces, yet this can hardly be called "learning to reason", as it's rather about learning to reproduce the reasoning patterns of some other expert. With RL for LLMs the paradigm has shifted and now the model can, in principle, discover the right reasoning patterns for the task. Yet, there are still lots of nuances. Here we consider a simple question: can LLMs learn the best reasoning patterns in an end-to-end manner? By understanding this we'll see precisely how different model designs and training paradigms fit together, like pieces of a puzzle.
-
Differentiable Simulation for Search
Differentiable simulation is the concept of modeling an environment's dynamics as a differentiable function. This immediately allows us to embed the environment's rollout into the computation graph of another module, allowing end-to-end differentiable training. This property has found applications in many domains. In physics, robotics, and autonomous vehicles, the dynamics often represent kinematic equations and laws of motion or energy transfer. In graphics, it is often the rendering process that is differentiable. There, differentiable rendering is at the heart of techniques like neural radiance fields and Gaussian splatting. We've previously explored how differentiable simulation can be used for control and world modeling, yet one key aspect is missing to complete the trilogy and that is test-time search. Here we'll see how differentiable simulation can be used to design efficient fast procedures that search for the best action sequence at test time.
-
Episodes From The Land of Ice and Fire
We've always wanted to do a trip in Iceland. We've talked about it for at least 5 years now. Plans got serious last year, but we didn't manage to get organized in time. This year it finally happened, and it was spectacular. Definitely a lifetime achievement to remember. How can this little island in the middle of the ocean pack so much scenery, biodiversity, and grandeur into it?
-
Options Pricing
A European call option is a contract that provides its holder the right, but not the obligation, to buy a specific quantity of an underlying asset at a pre-specified strike price on a specified future date. Thus, if you believe that, say, crude oil will increase in price next summer, you can buy a call option from the seller. If the oil price next summer does skyrocket, you can exercise the option and buy the pre-specified amount at the strike price, which is lower than the actual one. If instead, the oil price plummets, you don't have to exercise the option. You'll only lose the price you paid for the option. So, effectively, by buying an option you are buying, literally, an option, an opportunity for a better transaction. Yet, even this opportunity has to have a market price, given that somebody is willing to sell it or buy it. What should it be? How are options priced?
-
How To Intercept a Missile
Today, missile guidance sits at the crossroads of classical control theory, real-time signal processing and modern AI-driven state estimation. Whether you’re tuning fins on a supersonic interceptor or programming a drone swarm to shadow evasive targets, the same principles of geometry, feedback and computational efficiency govern success under extreme time pressure. Let's explore the basics of this hugely important topic.
Every subset of less than half the total number of vertices has a proportionally large boundary of edges.