Recent posts:
-
Formalizing Trial And Error
The performance of a predictor trained using supervised learning is bounded by the quality of the data. The performance of a policy trained using reinforcement learning (RL), on the other hand, is practically unbounded. Thus, RL has the potential of reaching superhuman performance in multiple domains. Yet, the fundamental idea of RL is quite primitive and intuitive - just reinforce those behaviours that work and penalize those that don't, this being the very very basic premise of policy gradient methods. Let's explore them a bit.
-
The Open Economy
We've previously explored the basic workings of closed economies. Now it's time to consider different national economies in relation to each other. This will lead us to topics like exchange rates, imports and exports, and currency wars. It is an important topic that provides a convenient frame of reference in which to ground current world developments and historical events. Finally, it holds curious intellectual concepts like the impossible trinity.
-
Learning and Searching
Some ideas are stirring. A few days ago Ilya Sutskever said at Neurips, paraphrasing, that we are running out of high-quality human generated text data with which to train LLMs. As a result, the pre-training paradigm of scaling up model and dataset sizes cannot go on forever. Performance of pure LLMs seems to be plateauing. The scaling hypothesis says that given a suitable architecture, one that sufficiently mixes all the features together, it is only a matter of scaling up training data and model size in order to get arbitrarily good, even superhuman, performance. Is this the case? It's hard to say. But a more important question is: if accuracy plateaus, at what level will it do so? People who argue against the scaling hypothesis believe that intelligence depends on a critical highly-sought algorithm that we need to find. It may not be precisely an architecture, but it is a design that we are currently missing.
-
DINO, CLIP, PaliGemma
Here we take a look at some basic use-cases with DINOv2, Clip, and Google's new visual-language model PaliGemma 2. This is not meant to be a comprehensive overview of vision-language models. We will only dip our toes into this vast topic, which is only becoming more and more prominent in current AI. All these models can be considered foundational.
-
Analytic Policy Gradients
Current methods to learn controllers for autonomous vehicles (AVs) focus on behavioural cloning, where agents learn to mimic the actions obtained from an expert trajectory. Being trained only on exact historic data, the resulting agents often generalize poorly when deployed to novel scenarios. Simulators provide the opportunity to go beyond offline datasets, but they are still treated as complicated black boxes, only used to update the global simulation state. As a result, these RL algorithms are slow, sample-inefficient, and prior-agnostic. Instead, if these simulators are differentiable, they can be included into an end-to-end training loop, turning their hard-coded world dynamics into useful inductive biases for the model. Here we explore the differentiable Waymax simulator and use analytic policy gradients to train AV controllers on the large-scale Waymo Open Motion Dataset (WOMD) [1]. This allows us to learn robust, accurate, and fast policies only using gradient descent.