PAI Course Notes: Probabilistic AI and Uncertainty

🧠 Course Information

Course: Probabilistic Artificial Intelligence (PAI)
Semester: Fall 2025
University: ETH Zurich
Status: In Progress

PAI Core Principle: A key aspect of intelligence is not to only make decisions, but reason about the uncertainty of these decisions, and to consider this uncertainty when making decisions. This is what PAI is about.

Welcome to my deep dive into Probabilistic Artificial Intelligence! This course is fundamentally changing how I think about machine learning and AI systems. It's not just about making machines smart, but about making them humble: systems that know what they don't know, and act cautiously when uncertainty is high.

📚 Part I: Probabilistic Approaches to Machine Learning

The first part of the program covers probabilistic approaches to machine learning. We exploit the crucial difference between two types of uncertainty:

Epistemic uncertainty - due to lack of data (reducible)
Aleatoric uncertainty - irreducible noise from observations and outcomes

Then we discuss concrete approaches toward probabilistic inference, including some fascinating methods:

🔍 1. Bayesian Linear Regression

Think of it as linear regression with a twist: instead of just one line through the data, we put a probability distribution over all possible lines, updating our belief as we see more points. The result is not just a prediction, but a quantified uncertainty around that prediction.

📈 2. Gaussian Process Models

A Gaussian process is like saying: "I don't know the exact function, but I'll assume smoothness and let the data tell me the shape." It's a powerful non-parametric model that gives both predictions and confidence intervals in a mathematically elegant way.

🧠 3. Bayesian Neural Networks

Neural networks, but probabilistic: instead of fixed weights, we learn distributions over weights. This lets the network say "I'm confident here" or "I'm uncertain there", which is critical for safe decision-making in the real world.

🎯 Part II: Uncertainty in Sequential Tasks

The second part is about uncertainty in sequential tasks. We consider active learning and optimization approaches that actively collect data by proposing experiments that are informative for reducing epistemic uncertainty.

🔬 4. Bayesian Optimization

When experiments are expensive (think: running clinical trials or tuning giant ML models), Bayesian optimization helps choose the next experiment to run by balancing exploration and exploitation. It actively seeks the most informative data points to shrink uncertainty fastest.

💡 How Do We Know Which Experiments Are Informative?

Great question! We measure informativeness through epistemic uncertainty — the uncertainty that comes from not having enough data. By asking "where am I most ignorant?", the algorithm proposes experiments that are maximally clarifying.

When we say an algorithm knows which experiments are informative, we're talking about information-theoretic criteria that measure how much an experiment would reduce epistemic uncertainty:

Posterior variance: In Gaussian processes, uncertainty at a point is quantified by the posterior variance. The algorithm chooses new inputs where this variance is largest.
Expected information gain: "If I ran this experiment, how much would it shrink my uncertainty about the model?" This is formalized as maximizing the expected reduction in entropy.
Acquisition functions: Functions like Upper Confidence Bound (UCB) or Expected Improvement (EI) trade off exploration and exploitation:
- UCB: "Try points with high predicted value plus high uncertainty."
- EI: "Try points likely to improve upon the best outcome so far."

🤖 Reinforcement Learning & MDPs

Then we cover reinforcement learning (RL), a rich formalism for modeling agents that learn to act in uncertain environments.

🎲 5. Markov Decision Process (MDP)

The MDP is the mathematical backbone of reinforcement learning. It models the world as states, actions, and rewards, capturing the idea that decisions today affect both what you see tomorrow and what long-term payoff you'll get.

🧠 6. RL with Neural Network Approximations

Modern RL uses deep neural networks to approximate value functions or policies in huge state spaces. This gives us "deep RL" — the engine behind agents that can play Go, control robots, or learn strategies in complex, high-dimensional environments.

🛡️ Model-Based RL and Safety

We close by discussing modern approaches in model-based RL, which use epistemic and aleatoric uncertainty to guide exploration, while also reasoning about safety.

The Big Picture: Probabilistic AI is not just about making machines smart, but about making them humble: systems that know what they don't know, and act cautiously when uncertainty is high. That's the difference between a reckless model and a trustworthy intelligent agent.

🔮 Why This Matters

In a world where AI systems are making increasingly important decisions — from medical diagnoses to autonomous vehicle control — the ability to quantify and reason about uncertainty isn't just mathematically elegant, it's ethically essential.

PAI gives us the tools to build AI systems that:

🎯 Make better decisions under uncertainty
🔍 Know when to ask for more data
🛡️ Fail safely when confidence is low
🧠 Learn more efficiently by being strategic about what to explore

This course is reshaping how I think about intelligence itself — not just as the ability to be right, but as the wisdom to know when you might be wrong.