“The derivative, as this notion appears in the elementary differential calculus, is a familiar mathematical example of a function for which both [the domain and the range] consist of functions. Or, turning to the integral calculus, if in the expression we take the function as independent variable, we are led to a function for which the range of arguments consists of functions and the range of values, of numbers.”
In ancient times, our ancestors used small pebbles to encode and
manipulate abstract quantities according to a small set of rules. In
effect, they were trading mental for mechanical labor. With the advent
of pen and paper, those same rules could be translated into written
symbols and more recently, electrical impulses and transistors.
Today we make the same trade when implementing mathematical algorithms:
one can either spend a great deal of effort manipulating an expression
with pen and paper, then implement the result as a computer program, or
we could implement the rules, then let the computer manipulate the
expression on our behalf. We call this exchange calculus.
Modern calculus is a topic with a variety of applications in classical
and statistical computing. Broadly, it encompasses any formal system for
manipulating symbolic expressions according to a fixed set of rules. We
are specifically interested in its applications to computer programming.
Each week, we will read a paper and hold a short discussion. All are
welcome to attend.
|Feb. 5, 9:30 am||Automatic Differentiation in PyTorch |
-Slides, notebook, picograd
|Feb. 12, 1:00 pm||Programming with Probabilities |
|Feb. 19, 12:00 pm||Introduction to Sequent Calculus |
|Feb. 26, 10:00 am||Quantitative Equational Reasoning |
-Recording, slides, textbook #1, textbook #2
We develop a quantitative analogue of equational reasoning which we call quantitative algebra. We define an equality relation indexed by rationals: a =ε b which we think of as saying that “a is approximately equal to b up to an error of ε”. We have 4 interesting examples where we have a quantitative equational theory whose free algebras correspond to well known structures. The most interesting example comes from the case of the Kantorovich metric. I will present this as a tutorial on equational reasoning.
|Mar. 1 – Mar. 5||Study Break / Période d’activités libres||N/A|
|Mar. 12, 11:00 am||Drasil: Generate all the Things! |
Generative techniques are useful for more than just code generation: many of the artifacts that make up ‘software’ can be generated too. The question then becomes, which ones, what is an adequate ‘source’ language for describing them, and when is such an approach effective. This talk will meander through these questions, giving answers that have emerged from our work on the Drasil system. While far from a silver bullet, an interesting subset of software does emerge that can be effectively attacked in this way.
|Mar. 19, 9:30 am||Automatic Reparameterisation of Probabilistic Programs |
-Recording, slides, Stan tutorial
Markov chain Monte Carlo (MCMC) algorithms can be used to approximate a probability distribution by continuously sampling from it. Some MCMC strategies, such as Hamiltonian Monte Carlo (HMC), use the gradient of the unnormalised density function to increase the quality of the obtained samples and the speed of convergence of the algorithm. But such algorithms can fail, and often silently, if the curvature of this target density function varies. One way to work around this problem is to reparameterise the distribution of interest, meaning to express it in terms of different parameters.
In this talk, I will describe the practical challenges of finding a suitable reparameterisation, and demonstrate how we can use mechanisms available in recent probabilistic programming languages to implement a family of parameterisations often used in practice. In particular, we will look at a continuous relaxation of the question of what parameterisation to use and combine it with variational inference to obtain robust and efficient MCMC samplers.
|Mar. 26, 12:00 pm||Inside Every Calculus Is A Little Algebra Waiting To Get Out |
-Recording, textbook #1, textbook #2
Because of deep learning, there has been a surge in interest in automatic differentiation, especially from the functional programming community. As a result there are many recent papers that look at AD from a Category Theory perspective. However, Category Theorists have already been looking at differentiation and calculus in general since the late 60’s in the context of Synthetic Differential Geometry, but it seems that this work is largely ignored by those interested in AD. In this talk, we will provide a gentle introduction to the ideas of SDG, by relating them to dual numbers, and show how it provides a simple and purely algebraic approach to (automatic) differentiation.
|Apr. 1, 9:30 am||Tensor Networks, Probabilistic Modeling, and Formal Grammar |
-Recording, slides, FGs, Codes on graphs (thanks to Tristan)
Tensor networks (TNs) are a general formalism for efficiently modeling complex higher-order tensors, whose broad applications within quantum many-body physics have led them to be described as the natural framework for modeling quantum states of matter. Spurred on by their independent rediscovery in applied mathematics, TNs have increasingly been used for machine learning, where they have unlocked a growing number of novel theoretical and practical results. In this talk, we discuss one such result, concerning the use of TN models for modeling probabilistic sequence data. We show how a simple recurrent TN model enables a novel algorithm for conditional sampling from collections of strings defined by formal grammars, a significant generalization of the autoregressive sampling of typical language models. By developing this unexpected link between quantum physics and language, we expect our results to contribute to the search for mathematically principled methods for unsupervised grammatical inference in NLP.
|Apr. 9, 10:00 am||Computing derivatives of matrix and tensor expressions |
Expressions involving matrix and tensors are often encountered in the area of machine learning. For such expressions, it is often necessary to obtain first and second-order derivatives. In my talk, I will present two algorithmic approaches for computing such matrix and tensor derivatives. Both approaches can be used for symbolic as well as for forward and reverse mode algorithmic differentiation. Experiments show a speedup between two and three orders of magnitude over state-of-the-art frameworks like TensorFlow or PyTorch when evaluating higher-order derivatives.
An online interface to our approach can be found at http://www.MatrixCalculus.org.
|Apr. 16, 3:00 pm||Harnessing second order optimizers from deep learning frameworks |
-Recording, slides, GitHub, PyPI, Read the Docs
Have you ever wanted to use a second order optimizer on code written in TensorFlow or PyTorch? What about optimizing a dictionary of tensors using SciPy minimize? If so, a lot of troublesome glue code was probably needed. For another way, look no further than the dict-minimize package, which takes care of everything and lets users easily optimize objectives implemented in TensorFlow, PyTorch, or JAX.
|Apr. 22, 1:30 pm||Factor Graph Grammars for Probabilistic Modeling |
-Recording, slides / HRG survey, Graph & AMR parsing, MT
Probabilistic graphical models (PGMs) are a powerful method for specifying probability distributions. However, they are unable to represent some of the most popular models used in NLP, such as HMMs and PCFGs (since PGMs assume a fixed number of random variables, while HMMs and PCFGs can generate sentences of arbitrary length). In this talk, I will present factor graph grammars (FGGs), an extension of PGMs that is able to capture these models and many others. A FGG is a type of grammar, called a hyperedge replacement graph grammar, which generates a (possibly infinite) set of factor graphs. Conveniently, inference can be done over the FGG without enumerating all the factor graphs in the set. For finite variable domains (but possibly infinite sets of graphs), a generalization of variable elimination to FGGs allows exact and tractable inference in many situations. This talk will introduce the FGG formalism and present that inference algorithm.
|Apr. 30, 1:00 pm||Gradual tensor typing for practical machine learning in Haskell with Hasktorch |
-Recording, Haskell [Mask], Twitch / Lambda Montreal, Chris Olah’s blog
Hasktorch is a library for neural networks and tensor math in Haskell. It is leveraging the C++ backend of PyTorch for fast numerical computation with GPU support. Hasktorch’s goal is to provide a platform for machine learning using typed functional programming. Gradually typed Hasktorch is a new API for tensors and neural networks that interpolates between the already existing unchecked (that is, untyped) and checked (typed) Hasktorch APIs. Thus far, users have to choose whether they want to commit fully to either typed or untyped tensors and models. The new gradually typed API relaxes this black-and-white tradeoff and makes the decision more granular. In gradually typed Hasktorch, users can choose whether or not they want type checking for every individual type variable, like a tensor’s compute device (e.g. the CPU or a GPU), its precision (Bool, Int64, Float, etc.), or its shape (the names and sizes of its dimensions). Thus, users can enjoy the flexibility of an unchecked API for rapid prototyping, while they can also add as much type checking as they want later on. Alternatively, users can start with fully checked tensor and model types and relax them when and where they get in the way. Thus, gradually typed Hasktorch combines the best of both worlds, of checked and of unchecked Hasktorch. The talk will use simple examples to demonstrate the benefits of gradual typing for tensor math and machine learning.
If you would like to sign up to present or wish to receive updates from
the mailing list, please reach out to the organizer via the following
breandan dot considine at mail dot mcgill dot ca.
A few recent papers on differential, integral and logical calculi are listed for illustrative purposes below, following the general theme of topics we plan to discuss. Other suggested reading materials are also welcome!
Differentiable programming concerns the calculus of infinitesimal change and change propagation through a computation graph. This subject is the original and primary focus of this reading group.
- Getting to the Point. Index Sets and Parallelism-Preserving Autodiff for Pointful Array Programming, Paszke et al. (2021)
- Learners’ languages, Spivak (2021)
- CHAD: Combinatory Homomorphic Automatic Differentiation, Vakar (2021)
- Symbolic and Automatic Differentiation of Languages, Elliott (2021)
- Swift for TensorFlow: A portable, flexible platform for deep learning, Saeta et al. (2021)
- ∂ is for Dialectica: Typing Differentiable Programming, Kerjean and Pédrot (2021)
- λS: Computable semantics for differentiable programming, Sherman et al. (2020)
- Instead of Rewriting Foreign Code for Machine Learning, Automatically Synthesize Fast Gradients, Moses and Churavy (2020)
- Differentiable Weighted Finite-State Transducers, Hannun et al. (2020)
- A simple and efficient tensor calculus, Laue et al. (2020)
- Differentiate Everything with a Reversible Domain-Specific Language, Liu and Zhao (2020)
- Gradient Descent: The Ultimate Optimizer, Chandra et al. (2019)
- The Differentiable Curry, Abadi et al. (2019)
- PyTorch: An Imperative Style, High-Performance Deep Learning Library, Paszke et al. (2019)
- The Simple Essence of Automatic Differentiation, Elliott (2018)
- Automatic differentiation in ML: Where we are and where we should be going, Merriënboer et al. (2018)
- Automatic differentiation in machine learning: a survey, Baydin et al. (2017)
We will also discuss some topics in automatic integration or probabilistic programming, i.e. the study of volume, density and how to derive their exact or approximate quantity.
- Probabilistic Programming Semantics for Name Generation, Sabok et al. (2021)
- Quantitative Equational Reasoning, Bacci, Mardare, Panangaden, and Plotkin (2020)
- Dice: Compiling Discrete Probabilistic Programs for Scalable Inference, Holtzen et al. (2020)
- Bean Machine: A Declarative Probabilistic Programming Language For Efficient Programmable Inference, Tehrani et al. (2020)
- miniKanren as a Tool for Symbolic Computation in Python, Willard (2020)
- PPL Bench: Evaluation Framework For Probabilistic Programming Languages, Kulkarni et al. (2020)
- Probabilistic Circuits: A Unifying Framework for Tractable Probabilistic Models, Choi et al. (2020)
- Probabilistic Programming with CuPPL, Collins and Grover (2020)
- Symbolic Exact Inference for Discrete Probabilistic Programs, Holtzen et al. (2019)
- Automatic Reparameterisation of Probabilistic Programs, Gorinova et al. (2019)
- A domain theory for statistical probabilistic programming, Vákár et al. (2019)
- Deep Learning for Symbolic Mathematics, Lample and Charton (2019)
- Exact Bayesian inference by symbolic disintegration, Shan and Ramsey (2017)
- Simplifying Probabilistic Programs Using Computer Algebra, Carette and Shan (2016)
- Augur: Data-Parallel Probabilistic Modeling, Tristan et al. (2014)
Finally, we plan to discuss some related topics in other logical calculi, including combinatory logic, sequent calculus and other formal systems. This material will be primarily introductory and no prior experience is expected or required.
- A general definition of dependent type theories, Bauer (2020)
- Inductive Types Deconstructed: The Calculus of United Constructions, Monnier (2019)
- An introduction to Differential Linear Logic, Erhard (2016)
- Introduction to the Calculus of Inductive Constructions, Paulin-Mohring (2014)
- Lecture Notes on the Lambda Calculus, Selinger (2013)
- Binary Lambda Calculus and Combinatory Logic, Tromp (2006)
- An introduction to the π-calculus, Parrow (2001)
- The differential lambda-calculus, Ehrhard and Regnier (2001)
- The SKI Combinator Calculus, a universal formal system, O’Donnell
- Lambda calculi with types, Barendregt (1992)
- The calculus of constructions, Coquand and Huet (1986)
- Constructive Mathematics and Computer Programming, Per Martin-Löf (1976)
- On the building blocks of mathematical logic, Schönfinkel (1924)