Reading Group

Welcome to the UVic AI Safety Reading Group!

As the name might suggest, we focus on AI safety research, including value alignment, interpretability, AI governance, and societal impact. Anyone who is interested in these topics is more than welcome to join our online meetings, regardless of their knowledge background or location!

Every week we select a paper, blog post, or article to read separately and discuss. Suggestions and conversations outside of the online meeting take place on our Discord channel, but if you don’t have Discord, you’re more than welcome to join our reading group mailing list to recieve updates on the weekly reading and the meeting link.

Reading Group Meetings:

Intro to AI Ethics & Alignment: Every Monday at 7pm, online
Advanced AI Ethics & General Topics: Every Wednesday at 7pm, online

Reading Group Mailing List: Sign up

Next Readings:

Session 26
- Ethics: Mon, Jan 29 - Overview and discussion of AI Ethics & Alignment
- General: Wed, Jan 31 - “Grokking: Generalization Beyond Overfitting on Small Algorithmic Datasets”

Reading Session History:

Session 25 - Dec 4th: Reading about the UK “AI Safety Summit 2023”
Session 24 - Nov 27th: CutMix: Regularization Strategy to Train Strong Classifiers with Localizable Features
Session 23 - Nov 20th: We’re reading about transformers and precursors / prerequisites leading up to them using this post as a jumping off point.
Session 22 - Nov 13th: A second week of discussing A Mechanistic Interpretability Analysis of Grokking
Session 21 - Nov 6th: A Mechanistic Interpretability Analysis of Grokking
Session 20 - Oct 30th: Curiosity: Exploration by Random Network Distillation (paper) with 2min papers summary vid
Session 19 - Oct 23th: MuZero: MuZero blog article, MuZero paper, and DreamerV3 paper
Session 18 - Oct 16th: Zoom In: An Introduction to Circuits (from Claim 2 until the end)
Session 17 - Oct 9th: Zoom In: An Introduction to Circuits (until the end of Claim 1)
Session 16 - Oct 2nd: 200 Concrete Open Problems in Mechanistic Interpretability: Introduction
Session 15 - Sept 25th: Preventing an AI-related catastrophe (sections 4 - 6)
Session 14 - Sept 18th: Preventing an AI-related catastrophe (sections 1 - 3)
Session 13 - Sept 11th: An AI Pause Is Humanity’s Best Bet For Preventing Extinction
Session 12 - Aug 8th: OpenAI’s Superalignment team/goal (This week we will discuss this topic generally, so choose whichever reading or podcast you wish in order to learn more about it)
Session 11 - Aug 1st: (My understanding of) What Everyone in Technical Alignment is Doing and Why (round 2)
Session 10 - July 25th: (My understanding of) What Everyone in Technical Alignment is Doing and Why
Session 9 - July 18th: Self-appointed skim reading from our list of suggested readings. This week we will all individually choose a few suggested readings from this document to skim read through. During our discussion we will each pitch some of the readings we surveyed and collectively decide on which one sounds the most interesting to fully read for the following week.
Session 8 - July 11th: LOVE in a simbox is all you need
Session 7 - July 4th: Self-appointed AI-optimist reading(s). This week we will all individually choose 1 or more readings that go against the “AI doomer” belief that AGI poses a sufficient risk of existential catastrophe such that we need to take AI safety very seriously.
Session 6 - June 27th: How likely is deceptive alignment?
Session 5 - June 20th: Mechanistic Interpretability, Variables, and the Importance of Interpretable Bases
Session 4 - June 13th: Core Views on AI Safety: When, Why, What, and How
Session 3 - June 6th: Seeing is Believing: Brain-Inspired Modular Training for Mechanistic Interpretability (round 2) and Alignment of Language Agents
Session 2 - May 30th: Seeing is Believing: Brain-Inspired Modular Training for Mechanistic Interpretability
Session 1 - May 23rd: Paul Christiano: Current work in AI alignment