Reading Group
Welcome to the UVic AI Safety Reading Group!
As the name might suggest, we focus on AI safety research, including value alignment, interpretability, AI governance, and societal impact. Anyone who is interested in these topics is more than welcome to join our online meetings, regardless of their knowledge background or location!
Every week we select a paper, blog post, or article to read separately and discuss. Suggestions and conversations outside of the online meeting take place on our Discord channel, but if you don’t have Discord, you’re more than welcome to join our reading group mailing list to recieve updates on the weekly reading and the meeting link.
Reading Group Meetings: Every Monday at 8pm, online
Reading Group Mailing List: Sign up
Next Readings:
- Session 18 - Oct 16th: Zoom In: An Introduction to Circuits (from Claim 2 until the end)
- Session 17 - Oct 9th: Zoom In: An Introduction to Circuits (until the end of Claim 1)
Reading Session History:
- Session 16 - Oct 2nd: 200 Concrete Open Problems in Mechanistic Interpretability: Introduction
- Session 15 - Sept 25th: Preventing an AI-related catastrophe (sections 4 - 6)
- Session 14 - Sept 18th: Preventing an AI-related catastrophe (sections 1 - 3)
- Session 13 - Sept 11th: An AI Pause Is Humanity’s Best Bet For Preventing Extinction
- Session 12 - Aug 8th: OpenAI’s Superalignment team/goal (This week we will discuss this topic generally, so choose whichever reading or podcast you wish in order to learn more about it)
- Session 11 - Aug 1st: (My understanding of) What Everyone in Technical Alignment is Doing and Why (round 2)
- Session 10 - July 25th: (My understanding of) What Everyone in Technical Alignment is Doing and Why
- Session 9 - July 18th: Self-appointed skim reading from our list of suggested readings. This week we will all individually choose a few suggested readings from this document to skim read through. During our discussion we will each pitch some of the readings we surveyed and collectively decide on which one sounds the most interesting to fully read for the following week.
- Session 8 - July 11th: LOVE in a simbox is all you need
- Session 7 - July 4th: Self-appointed AI-optimist reading(s). This week we will all individually choose 1 or more readings that go against the “AI doomer” belief that AGI poses a sufficient risk of existential catastrophe such that we need to take AI safety very seriously.
- Session 6 - June 27th: How likely is deceptive alignment?
- Session 5 - June 20th: Mechanistic Interpretability, Variables, and the Importance of Interpretable Bases
- Session 4 - June 13th: Core Views on AI Safety: When, Why, What, and How
- Session 3 - June 6th: Seeing is Believing: Brain-Inspired Modular Training for Mechanistic Interpretability (round 2) and Alignment of Language Agents
- Session 2 - May 30th: Seeing is Believing: Brain-Inspired Modular Training for Mechanistic Interpretability
- Session 1 - May 23rd: Paul Christiano: Current work in AI alignment