Shubhra Mishra

Hi! I’m Shubhra, a first-year PhD candidate at KTH Royal Institute of Technology in Stockholm, Sweden, advised by Dr. David Broman and Dr. Martin Monperrus funded by the Wallenberg AI, Autonomous Systems and Software Program. I graduated with my B.S. + M.S. from Stanford University, where I studied Computer Science (AI track).

There are two research directions I’m incredibly excited about: automated mathematical discovery (AMD) and reliable code generation. The two areas share two key challenges.

Raising the abstraction. Strong mathematical discoveries generally make mathematics more compressible/build better mathematical abstractions; similarly, good code abstracts away low-level details.
Building intermediate representations that can be iteratively refined and then verifiably compiled. The primary approach to code generation (which is also a part of theorem proving in AMD) has been to convert natural language directly to code. I believe that stronger intermediate representations that can be compared against executable/proof requirements will be the key to making code generation more reliable.

If these problems excite you, please reach out! I would love to chat.

(trains of thought/lines of research inspired by Kartik Chandra)

Publications

2025

A Matter of Interest: Understanding Interestingness Judgments of Math Problems in Humans and LLMs

[Oral] Nordic AI Meet 2025
Shubhra Mishra, Yuka Machino, Gabriel Poesia, Albert Jiang, Joy Hsu, Adrian Weller, Challenger Mishra, David Broman, Joshua B. Tenenbaum, Mateja Jamnik, Cedegao E. Zhang, Katherine M. Collins
[arXiv]

DiagramIR: An Automatic Pipeline for Math Diagram Evaluation

In submission
Vishal Kumar, Shubhra Mishra, Rebecca Hao, Rizwaan Malik, David Broman, Dorottya Demszky
[arXiv]

[2025] From Next-Token to Mathematics: The Learning Dynamics of Mathematical Reasoning in Language Models

[2024] MathCAMPS: Fine-grained Synthesis of Mathematical Problems From Human Curricula

Conference on Language Modeling 2025, NeurIPS Math-AI Workshop 2024
Shubhra Mishra, Gabriel Poesia, Noah Goodman
[Website] [arXiv]

Training Language Models with the Human Curriculum

[Oral] NeurIPS Workshop on Continual and Compatible Foundation Model Updates
Pavan Kalyan, Shubhra Mishra, Satya Lokam, Navin Goyal
Paper coming soon!

2024

An Evaluation Benchmark for Autoformalization

ICLR 2024
Aryan Gulati*, Devanshu Ladsaria*, Shubhra Mishra*, Jasdeep Sidhu*, Brando Miranda
[arXiv]

Families of Harris Graphs

In submission
Shubhra Mishra, Doug Shaw, Francesca Gandini
[Website] [arXiv]

Projects

2024

🏆 Can Symbolic Scaffolding and DPO Enhance Mathematical Problem-Solving Skills in LLMs?

CS 329H: Machine Learning from Human Preferences
🏆 Outstanding Project Award for CS 329H
Paper coming soon!

Improving Counting Abilities in Stable Diffusion Models

CS 468: Topics in Geometric Computing - 3D and 4D Foundation Models
[Paper]

Self-Improvement in Small Language Models

🏆 Self-Improvement for Math Problem-Solving in Small Language Models

CS 224N: Natural Language Processing with Deep Learning
🏆 Outstanding Project Award for CS 224N
[Paper]

Synthetic Data Generation for Visual Math Reasoning

GAMMAS: Improving Mathematical Reasoning in Vision Language Models Through Synthetic Data Generation

CS 231N: Deep Learning for Computer Vision
[Paper]

Shubhra Mishra

News

Publications

2025

A Matter of Interest: Understanding Interestingness Judgments of Math Problems in Humans and LLMs

DiagramIR: An Automatic Pipeline for Math Diagram Evaluation

[2025] From Next-Token to Mathematics: The Learning Dynamics of Mathematical Reasoning in Language Models

[2024] MathCAMPS: Fine-grained Synthesis of Mathematical Problems From Human Curricula

Training Language Models with the Human Curriculum

2024

An Evaluation Benchmark for Autoformalization

Families of Harris Graphs

Projects

2024

🏆 Can Symbolic Scaffolding and DPO Enhance Mathematical Problem-Solving Skills in LLMs?

Improving Counting Abilities in Stable Diffusion Models

🏆 Self-Improvement for Math Problem-Solving in Small Language Models

GAMMAS: Improving Mathematical Reasoning in Vision Language Models Through Synthetic Data Generation