Pitt-CMU Colloquium: Yuan-Sen Ting (Ohio State University)

December 9, 2024 - 3:30pm

102 Thaw Hall

 

Expediting Astronomical Discovery with Large Language Models: Progress, Challenges, and Future Directions

---------------------

 

The expansive, interdisciplinary nature of astronomy, combined with its open-access culture, makes it an ideal testing ground for exploring how Large Language Models (LLMs) can accelerate scientific discovery. In this talk, I will present our recent advances in applying LLMs to real-world astronomical challenges. Through self-play reinforcement learning, we demonstrate how LLM agents can conduct end-to-end research tasks in galaxy spectra fitting, encompassing data analysis, strategy refinement, and outlier detection—effectively mimicking human intuition and deep domain knowledge. Our agent, named Mephisto, rediscovered and analyzed the Little Red Dots, a new class of galaxies recently identified by the James Webb Space Telescope. While autonomous research agents like Mephisto could theoretically help analyze all observed sources, the cost of closed-source solutions remains prohibitive for large-scale applications involving billions of objects. To address this limitation, we are developing lightweight, open-source specialized models and evaluating them against carefully curated astronomical benchmarks. Our research shows that specialized 8B-parameter LLMs can match GPT-4o's performance on specific tasks when properly pretrained and fine-tuned. Despite ongoing challenges, we see immense potential in scaling up automated astronomical inference, which could transform how astronomical research is conducted.