Astronomer Meredith Rawls was in an astronomy master's program at San Diego State University in 2008 when her professor threw a curveball. “We’re going to need to do some coding," he said to her class. "Do you know how to do that?”

Not really, the students said.

And so he taught them—at lunch, working around their regular class schedule. But what he meant by “coding” was Fortran, a language IBM developed in the 1950s. Later, working on her PhD at New Mexico State, Rawls decided her official training wasn’t going to cut it. She set out to learn a more modern language called Python, which she saw other astronomers switching to. “It's going to suck,” she remembers telling herself, “but I'm just going to do it.”

And so she started teaching herself, and signed up for a workshop called SciCoder. “I basically lost the better part of a year of standard research productivity time largely due to that choice, to switch my tools,” she says, “but I don't think I could have succeeded without that, either.”

That’s probably true. Rawls’s educational experience is still typical: Fledgling astronomers take maybe one course in coding and then informally learn whatever language their leaders happen to use, because those are the ones the leaders know how to teach. They usually don't take meaningful courses in modern coding, data science, or their best practices.

But today’s astronomers don't just need to know how stars form and black holes burst. They also need knowledge of how to pry that information from the many terabytes of data that will stream from next-generation telescopes like the Large Synoptic Survey Telescope and the Square Kilometer Array. So they're largely teaching themselves—using a suite of open-source training tools, focused workshops, and fellowship programs aims to help and actually prepare astronomers for the universe they’re entering.

Segmentation

Back when telescopes produced less data, astronomers could get by on teaching themselves. “The old model was you go to your telescope—or you log in remotely because you're fancy—you get your data, you download it on your computer, you make a plot, you write a paper, and you’re a scientist,” says Rawls, who is now a postdoc at the University of Washington. “Now, it's not practical to download all the data.” And “a plot” is laughable. You just try using graph paper to nail down the correlation function that shows the distribution of millions of galaxies (go ahead; I'll wait).

There are social costs to that inadequate education. First, it gives a booster to people who knew, early, both that they wanted to be astronomers and that astronomy meant typing into your computer all day. You know, the kinds of kids who sat in Algebra I “hacking” their TI-83s—ones with access to autodidactic materials and the free time to do that didacting. That kind of favoring is a good way to, on average, keep astronomy’s usual suspects—white guys!—on top.

Beyond the social costs, though, lie scientific ones. Let’s say a scientist writes a program that analyzes quakes inside the sun (that happens!). But there’s no documentation on how the program works, and its kludgy, coagulated subroutines are opaque. No second scientist, then, can run that code see if they get the same result, or if the program actually does what Scientist 1 claims. “Reproducibility is held up as the gold standard for what is real or not,” says Lucianne Walkowicz, an astrophysicist at the Adler Planetarium. “You need the materials upon which the experiment was performed, and you need the tools. Code is the equivalent of our beakers and Bunsen burners.”

Plus, the way astrophysics programming has historically worked is inefficient. Out on overheating desktops across Earth’s universities are dozens of programs that do the same thing—catch those quakes, comb for exoplanets—different research groups having made their own. Instead of applying increasingly refined algorithms to their research problems, ill-trained astronomer-coders sometimes spend their time reinventing the wheel.

Data Drama

Walkowicz wants to help fix these problems before they get worse—which they’re about to. She is the science collaboration coordinator for the Large Synoptic Survey Telescope, which will essentially make a 10-year-long HD movie of the sky, so astronomers can see—and, ideally, understand—what changes from diurn to diurn. “Part of the reason we could all get by on being self-taught is that datasets, even when they're on the fairly big side, are pretty small,” says Walkowicz. “They're not as large and complex as the data from LSST will be. Problems will be amplified.”

Knowing this, and knowing that astronomer apprentices are getting essentially the same training astronomers have gotten since always, she and LSST colleagues decided to help prepare those apprentices. The LSST Corporation (LSSTC) Data Science Fellowship program was born, bringing cohorts of students to six weeklong workshops over two years. To select fellows, they use a program called Entrofy, which optimizes diversity among each class.

The idea doesn’t always go over well with professors. “Reactions that I’ve gotten run the gamut from ‘That's a good point, but our students don't have time’ to ‘Stop trying to turn our astronomers into computer scientists,’” says Walkowicz.

But for their part, the students—perhaps more aware of the future of their field than the more senior researchers—feel more like astronomers. “Before being in this program, I already knew my thesis and my thesis hasn't changed,” says Charee Peters, a grad student at the University of Wisconsin, “but I feel more comfortable now being able to approach it. I feel more like a scientist.”

Read more: Wired Magazine