Astro Software Survey 7
dergraduate level (and some participants left comments
to that effect). While graduate students are more likely
to have had a little training, it seems that few gradu-
ate programs offer and/or require CS courses (otherwise
junior astronomers will have a higher level of significant
training). Overall, ∼ 90% of the survey participants have
received only a little bit of training at best, despite all
being software users, and most being writers of their own
software.
3.4. What is in the Astronomer Software Tool Stack?
In this section we consider the most common software
tools for professional astronomers. We refer to the full
set of software tools an astronomer uses as their “stack”.
In the survey form we suggested 19 software tools and
allowed participants to add any options we missed. The
input was edited to standardize spelling and capitaliza-
tion of tools. In total, participants added 64 custom
options. 10 respondents did not provide an answer to
this question. While “C” was an option, “C++” was not
part of our suggestions. Some participants noted in the
comments what they chose “C” even though they actu-
ally use “C++”. For this reason we consider C and C++
together in our analysis. Within the top-20 most used
software tools there are four items that were not on our
original list: C++, Mathematica, gnuplot and awk.
The overall astronomer stack is rather narrow (Figure
10, first panel). Only ten of the software tools are used
by more than 10% of the survey participants. These
are (from most popular to least popular): Python, shell
scripting, IDL, C/C++, Fortran, IRAF, spreadsheets,
HTML/CSS, SQL and Supermongo. Across all partici-
pants the most common programing language is Python
(67± 2%), followed by IDL (44 ± 2%), C/C++ (37 ± 2%)
and Fortran (28±2%). Shell scripting is the second most
popular tool for astronomers (47 ± 2%). The IRAF (Im-
age Reduction and Analysis Facility) environment is used
by 24 ± 1% of the survey participants.
Across the different career stages, we notice that se-
nior astronomers have a broader tool stack, i.e. they
utilize a wider variety of tools in their research. Only
eight tools are used by more than 10% of graduate stu-
dents, nine tools are used by more than 10% of post-
docs and 11 tools are used by more than 10% of faculty
and scientists. Python is the most popular tool at all
career levels, and it is most popular among junior re-
searchers. Four out of five graduate students use Python
(80 ± 5%), as do 70 ± 5% of postdocs and half of faculty
and scientists (53 ± 4%). IDL, IRAF and compiled lan-
guages have a more uniform user base across all career
levels. Some tools are unique to certain demographics.
Graduate students have the highest fraction of Matlab
users (11%), while faculty and research scientists dom-
inate HTML/CSS (21%), Supermongo (16%) and Perl
(16%).
Unsurprisingly, software tools depend strongly on the
research area (Figure 11). Without attempting to be ex-
haustive, we note some interesting differences between
fields. Observational astronomers have the highest frac-
tions of IDL (48 ± 2%) and IRAF (31 ± 2%) users. Theo-
retical researchers have the highest fractions of compiled
language users: C/C++ with 56 ± 4% and Fortran with
50 ± 4%. Researchers in instrumentation have a high
fraction of C/C++ (52 ± 6%) and spreadsheet (28 ± 5%)
users. Other tools, however show little field-to-field vari-
ation. Python use is consistently high across all fields at
60 - 70%, as is shell scripting at ∼ 50%.
Finally, in Figure 12 we consider the software stack
for researchers in different countries. Researches in the
USA have the highest fractions of IDL (49 ± 3%) and
IRAF (25 ± 2%) users, while Australia has the lowest
fraction of users of these tools, 32 ± 7% and 12 ± 4%,
for IDL and IRAF respectively. The UK has the highest
fraction of SQL users (21±5%); Germany has the highest
fraction of C/C++ users (48 ± 5%); and Australia has
the highest fraction of Matlab users (13 ± 4%). However,
these results can be strongly influenced by the research
areas represented for each country within our sample so
we caution against drawing far-reaching conclusions.
We can also compare the USA and non-USA survey re-
spondents, since those two samples are comparable in size
(Figure 12, second and sixth panels). Overall the rank-
ings and fractions of users of different tools are very simi-
lar as can be expected by the global mobility of many as-
tronomers. The only notable exceptions are IDL and R.
The fraction of IDL users in the USA is 10% larger than
of non-USA participants. The user base of the statistical
package R is reversed: 8 ± 1% of non-USA researchers
choose this option vs. only 3 ± 1 of USA researchers.
Considering the wide-spread use of R in other scientific
fields, its popularity among astronomers is strikingly low.
3.5. Python vs. IDL?
A recent shift in astronomy has been the favored
choice of interpreted programming language for day-to-
day analysis work. In the previous section we showed
that Python has overtaken IDL in popularity. This may
not have been true three to five years ago, but today
Python is, by a wide margin, the most popular inter-
preted language in astronomy (at least insofar as this
survey is representative). Still, there is a significant over-
lap between the users of both languages as many people
are either transitioning from one to the other or using
both in their research. In Figure 13 we show a Venn di-
agram of the Python and IDL users. In total 984 (86%)
of the survey participants use either Python or IDL. Of
those, 764 use Python and 497 use IDL. Both are chosen
by 277 or 25% of all survey participants. This indicates
substantial overlap: 36% of Python users also use IDL
and 55% of IDL users also use Python. Finally, 158 sur-
vey participants (14% of the full sample) chose neither
option.
3.6. Interactive Visualization Of Software Tools
To facilitate understanding of this multi-dimensional
dataset of how use of the various software tools overlap
with each other, we provide an interactive visualization,
available within the Authorea version of the paper, by
downloading the software repository described in Sec-
tion 1, or at this link. In this visualization, the tools
respondents use are shown as sectors in a radial layout.
Users of multiple tools are represented as stacked sectors:
for example, the fraction of users who use only Python
and IDL are represented as the fraction of the third ring
labeled “idl” with “python” and “idl” as the lower two
layers. Hovering over that sector shows the number of
respondents to the left of the page (for Python and IDL