Preparing and Submitting Tabular Data

The CDS and other astronomical data centers are storing and distributing the astronomical data to promote their usage primarily by professional astronomers.
In order to ensure the scientific quality of the data, we therefore require that the data are related to a publication in a refereed journal, either as tables or catalogues actually published, or as a paper describing the data and their context.

For a quick view of the guidelines and recommandations for publishing your data at CDS, please have a look at the "Make your data visible" brochure.
See also the Best Practices for Data Publication in the Astronomical Literature (T.Chen, 2022). The article is dedicated for authors, and is a basis of good practices expected in journals and data-centers.

A training summarizing how to publish your data in VizieR following the best practices is available at The journey of your data through the Virtual Observatory and the European Open Science Cloud.

In order to facilitate the usability of the data, and to allow their processing by the data centers, we require that:

the data are described accurately enough to allow an unambiguous interpretation of the data, as well as a comprehension of the context in which the data were acquired and/or processed; a single ascii file, named ReadMe, is designed for this role.
the data are in a format which allows their usage by tools currently in usage in our discipline — normally flat ascii files; other formats can be accepted, but are converted into flat files.

A full description of the standard conventions used for the documentation of the catalogues is available at URL http://cds.unistra.fr/doc/catstd.htx. The present document just tries to answer to some frequently asked question about how to prepare the data for their inclusion in the Data Center documents. The following topics are covered:

1 The new submission interface

Since january 2018, the new submission interface is inline; it includes FITS ingestion procedure to improve the discoverability of images and spectra.

See the publication notes

2 How to prepare the Data files

It is assumed that each component of the data set is stored in a file; each file can represent a table, a spectrum (1-D data), or an image (2-D data). As a general rule, plain ascii data files (also called flat files) — are preferred, simply because such files can always be processed. More explicitely, the following formats can be used:

for tables and catalogues: ascii (simple flat files), with details about their structures (description of columns) detailed in the ReadMe file. Some other data formats can be accepted, but are converted into flat files: latex, FITS, or TSV / CSV. TSV (tab-separated values) and CSV (character-separated values), are a presentation where a dedicated character (the tab in TSV, or a punctuation in CSV, typically the semi-colon) is used as a column separator; this is one of the formats available for the output of spreadsheets.
What cannot be used: postscript or word/excel processing internal documents.
for spectra (1-D data): either FITS file(s), or 2-column ascii tables.
What cannot be used: postscript, word/excel documents, GIF or JPEG images.
for images (2-D data): FITS is the preferred format; for images of the sky, the inclusion of the FITS-WCS (World Coordinate System) parameters describing the conversion between celestial coordinates and pixel position is strongly encouraged.
What cannot be used: postscript, word/excel documents.

Therefore: never postscript files, postscript is a language designed for printers, not for storing scientific data !

A short word about file naming conventions: according to ISO 9660 standard, file names are restricted to 8 + 3 characters: 8 characters in the set [a-z0-9_-], followed by a dot and an extension made of 3 characters with the following conventions: .dat for data files, .fit for FITS files, .tex for TeX/LaTeX files, and .txt for text files (ascii files containing only printable text).

Full details about the files and directories structures can be found in the Adopted Standards for Catalogues document.

The CDS provides tools and services for authors submission :

build ReadMe and tables :
- cdspyreadme package (Python): pip install cdspyreadme
- anafile package (C)
FITS spectra/images validation service: FITS validator

3 How to fill the `ReadMe` description file

This file is aimed at describing all data files stored in a catalogued data set, and at providing the necessary explanations and references to the stored material.

All catalogues available at CDS and in associated astronomical data centers have such an associated file, and numerous examples can be found on the FTP directories at CDS.

A full description of the conventions used in this ReadMe file can be found in the Standards for Astronomical Catalogues, and a template is readily accessible for all journals. A typical illustration could be e.g. J/A+A/382/389/ReadMe. Short explanations about how to fill the ReadMe file:

the volume and page numbers: for papers accepted for publication in A&A, but not yet published, these will be added directly at CDS as soon as we get these from the publisher. For papers accepted for publication in other journals, it is recommended to mail them (to cds-cats(at)unistra.fr) when you get these details from the publisher.
the Keywords: part lists the following keywords:
- ADC_Keywords introduces the list of data-related keywords, out of a controlled set
- Keywords: introduces the list of keywords as in the printed publication
Unlike the Keywords: set which is generally related to the scientific goal of a paper, the ADC_Keywords are stricly related to the tabular material collected in the paper.
the Description: section is expected to describe the context of the data, like the instrumentation used or the observing conditions — it therefore differs from the Abstract which tends to describe the scientific results that the author derived from the data.
the File Summary: section describes the files making up the set: for each file are specified its filename, the length of the longest line (lrecl), the number of records (number of lines), and a caption (short title of the file). Lengthy notes can be added if necessary.

the Byte-by-byte Description of file: section describes the structure of each of the data files (files with the .dat extension). This description is made in a tabular form, each row describing one field (column) of the data file. The description contains the following columns:

the starting column of the data field

the format of the field as a fortran-like format:

An	for a character column made of n characters;
In	for a column containing an integer number of n digits;
Fn.d	for a column containing a number of width n digits and up to d digits in the fractional part;
En.d Dn.d	for a number using the exponential notation.

the units used in the field; the usage of SI units are strongly encouraged, avoid the CGS units (for instance, use mW/m² instead of ergs/s/cm²).
the label (heading) of the field, made of a single word (no embedded blank); a few basic conventions are used for usual parameters (e.g. positions) and related quantities (e.g. mean errors).
the explanations can start with the following special characters related to some important data characteristics:

* (the asterisk) indicating a lengthy note

[...] (square brackets) indicating data ranges

? (question mark) indicating a possibility of blank or NULL (unspecified) values

the References: section contains the necessary references; the usage of the bibcode is strongly encouraged. For large sets of references, it is suggested to gather them into a dedicated reference file named refs.dat .

4 How to deposit the data

If not too bulky, the ascii (text) files data files with their ReadMe file can be uploaded from

https://cdsarc.cds.unistra.fr/vizier.submit/

where some basic checks on the ReadMe and data files are performed. The checking procedure is also available as the anafile package which can be installed with the standard configure and make Linux procedures (man page)

Alternatively (needed for binary files like FITS) you can:

upload the files with their ReadMe via ftp (recommended for large files)

IMPORTANT NOTE: The FTP deposit evolves in october 2021
FTP uploads now requires a login authentification - A web application is available for authors to obtain a temporary login/password:
https://cds.unistra.fr/ftp/token/
(include the FTP instruction)
e-mail your files to the e-mail address cds-cats(at)unistra.fr if these are not too bulky (< a few Megabytes).
contact us for other possibilities like download from your site, DVD posting, etc... at

Centre de Données astronomiques
11, rue de l'Université
67000 STRASBOURG, France
cds-cats(at)unistra.fr

5 What happens to your data

At the CDS, some checking procedures are executed to verify the compatibility between the data files and their description. This can lead to interactions with the authors, but we are trying to minimize the level of interaction. Once the data are public, they are accessible as plain files in FTP directories at CDS and other participating data centers (e.g. at VizF.ADACNOAJ/ADAC, Japan). The data are also added to the CDS service, with mirrors at CfA/Harvard (USA), NOAJ/ADAC (Japan), IUCAA (India), INASAN( Russia), NAO (China). IDIA (South Africa).

6 Contacts

For any question related to the preparation of the data, for problems related to non-standard data formatting, or any other difficulty in the management or the transfer of the electronic tables, either send a mail by clicking on the envelope below, or contact directly the VizieR team ()

*	(the asterisk)	indicating a lengthy note
[...]	(square brackets)	indicating data ranges
?	(question mark)	indicating a possibility of blank or NULL (unspecified) values

Contents: