Preparing Datasets

Structuring Datasets

The ACSC uses a standard directory structure to identify files associated with molecular dynamics simulations. Each dataset submitted to the ACSC must conform to the following directory structure:

└──System_Name
    ├── atbrepo.yaml
    ├── control
    |   └──System_Name_control_00001.fileextension
    ├── energy
    |   └──System_Name_energy_00001.fileextension
    ├── final-coordinates
    |   └──System_Name_final-coordinates_00001.fileextension
    ├── input-coordinates
    |   └──System_Name_input-coordinates_00001.fileextension
    ├── forcefield-files
    |   └──System_Name_forcefield-files.fileextension
    ├── log
    |   └──System_Name_log_00001.fileextension
    ├── miscellaneous
    |   └──System_Name_miscellaneous-files.fileextension
    ├── reference-coordinates
    |   └──System_Name_reference-coordinates.fileextension
    ├── topology
    |   └──System_Name_topology.fileextension
    └── trajectory
        └──System_Name_trajectory_00001.fileextension

In all cases listed above .fileextension should be replaced with the appropriate file extension for the given file type. In most cases, no modification of the file extension should be necessary on the part of the user.

Files containing 00001 are file types which can be sequentially numbered to indicate continuation runs.

Warning

All top level directories (control, energy, etc.) must be present, even if no files of a given type are provided. The exceptions to this are the forcefield-files and miscellaneous directories, which are optional. No other files or directories, hidden or otherwise, are permitted at the top level. Ensure that no hidden files are present with ls -a.

System_Name

The name of the dataset directory should be a descriptive name for the simulation run. While this is not the name that will be displayed on the ACSC website, it is the name that will be used to generate URLs pertaining to the dataset and should be used when naming all dataset files, as outlined in the template above.

Warning

Dataset directory names must be globally unique across the entire ACSC site and contain only alphanumeric characters and the symbols _ and -. Ensure that you have chosen a sufficiently descriptive name as to be unlikely to conflict with other datasets.

atbrepo.yaml

This is the metadata file for the dataset, the contents of which will be explained in a later section.

control

This directory should contain control/parameter/settings files for the simulation run (or segments thereof, in the case of continuation runs) (e.g., .mdp files for GROMACS, .imd files for GROMOS, .mdin files for AMBER).

Note

At least one control or log file must be provided for a dataset to be included in the ACSC database.

energy

This directory should contain energy trajectory files for the simulation run (or segments thereof, in the case of continuation runs).

final-coordinates

This directory should contain output coordinate files for the simulation run (or segments thereof, in the case of continuation runs).

input-coordinates

This directory should contain input coordinate files for the simulation run (or segments thereof, in the case of continuation runs).

Note

At least one input coordinates file must be provided for a dataset to be included in the ACSC database.

forcefield-files

This directory should contain force field modifcation files for the simulation run (e.g., .ifp files for GROMACS, .itp files for GROMOS).

log

This directory should contain log files for the simulation run (or segments thereof, in the case of continuation runs).

Note

At least one control or log file must be provided for a dataset to be included in the ACSC database.

miscellaneous

This directory should contain files that do not fit into any of the other categories (e.g., GROMACS .ndx files).

reference-coordinates

Reference coordinates for the simulation run (or other coordinate files which do not meet the criteria for input or output coordinates).

topology

Topology files for the simulation run.

Note

At least one topology file must be provided for a dataset to be included in the ACSC database.

trajectory

Coordinate trajectory files for the simulation run (or segments thereof, in the case of continuation runs).