Science Keyword Rules

by Lola M. Olsen

The following rules have been used in determining TERMs and three levels of VARIABLEs (known as Variable_Level1, Variable_Level2, Variable_Level3) for the GCMD keywords. These rules are used in GCMD's procedures for modifying TOPICs, TERMs, and VARIABLEs to assist the user in locating Earth science data sets of interest. These VALIDs are expected to remain fairly stable over time, although suggestions for additions and/or changes will always be considered. In addition, an uncontrolled level of keywords is available for "detailed variables". This list will be uncontrolled except for spelling. Data set producers and DIF writers are encouraged to populate this field with "detailed variables" for the data sets being documented. None of the rules below will apply to this uncontrolled set of "detailed variables". [The field will be searchable through a Lucene fielded search.] Be aware that not all the keywords currently have dataset descriptions behind them.

1. At any level of the keyword taxonomy, all topics, terms, and variables should be chosen to be mutually exclusive, minimizing overlap as much as possible.

2. At any level within the taxonomy, the keywords should be parallel. For example, one would not include a broader or narrower keyword within any one level of the taxonomy.

3. Terms may be prefixed with TOPIC level modifiers, if they do not "stand alone" well.

4. Terms/Variables should be plural when singular vs. plural is in question.

    • Count nouns answering the question, "How many?" are plural: for example, chemical reactions, penguins, ecosystems. (Exceptions to this rule exist on a discipline-specific basis.)

    • Noncount nouns answer the question "How much?" Abstract concepts and unique entities are singular: for example, copper, snow, water, digestion, and conductivity.

5. No "data center-specific" pre- or suf-fixed variables should be used.

6. Chemical symbols may be used at Variable_Level2 or Variable_Level3 or as detailed variables.

7. Variables may be prefixed with TERM level modifiers; however, this is not required. TERM modifiers are suggested to identify variables that do not "stand alone" well. A generic variable such as "motion" should not be used if a more accurate and descriptive variable such as "sea ice motion" is what the user will find in the search. [Variables should generally not be prefixed with TOPIC level modifiers, although there are exceptions]

8. Statistical modifiers may be used at Variable_Level2 or Variable_Level3 or captured as detailed variables.

    • Example: Mean stream discharge

9. Extended modifiers should be reserved for Variable_Level2 or Variable_Level3 or captured in the uncontrolled (detailed variable) keyword list.

    • Example:

      Integrated Precipitable Water Vapor

      "Intercepted" Photosynthetically Active Radiation

10. Meaningless (scientifically) overly complex modifiers or interal organization prefixes, should be avoided in the variable list.

    • Example:
      "1_BUTENE" and "Langley_8_year_SRB_SW_Radiation" would be appropriate only at the uncontrolled detailed variable list.

11. If a generic VARIABLE has been used to describe a contingent of variables, a repetitive generic term should NOT be used at the same level of the variable list. (When multiple expressions for the same variable exist, the variable level should indicate these as "xxx Expressions", signifying that the expressions following can be used interchangeably with appropriate conversions.)

    • Example: Variable = water vapor.

      Do not include another variable indicating the same quantity at the same level, such as "humidity".   All water vapor derived values (which can be converted from one expression to another), such as "absolute humidity", "specific humidity", "relative humidity", "vapor pressure", "mixing ratio" should be listed at the variable level below the common identifier, "Water Vapor Expressions".

12. Duplicate variables should be avoided if one serves as a euphemism or surrogate for another.

    • Possible example: sea ice stage development vs. sea ice form

13. Variable descriptors that add only nebulous information should be avoided; for instance, "how low is low?" for the lower troposphere?

14. If the science community uses terms interchangeably, the more commonly used variable in the field should be chosen.

    • Examples:

      Variable: Use Planetary Boundary Layer (PBL) or Atmospheric Boundary Layer (ABL)

      Reserve "peplosphere" for the Detailed Variable level ground-based layer of atmosphere which is affected by convection.

15. Modifiers that only describe the spatial domain should generally be reserved for Variable_Level2 or Variable_Level3.

  • Example:

    Variable: Heat Flux

    Variable Level 2 or 3: Global Heat Flux

16. Variables should be mutually exclusive, minimizing overlap as much as possible.

17. Keywords should not be associated with "value judgments".    "Air Quality" should be used in preference to "Pollution".

18. Any "slashed" keywords must be clarified so that each side of the slashed word(s) can stand alone for searching by the user.

  • Example:

    Use: Atmosphere > Atmospheric Radiation > Optical Depth/Optical Thickness

    Not: Atmosphere > Atmospheric Radiation > Optical Depth/Thickness

19. An overriding goal is to have the keywords as "parallel" (in terms of "detail") as possible within any one level of the hierarchy.

To suggest a modification to the GCMD keyword valids, please contact one of the GCMD science staff, E-mail your suggestion to GCMD User Support.

