AQDx v3 Metadata Reference Guide
Download: AQDx_metadata_form_v3.yaml (Template) note: right click "Save Link As..." to download the yaml file.
This document details the fields and definitions required for the AQDx Version 3 Metadata YAML form. Metadata provides essential context for air quality datasets, including project ownership, site locations, instrument specifications, and quality assurance procedures.
To streamline data submission, the metadata is organized into three distinct sections: Overview, Sites, and Instruments. This standardized structure avoids repetitive data entry by allowing submitters to define a monitoring site once, and then link multiple individual instruments to that location.
(Note: Data submitters may fill out this information using either a standard three-tab spreadsheet (e.g. Microsoft Excel, Google Sheets) or editing the YAML file in a text editor. Please see the accompanying guide on Using the AQDx Excel Template and Conversion Script for step-by-step instructions on data entry.)
Metadata Organization
The schema relies on unique text identifiers to link data across the different sections:
- Sites: Each physical monitoring location is defined once and identified by a unique
site_name. This section holds all geographic and location-based data. - Instruments: Each instrument is identified by a combination of its
device_idand the specificparameter_codeit measures. This allows you to define different detection limits and calibrations for different pollutants measured by the same physical sensor package. - The Link: Every instrument entry must include a
site_namethat exactly matches a location defined in the Sites list.
Data Quality Documentation
Data quality information is divided into two levels to accurately reflect different monitoring practices:
- Dataset-Level Quality: Broad procedures that apply to the entire project (e.g., QAPP links, general data review workflows, project-wide automated QC).
- Instrument/Parameter-Level Quality: Specific performance metrics (e.g., precision, bias, detection limits, and correction methods) that apply uniquely to a specific parameter measured by a specific device.
1. Dataset Header
These root-level fields define the entire submission package and ensure the metadata maps correctly to the accompanying row-level data file.
| Field | Type | Description |
|---|---|---|
| dataset_id | String | Required. Unique identifier for the dataset. Must exactly match the dataset_id in your tabular data file. |
| aqdx_metadata_version | String | Required. Version of the metadata schema used (e.g., "3.0"). |
| aqdx_data_version | String | Required. Version of the attached tabular data format (e.g., "3.0"). |
2. Overview & Data Steward
This section provides context about the organization responsible for the data.
| Field | Type | Description |
|---|---|---|
| data_steward_name | String | Required. Max 64 chars. Must exactly match data_steward_name in the tabular data. |
| contact_name | String | Required. First and Last name. Max 64 chars. |
| contact_email | String | Required. Max 64 chars. |
| contact_phone | String | Optional. Phone number. |
| organization_type | Integer | Required. 1-Gov, 2-NGO, 3-Community, 4-Academic, 5-Industry, 6-Consulting, 7-Education, 8-Other. |
| organization_name_full | String | Required. Full name (e.g., Colorado Department of Public Health and Environment). Max 128 chars. |
| address | String | Optional. Physical address of the organization. Max 128 chars. |
| last_update_date | Integer | Required. Date the metadata was last modified. Format: YYYYMMDD. |
| is_regulatory_data | Integer | Required. Is this dataset regulatory? 1 (Yes) or 0 (No). |
| data_abstract | String | Optional. Brief description of the dataset/project. Max 500 chars. |
3. Dataset-Level Quality
General quality assurance procedures applied to the dataset as a whole.
| Field | Type | Description |
|---|---|---|
| automated_qc_applied | Boolean | Required. true or false. Should align with 'auto_qc' in the tabular data. |
| automated_qc_methods | String | Optional. Comma-separated exact options: bounds check, sticking values, comparison to other signals, low values check, high values check, completeness, comparison to co-located duplicate sensor or sample, automated bump checks, automated zero checks, other algorithmic checks. |
| automated_qc_description | String | Optional. Free text summary of automated QC. |
| data_review_undergone | Boolean | Required. true or false. Should align with 'review_level_code' in the tabular data. |
| data_review_methods | String | Optional. Comma-separated options: qualitative review, quantitative review, external party reviewing - [add here], based on established methods associated with an official Program - [add name here]. |
| data_review_description | String | Optional. Free text summary of the review process. |
| official_monitoring_programs | String | Optional. E.g., US EPA NATTS Program. |
| other_processing_desc | String | Optional. Any other processing users should be aware of. |
| useful_links | String | Optional. Comma-separated URLs (e.g., project webpages, QAPPs, publications, procedures). |
4. Sites
Defines the physical locations of the monitoring stations. Provide one entry per site.
Special Case: Handling Mobile and Wearable Data
AQDx fully supports mobile platforms (e.g., regulatory vans) and wearable sensors. Because the precise, second-by-second GPS coordinates for mobile data are recorded in your tabular data file, the Sites metadata entry is used to define the General Study Area or the Mobile Platform itself.
If your data is mobile or wearable, fill out the Sites section using these guidelines:
- site_name: Name the route, the bounding area, or the platform (e.g., "Denver Mobile Van 1", "I-25 Route", "Wearable Subject A").
- latitude / longitude: Enter the centroid (center point) of your study area, the starting point of your route, or the "home base" of the instrument.
- surroundings_type: Choose the code that best represents the overall area, or select
10(Mixed) if the route covers diverse environments.
(Note: When you set the monitoring_approach on the Instruments tab to 4 (Mobile) or 5 (Wearable), data users and databases will automatically know to look at your tabular data for the exact timestamps and moving coordinates.)
| Field | Type | Description |
|---|---|---|
| site_name | String | Required. Unique identifier for the location. Max 64 chars. |
| latitude / longitude | Decimal | Required. Coordinates in decimal degrees. |
| original_gis_datum | String | Required. Expected coordinate system: WGS84. Use this field to indicate if the data was converted to WGS84. |
| address | String | Optional. Physical address of the site. |
| state_code / county_code | Integer | Required. AQS state and county codes. |
| site_owner | String | Required. Person or organization owning the site. Max 128 chars. |
| site_photos_url | String | Optional. Link to site imagery. Max 200 chars. |
| surroundings_type | Integer | Required. 1-Urban, 2-Rural, 3-Suburban, 4-Industrial, 5-Residential, 6-Ag, 7-Natural, 8-Rec, 9-Water, 10-Mixed, 11-Other. |
| nearby_sources | String | Optional. Free text description of nearby pollution sources. |
| reg_aqs_id | Integer | Required if regulatory. 9-digit AQS monitoring location code. |
| reg_monitoring_scale | Integer | Required if regulatory. 1-Micro, 2-Middle, 3-Neighborhood, 4-Urban, 5-Regional, 6-National, 7-Global. |
| reg_site_type | Integer | Required if regulatory. Objective category code. |
| reg_groundcover | Integer | Required if regulatory. Dominant groundcover code. |
5. Instruments & Parameter-Level Quality
Defines the specific devices, configurations, and chemical/physical parameters measured. Provide one entry per unique combination of device_id and parameter_code.
| Field | Type | Description |
|---|---|---|
| device_id | String | Required. The device identifier. Matches tabular data. |
| parameter_code | String | Required. 5-digit AQS parameter code. Matches tabular data. |
| site_name | String | Required. Must exactly match a site_name defined in the Sites section. |
| manufacturer_name | String | Required. Manufacturer of the instrument. |
| device_model | String | Required. Model name assigned by manufacturer_name. |
| firmware_version | String | Optional. Instrument firmware version. |
| method_code | String | Required. 3-digit method code. Matches tabular data. |
| monitor_start_date | Integer | Required. Format: YYYYMMDD. |
| probe_height_m | Decimal | Required. Height above ground in meters. |
| monitoring_approach | Integer | Required. 1-Stationary Cont, 2-Stationary Integrated, 3-Intermittent, 4-Mobile, 5-Wearable. |
| monitoring_objective | Integer | Required. 1-Ambient, 2-Near-source, 3-Fenceline, 4-Community, 5-Personal, 6-Indoor, 7-Other. |
| expanded_objective | String | Required. Free text expanded objective. |
| sampling_frequency_sec | Decimal | Required. Frequency in seconds (e.g., 60, 3600). |
| residence_time_sec | Decimal | Optional. Residence time for reactive parameters. |
| airflow_arc_degrees | Integer | Required. Unrestricted airflow (0–360). |
| instrument_photos_url | String | Optional. URL to instrument installation photos. |
| dist_obstructions_m | Decimal | Required. Distance from obstructions not on roof (meters). |
| dist_roof_obstructions_m | Decimal | Optional. Distance from obstructions on roof (meters). |
Instrument/Parameter-Specific Data Quality
These fields apply only to the specific device and parameter defined in the row.
| Field | Type | Description |
|---|---|---|
| corrections_applied | Boolean | Required. true or false. Matches tabular 'corr_code'. |
| corrections_methods | String | Optional. Comma-separated options: manual correction, automated correction, global correction model applied, unit/device-specific corrections applied, corrections derived from [frequency] co-locations, published model used [add name/description], corrections based on periodic calibration via standards, correction after re-processing/re-analysis. |
| corrections_description | String | Optional. Free text including frequency. |
| detection_limit_methods | String | Optional. Comma-separated options: method detection limit, instrument detection limit, determined empirically, taken from manufacturer specifications, method used - [add name/description]. |
| detection_limit_desc | String | Optional. Free text limit description. |
| precision_quantified | Boolean | Optional. true or false. |
| precision_desc | String | Optional. Metrics (e.g., standard deviation, CV) and dataset used. |
| bias_linearity_quantified | Boolean | Optional. true or false. |
| bias_linearity_desc | String | Optional. Metrics (e.g., simple linear regression) and dataset used. |
| accuracy_error_quantified | Boolean | Optional. true or false. |
| accuracy_error_desc | String | Optional. Metrics (e.g., RMSE) and dataset used. |
| maintenance_procedures_desc | String | Optional. Frequency and date last performed for this device. |
Regulatory Fields
Complete these fields only if is_regulatory_data is set to 1. Otherwise, leave null or blank.
| Field | Type | Description |
|---|---|---|
| reg_monitor_type | String/Int | Optional. AQS monitor type. |
| reg_method_type | String/Int | Optional. AQS method type. |
| reg_network_affiliation | String | Optional. Network affiliation. Max 64 chars. |
| reg_collecting_agency | String | Optional. Collecting agency name. Max 64 chars. |
| reg_agency_code | Integer | Optional. Agency code. |
| reg_analysis_method | String | Optional. Analysis method used. Max 64 chars. |
| reg_analytical_lab | String | Optional. Analytical lab name. Max 64 chars. |
| reg_probe_material | String | Optional. Material of the probe. Max 64 chars. |