Field Dictionary
This page defines the standard vocabulary for the AQDx format. Regardless of whether you are using a tabular file (CSV/Excel) or a JSON stream, these field names and data types serve as the single source of truth.
1. Time & Measurement
These fields define what was measured, when it was measured, and how much was found.
| Field Name | Data Type | Value Required | Description |
|---|---|---|---|
| datetime | ISO 8601 String (29) |
Yes | The date and time of the measurement (start of the sampling period). |
| parameter_code | String (5) | Yes | The 5-digit AQS code identifying the pollutant or variable. |
| parameter_value | Decimal (12,5) | No | The actual measured value. |
| unit_code | String (3) | Yes | The 3-digit AQS code identifying the unit of measure. |
| method_code | String (3) | No | The 3-digit code for the measurement method. |
| duration | Decimal (12,3) | Yes | The duration of the sample in seconds. |
| aggregation_code | Integer (1) | Yes | Indicates the mathematical or physical method used to represent the data over time. |
datetime
Format: ISO 8601 String (29)
Example: 2008-10-08T12:00:43-06:00
The date and time of the data value. It must follow the "Date and time with the offset" ISO 8601 format YYYY-MM-DDThh:mm:ssTZD, where TZD is the Time Zone Designator (offset from UTC). See also: https://en.wikipedia.org/wiki/ISO_8601
- Precision: For data reporting faster than 1Hz (less than one second), report seconds with a decimal (ss.sss). The maximum allowed precision is milliseconds, translating to a maximum allowed string length of 29 characters.
- Timing: The timestamp corresponds to the beginning of the averaging or sampling period.
- Time Zone: Must include the offset (e.g.,
-06:00for CST,+00:00for UTC). Do not use "Z" for UTC. - see more details in Data Types & Conventions
parameter_code
Format: String (5)
Example: 44201 (Ozone)
A 5-digit numerical code that identifies the parameter being measured. These codes are based on the EPA's Air Quality System (AQS) parameter code library.
- Common Codes:
44201: Ozone (O3)88101: PM2.5 - Local Conditions61101: Wind Speed
- Note: Only list one parameter code per data record.
- Note: Parameter code cannot be blank, if a parameter code or method_code is missing for your project, please open a github issue to have a code or codes added.
parameter_value
Format: Decimal (12,5)
Example: 35.5
The actual data value of the specified parameter.
- Precision: Round data to the 5th decimal place if the measured value has larger precision than 5 decimal places.
- Formatting: Do not use commas (e.g., use
1500, not1,500). - Can be blank: Leave the field blank if the measurement could not be taken or is missing.
- Zero vs. Null: distinct meanings must be preserved.
0.0: A valid measurement indicating zero concentration.- Blank / Null: The absence of a measurement (e.g., power failure, maintenance, sensor error).
- Validation Rules:
- If
parameter_valueis blank, thevalidity_codemust be 9 (Invalid/Missing) for processed data or 0 (Not Validated) for raw data. - If
parameter_valueis blank, it is recommended to provide aqualifier_code(e.g.,AMfor Miscellaneous Void) to explain the missing data.
- If
- Zero vs. Null: distinct meanings must be preserved.
unit_code
Format: String (3)
Example: 008 (ppb)
A 3-digit code associated with the units of the measurement. These codes are based on the EPA's Air Quality System (AQS) unit code library.
- Common Codes:
008: Parts per billion (ppb)001: Micrograms/cubic meter (µg/m³) at 25°C105: Micrograms/cubic meter (µg/m³) at Local Conditions017: Degrees Centigrade (°C)- View Unit Codes
method_code
Format: String (3)
Example: 170 (Met One BAM-1020)
A 3-digit code associated with the reference method used to perform an EPA-designated FRM/FEM or other officially prescribed measurement. These codes are based on the EPA's Air Quality System (AQS) method code library.
Value RequiredNo- Regulatory-Grade Instruments (e.g., FRM/FEM): Strongly Recommended. If the instrument is an EPA Federal Reference Method (FRM), Federal Equivalent Method (FEM), or a Compendium Method, you should provide the specific 3-digit code defined by the EPA (e.g.,
170for BAM-1020). - Low-Cost Sensors / Non-Regulatory: Leave Blank. If the device is a low-cost sensor or has not been EPA-designated, leave this field blank.
- Regulatory-Grade Instruments (e.g., FRM/FEM): Strongly Recommended. If the instrument is an EPA Federal Reference Method (FRM), Federal Equivalent Method (FEM), or a Compendium Method, you should provide the specific 3-digit code defined by the EPA (e.g.,
- View Method Codes
duration
Format: Decimal (12,3)
Example: 3600 or 1.500
The duration of the sampling period or mathematical aggregation window (see aggregation_code below) in seconds.
General Guidelines
- Integers Preferred: For standard intervals, use whole numbers without decimal padding (e.g., use
3600, not3600.000). - Precision: Fractional seconds are allowed up to milliseconds (3 digits after the decimal point) if high-precision timing is required.
- Variable Duration: If a sensor's physical sampling duration varies slightly row-to-row (e.g., fluctuating between 90 and 92 seconds), you may use a consistent, approximated nominal duration (e.g.,
90) for the entire dataset to reduce computational burden and make the data easier to query and compare. - Instantaneous / Unknown: Use
0to explicitly flag a measurement where the duration is near-instantaneous, highly inconsistent (sub-minute), or completely unknown. This may be espeically relevant to the data coming from a low-cost sensor rather than a precision regulatory or research-grade monitor.
For long-term aggregations (like months or years), standard generalized timeframes are recommended to maintain consistency across leap years and varying month lengths, unless the exact physical duration of a specific period is required.
Common Duration Values:
0= Instantaneous / Unknown (Low-Cost Sensor flag)60= 1 Minute3600= 1 Hour86400= 1 Day (24 Hours)604800= 1 Week (7 Days)2592000= 1 Month (Standardized 30-Day Period)31536000= 1 Year (Standardized 365-Day Period)
aggregation_code
Format: Integer (1)
Example: 1 (Mean)
Indicates the mathematical or physical method used to represent the data over the specified duration.
0: None / Native Resolution. The data is reported at the instrument's native sampling frequency.1: Mean (Average). The mathematical average of measurements over the specifiedduration.2: Time-Integrated (Physical). A single physical sample accumulated over theduration(e.g., a 24-hour PM filter or VOC canister).3: Maximum. The highest single value recorded within theduration.4: Median (50th Percentile). The middle value of the measurements within theduration.5: Rolling / Moving Average. A mathematical average calculated over a moving window (e.g., an 8-hour rolling ozone average). Timestamp is assumed to be the beginning of the window unless otherwise specified in the metadata form.6: Spatial Aggregation. Data grouped by a geographic boundary rather than strictly by time (e.g., binning mobile data into 50-meter road segments).- Specific details of the method used must be documented in the accompanying AQDx metadata form.
- For
duration, report the total integration time (sum of durations) of all observations included in the spatial bin.
7: Other. Any aggregation method not listed above, including specific statistical percentiles (e.g., 90th, 98th). Specific details of the method used must be documented in the accompanying AQDx metadata form.
2. Location
These fields define where the measurement was taken.
| Field Name | Data Type | Value Required | Description |
|---|---|---|---|
| latitude | Decimal (9,5) | Conditional | Latitude in WGS84 decimal degrees. |
| longitude | Decimal (9,5) | Conditional | Longitude in WGS84 decimal degrees. |
| elevation | Decimal (8,2) | No | Elevation of the device in meters. |
latitude
Format: Decimal (9,5)
Example: 39.7392
Latitude in decimal degrees (WGS84).
- Positive: North of the Equator.
- Negative: South of the Equator.
- Precision: Report to the 5th decimal point (~1 meter precision).
- Conditional:
latitudeis required except in the following circumstances:- Mobile Monitoring Exception:
latitudemay be left blank due to a temporary loss of GPS fix on a mobile platform, but you must include theIG(GPS invalid) code in thequalifier_codesfield.
- Mobile Monitoring Exception:
longitude
Format: Decimal (9,5)
Example: -104.9903
Longitude in decimal degrees (WGS84).
- Positive: East of the Prime Meridian.
- Negative: West of the Prime Meridian (e.g., USA).
- Precision: Report to the 5th decimal point.
- Conditional:
longitudeis required except in the following circumstances:- Mobile Monitoring Exception:
longitudemay be left blank due to a temporary loss of GPS fix on a mobile platform, but you must include theIG(GPS invalid) code in thequalifier_codesfield.
- Mobile Monitoring Exception:
elevation
Format: Decimal (8,2)
Example: 1609.3
Elevation of the device in meters above mean sea level (MSL). Can be left blank.
3. Device & Organization
These fields define who collected the data and with what hardware.
| Field Name | Data Type | Value Required | Description |
|---|---|---|---|
| data_steward_name | String (64) | Yes | The organization responsible for the data. |
| device_id | String (64) | Yes | An internal identifier used by the data steward. |
| measurement_technology_code | String (14) | Yes | categorizes the physical measurement technology of an instrument. |
| instrument_classification | Integer (1) | Yes | Regulatory standing or operational tier of the instrument. |
| dataset_id | String (128) | Yes | Unique identifier to connect dataset to metadata form. |
data_steward_name
Format: String (64)
Example: CityOfDenver or city_of_denver
Name of the party responsible for data oversight.
- Formatting: Use PascalCase or snake_case to separate words.
- Forbidden: Do not use commas, spaces, or periods.
device_id
Format: String (64)
Example: A123-Sensor-01
An internal identifier used by the data steward to uniquely distinguish this specific instrument within the dataset. Its primary purpose is to link the measurements in the data file to the instrument's details in the accompanying metadata form.
- Internal Identification: This is a localized text field to differentiate measurements. It is not intended to be a globally searchable, standardized hardware ID.
- Recommended Convention: We recommend using a combination of
[device model]-[ID#]-[sensor type or operating principle]. TheID#can be an internal project number, a device serial number, or a device MAC address.- Example:
atmotube_pro_01_ls(where "ls" stands for light scattering) - Example:
nodeA_macaddress_pms5003
- Example:
- Other Valid Formats: A simple hardware serial number, MAC address, or custom project ID (e.g.,
Monitor_1) are also acceptable. - Allowed Characters: Spaces and hyphens.
- Forbidden: Do not use commas or periods.
measurement_technology_code
Format: String (14)
Examples: DA-00-SC, ICep-GCca-MS
A structured, hierarchical code that chronologically categorizes the physical journey of a sample from acquisition to the final signal.
- Acquisition: How the sample is acquired (e.g., in-situ, canister, remote sensing, etc.)
- Conditioning: The most significant physical or chemical treatment step applied to the sample (e.g., gas chromatography, de-humidification, thermal desorption, chemical ionization, etc.)
- Detection: The actual method of detection (e.g., mass spectrometry, PID sensor)
Code Structure: [Acquisition]-[Conditioning]-[Detection]
Each of the three steps requires a 2-character broad uppercase code (XX). You can optionally append two lowercase characters (xx) to designate a specific hardware subtype (e.g., ICep for Integrated Canister, electropolished.) The blocks must be separated by hyphens.
Key Rules:
- System Boundary: The code describes the end-to-end measurement system. For integrated or passive methods, it includes both the field acquisition AND the downstream laboratory analysis. Deep analytical nuances belong in the accompanying AQDx Metadata Form (YAML).
- The "00" Conditioning block: Use
00for the Conditioning block ONLY if no intentional physical or chemical transformation occurred before the detector. If the system intentionally changes humidity, removes interferents, selects a size fraction, or chemically ionizes the sample, it is not00. - Conditioning Priority: If multiple conditioning steps exist, encode the one that most constrains what physically reaches the detector (e.g., a size cut or preconcentration) and document the rest in the YAML metadata form.
Note: You must use approved vocabulary. Please refer to the Measurement Technology Code Builder Tool in the code lookup tables to find the exact tokens permitted for your setup.
instrument_classification
Format: Integer (1)
Example: 3
Indicates the objective regulatory standing or operational tier of the instrument generating the data.
Allowed Values:
1= Regulatory-Grade Monitor: The instrument is operating under a formal, active designation from a recognized environmental authority (e.g., the US EPA) for the specific parameter being reported. This would include FRM/FEM-type instruments being operated according to their respective EPA methods as well as sampling taking place under a program such as the NATTS (National Air Toxics Trends Sites) Program, adhering to EPA procedures. Note: If this code is used, the exact FRM/FEM designation should ideally be recorded in themethod_codefield if applicable.2= Research-Grade Analytical Monitor: High-fidelity instruments or methods that do not hold a formal regulatory designation but are widely accepted for rigorous scientific study. This includes advanced non-designated continuous monitors, as well as physical samples collected in the field and transported to a laboratory for discrete analytical analysis (e.g., GC/MS on canisters, XRF on filter tape).3= Consumer-Grade Monitor: Continuous monitors or indicative devices that actively measure ambient air but lack formal regulatory designation or research-grade analytical rigor. These devices are highly valuable for spatial mapping, identifying local trends, and supplemental public awareness.
dataset_id
Format: String (128)
Example: CDPHE_DowntownStation_20260213 or 123e4567-e89b-12d3-a456-426614174000
A unique identifier that explicitly links this specific row of data to its corresponding AQDx dataset-level metadata file (e.g., AQDx_metadata_form_v3.yaml). This exact string must be present on every row of the tabular data file and must perfectly match the dataset_id field defined at the top of the accompanying metadata file.
- Forbidden: Do not use spaces, commas, or special characters other than hyphens (
-), underscores (_), and periods (.).
To ensure global uniqueness across the AQDx ecosystem without relying on a central registry, data creators must generate this ID using one of the following three approved methods.
- Method 1: Semantic Namespace (Recommended). Create a self-documenting, human-readable string by combining your organization's metadata fields with high-resolution temporal or spatial identifiers.
- Formula:
[data_steward_name]_[project_or_device_id]_[YYYYMMDD] - Single Sensor Example:
CleanAirVision_A123-Sensor-01_20260213- note: if you are submitting a dataset with multiple sensors, use project notation below.
- Network/Project Example:
CDPHE_WinterInversion_20260213
- Formula:
- Method 2: UUID v4. Generate a standard Universally Unique Identifier. This is ideal for automated, programmatic data pipelines.
- Tooling: Specialists can generate these natively in Python (
import uuid; uuid.uuid4()) or R (uuid::UUIDgenerate()). - Web Generator: If generating manually, use a trusted standard generator such as https://www.uuidgenerator.net/version4.
- Example:
123e4567-e89b-12d3-a456-426614174000(do not use this uuid!)
- Tooling: Specialists can generate these natively in Python (
- Method 3: External DOI / URI. If the dataset is published to an academic or government repository (e.g., Zenodo, Dataverse), use its assigned permanent identifier (DOI) or record number.
- Example:
10.5281/zenodo.1234567
- Example:
4. Quality Control (QC)
These fields describe the level of processing applied to the data. Note, the presence of specific codes do not necessarily indicate higher data quality - rather this field is intended to provide the end user with details necessary to inform their use or interpretation of the data.
| Field Name | Data Type | Value Required | Description |
|---|---|---|---|
| validity_code | Integer (1) | Yes | The assessed validity of the individual measurement. |
| calibration_code | Integer (1) | Yes | Indicates whether the data has been systematically corrected or the instrument has been calibrated. |
| review_level_code | Integer (1) | Yes | Indicates the level of human review the dataset has undergone. |
| detection_limit | Decimal (12,5) | No | Detection limit for the method used to measure parameter_value. |
| qualifier_codes | String (254) | No | Space-separated codes explaining why data was flagged or describing specific events. |
validity_code
Format: Integer (1)
Example: 1 (Valid)
The assessed validity of the individual measurement. Validation extends beyond simple statistical outlier detection; it evaluates physical limits, hardware faults, "sticking" (unchanging) values, sensor degradation, and data completeness. Validation under this code includes both automated and manual proccesses.
0: Validation not performed. Raw data directly from the device. No QC checks have been applied to verify if a blank value is a true outage or a transmission error.- Note: Use this code for gaps in raw, real-time streams (
parameter_valueis blank) where no post-processing has occurred
- Note: Use this code for gaps in raw, real-time streams (
1: Valid. Data passed all QC (quality control) checks and is considered accurate for analysis.3: Estimated. Data is considered valid, but the value was mathematically derived or interpolated rather than directly measured at this exact timestamp.5: Suspect. Data is physically possible but exhibits anomalous behavior (e.g., unexplained spikes, deviation from neighboring sensors, or operation during extreme weather). There is insufficient evidence to invalidate it entirely, but it should be used with caution.8: QA/QC data. Legitimate measurements taken during quality control procedures, such as zero/span checks, flow audits, or calibration events. While these values are "valid" representations of the instrument's response to a reference standard, they do not represent ambient air quality and must be excluded from environmental statistics (e.g., daily averages, AQI calculations).9: Invalid or Missing. Data that should not be used for analysis. Includes missing values (e.g., power failures, maintenance gaps, lost data), instrument malfunctions, failed range checks, or data failing completeness criteria (e.g., insufficient uptime for an hourly average). Note, ifparameter_valueis blank in a processed dataset, this code must be used.
calibration_code
Format: Integer (1)
Example: 2 (Formally Verified)
Indicates the approach and documentation of any post-processing corrections or calibrations applied by the Data Steward. This field tracks adjustments applied to the parameter_value, which are in addition to any internal processing performed by a sensor-type device's firmware. This field also tracks conventional calibrations to higher-grade air monitoring instruments, where direct adjustments to the parameter_value output may be made by following a calibration to a standard. Note, the code is intended to indicate the type of calibration applied to the data, it does not necessarily reflect the efficacy of the calibration. The performance of the specific calibration method should be elaborated on in the metadata documentation. In addition, "method" here includes all procedures associated with calibration through the deployment and maintenance of the instrument - it is not limited to the specific calibration model applied.
0: None / Default. The data is reported exactly as output by the device using the manufacturer's default factory calibration. No post-collection mathematical adjustments have been made.1: Provisional. The data was mathematically adjusted using a custom, project-specific, or experimental method. While the method may be highly effective for the specific project, it has not been demonstrated to be robust and widely applicable across a range of locations and long-term use. This designation is likely to apply to a large proportion of air quality sensor data. If this designation is selected, providing additional details in the metadata form is highly recommended.2: Established. The data was corrected using a robust, widely applicable methodology. To qualify for this code, the entire method's performance must be quantified across a range of locations as well as long-term use (e.g., more than a year). Given the impact variable temperatures, humidities, and presence of confounding pollutants can have on a sensor device's performance, it is important that an "Established" method's performance be understood across a range of conditions. Furthermore, this method should be explicitly documented in a Quality Assurance Project Plan (QAPP), described in a peer-reviewed scientific publication, or accepted for use by a government environmental agency. Note, under this category, a calibration model for individual sensors may rely on local information, however if the entire set of procedures (e.g., timeframes associated with the deployment, procedures for conducting co-locations, acceptance criteria for reviewing and applying updated calibration factors, etc.) are well-established, then this designation would be appropriate. If this designation is selected, providing additional details in the metadata form is required.3: Conventional. The instrument was directly calibrated against a certified physical reference standard or secondary standard (e.g., physically adjusted using National Institute of Standards and Technology (NIST) traceable zero-air and span gas checks). This designation is likely to include the majority of higher grade research and regulatory instruments that are being calibrated according to official or conventional approaches. This designation also applies where rigorous calibrations and QA (quality assurance) are applied to instruments used to analyze physical samples. If this designation is selected, providing additional details in the metadata form is required.
Calibration method examples and associated codes for air quality sensors:
- Scenario 1: A community group has deployed a sensor network in their neighborhood. These sensors are periodically co-located with regulatory-grade instruments and an academic partner uses this data to develop sensor-specific calibration models to improve the data quality. Data quality is being monitored as the deployment occurs and the team plans to write up a report detailing the data and methods at the conclusion of the study. [CODE = 1]
- Scenario 2: A community group has deployed a sensor network in their neighborhood. The manufacturer has developed procedures to ensure consistent data quality over time, which entail periodic updates to the values in the calibration models being used to improve the sensor data's accuracy. These periodic updates may be based on data collected remotely or via physical co-location. However, the procedures for completing calibrations are predetermined, there are clear thresholds to guide the acceptance of a calibration update, and the performance of the overall method is well-understood across a range of locations. [CODE = 2]
- Scenario 3: A research group is calibrating sensors prior to deployment in a controlled chamber. These sensors are then deployed to locations in the fields where their performance is evaluated. The objective of the project is to develop a robust method for calibrating sensors in the lab that applies well to the field. [CODE = 1]
- Scenario 4: A research group is testing a new method of correcting sensor data that uses machine learning. This method is considerably effective at enhancing data quality. However, it has not yet been tested in different locations across different seasons. [CODE = 1]
- Scenario 5: A general correction for a particular sensor has been developed. This correction has been tested across many different urban and rural locations, furthermore, the performance of this correction model has been studied over long periods of time (e.g., one or more years). This has resulted in a correction approach whose performance is well understood, including its strengths and limitations. An example is the US EPA's extended U.S.-wide correction for PurpleAir sensor data. [CODE = 2]
review_level_code
Format: Integer (1)
Example: 1 (Internal Review)
Indicates the level of review the data has undergone. The levels follow the typical review processes, where data collectors would first review the data internally and then depending on the dataset possibly seek an external review. The highest level of review is "Certified" which follows the EPA requirements and processes for certification and is only applicable to data that has been collected for compliance with NAAQS, using federal reference or equivalent method instruments operated by a Tribal, state, or local air monitoring agency. Providing supporting details on these reviews in the metadata form is recommended.
0: Raw. Direct from device, no human review.1: Internal. Reviewed by the data creator/project team.2: External. Audited by an independent third party.3: Certified. Legally certified for regulatory use (requires FRM/FEM).
detection_limit
Format: Decimal (12,5)
Example: 0.50000
Lists the detection limit for the measurement, expressed in the same units as parameter_value (i.e., the record’s unit_code). This field is intended to provide the measurements with additional context. Note, it is recommended that no substitutions or replacements are made with respect to the detection limit (i.e., substituting values less than the detection limit with 0.5*DL). Rather, it is recommended that data be shared alongside the detection limit so the end user can decide how to process the data for a given analysis.
- This field should be left blank/omitted when a detection limit is unknown or not applicable.
- Please note the method used to determine the detection limit in the metadata form included with the submission.
qualifier_codes
Format: String (254)
Example: IM
Space-separated codes explaining why data is flagged or describing specific events. These qualifiers also use the current AQS qualifiers, which you can find in the U.S. EPA’s Qualifiers List for a comprehensive, up-to-date listing. Supplemental codes are for additional monitoring scenarios (e.g., mobile monitoring applications) are also avaiable in this GitHub. A few examples are provided below.
- Examples:
IM(Prescribed Fire)LJ(High Winds)AA AG BG ND(Multiple qualifier codes in one measurement)(Can be blank)
- View Qualifier Codes