From 4ea7d16f3c8f4f089af5b58c0fd7d5fcd36ad1e7 Mon Sep 17 00:00:00 2001 From: OumaimaAMAL <117732363+OumaimaAMAL@users.noreply.github.com> Date: Wed, 5 Mar 2025 11:23:15 +0100 Subject: [PATCH] Update README.md --- README.md | 179 +++++++++++++++++++++++++++++++++++++++++++++++++++++- 1 file changed, 178 insertions(+), 1 deletion(-) diff --git a/README.md b/README.md index b932d55..f34ece4 100644 --- a/README.md +++ b/README.md @@ -1 +1,178 @@ -# dqa-ontology \ No newline at end of file +# Data Quality Assessment Ontology (DQA) + +## Table of Contents +1. [About the Ontology](#about-the-ontology) + - [Overview](#overview) + - [Ontology Classes](#ontology-classes) + - [Object Properties](#object-properties) + - [Data Properties](#data-properties) + - [usage](#usage) +2. [Download](#Download) +3. [Contributors](#contributors) +4. [Contact](#contact) + +## About the ontology + +### Overview + +The **Data Quality Assessment Ontology (DQA)** is a suggested extension of the [Data Quality Vocabulary (DQV)](https://www.w3.org/ns/dqv#) designed to enhance the assessment of data quality (DQ). It introduces structured evaluation processes, traceability of applied methods, granularities for assessments, and results of applying DQ indicators. + +Below is a diagram that represents the DQ Assessment Ontology (DQA): + + + +### Ontology Classes + +The ontology defines the following key classes: + +- **QualityAssessmentProcess**: Represents the process of evaluating DQ. +- **AssessmentGranularity**: Represents the level at which DQ is assessed (e.g., unique observation, set of observations). +- **AssessmentLabel**: Categorizes the result of a quality evaluation (e.g., "High Quality", "Medium Quality"). +- **AssessmentMethod**: Defines the method used for DQ assessment, either formula-based or source-based. +- **FormulaBasedMethod**: A specific type of assessment method based on predefined formulas. +- **SourceBasedMethod**: A specific type of assessment method based on the source of the data. + +### Object Properties + +The ontology includes the following object properties to define relationships between the classes: + +- **assessesIndicator**: Links a quality assessment process to the indicator it evaluates. +- **findsSource**: Links a quality assessment process to the data source identified in the knowledge graph. +- **hasGranularity**: Specifies the granularity level at which DQ is assessed. +- **hasLabel**: Links a quality measurement to an assessment label. +- **producesResult**: Associates a quality assessment process with its resulting evaluation. +- **usesMethod**: Links a quality assessment process to the specific method used. +- **usesMetric**: Defines the metric used in a formula-based method. + +### Data Properties + +The ontology defines the following data properties: + +- **labelThreshold**: Specifies the numerical threshold for assigning a quality label (e.g., "High Quality" threshold = 80%). + +## Usage + +### Use Case + +We consider a simple baseline example. The use case involves five devices installed in a classroom within a campus building. These devices include five sensors: two presence sensors and three integrated into three platforms—an air conditioner, a projector, and a ventilation system—all of which track their energy consumption (in watts) as observations. The use case is described using SOSA. + + + +### Steps of DQA based approach and examples + +#### Step 1: Enriching the Observation + +The first step involves receiving an observation and enriching it using the **SOSA** ontology. This phase does not require describing the processing of the observation in terms of representational indicators, as these mainly concern the proper representation of data rather than its qualitative assessment. Once enrichment is completed, the observation is stored in the GC repository to be used in the evaluation phases. + +ex:obs_001 a sosa:Observation ; + rdfs:comment "Observation of energy consumption from airconditioner#32" ; + sosa:hasFeatureOfInterest ex:room016 ; + sosa:hasSimpleResult "1900.0"^^xsd:float . + sosa:madeBySensor <http://www.w3.org/ns/sosa/examples#PZEM-004T#53> ; + sosa:observedProperty ex:energyConsumption ; + sosa:resultTime "2025-01-31T08:00:00"^^xsd:dateTime ; + sosa:usedProcedure ex:RoomPowerConsumption . + + +#### Step 2: Evaluating DQ Indicators at the UO Level + +At this step, DQ indicators that can be assessed on a single observation (**UO**) are applied. These include **accuracy**, **provenance**, **relevance**, and **freshness**. Each indicator is evaluated individually, and the resulting data is stored in the repository with a complete description of the evaluation process using the **SOSA**, **DQV**, and **DQA** ontologies. The **DQV** ontology is used to describe the evaluation of DQ dimensions (indicators), while **DQA** extends this descriptive model by integrating metadata associated with the evaluation process. + +##### Formula-Based Method (Accuracy) : + +ex:AccuracyAssessmentobs_018 a dqa:QualityAssessmentProcess ; + dqa:assessesIndicator ex:Accuracy ; + dqa:hasGranularity dqa:GranularityUniqueObservation ; + dqa:producesResult ex:Accuracy_obs_018Result ; + dqa:usesMethod ex:AccuracyMethod . + +ex:AccuracyMethod a dqa:FormulaBasedMethod ; + dqa:usesMetric ex:MetricAccuracyMethod . + +ex:MetricAccuracyMethod a dqv:Metric ; + skos:defintion "100*(1 - (absolute(observed - expected) / expected))" ; + dqv:expectedDataType xsd:double ; + dqv:inDimension ex:Accuracy . + +ex:Accuracy a dqv:Dimension ; + skos:definition "The considered data value must be close to the realistic expected value, ..." ; + skos:prefLabel "Accuracy"@en . + +ex:Accuracy_obs_018Result a dqv:QualityMeasurement ; + sdmx-attribute:unitMeasure <http://www.w3.org/vocabularies/om-1.8/Percentage> ; + dqv:isMeasurementOf ex:MetricAccuracyMethod ; + dqv:value 9.8e+01 . + + +##### Source-Based Method (Provenance) : + + +ex:ProvenanceAssessmentobs_018 a dqa:QualityAssessmentProcess ; + dqa:assessesIndicator ex:Provenance ; + dqa:hasGranularity dqa:GranularityUniqueObservation ; + dqa:findsSource <http://www.w3.org/ns/sosa/examples#PZEM-004T#08> ; + dqa:usesMethod ex:ProvenanceMethod . + +ex:ProvenanceMethod a dqa:SourceBasedMethod ; + dqv:inDimension ex:Provenance . + +ex:Provenance a dqv:Dimension ; + skos:definition "Provenance refers to the source of the data values and its description." ; + skos:prefLabel "Provenance"@en . + + +#### Step 3: Evaluating DQ Indicators at the SO-SD and SO-MD Levels + +In this final step, the evaluation focuses on sets of observations (**SO-SD** and **SO-MD**). Two types of evaluations can be performed: + +- **Task-Agnostic Evaluation**: Some DQ indicators can be evaluated independently of a user-defined objective, only considering the context of the observations’ generation. These indicators include **precision**, **duplication**, and **consistency**. +- **Task-Specific Evaluation**: When a user defines a specific task, additional DQ indicators may be necessary, alongside the three previously mentioned indicators. These include **completeness**, **timeliness**, and **relevance**. + +To structure this step, the **OBOE** ontology is used to define the sets of observations on which the evaluation is performed. Then, the evaluation results are documented using **DQV**, and the evaluation process is described using **DQA**. + +##### Formula-Based Method (Duplication) : + +ex:ObservationCollection_1 a obeo:ObservationCollection ; + obeo:hasMember ex:obs_001, ex:obs_002, ex:obs_003, ex:obs_004, ex:obs_005 . + +ex:DuplicationAssessement_set_obs_1 a dqa:QualityAssessementProcess ; + dqa:assessesIndicator ex:Duplication ; + dqa:hasGranularity dqa:GranularitySetObservationsSingleDevice ; + dqa:producesResult ex:Duplication_set_obs_1_Result ; + dqa:usesMethod ex:MetricDuplication . + +ex:DuplicationMethod a dqa:FormulaBasedMethod ; + dqa:usesMetric ex:MetricDuplication . + +ex:MetricDuplication a dqv:Metric ; + skos:defintion "100 * (duplication_count / total)"@en ; + dqv:expectedDataType xsd:float ; + dqv:inDimension ex:Duplication . + +ex:Duplication a dqv:Dimension ; + skos:definition "The proportion of duplicated observations in a set." ; + skos:prefLabel "Duplication"@en . + +ex:Duplication_set_obs_1_Result a dqv:QualityMeasurement ; + sdmx-attribute:unitMeasure <http://www.w3.org/vocabularies/om-1.8/Percentage> ; + dqv:isMeasurementOf ex:MetricDuplication ; + dqv:value "0.0"^^xsd:float . + + + +## Download + +To use the DQA ontology in your project, **Download or Clone the Repository**: You can download the RDF file containing the DQA ontology or clone the repository from GitHub. + + ```bash + git clone https://github.com/yourusername/dqa-ontology.git + ``` + +## Contributors + +- Oumaima Amal - [LinkedIn](https://www.linkedin.com/in/oumaima-amal/) + +## Contact + +For further questions or support, please contact Oumaima Amal via email: oumaima.amal63@gmail.com. + -- GitLab