Interoperability

Département Informatique et Systèmes intelligents - Institut Henri Fayol

Data interoperability and Semantics (20h)

Organization

Syllabus

This lecture aims at understanding the data interoperability issues that arise from the integration of existing application, or the evolution of systems.

The students will understand and use the main data formats. They will learn to use query languages and schema definition languages. They will be given pointers to use these data formats in Java applications.

They will understand the importance of standardized vocabularies, and the importance of relaxing data schemas to enable the injection of additional information in documents.

  • Data formats: XML, JSON, CSV, YAML, CBOR, EXI, EXPRESS, …​

  • Datatypes

  • Query languages (Xpath/Jsonpath)

  • Schema definition languages (XML Schema, JSON Schema)

  • Namespaces and vocabularies

  • Standardized vocabularies and semantics

  • JSON-LD

On completion of the unit, the student will be capable of:

Classification level

Priority

Know the main data formats

Knowledge

Essential

Understand and use the different datatypes

Apply

Important

Know how to use the data query languages for the main data formats

Apply

Important

Know how to use the data schema description language

Apply

Important

Understand the importance of using standardized vocabularies

Summarise

Useful

Self Assessment form and credentials

Evaluation

Program

  • Fri 27/09/2019 08:00 - 12:15 - EF Lab room S2.21

  • Fri 04/10/2019 08:00 - 12:15 - EF Lab room S2.21

  • Fri 18/10/2019 08:00 - 12:15 - EF Lab room S2.21

  • Fri 22/11/2019 08:00 - 12:15 - EF Lab room S2.21

  • Fri 29/11/2019 08:00 - 10:00 - Exam in room EF S2.24

  • Fri 29/11/2019 10:15 - 12:15 - EF Lab room S2.21

Variety of document types

Here is an exhaustive list of knowledge and competencies you should have after this course:

  • You understand the importance of using structured, open, interoperable, data formats;

  • You know that many data formats exist, you can recognize some of them;

  • You have the instinctive habit to check what documents look like, in a text editor or with commands (Linux more, head, tail; Windows PowerShell type);

  • You have the instinctive habit to try to deflate what seem to be archives (ex. java .jar documents are just zip archives);

  • You know what Mediatypes are, the history, where they are registered;

  • You know how Mediatypes are structured. type, subtype, parameters;

  • You know how Mediatypes are used on the web.

Some data formats

Tabular data

  • Exports from Excel, OpenOffice Calc ?

  • CSV, TSV, no standard, just many different implementation choices

Download, compare, open with a text editor, open with Excel, tabular files for the following documents

Here is an exhaustive list of knowledge and competencies you should have after this course:

  • You know about different tabular data formats;

  • You know what is CSV, TSV, and what the acronym means;

  • You know about the various CSV and TSV dialects, the parameters;

  • You know how to load various tabular data into Excel or OpenOffice Calc;

  • You know how to export CSV documents from Excel or OpenOffice Calc;

  • You have experience in parsing and generating tabular data with some programming language.

JSON

Write small JSON documents

  • Use the service https://openweathermap.org to find today’s weather forecast. Describe the information in JSON;

  • Describe the Simpsons family with their age, height, weight, distinctive features

Here is an exhaustive list of knowledge and competencies you should have after this course:

  • You know what JSON is and what the JSON acronym means;

  • You can recognize JSON files, and identify format errors;

  • You can write JSON documents and model simple situations wisely in JSON;

  • You know online tools to validate and prettify your JSON documents;

  • You know how to use a text editor with syntax coloring for JSON, JSON validation, JSON prettifier

XML

Write small XML documents

  • Describe a fictuous paper/stone/scissors game between two members of the Simpsons family

  • Describe your schedule for today.

Here is an exhaustive list of knowledge and competencies you should have after this course:

  • You know what XML is and what the XML acronym means;

  • You can recognize XML files, and identify format errors;

  • You can write XML documents and model simple situations wisely in XML;

  • You know online tools to validate and prettify your XML documents;

  • You know how to use a text editor with syntax coloring for XML, XML validation, XML prettifier

YAML

Write small YAML documents, following specifications

  • Using the OpenAPI 3 specification https://swagger.io/docs/specification/basic-structure/ , describe a small api at http//api.example.org/ where one can get the list of temperature sensors in the building, and for each temperature sensor on can get the current temperature reading.

Here is an exhaustive list of knowledge and competencies you should have after this course:

  • You know what YAML is and what the YAML acronym means;

  • You can recognize YAML files, and identify format errors;

  • You can write YAML documents and model simple situations wisely in YAML;

  • You know online tools to validate and prettify your YAML documents;

  • You know how to use a text editor with syntax coloring for YAML, YAML validation, YAML prettifier

Some primitive types

Dates and Times

Here is an exhaustive list of knowledge and competencies you should have after this course:

  • You know about different formats for Dates and Times;

  • You know about the epoch timestamps, you can recognize an epoch timestamp in seconds or milliseconds, you know how to convert it to human-readable data and vice versa;

  • You know about the XML Datatypes 1.1 for dates and times (starting point: the XML Schema Definition Language 1.1 Datatypes spec);

  • You have experienced Date and Time manipulation and conversion in your favorite programming language.

Binary data encoded in text

Here is an exhaustive list of knowledge and competencies you should have after this course:

  • You know about Base64, Base32, Base16 literals;

  • You know where the actual specification for these formats is, and you browsed it;

  • You can recognize a Base64 image embedded in an HTML document;

  • You know how to convert or encode data in one of these formats;

  • You have experienced manipulation of such data in your favorite programming language. For example, you develop a program to read the Base64 string in the document base64.txt and generate the png document.

  • You can discuss the pros and cons of using Base64 encoding vs. sending it in a separate document. BTW., how would we do this?

Some query languages

JSONPath

Write small JSONPath queries, test with the service http://jsonpath.com/

  • From the weather forecast document at https://www.prevision-meteo.ch/services/json/lausanne , Write JSONPath queries to retrieve:

    • The temperatures at 20h the second day of forecast

    • The list of temperatures for the second day of forecast

    • The list of temperatures at 20h for every day of forecast

    • The condition keys when the humidity is above 65%

Here is an exhaustive list of knowledge and competencies you should have after this course:

  • You know what JSONPath is and what it is used for;

  • You can recognize JSONPath queries, and identify format errors;

  • You can write JSONPath queries to answer simple questions on a given JSON document;

  • You can tell what a JSONPath query would match on a given JSON document;

  • You know online tools to test and validate JSONPath queries.

XPath

Write small XPath queries, test with an online tester

Here is an exhaustive list of knowledge and competencies you should have after this course:

  • You know what XPath is and what it is used for;

  • You can recognize XPath queries, and identify format errors;

  • You can write XPath queries to answer simple questions on a given XML document;

  • You can tell what a XPath query would match on a given XML document;

  • You know online tools to test and validate XPath queries.

Some schema definition languages

JSONSchema

Understand and manipulate existing JSONSchema data models

Here is an exhaustive list of knowledge and competencies you should have after this course:

  • You know what JSONSchema is and what it is used for;

  • You can recognize JSONSchema documents, and validate a JSON document against the JSONSchema;

  • You can write simple JSONSchema documents;

  • You know online tools to test and validate JSONSchema documents.

XMLSchema

We will not see XMLSchema

Semantic interoperability

Issues

Compare weather documents from:

Different modeling choices were made, which make these two services completely non-interoperable:

  • the lat/long coordinates: string vs number

  • the UNIX timestamps vs dates and times

  • the choice of keys and the semantics (meaning) of the values

  • the units of temperature, pressure, wind speed, …​

  • the semantics of wind direction

  • the value for "icon": "03n" - (if we follow our nose on the website, we may figure out it refers to http://openweathermap.org/img/w/03n.png )

  • the country codes ISO 3166-1 ALPHA-2 and ISO 3166-1 ALPHA-3 (example of Australia and Austria)

Some libraries to manipulate JSON

Update 2019-12-10: You are not required to do this section. Or put differently: whatever you do in your report for this section will be considered bonus.

Explanation: the key-value pais "ref": "http://xxxxxxx" in the Fiware JSON-Schemas is not handled properly by the jsonschema2pojo library, so the library fails at generating classes. here is no direct solution to generate the classes properly with this library without modifying the schemas themselves.

From the FIWARE JSONSchema data model for Weather forecast:

  • find tools and tutorials to generate a set of classes in your favorite programming language

  • find tools and tutorials to generate documents from instances of your classes, or instances of your classes from documents

For example:

  • for Java

    • generate classes from JSONSchema using http://www.jsonschema2pojo.org/ either manually, or a Maven plugin if you know Maven ( ;-) )

    • generate instances from a document and vice versa using Jackson or GSON

  • for C++, php, ruby, nodejs, Python, the Google search "jsonschema to class python", or "jsonschema to class ruby", confirm that a lot of different tools exist out there for this task. This proves it is useful to many developers.

Here is an exhaustive list of knowledge and competencies you should have after this course:

  • You can write a set of simple classes that look like a JSON Schema;

  • You have experienced the generation of classes from a JSONSchema

  • You have experienced the generation of documents from an instance, and vice versa.

The JSON-LD format for structured data

Read carefully the following pages about the usage of JSON-LD for integrating metadata in webpages:

Here is an exhaustive list of knowledge and competencies you should have after this course:

  • You understand the importance and interest of using JSON-LD;

  • You understand the importance and interest of using schema.org;

  • You can use the schema.org documentation to write a JSON-LD snippet that can be included in a web page;

  • Here are two webpages about upcoming events in Saint-Etienne. You have merged these two event descriptions in one webpage, and added some description using JSON-LD in this webpage, with the https://schema.org/ vocabulary.

Update 2019-12-10: one of the links was broken. I changed the event. Do not re-do the work if you have already done it.