Implementation of the proposed approach on the dataset for the DI2KG challenge.
Luca Maria Lauricella |
Valerio Marini |
The Schema matching task consists in identifying mappings between source attributes (e.g. the attribute "brand" from source "www.ebay.com") and a set of target attributes (e.g. "brand", "dimensions", "screen_size", etc.) defined in a given mediated schema.
Participants to the Schema matching task are provided with the mediated schema (in TXT format, one target attribute per row) and a labelled dataset in CSV format (i.e.,
- the "source_attribute_id" is a global identifier for an attribute at source level. For instance, the source_attribute_id "www.ebay.com//screen size" refers to all the "screen size" attributes from specs in source "www.ebay.com". All "source_attribute_id" in the labelled dataset
$Y^{SM}_v$ refer to one or multiple target attributes (or properties) of the given mediated schema. Thus, the dataset$Y^{SM}_v$ provides a subset of mappings from source attributes in the specs dataset$X_v$ and target attributes in the mediated schema; - the "target_attribute_name" is the name of the target attribute in the mediated schema (e.g. "brand", "screen_size", etc.).
Example of
source_attribute_id, target_attribute_name
www.ebay.com//producer name, brand
www.ebay.com//brand, brand
www.odsi.co.uk//device type, screen_type
www.odsi.co.uk//device type, screen_size_diagonal
Note that some source attribute have values refer to multiple target attributes. Therefore, there might be source attributes with mappings to more than one target attribute. For instance, if the set of values related to the source attribute "www.odsi.co.uk//device type" is the following:
- value1: "LED-backlit LCD monitor - 23''"
- value2: "23''"
- value3: "LED LCD"
Then this source attribute is mapped with target attributes "screen_type" (because of value1 and value3) and "screen_size_diagonal" (because of value1 and value2).
The goal is to find mappings between source attributes in the dataset