Inlämning av Examensarbete / Submission of Thesis

Muhammad Atif Riaz; Sameer Munir , pp. 60. COM/School of Computing, 2012.

The work

Författare / Author: Muhammad Atif Riaz, Sameer Munir
atifriaz83@gmail.com, samgul.32@hotmail.com
Titel / Title: An Instance based Approach to Find the Types of Correspondence between the Attributes of Heterogeneous Datasets
Abstrakt Abstract:

Context: Determining attribute correspondence is the most important, time consuming and knowledge intensive part during databases integration. It is also used in other data manipulation applications such as data warehousing, data design, semantic web and e-commerce.

Objectives: In this thesis the aim is to investigate how to find the types of correspondence between the attributes of heterogeneous datasets when schema design information of the data sets is unknown.

Methods: A literature review was conducted to extract the knowledge related to the approaches that are used to find the correspondence between the attributes of heterogeneous datasets. Extracted knowledge from the literature review is used in developing an instance based approach for finding types of correspondence between the attributes of heterogeneous datasets when schema design information is unknown. To validate the proposed approach an experiment was conducted in the real environment using the data provided by the Telecom Industry (Ericsson) Karlskrona. Evaluation of the results was carried using the well known and mostly used measures from information retrieval field precision, recall and F-measure.

Results: To find the types of correspondence between the attributes of heterogeneous datasets, good results depend on the ability of the algorithm to avoid the unmatched pairs of rows during the Row Similarity Phase. An evaluation of proposed approach is performed via experiments. We found 96.7% (average of three experiments) F-measure.

Conclusions: The analysis showed that the proposed approach was feasible to be used and it provided users a mean to find the corresponding attributes and the types of correspondence between corresponding attributes, based on the information extracted from the similar pairs of rows from the heterogeneous data sets where their similarity based on the same common primary keys values.

Ämnesord / Subject: Datavetenskap - Computer Science\General
Mathematics\General
Nyckelord / Keywords: Attribute Correspondence, Heterogeneous databases schema matching, Instance based matching.

Publication info

Dokument id / Document id: houn-927m3b
Program:/ Programme Datavetenskapligt program/Computer Science
Mathematical Modelling and Simulation
Registreringsdatum / Date of registration: 11/19/2012
Uppsatstyp / Type of thesis: Masterarbete/Master's Thesis (120 credits)

Context

Handledare / Supervisor: Bengt Carlsson, Håkan Lennerstad
bengt.carlsson@bth.se, hakan.lennerstad@bth.se
Examinator / Examiner: Lars Lundberg, Mattias Dahl
Organisation / Organisation: Blekinge Institute of Technology
Institution / School: COM/School of Computing

+46 455 38 50 00
I samarbete med / In co-operation with: Ericsson AB Karlskrona

Files & Access

Bifogad uppsats fil(er) / Files attached: bth2012atifriaz.pdf (1898 kB, öppnas i nytt fönster)