Dariusz Matyja MSE-2007:20, pp. 104. TEK/avd. för programvaruteknik, 2007.
Medical datasets have reached enormous capacities.
This data may contain valuable information that awaits
extraction. The knowledge may be encapsulated in
various patterns and regularities that may be hidden in
the data. Such knowledge may prove to be priceless in
future medical decision making. The data which is
analyzed comes from the Polish National Breast Cancer
Prevention Program ran in Poland in 2006.
The aim of this master's thesis is the evaluation of
the analytical data from the Program to see if the
domain can be a subject to data mining. The next step is
to evaluate several data mining methods with respect to
their applicability to the given data. This is to show
which of the techniques are particularly usable for the
given dataset. Finally, the research aims at extracting
some tangible medical knowledge from the set.
The research utilizes a data warehouse to store the
data. The data is assessed via the ETL process. The
performance of the data mining models is measured
with the use of the lift charts and confusion
(classification) matrices. The medical knowledge is
extracted based on the indications of the majority of the
models. The experiments are conducted in the
Microsoft SQL Server 2005.
The results of the analyses have shown that the
Program did not deliver good-quality data. A lot of
missing values and various discrepancies make it
especially difficult to build good models and draw any
medical conclusions. It is very hard to unequivocally
decide which is particularly suitable for the given data.
It is advisable to test a set of methods prior to their
application in real systems.
The data mining models were not unanimous about
patterns in the data. Thus the medical knowledge is not
certain and requires verification from the medical
people. However, most of the models strongly
associated patient's age, tissue type, hormonal therapies
and disease in family with the malignancy of cancers.
The next step of the research is to present the
findings to the medical people for verification. In the
future the outcomes may constitute a good background
for development of a Medical Decision Support