Anna Tarczyńska MSE-2012-93, pp. 95. COM/School of Computing, 2012.
Context The huge amount of existing digital video files needs to provide indexing to make it available for customers (easier searching). The indexing can be provided by text information extraction. In this thesis we have analysed and compared methods of text information extraction in digital videos. Furthermore, we have evaluated them in the new context proposed by us, namely usefulness in sports news indexing and information retrieval.
Objectives The objectives of this thesis are as follows: providing a better understanding of the nature of text extraction; performing a systematic literature review on various methods of text information extraction in digital videos of TV sports news; designing and executing an experiment in the testing environment; evaluating available and promising methods of text information extraction from digital video files in the proposed context associated with video sports news indexing and retrieval; providing an adequate solution in the proposed context described above.
Methods This thesis consists of three research methods: Systematic Literature Review, Video Content Analysis with the checklist, and Experiment. The Systematic Literature Review has been used to study the nature of text information extraction, to establish the methods and challenges, and to specify the effective way of conducting the experiment. The video content analysis has been used to establish the context for the experiment. Finally, the experiment has been conducted to answer the main research question: How useful are the methods of text information extraction for indexation of video sports news and information retrieval?
Results Through the Systematic Literature Review we identified 29 challenges of the text information extraction methods, and 10 chains between them. We extracted 21 tools and 105 different methods, and analyzed the relations between them. Through Video Content Analysis we specified three groups of probability of text extraction from video, and 14 categories for providing video sports news indexation with the taxonomy hierarchy. We have conducted the Experiment on three videos files, with 127 frames, 8970 characters, and 1814 words, using the only available MoCA tool. As a result, we reported 10 errors and proposed recommendations for each of them. We evaluated the tool according to the categories mentioned above and offered four advantages, and nine disadvantages of the Tool mentioned above.
Conclusions It is hard to compare the methods described in the literature, because the tools are not available for testing, and they are not compared with each other. Furthermore, the values of recall and precision measures highly depend on the quality of the text contained in the video. Therefore, performing the experiments on the same indexed database is necessary. However, the text information extraction is time consuming (because of huge amount of frames in video), and even high character recognition rate gives low word recognition rate. Therefore, the usefulness of text information extraction for video indexation is still low. Because most of the text information contained in the videos news is inserted in post-processing, the text extraction could be provided in the root: during the processing of the original video, by the broadcasting company (e.g. by automatically saving inserted text in separate file). Then the text information extraction will not be necessary for managing the new video files