Semi-Structured Data Structured Data Conversion Using Data Mining Methods
International Journal of Emerging Trends in Science and Technology,
Vol. 4 No. 10 (2017),
1 October 2017
,
Page 6272-6278
Abstract
Emerging technologies of semi-structured data have attracted a wide attention like networks, e-commerce,
information retrieval and databases. In these applications, the data are modeled not as static collections but
as transient data streams, where the data source is an unbounded stream of individual data items. It is
becoming increasingly popular to send heterogeneous and ill-structured data through networks. Since
traditional database technologies are not directly applicable to such data streams, it is important to study
efficient information extraction methods for semi-structured data. Hence there has been increasing demand
for automatic methods for extracting useful information, particularly, for discovering rules or patterns
from large collection of semi-structured data, namely, semi-structured data mining. We introduce a class
of simple combinatorial patterns over texts such as proximity phrase association patterns and ordered and
unordered tree patterns modeling unstructured texts and semi-structured data on the Web. In addition with,
we consider the problem of finding the patterns that optimize a given statistical measure within the whole
class of patterns in a large collection of unstructured texts. For these classes of patterns, we develop fast
and robust text mining algorithms based on techniques in computational geometry, string matching, and
combinatorial optimization. We successfully implemented the developed text and semi-structured mining
algorithms with experiments on interactive document browsing in a large text database, keyword and
common structure discovery from Web.
How to Cite
Download Citation
References
- Article Viewed: 55 Total Download