Many applications involve the generation and analysis of a new kind of data, called stream data, where data flow in and out of an observation platform (or window) dynamically. Such. data streams have the following unique features —
- huge or possibly infinite volume
- dynamically changing
- flowing in and out in a fixed order
- allowing only one or a small number of scans
- demanding fast (often real-time) response time.
Typical examples of data streams include various kinds of scientific and engineering data, time-series data, and data produced in other dynamic environments, such as power supply, network traffic, stock exchange, telecommunications, Web click streams, video surveillance, and whether or environment monitoring.
Because data streams are normally not stored in any kind of data repository, effective and efficient management and analysis of stream data pose great challenges to researchers. Currently, many researchers are investigating various issues relating to the development of data stream management systems.
A typical query model in such a system is the continuous query model, where predefined queries constantly evaluate incoming streams, collect aggregate data, report the current status of data streams, and respond to their changes. Mining data streams involve the efficient discovery of general patterns and dynamic changes within stream data.
For example, we may like.to detect intrusions of a computer network based on the anomaly of message low, which. may. be. discovered by clustering data streams, dynamic construction of. stream models, or comparing the current frequent patterns with that of a Certain previous time. Most stream data reside at a rather low level of Abstraction, whereas analysts are often more interested in higher and multiple levels of abstraction. Thus, multilevel, multidimensional on-line analysis and mining should be performed on stream data as well.