Sequential pattern mining
Sequential pattern mining is a topic of
There are several key traditional computational problems addressed within this field. These include building efficient databases and indexes for sequence information, extracting the frequently occurring patterns, comparing sequences for similarity, and recovering missing sequence members. In general, sequence mining problems can be classified as string mining which is typically based on string processing algorithms and itemset mining which is typically based on association rule learning. Local process models [3] extend sequential pattern mining to more complex patterns that can include (exclusive) choices, loops, and concurrency constructs in addition to the sequential ordering construct.
String mining
String mining typically deals with a limited
A survey and taxonomy of the key algorithms for sequence comparison for bioinformatics is presented by Abouelhoda & Ghanem (2010), which include:[4]
- Repeat-related problems: that deal with operations on single sequences and can be based on exact string matching or approximate string matchingmethods for finding dispersed fixed length and maximal length repeats, finding tandem repeats, and finding unique subsequences and missing (un-spelled) subsequences.
- Alignment problems: that deal with comparison between strings by first aligning one or more sequences; examples of popular methods include ClustalW for multiple alignments. Alignment algorithms can be based on either exact or approximate methods, and can also be classified as global alignments, semi-global alignments and local alignment. See sequence alignment.
Itemset mining
Some problems in sequence mining lend themselves to discovering frequent itemsets and the order they appear, for example, one is seeking rules of the form "if a {customer buys a car}, he or she is likely to {buy insurance} within 1 week", or in the context of stock prices, "if {Nokia up and Ericsson up}, it is likely that {Motorola up and Samsung up} within 2 days". Traditionally, itemset mining is used in marketing applications for discovering regularities between frequently co-occurring items in large transactions. For example, by analysing transactions of customer shopping baskets in a supermarket, one can produce a rule which reads "if a customer buys onions and potatoes together, he or she is likely to also buy hamburger meat in the same transaction".
A survey and taxonomy of the key algorithms for item set mining is presented by Han et al. (2007).[5]
The two common techniques that are applied to sequence databases for frequent itemset mining are the influential apriori algorithm and the more-recent FP-growth technique.
Applications
With a great variation of products and user buying behaviors, shelf on which products are being displayed is one of the most important resources in retail environment. Retailers can not only increase their profit but, also decrease cost by proper management of shelf space allocation and products display. To solve this problem, George and Binu (2013) have proposed an approach to mine user
Algorithms
Commonly used algorithms include:
- GSP algorithm
- Sequential Pattern Discovery using Equivalence classes (SPADE)
- FreeSpan
- PrefixSpan
- MAPres[7]
- Seq2Pat (for constraint-based sequential pattern mining)[8][9]
See also
- Collocation extraction – Computational technique to find word sequences
- Process mining – Data mining technique using event logs
- Sequence analysis – process of analysis of one or more known biological sequences
- Sequence analysis in social sciences
- Sequence clustering – algorithm
- Sequence labeling – pattern recognition
References
- S2CID 207180619.
- .
- S2CID 10872379.
- ISBN 978-3-642-02787-1.
- .
- .
- S2CID 22362167.
- S2CID 53427299.
- ^ "Seq2Pat: Sequence-to-Pattern Generation Library". GitHub. 9 April 2022.
External links
- SPMF includes open-source implementations of GSP, PrefixSpan, SPADE, SPAM and many others.