Sequential pattern mining

Sequential pattern mining is a topic of

structured data mining

.

There are several key traditional computational problems addressed within this field. These include building efficient databases and indexes for sequence information, extracting the frequently occurring patterns, comparing sequences for similarity, and recovering missing sequence members. In general, sequence mining problems can be classified as string mining which is typically based on string processing algorithms and itemset mining which is typically based on association rule learning. Local process models ^[3] extend sequential pattern mining to more complex patterns that can include (exclusive) choices, loops, and concurrency constructs in addition to the sequential ordering construct.

String mining

String mining typically deals with a limited

mutations

occur in a string.

A survey and taxonomy of the key algorithms for sequence comparison for bioinformatics is presented by Abouelhoda & Ghanem (2010), which include:^[4]

Repeat-related problems: that deal with operations on single sequences and can be based on
exact string matching or approximate string matching
methods for finding dispersed fixed length and maximal length repeats, finding tandem repeats, and finding unique subsequences and missing (un-spelled) subsequences.

Alignment problems: that deal with comparison between strings by first aligning one or more sequences; examples of popular methods include
ClustalW for multiple alignments. Alignment algorithms can be based on either exact or approximate methods, and can also be classified as global alignments, semi-global alignments and local alignment. See sequence alignment
.

Itemset mining

Some problems in sequence mining lend themselves to discovering frequent itemsets and the order they appear, for example, one is seeking rules of the form "if a {customer buys a car}, he or she is likely to {buy insurance} within 1 week", or in the context of stock prices, "if {Nokia up and Ericsson up}, it is likely that {Motorola up and Samsung up} within 2 days". Traditionally, itemset mining is used in marketing applications for discovering regularities between frequently co-occurring items in large transactions. For example, by analysing transactions of customer shopping baskets in a supermarket, one can produce a rule which reads "if a customer buys onions and potatoes together, he or she is likely to also buy hamburger meat in the same transaction".

A survey and taxonomy of the key algorithms for item set mining is presented by Han et al. (2007).[5]

The two common techniques that are applied to sequence databases for frequent itemset mining are the influential apriori algorithm and the more-recent FP-growth technique.

Applications

With a great variation of products and user buying behaviors, shelf on which products are being displayed is one of the most important resources in retail environment. Retailers can not only increase their profit but, also decrease cost by proper management of shelf space allocation and products display. To solve this problem, George and Binu (2013) have proposed an approach to mine user

buying patterns using PrefixSpan algorithm and place the products on shelves based on the order of mined purchasing patterns.^[6]

Algorithms

Commonly used algorithms include:

GSP algorithm
Sequential Pattern Discovery using Equivalence classes (SPADE)
FreeSpan
PrefixSpan
MAPres^[7]
Seq2Pat (for constraint-based sequential pattern mining)^[8]^[9]

References

S2CID 207180619
.

doi:10.3934/aci.2023004
.

S2CID 10872379
.

ISBN 978-3-642-02787-1
.

doi:10.1007/s10618-006-0059-1
.

doi:10.1016/j.jksuci.2012.07.001
.

S2CID 22362167
.

S2CID 53427299
.

^ "Seq2Pat: Sequence-to-Pattern Generation Library". GitHub. 9 April 2022.

External links

SPMF includes open-source implementations of GSP, PrefixSpan, SPADE, SPAM and many others.

v
t
e
Strings
String metric

Approximate string matching

Bitap algorithm

Damerau–Levenshtein distance

Edit distance

Gestalt pattern matching

Hamming distance

Jaro–Winkler distance

Lee distance

Levenshtein automaton

Levenshtein distance

Wagner–Fischer algorithm

String-searching algorithm

Apostolico–Giancarlo algorithm

Boyer–Moore string-search algorithm

Boyer–Moore–Horspool algorithm

Knuth–Morris–Pratt algorithm

Rabin–Karp algorithm

Raita algorithm

Trigram search

Two-way string-matching algorithm

Zhu–Takaoka string matching algorithm

Multiple string searching

Aho–Corasick

Commentz-Walter algorithm

Regular expression

Comparison of regular-expression engines

Regular grammar

Thompson's construction

Nondeterministic finite automaton

Sequence alignment

BLAST

Hirschberg's algorithm

Needleman–Wunsch algorithm

Smith–Waterman algorithm

Data structure

DAFSA

Suffix array

Suffix automaton

Suffix tree

Generalized suffix tree

Rope

Ternary search tree

Trie

Other

Parsing

Pattern matching

Compressed pattern matching

Longest common subsequence

Longest common substring

Sequential pattern mining

Sorting

String rewriting systems

String operations

Retrieved from "https://en.wikipedia.org/w/index.php?title=Sequential_pattern_mining&oldid=1189419466"

[1] S2CID 207180619
.

[2] :10.3934/aci.2023004
.

[3] S2CID 10872379
.

[4] ISBN 978-3-642-02787-1
.

[5] :10.1007/s10618-006-0059-1
.

[6] :10.1016/j.jksuci.2012.07.001
.

[7] S2CID 22362167
.

[hosseininasab2019-8] S2CID 53427299
.

[9] "Seq2Pat: Sequence-to-Pattern Generation Library". GitHub. 9 April 2022.

[3]

[4]

[6]

[7]

[8]

[9]

String mining

Itemset mining

Applications

Algorithms

See also

References

External links