Regexpcount, a symbolic package for counting problems on regular expressions and words. In a previous work [P. Nicodème, B. Salvy and P. Flajolet, Theor. Comput. Sci. 287, 593–617 (2002; Zbl 1061.68118)], we considered algorithms related to the statistics of matches with words and regular expressions in texts generated by Bernoulli or Markov sources. In this work these algorithms are extended for two purposes: to determine the statistics of simultaneous counting of different motifs, and to compute the waiting time for the first match with a motif in a model which may be constrained. This extension also handles matches with errors. The package is fully implemented and gives access to high and low level commands. We also consider an example corresponding to a practical biological problem: getting the statistics for the number of matches of words of size 8 in a genome (a Markovian sequence), knowing that an (overrepresented DNA protecting) pattern named Chi occurs a given number of times.
Keywords for this software
References in zbMATH (referenced in 6 articles , 1 standard article )
Showing results 1 to 6 of 6.
- Lladser, Manuel E.; Chestnut, Stephen R.: Approximation of sojourn-times via maximal couplings: motif frequency distributions (2014)
- Marschall, Tobias; Rahmann, Sven: An algorithm to compute the character access count distribution for pattern matching algorithms (2011)
- Lladser, Manuel E.; Betterton, M. D.; Knight, Rob: Multiple pattern matching: a Markov chain approach (2008)
- Bassino, F.; Clément, J.; Fayolle, J.; Nicodème, P.: Counting occurrences for a finite set of words: an inclusion-exclusion approach (2007)
- Gheorghiciuc, Irina; Ward, Mark Daniel: On correlation polynomials and subword complexity (2007)
- Nicodème, Pierre: Regexpcount, a symbolic package for counting problems on regular expressions and words. (2003)
Further publications can be found at: http://algo.inria.fr/papers/bibgen/algobib.html