r-tokenizers
condav0.3.0Convert natural language text into tokens. Includes tokenizers for shingled n-grams, skip n-grams, words, word stems, sentences, paragraphs, characters, shingled characters, lines, tweets, Penn Treebank, regular expressions, as well as functions for counting characters, words, and sentences, and a function for splitting longer texts into separate documents, each with the same number of words. The tokenizers have a consistent interface, and the package is built on the 'stringi' and 'Rcpp' packages for fast yet correct tokenization in 'UTF-8'.
[email protected] is safe to use (health: 52/100)
Health History
Dependency Tree
License Audit
Get this data programmatically — free, no authentication.
curl https://depscope.dev/api/check/conda/r-tokenizersFirst published · 2020-07-14 12:02:53.225000+00:00
Last updated · 2025-09-12 05:31:04.879000+00:00