r-tokenizers

condav0.3.0

Convert natural language text into tokens. Includes tokenizers for shingled n-grams, skip n-grams, words, word stems, sentences, paragraphs, characters, shingled characters, lines, tweets, Penn Treebank, regular expressions, as well as functions for counting characters, words, and sentences, and a function for splitting longer texts into separate documents, each with the same number of words. The tokenizers have a consistent interface, and the package is built on the 'stringi' and 'Rcpp' packages for fast yet correct tokenization in 'UTF-8'.

License MITpermissive4 versions1 maintainers0 deps510 weekly dl
52
/ 100
Health
safe to use

[email protected] is safe to use (health: 52/100)

Health breakdown0 – 100
10/25
maintenance
3/20
popularity
25/25
security
12/15
maturity
2/15
community
Vulnerabilities
0
none known

Health History

Dependency Tree

License Audit

API access

Get this data programmatically — free, no authentication.

curl https://depscope.dev/api/check/conda/r-tokenizers

First published · 2020-07-14 12:02:53.225000+00:00

Last updated · 2025-09-12 05:31:04.879000+00:00

r-tokenizers — Health Score 52/100 | DepScope