words
Returns the list of all words in a string.
Syntax
-
words(s)
-
s
is a string
-
-
words(s, preset)
-
s
is a string -
preset
is a string
-
-
words(s, options)
-
s
is a string -
options
is a structure
-
Description
In general, words(s[...])
returns the ordered list of all words in the text s
. The additional arguments specify options for the word-extraction algorithm.
-
words(s)
uses default settings, suitable for most forms of natural-language text (but not source code). This is equivalent to the “English” preset (see below). -
words(s, preset)
uses a named preset of options:-
English
: Options suitable for text in English and many other human languages. -
Math
: Options suitable for text in English with interspersed mathematical formulae. -
Source code
: Options suitable for source code.
-
-
words(s, options)
uses the options inoptions
, a structure with the following members (each can be omitted, in which case the option’s default value is assumed):-
WordSeps
: A string containing word-separating characters. For example, if-
is included, “word-separating” will be considered as two words: “word” and “separating”. -
MathSeps
: A boolean value indicating if the algorithm should attempt to isolate mathematical symbols. -
TrimChrs
: A string containing characters that are to be trimmed from words. For example, if.
is included, a potential word “words.” will be trimmed to “words”. -
TrimPunct
: A boolean value indicating if punctuation characters automatically should be trimmed from words. -
LetterRequired
: A boolean value indicating if a word must contain a letter. -
LetterOrDigitRequired
: A boolean value indicating if a word must contain a letter or a digit.
-
Examples
Alice ≔ ExampleData("Alice in Wonderland") \ 50
ALICE’S ADVENTURES IN WONDERLAND Lewis Carroll …
SortBy(frequencies(words(Alice)), (x ↦ −x[2])) \ 20
(the, 1515) (and, 774) (to, 717) (a, 610) (she, 498) (of, 494) (it, 482) (said, 456) (I, 400) (Alice, 385) (in, 353) (was, 352) (you, 308) (that, 257) (as, 246) (her, 243) (at, 199) (on, 189) (had, 177) (with, 176) ⋮