Example usage
Here we will demonstrate how to use the Python package textfeatureinfo
to extract information from text that can be used as text features:
Imports
from textfeatureinfo import textfeatureinfo
from textfeatureinfo.textfeatureinfo import count_punc
from textfeatureinfo.textfeatureinfo import avg_word_len
from textfeatureinfo.textfeatureinfo import perc_cap_words
from textfeatureinfo.textfeatureinfo import remove_stop_words
[nltk_data] Downloading package stopwords to
[nltk_data] /Users/beilinwu/nltk_data...
[nltk_data] Package stopwords is already up-to-date!
Count number of punctuations
We can use count_punc()
to count the number of punctuations within the text.
count_punc("Hello, World!")
2
Get average word length
We can use avg_word_len()
to calculate the average length of the words within the text.
avg_word_len("Hello, World!")
5.0
Get percentage of capitalised words
We can use perc_cap_words()
to calculate the percentage of fully capitalised words in the text.
perc_cap_words("THIS is a SPAm MESSage.")
20.0
Remove stopwords
We can use remove_stop_words()
to remove stop words from the text and get the list of clear words in the text.
remove_stop_words("Tomorrow is a big day!")
['tomorrow', 'big', 'day!']