Example usage

Here we will demonstrate how to use the Python package textfeatureinfo to extract information from text that can be used as text features:

Imports

from textfeatureinfo import textfeatureinfo
from textfeatureinfo.textfeatureinfo import count_punc
from textfeatureinfo.textfeatureinfo import avg_word_len
from textfeatureinfo.textfeatureinfo import perc_cap_words
from textfeatureinfo.textfeatureinfo import remove_stop_words
[nltk_data] Downloading package stopwords to
[nltk_data]     /Users/beilinwu/nltk_data...
[nltk_data]   Package stopwords is already up-to-date!

Count number of punctuations

We can use count_punc() to count the number of punctuations within the text.

count_punc("Hello, World!")
2

Get average word length

We can use avg_word_len() to calculate the average length of the words within the text.

avg_word_len("Hello, World!")
5.0

Get percentage of capitalised words

We can use perc_cap_words() to calculate the percentage of fully capitalised words in the text.

perc_cap_words("THIS is a SPAm MESSage.")
20.0

Remove stopwords

We can use remove_stop_words() to remove stop words from the text and get the list of clear words in the text.

remove_stop_words("Tomorrow is a big day!")
['tomorrow', 'big', 'day!']