| # | Problem | Pass Rate (passed user / total user) |
|---|---|---|
| 12309 | WF-ILF_Python (Advanced) |
|
Description
TF-IDF is a popular way to find important keywords from documents. In this problem, you are going to implement a simple version of it, namely WF-ILF.
Given a few lines of text, you need to parse it and perform the following requirements.
Hint: Please use sys.stdin instead of input() in this homework as there are multiple lines, else you will not be accepted!!!
Input
In this assignment, a few lines of text will be given as an input.
Output
You will need to read the given input and calculate the WF-ILF of each word that appear in the sentences.
Then, print out only the bottom 3 as your output results.
WF-ILF = WF * ILF
WF(Word Frequency): Word Frequency in Input lines.
ILF(Inverse Line Frequency): Number of Lines/Number of Lines include the word
For example,
Example
If we get two lines of Input:
'Hello Hello John'
'Hello Bob'
WF value:
'Hello' is equal to 3
'John' is equal to 1
'Bob' is also equal to 1
ILF value:
'Hello' is equal to 2/2 = 1 (Number of line = 2 and 'Hello' appears in these 2 lines)
'John' is equal to 2/1 = 2 (Number of line = 2 and 'John' appears in 1 line)
'Bob' is equal to 2/1 = 2 (Number of line = 2 and 'Bob' appears in 1 line)
WF-ILF value:'Hello' is equal to 3 * 1 = 3
'John' is equal to 1 * 2 = 2
'Bob' is equal to 1 * 2 = 2
After we get WF-ILF value of each word, we choose the bottom 3
(If values are the same, the order should depends on their appearance order)
Therefore, the order of above example is John', 'Bob', 'Hello'