- Clean.py takes in a csv file and does all the cleaning of the text outlined in section 1.2.2 of the report.
- choose.py selects the subset of tweets we are going to analyze, and it selects the test tweets for prediction later on.
- betterDate.py converts the publish date of the tweets into a python datetime object.
- Visualize.py takes a csv file and constructs a word cloud from the corpus of tweets.
- analysisk.py, where k =1,2,3 are the processes by which we answer questions 1,2, and 3. Basically all three construct T matrices that are used to compare tweet via dot products.
- results.py quantify the goodness of our predictions with confusion matrix statistics.
- timePlot.py produces the histogram of consecutively similar tweets posted in a range of time frames.
-
Notifications
You must be signed in to change notification settings - Fork 0
JoetheManHowie/TwitterData
Folders and files
Name | Name | Last commit message | Last commit date | |
---|---|---|---|---|
Repository files navigation
About
assignment 3 for my data modelling class
Resources
Stars
Watchers
Forks
Releases
No releases published
Packages 0
No packages published