User:Kranthi206
HADOOP DATA DICTIONARY
1) Is it internal or external? From what system does it came from?
The data is external from the third party data source platform Kaggle, which was eventually extracted from Twitter network.
Kaggle is a platform for predictive modelling and analytics competitions in which statisticians and data miners compete to produce the best models for predicting and describing the datasets uploaded by companies and users. This crowdsourcing approach relies on the fact that there are countless strategies that can be applied to any predictive modelling task and it is impossible to know beforehand which technique or analyst will be most effective.
2) Is it going to change? What data are you going to use ?
The data is static, which won't get change as it is not the real time data. Precisely, it was complete real time data which was extracted from Twitter but there won't be any further changes going to happen for the complete project.
The data we are using is "How ISIS uses Twitter?" we gathered the data set which describes the list of users and the followers along with the content which had been tweeted using the Twitter platform. By analysing the data, we can fetch the ISIS supporters and predict the attack
3) Data Description (Describe the different data types?)
Field Name | Data Type | Field Length | Description |
name | String | 15 | Names of the twitter homepage. Total of 112 unique names |
username | String | 15 | Twitter usernames, which are similar to actual names. |
description | String | 60 | Subject of a tweet with video link |
location | String | 20 | Location of the user |
followers | Integer | 3 | Followers the person had for an individual tweet |
Numberstatuses | Integer | 3 | The count of an individual person account |
time | Date | 10 | Time stamp of the tweet |
tweets | String | 140 | Content of tweet with maximum of 140 characters |