User:Kithira/Course Pages/CSCI 12/Assignment 2/Group 5/Homework 4
Identification and Removal
[edit]In order to make our data more robust and filter out the spikes, we first compressed the file by summarizing over seconds and averaging over minutes, thus leaving just one data point per minute. We then calculated the standard deviation of all of these accelerations. We labeled something as a "spike" if the difference between it and the minute before it or the minute after it was greater than two standard deviations. This would indicate that the difference between the current minute and the minutes surrounding it is very large compared to the standard deviation, or average change, of the whole data set. If, however, the difference between the past minute and current one, and the difference between the future minute and the current one had the same sign, we didn't label it as a "spike," because this indicated a trend in one direction, as opposed to a one minute aberration. Once we had identified the "spike" we filtered it out by replacing the spiked value with the average of the immediate past and future minutes.
Effects
[edit]The effects that this had on our data set make it seem likely that it acts as an appropriate filter without modifying intensity classifications too much. At the end, this filter had edited 112 data points, which is just over 1% of the whole data set. This number seems to sound reasonable to us as the number of spikes one might find in a data set of this size. It also hardly at all impacts our categorization of time spent in each of the three states. We categorized minutes with values below 1 as low intensity, values from 1 to 10 as medium intensity, and values above 10 as high intensity. With this system, the person spent 82 hrs, 6 minutes in low intensity, 80 hrs, 25 minutes in medium intensity, and 5 hrs 29 minutes in high intensity throughout the week. After filtering, the numbers are almost identical: 82 hrs, 2 minutes in low intensity, 80 hrs, 27 minutes in medium intensity, and 5 hrs 31 minutes in high intensity.
Conclusion
[edit]In conclusion, we think the filter is an effective way of measuring those aberrations that are significantly difference than the minutes around them but not part of a trend, and it is especially good because it doesn't change the intensity classification hardly at all.