User:Kithira/Course Pages/CSCI 12/Assignment 2/Group 2/Homework 4

Over the course of our week, we determined that our participant had 6016 sedentary minutes, 3579 minutes of moderate exercise, and 473 minutes of vigorous exercise.

We reached this result using three programs. The first read in the raw data and outputted a list of the summarized value for every minute during the week. The code is below:

from sys import argv from math import sqrt

def median(k):

   sorts=sorted(k)
   length=len(sorts)
   if not length%2:
       return (sorts[length/2]+sorts[length/2-1])/2.0
   return sorts[length/2]

inputfile = open(argv[1], "r")

list=[] for i in range(100):

   line=inputfile.readline()

e=[] for i in range(10080):

   d=[]
   for i in range(60):
       b=[]
       for i in range(40):
           line=inputfile.readline()
           s=line.split(",")
           x=float(s[1])
           y=float(s[2])
           z=float(s[3])
           a=abs(sqrt(x*x+y*y+z*z)-1)
           b.append(a)
       c=sum(b)
       d.append(c)
   e.append(median(d))

f=open("project2.txt", "w") for i in e:

   f.write(str(i) + "\n")

This program begins by skipping the first 100 lines of the data file. It creates the data points for each second by using the given formula and adding each batch of 40 sub-second values together. It then averages the values for each second, by using the median, to create the values for each minute. The program ends by exporting these values to a text document.

Our next program filtered the values for each minute.

from sys import argv

inputfile=open(argv[1], "r")

list=[] for i in range(10080):

   line=inputfile.readline()
   list.append(float(line))

n=2 while n<10068:

   if abs(list[n]-list[n-1])>4 and abs(list[n]-list[n+1])>4:
       list.remove(list[n])
   n=n+1

f=open("project3.txt","w") for i in list:

   f.write(str(i)+"\n")

We believe most of the spikes had been removed by our use of the median instead of the median previously, since the median is a more robust measure of central tendency, especially in the presence of outliers, than is the mean. However, we removed another 12 data points by using this program, which removed data points that were different from their preceding and follow values by more than 3.

The last program sorts the data into the levels of exercise intensity.

from sys import argv

inputfile=open(argv[1],"r")

list=[] for i in range(10068):

   line=inputfile.readline()
   list.append(float(line))

a=[] b=[] c=[] i=1 for i in range(10068):

   if list[i]<.75:
       a.append(list[i])

for i in range(10068):

   if .7500<list[i]<2.2500:
       b.append(list[i])

for i in range(10068):

   if list[i]>2.2500:
       c.append(list[i])

print len(a),"sedentary minutes",len(b),"minutes of moderate exercise",len(c),"minutes of vigorous exercise"

Our filtering procedure was fairly simple. We believe we filtered out most of the spike by using the median to average the second values over a minute. If the spikes lasted for more than 30 seconds, this filter would not work. Thus our second program removed a minute's data point if its value was different by more than 3 from the values both before and after it. For reference, the vast majority of our values were under .75, so a difference of more than 3 between consecutive data points indicates a significant change, and likely denotes a spike.