0

Files and Dictonaries

Doing coursera ( just practice) and no matter what I try with this question I cannot get it to give me the max for the key/value pair that appears most. here is the question. Write a program to read through the mbox-short.txt and figure out who has the sent the greatest number of mail messages. The program looks for 'From ' lines and takes the second word of those lines as the person who sent the mail. The program creates a Python dictionary that maps the sender's mail address to a count of the number of times they appear in the file. After the dictionary is produced, the program reads through the dictionary using a maximum loop to find the most prolific committer. So far I can print all the email's only as well as the dict() the last part of the question just does not seem to be anywhere in this chapter or previous.. not even sure what I did ...lol my code. fname = input("Enter file name: ") fh = open(fname) count = dict() for handle in fh: if not handle.startswith("From "): continue handle = handle.rstrip() words = handle.split() for word in words: count[word] = count.get(word,0) + 1 print(words[1])

7th Nov 2017, 1:54 AM
Aric Dunn
Aric Dunn - avatar
17 Answers
+ 11
This is why I asked earlier if you only added email addresses as keys. In your first loop, when you iterate through the lists of words (after splitting lines read from the file into those lists), you only need to add the second item of each line as the dictionary key. Apparently, the first item is always "From:" and the second item is the email address. So you don't need the second loop at all (the one with for word in words). You just go and: count[words[1]] = count.get(words[1], 0) + 1 When you have the dictionary containing only the email addresses as keys, you go with the bigword/bigcount part and print out the pair: print(bigword, bigcount) That should work alright.
7th Nov 2017, 4:57 PM
Kuba SiekierzyƄski
Kuba SiekierzyƄski - avatar
+ 9
As far as I can see, having remembered the lesson there, you indeed print out the emails (being the second element of the 'words' list of strings). However, it seems that you add up *all* the words, not only email addresses to the dictionary, is that right? After you settle the dictionary count alright, you can simply iterate through its .items() and assign the current key-value pair to a given variable if the value of this pair is greater than the previously found one (or for some advanced case -- append to a list if it is equal to the previous one and clear this list if you find a still greater one).
7th Nov 2017, 7:30 AM
Kuba SiekierzyƄski
Kuba SiekierzyƄski - avatar
+ 8
You made a typo in the for loop body. It should go: bigword = word bigcount = counts # instead of count, which is the dictionary
7th Nov 2017, 4:21 PM
Kuba SiekierzyƄski
Kuba SiekierzyƄski - avatar
+ 8
Great :) If you don't mind and think my answer helped you, could you please tick a little tick next to it and mark it as best? Means a lot to me and keeps me motivated to help ;)
7th Nov 2017, 5:49 PM
Kuba SiekierzyƄski
Kuba SiekierzyƄski - avatar
+ 7
Hmm... I assume you already have a proper 'words' dictionary with email addresses as keys and their frequencies as values. What you can do is the following: max_key = max_val = 0 for k, v in count.items(): if v > max_val: max_key = k max_val = v print(' {} is the most prolific commiter having posted {} emails.'.format(max_key, max_val))
7th Nov 2017, 1:35 PM
Kuba SiekierzyƄski
Kuba SiekierzyƄski - avatar
+ 7
For the future - it is always good to print out a variable once in a while to visualize and understand what it represents. If you had printed each line beginning with "From:" after making the .split() you'd have noticed the pattern ;) Those lessons are cool because they do not just make you repeat the same thing lectured during the videocast, but push you further to look up the answer :D
7th Nov 2017, 5:39 PM
Kuba SiekierzyƄski
Kuba SiekierzyƄski - avatar
+ 7
No problem. Let me know later if you managed to pass the autograder successfully ;)
7th Nov 2017, 5:44 PM
Kuba SiekierzyƄski
Kuba SiekierzyƄski - avatar
+ 6
Thanks, man! 👍
7th Nov 2017, 5:52 PM
Kuba SiekierzyƄski
Kuba SiekierzyƄski - avatar
+ 2
yeah I am stuck...
7th Nov 2017, 4:03 PM
Aric Dunn
Aric Dunn - avatar
+ 2
Thank you :)
7th Nov 2017, 5:41 PM
Aric Dunn
Aric Dunn - avatar
+ 2
I did actually thank you again ! i am sure I will have more questions because for some reason coding is not logical to me.
7th Nov 2017, 5:47 PM
Aric Dunn
Aric Dunn - avatar
+ 2
done:)
7th Nov 2017, 5:50 PM
Aric Dunn
Aric Dunn - avatar
+ 1
yeah dont have a clue how to.get that to work. the video shows a..items() but i tried it even though it is not in the chapter and not sure why
7th Nov 2017, 1:19 PM
Aric Dunn
Aric Dunn - avatar
+ 1
The chapter says nothing about going from a dict to a list. each chapter has something that has no reference to follow and no way to figure it myself. I keep having to ask and it is harder to learn that way. no matter how many time I go back to re read or do assignments again.
7th Nov 2017, 1:29 PM
Aric Dunn
Aric Dunn - avatar
+ 1
So the bottom half ( top half is mine)of this is pretty close to exactly what the video has for a similar question( again nowhere have we learned to switch between dict and list) fname = input("Enter file name: ") fh = open(fname) count = dict() for handle in fh: handle = handle.rstrip() words = handle.split() if not handle.startswith("From "): continue for word in words: count[word] = count.get(word,0) + 1 bigcount = None bigword = None for word,counts in count.items(): if bigcount is None or counts > bigcount: bigword = word bigcount = count print(bigword,bigcount) This does not work I get this if bigcount is None or counts > bigcount: TypeError: '>' not supported between instances of 'int' and 'dict' the output should only be "cwen@iupui.edu 5". nor does anywhere in the chapter or video's properly show us how to print put key,value pairs in this type of situation.
7th Nov 2017, 2:48 PM
Aric Dunn
Aric Dunn - avatar
+ 1
Ok, with that fixed I still do not have the correct output this is an output I have had multiple times. I dont know what else to try. I get "From 27' the count dict is correct and shows what I need but I cant seem to extract it .. {'From': 27, 'stephen.marquard@uct.ac.za': 2, 'Sat': 1, 'Jan': 27, '5': 1, '09:14:16': 1, '2008': 27, 'louis@media.berkeley.edu': 3, 'Fri': 20, '4': 20, '18:10:48': 1, 'zqian@umich.edu': 4, '16:10:39': 1, 'rjlowe@iupui.edu': 2, '15:46:24': 1, '15:03:18': 1, '14:50:18': 1, 'cwen@iupui.edu': 5, '11:37:30': 1, '11:35:08': 1, 'gsilver@umich.edu': 3, '11:12:37': 1, '11:11:52': 1, '11:11:03': 1, '11:10:22': 1, 'wagnermr@iupui.edu': 1, '10:38:42': 1, '10:17:43': 1, 'antranig@caret.cam.ac.uk': 1, '10:04:14': 1, 'gopal.ramasammycook@gmail.com': 1, '09:05:31': 1, 'david.horwitz@uct.ac.za': 4, '07:02:32': 1, '06:08:27': 1, '04:49:08': 1, '04:33:44': 1, '04:07:34': 1, 'Thu': 6, '3': 6, '19:51:21': 1, '17:18:23': 1, 'ray@media.berkeley.edu': 1, '17:07:00': 1, '16:34:40': 1, '16:29:07': 1, '16:23:48': 1} this is what I need to pull out 'cwen@iupui.edu': 5 I have to somehow iterate through whatever that is ( a dict or a list) for the max value and corresponding key. and that is what I thought the last for loop was for ?
7th Nov 2017, 4:46 PM
Aric Dunn
Aric Dunn - avatar
0
Wow thank you sorry I did know what you were asking earlier.... how am I supposed to know to do that?.. does not make any sense from the lessons..
7th Nov 2017, 5:28 PM
Aric Dunn
Aric Dunn - avatar