MapReduce

Moderator: Concepts and Technologies for DS and BDP

ms77vyve
Neuling
Neuling
Beiträge: 5
Registriert: 26. Apr 2016 11:47

MapReduce

Beitrag von ms77vyve » 21. Mai 2018 17:28

As in the format:

Code: Alles auswählen

[   [pk, (id, quantity, price per unit), ... (id, quantity, price per unit) ]  , .... ] 
given below a dataset:

Code: Alles auswählen

[ 
[1, ("5464", 4, 9.99), ("8274",18,12.99), ("9744", 9, 44.95)], 
[2, ("5464", 9, 9.99), ("9744", 9, 44.95)],
[3, ("5464", 9, 9.99), ("88112", 11, 24.99)],
[4, ("8732", 7, 11.99), ("7733",11,18.99), ("88112", 5, 39.95)] 
]
Using map reduce find a list of tuples of accumulated order value against each id. e.g.:

Code: Alles auswählen

 [ ( '5464', 219.78 ), ... ] 
Please show all the steps either in pseudo code or in any language of your choice.

ms77vyve
Neuling
Neuling
Beiträge: 5
Registriert: 26. Apr 2016 11:47

Re: MapReduce

Beitrag von ms77vyve » 21. Mai 2018 17:45

My Solution in python3 (Incomplete):

Code: Alles auswählen

from functools import reduce

orders = [ 
                [1, ("5464", 4, 9.99),  ("8274",18,12.99), ("9744", 9, 44.95)], 
                [2, ("5464", 9, 9.99),  ("9744", 9, 44.95)],
                [3, ("5464", 9, 9.99),  ("88112", 11, 24.99)],
                [4, ("8732", 7, 11.99), ("7733",11,18.99), ("88112", 5, 39.95)] 
            ]

step1 = list( map( lambda x:x[1:] , orders  ) )

#[[('5464', 4, 9.99), ('8274', 18, 12.99), ('9744', 9, 44.95)], [('5464', 9, 9.99), ('9744', 9, 44.95)], [('5464', 9, 9.99), ('88112', 11, 24.99)], [('8732', 7, 11.99), ('7733', 11, 18.99), ('88112', 5, 39.95)]]

step2 = list( reduce( lambda x,y: x+y, step1  ) )

#[('5464', 4, 9.99), ('8274', 18, 12.99), ('9744', 9, 44.95), ('5464', 9, 9.99), ('9744', 9, 44.95), ('5464', 9, 9.99), ('88112', 11, 24.99), ('8732', 7, 11.99), ('7733', 11, 18.99), ('88112', 5, 39.95)]

step3 = list( map( lambda x : ( x[0], reduce( lambda a,b:a*b , x[1:] ) ) , step2 ) )

#[('5464', 39.96), ('8274', 233.82), ('9744', 404.55), ('5464', 89.91), ('9744', 404.55), ('5464', 89.91), ('88112', 274.89), ('8732', 83.93), ('7733', 208.89), ('88112', 199.75)]

step4 = reduce( lambda a,b : ( a[0], a[1]+b[1] ) if a[0]==b[0] else (a[0], a[1]) , step3 )

#('5464', 219.78)
Now what I mean with "incomplete" is that I have flattened the data in step2 & 3 but in step4 i should have done "grouping" which I couldn't acheive using map/reduce/filter method. So I just have only one tuple instead of all 4.

Looking forward to a solution or hints here especially on "how to group/regroup" data.

Thanks

Antworten

Zurück zu „Concepts and Technologies for Distributed Systems and Big Data Processing“