Grouping Dictionary List by Key

It is shown in code snippets below how to group a dictionary list based on a specific key.

First of all, let’s import the data from a csv file.

from astropy.io.ascii import read

selected = ["origin", "dest", "distance", "carrier"]

### IMPORT CSV FILE INTO ASTROPY TABLE ###
tbl = read("Downloads/nycflights.csv", format = 'csv', data_end = 11)[selected]

### CONVERT ASTROPY TABLE TO DICTIONARY LIST ###
lst = map(lambda x: dict(zip(x.colnames, x)), tbl)

### DISPLAY DATA CONTENTS ###
from tabulate import tabulate

print tabulate([lst[i] for i in range(3)], headers = "keys", tablefmt = "fancy_grid")

╒══════════╤════════╤═══════════╤════════════╕
│ origin   │ dest   │ carrier   │   distance │
╞══════════╪════════╪═══════════╪════════════╡
│ EWR      │ IAH    │ UA        │       1400 │
├──────────┼────────┼───────────┼────────────┤
│ EWR      │ ORD    │ UA        │        719 │
├──────────┼────────┼───────────┼────────────┤
│ EWR      │ FLL    │ B6        │       1065 │
╘══════════╧════════╧═══════════╧════════════╛

In the first approach, only standard Python modules and data structures are used.

### APPROACH 1: HOMEBREW GROUPING ###

from operator import itemgetter

### GET UNIQUE VALUES OF GROUP KEY ###
g_key = set([x["origin"] for x in lst])

### GROUPING LIST BY GROUP KEY ###
g_lst1 = sorted(map(lambda x: (x, [i for i in lst if i["origin"] == x]), g_key), key = itemgetter(0))

for i in g_lst1:
  print tabulate(i[1], headers = "keys", tablefmt = "fancy_grid")

In the second approach, we first sort the list by the key and then group the list with the itertools.groupby() function.

### APPROACH 2: ITERTOOLS.GROUPBY  ###

### SORTING DICTIONARY BEFORE GROUPING ###
s_lst = sorted(lst, key = itemgetter('origin'))

### GROUPING DICTIONARY BY "ORIGIN" ###
from itertools import groupby

g_lst2 = [(k, list(g)) for k, g in groupby(s_lst, itemgetter("origin"))]

for i in g_lst2:
  print tabulate(i[1], headers = "keys", tablefmt = "fancy_grid")

In the third approach, we use the defaultdict class in the collections module.

### APPROACH 3: DEFAULTDICT ###

from collections import defaultdict

### CREATE KEY-VALUE PAIRS FROM LIST ###
ddata = [(x["origin"], x) for x in lst]

### CREATE DEFAULTDICT ###
ddict = defaultdict(list)

for key, value in ddata:
  ddict[key].append(value)

g_lst3 = sorted(ddict.items(), key = itemgetter(0))

for i in g_lst3:
  print tabulate(i[1], headers = "keys", tablefmt = "fancy_grid")

In the fourth approach, we use the ordereddict class also in the collections module.

### APPROACH 4: ORDEREDDICT ###
from collections import OrderedDict

odict = OrderedDict()

for key, value in ddata:
  if key in odict: odict[key].append(value)
  else: odict[key] = [value]

g_lst4 = sorted(odict.items(), key = itemgetter(0))

for i in g_lst4:
  print tabulate(i[1], headers = "keys", tablefmt = "fancy_grid")

In the output below, it is shown that four grouped lists are identical.


g_lst1 == g_lst2 == g_lst3 == g_lst4
# True

╒══════════╤════════╤═══════════╤════════════╕
│ origin   │ dest   │ carrier   │   distance │
╞══════════╪════════╪═══════════╪════════════╡
│ EWR      │ IAH    │ UA        │       1400 │
├──────────┼────────┼───────────┼────────────┤
│ EWR      │ ORD    │ UA        │        719 │
├──────────┼────────┼───────────┼────────────┤
│ EWR      │ FLL    │ B6        │       1065 │
╘══════════╧════════╧═══════════╧════════════╛
╒══════════╤════════╤═══════════╤════════════╕
│ origin   │ dest   │ carrier   │   distance │
╞══════════╪════════╪═══════════╪════════════╡
│ JFK      │ MIA    │ AA        │       1089 │
├──────────┼────────┼───────────┼────────────┤
│ JFK      │ BQN    │ B6        │       1576 │
├──────────┼────────┼───────────┼────────────┤
│ JFK      │ MCO    │ B6        │        944 │
╘══════════╧════════╧═══════════╧════════════╛
╒══════════╤════════╤═══════════╤════════════╕
│ origin   │ dest   │ carrier   │   distance │
╞══════════╪════════╪═══════════╪════════════╡
│ LGA      │ IAH    │ UA        │       1416 │
├──────────┼────────┼───────────┼────────────┤
│ LGA      │ ATL    │ DL        │        762 │
├──────────┼────────┼───────────┼────────────┤
│ LGA      │ IAD    │ EV        │        229 │
├──────────┼────────┼───────────┼────────────┤
│ LGA      │ ORD    │ AA        │        733 │
╘══════════╧════════╧═══════════╧════════════╛