Creating a Python Dictionary from Excel Data: A Step-by-Step Guide

Creating Python Dictionary from Excel

Introduction

In this article, we will explore how to create a dictionary in Python using data imported from an Excel file. We will go through the process step-by-step, explaining each part and providing examples.

Requirements

To follow along with this tutorial, you’ll need:

  • Python 3.x installed on your computer
  • The xlrd library, which can be installed using pip: pip install xlrd

Excel Data Structure

Before diving into the code, let’s take a look at how data is structured in an Excel file. The data is stored in rows and columns, with each cell containing a value.

For example, if we have an Excel file with three sheets (MM, DD, FF) and some sample data, the sheet might look like this:

MMFF
s1c1
s2c2
s3c3

Using xlrd to Import Data

To start, we’ll use the xlrd library to open and read our Excel file. We can import xlrd in Python using the following code:

import xlrd

file_location = "data.xlsx"
workbook = xlrd.open_workbook(file_location)

In this example, replace "data.xlsx" with the actual path to your Excel file.

Defining Sheets and Data Structures

We can then select which sheets we want to use by calling sheet_by_name:

M_Sheet = workbook.sheet_by_name("MM")
D_Sheet = workbook.sheet_by_name("DD")
F_Sheet = workbook.sheet_by_name("FF")

Here, M_Sheet, D_Sheet, and F_Sheet are the sheet objects we’ll use to access the data.

We define lists M, D, and F to store the values from each sheet. These will be used to create our final dictionary:

M = []
for i in range(M_Sheet.nrows):
    value = (M_Sheet.cell(i,0).value)
    M.append(value)

D = []
for j in range(D_Sheet.nrows):
    value = (D_Sheet.cell(j,0).value)
    D.append(value)

F = []
for f in range(F_Sheet.nrows):
    value = (F_Sheet.cell(f,0).value)
    F.append(value)

This code loops through each row of the sheet and appends the value from column 0 to our respective lists.

However, using this approach is inefficient because it results in duplicate data. A more efficient way would be to use dictionaries where keys are from one list and values are from another.

Creating a Dictionary

Our goal now is to create a dictionary DICT with keys taken from the sheets M and F, while using the values from sheet D. The code below shows how we can achieve this:

dico_s = {}
for s in S:
    dico_d = {}
    for d in D:
        idx = D.index(d) + len(D) * S.index(s)
        dico_d[d] = C[idx]
    dico_s[s] = dico_d

print(dico_s)

In this code, S is the list of values from sheet M, D is the list of values from sheet D, and C is the list of values from sheet F. The inner loop uses the index method to find the corresponding value in the list C.

This way, we avoid duplicating data by storing each key-value pair separately.

Example Walkthrough

Let’s walk through an example where we want to create a dictionary with three keys (s1, s2, s3) and their corresponding values from sheets D and F. Here’s the code:

nb_s = 3; nb_d = 2

S = ['s' + str(x) for x in range(1, nb_s + 1)]
D = ['d' + str(x) for x in range(1, nb_d + 1)]
C = ['c' + str(x) for x in range(1, (len(S) * len(D)) + 1)]

print(S)
print(D)
print(C)

dico_s = {}
for s in S:
    dico_d = {}
    for d in D:
        idx = D.index(d) + len(D) * S.index(s)
        dico_d[d] = C[idx]
    dico_s[s] = dico_d

print(dico_s)

This code creates three lists S, D, and C using list comprehensions. Then it iterates through the lists to create a dictionary where each key is from sheet M and its corresponding value is from the combined values of sheets D and F.

The final output should look like this:

['s1', 's2', 's3']
['d1', 'd2']
['c1', 'c2', 'c3', 'c4', 'c5', 'c6']

DICO-{'s1': {'d1': 'c1', 'd2': 'c2'}, 
       's2': {'d1': 'c3', 'd2': 'c4'}, 
       's3': {'d1': 'c5', 'd2': 'c6'}}

Creating a Dictionary from Excel with Different Number of Sheets

If we have a different number of sheets in our Excel file, we can modify the code to accommodate this.

Here’s how we could do it:

nb_s = 4; nb_d = 6

S = ['s' + str(x) for x in range(1, nb_s + 1)]
D = ['d' + str(x) for x in range(1, nb_d + 1)]
C = ['c' + str(x) for x in range(1, (len(S) * len(D)) + 1)]

print(S)
print(D)
print(C)

dico_s = {}
for s in S:
    dico_d = {}
    for d in D:
        idx = D.index(d) + len(D) * S.index(s)
        dico_d[d] = C[idx]
    dico_s[s] = dico_d

print(dico_s)

The output will look like this:

['s1', 's2', 's3', 's4']
['d1', 'd2', 'd3', 'd4', 'd5', 'd6']
['c1', 'c2', 'c3', 'c4', 'c5', 'c6', 'c7', 'c8', 'c9', 'c10', 'c11', 'c12', 
 'c13', 'c14', 'c15', 'c16', 'c17', 'c18', 'c19', 'c20', 'c21', 'c22', 'c23', 'c24']

DICO-{'s1': {'d1': 'c1', 'd2': 'c2', 'd3': 'c3', 'd4': 'c4', 'd5': 'c5', 'd6': 'c6'}, 
       's2': {'d1': 'c7', 'd2': 'c8', 'd3': 'c9', 'd4': 'c10', 'd5': 'c11', 'd6': 'c12'}, 
       's3': {'d1': 'c13', 'd2': 'c14', 'd3': 'c15', 'd4': 'c16', 'd5': 'c17', 'd6': 'c18'}, 
       's4': {'d1': 'c19', 'd2': 'c20', 'd3': 'c21', 'd4': 'c22', 'd5': 'c23', 'd6': 'c24'}}

In this example, we have four keys (s1, s2, s3, s4) and their corresponding values from the combined data of sheets D and F.


Last modified on 2024-08-19