This blog post will walk you through the steps you will need to follow in order to create a Materials Information File (MIF). The MIF is a flexible, JSON-based schema that has been developed to impose structure on materials data. More information on this file format can be found here.
Citrine provides a Python toolkit for working with MIF files called mifkit (source code and installation instructions are available here.) We use mifkit throughout this post, as well as Python's built in csv module for parsing CSV files.
In this post, we will convert the following table of bulk and shear moduli to the MIF schema. The table also provides information about the materials themselves and the conditions at which the measurements were taken.
Before we start, we will export this spreadsheet to a CSV, which would look like this:
Let's get started writing a script that will convert this data to a MIF. As we are working with information about a material-measurement pair, we will use the Sample object, part of the MIF's core schema, to store this data.
1. Import your modules
Create a Python file and import mifkit and any additional modules you will need to parse the data.
# -*- coding: utf-8 -*- set the coding here as the input file contains non-ascii characters from mifkit import mif from mifkit.objects import * #using import * imports all possible MIF objects import csv
2. Open your data file
Open the data file and parse its content using the CSV module. As the first two rows in the sample file are headers, we will use the next() function to skip over these when iterating through the input file.
with open("input_table.csv", "rU") as f: #this opens our data file ‘input_table.csv’ in universal read mode reader = csv.reader(f) #parse the data using the csv module next(reader) #skip row one next(reader) #skip row two
3. Create a list to store information from each row and loop through each row of the CSV after the headers
samples =  for row in reader:
4. Store the reference information
Reference objects store information about the source of the data. There are a number of fields that you can use to store this information, the most common being doi, title, and url (all of which are strings). If possible, use the doi field since this is a unique identifier that can be used to unambiguously look up sources.
reference = Reference() reference.doi = row #row references the fifth column of the current row and the string from this cell will be stored in the doi field
5. Store the material information
Material objects store information about the material which is under investigation. Available fields are chemical_formula, common_name, and condition. In this example, we have the chemical formula as well as several conditions (structure, crystallinity, and crystal system) of the material.
First, the chemical formula can be stored in the relevant field
material = Material() material.chemical_formula = row #row references the second column of the current row
The material condition field stores Value objects and we'll create one object for each of the conditions.
structure = Value() structure.name = “Structure” structure.scalar = row #the structure from the first column is stored as a scalar crystallinity = Value() crystallinity.name = “Crystallinity” crystallinity.scalar = “Single Crystal” #This is the same for every row and is only provided in the heading so it can be hard coded crystal_system = Value() crystal_system.name = “Crystal System” crystal_system.scalar = “Cubic” #This is the same for every row and is only provided in the heading so it can be hard coded
We store each of these conditions in a list:
material.condition = [structure, crystallinity, crystal_system] #store a list of value objects
6. Store the measurement information
The Measurement object is used to store information about a measurement and the conditions under which it was taken. In this example, we have two measurements: Shear modulus (G0) and Bulk Modulus (K0). We also have information that the measurements were taken at the standard conditions; these fields will be saved as conditions of the measurement.
First, store the information about the measurement conditions; these apply to both measurements.
temperature = Value() temperature.name = “Temperature” temperature.scalar = “Standard” #The header states that the measurements were taken at standard conditions so this can be hard coded for each row pressure = Value() pressure.name = “Pressure” pressure.scalar = “Standard” #The header states that the measurements were taken at standard conditions so this can be hard coded for each row
Next, create Value objects to store the information about the measurement properties, bulk and shear modulus.
bulk_modulus = Value() bulk_modulus.name = “Bulk Modulus K$_0$” #Citrination uses LaTeX notation to represent symbols, superscripts and subscripts bulk_modulus.scalar = row #bulk modulus is given in the third column of each row bulk_modulus.units = “GPa” #units are given in the heading and can be hard coded shear_modulus = Value() shear_modulus.name = “Shear Modulus G$_0$” #Citrination uses LaTeX notation to represent symbols, superscripts and subscripts shear_modulus.scalar = row #shear modulus is given in the fourth column of each row shear_modulus.units = “GPa” #units are given in the heading and can be hard coded
Then, we will create a Measurement object for each measurement and store the property and condition information in the relevant fields.
bulk_modulus_measurement = Measurement() bulk_modulus_measurement.property = bulk_modulus bulk_modulus_measurement.condition = [temperature, pressure] shear_modulus_measurement = Measurement() shear_modulus_measurement.property = shear_modulus shear_modulus_measurement.condition = [temperature, pressure]
We will also indicate that the data is experimental data. The data_type field can only store either the string "Experimental" or the string "Computational"
shear_modulus_measurement.data_type = “Experimental” bulk_modulus_measurement.data_type = “Experimental”
7. Combine the row's information into a sample
Once you have stored all the information from a given row, you will need to combine this into a sample object by storing the information in the relevant fields.
sample = Sample() sample.reference = reference sample.material = material sample.measurement = [bulk_modulus_measurement, shear_modulus_measurement]
8. Store the sample in the samples list
For each row in the CSV file append the sample object for that row to the samples list.
9. Dump the samples to a JSON file using the mif.dump method
mif.dump functions in a very similar way to the json.dump method in Python's JSON module and it can accept all the same arguments as json.dump.
with open(“output.json”, “w”) as output_file: #create an output file mif.dump(samples, output_file, indent=4) #dump the sample list to JSON and include an indent of 4 so that the file can be reviewed more easily
10. Run your Python script
Your script is now complete! You can run it and view the results in the file output.json.
Create your own MIF and contribute to Citrination
Now that you are familiar with mifkit, feel free to share some of your data on Citrination! By uploading data to this platform, you will be contributing to a growing dataset which is making materials data more open, accessible, and useful.
To contribute data, go to https://citrination.com/data_uploads/new.
If you have any questions or suggestions regarding this post, please contact us. Come back soon for our post on using our CSV template to structure your data.
The full script can be downloaded from here.