The dawn of the Moneyball era in materials and manufacturing

Moneyball movie poster

Moneyball movie poster

This is a special time of year for me: the beginning of a new baseball season, and the hope against hope that the Chicago Cubs can finally win a World Series after a 107-year championship drought (here’s a realistic view of what that would look like).

While my research career and work at Citrine focus on materials informatics, I also do sports analytics as a hobby. Baseball stands out among American professional sports for being particularly data-obsessed, and the book and movie Moneyball have elevated baseball analytics to a pop culture phenomenon. Billy Beane, general manager of the Oakland Athletics, famously used advanced data analytics to gain a competitive edge against perennial titans such as the New York Yankees, despite having one of the smallest payrolls in baseball.

We founded Citrine because we want to help customers unlock a Moneyball edge in materials and manufacturing. Just as the Oakland Athletics became four times more efficient (in terms of payroll dollars per win) than the Boston Red Sox by harnessing the power of data, materials and manufacturing companies can make R&D and production dramatically more efficient by analyzing large-scale data about the materials and chemicals they use.

Comparing my passions for baseball stats and materials informatics, I am struck by how absolutely integral data analytics is to baseball in comparison to the status quo in materials. To make the point, I’ll give some examples of statistically-derived facts that are commonplace in baseball, and analogous questions in materials that are impossible to answer without inordinate effort:


Most and least efficient Major League teams in terms of payroll dollars per win in the 2012 baseball season. Source:

Most and least efficient Major League teams in terms of payroll dollars per win in the 2012 baseball season. Source:

  • Colorado Rockies rookie Trevor Story became the first player in Major League history to hit home runs in his first three games as a professional player
  • On average, a team will win a game for every 10 total runs they score
  • My childhood idol Mark Grace was the last Chicago Cubs player to hit for the cycle (a single, double, triple, and home run in the same game), and did so on May 9, 1993


  • What is the highest-reported superconducting critical temperature as a function of year? How about by journal and by year?
  • Which commercial aluminum alloys have elastic modulus above 75 GPa and yield strength above 250 GPa?
  • How does the adsorption energy of organic molecules on Au(111) vary according to the molecular masses of those molecules?
  • What chemical features are most important in governing the viscosity of paint?
Correlation between relative payroll and regular season win percent for all non-Oakland Major League teams from 2000-2013, where each point represents the a binned average of 15 team-seasons. The Oakland Athletics’ performance is shown in green. Source:  

Correlation between relative payroll and regular season win percent for all non-Oakland Major League teams from 2000-2013, where each point represents the a binned average of 15 team-seasons. The Oakland Athletics’ performance is shown in green. Source:


When any discipline becomes more data-driven, it will not ever revert back to the old pure-intuition way of doing things. Consider the quantitative revolution on Wall Street that upended the finance industry in the 1980s and 1990s. In baseball, Moneyball thinking has permanently transformed the game from the front office on down. Teams are locked in an arms race for analytics talent, managers are shifting their defensive formations and making frequent, subtle in-game adjustments, and star players like Zack Greinke openly aim to optimize their performance using advanced statistical analyses.

No such transformation has yet occurred in materials. The above materials questions, while comparable in complexity to the example baseball insights, would each take tens or hundreds of hours of manual data collection and analysis to answer satisfactorily. As a result, groundbreaking materials insights remain hidden in data, awaiting discovery. Our vision at Citrine is to instantly reveal these Moneyball insights to our customers using large-scale data aggregation and machine learning.

The sharp contrast between the data-intensiveness of baseball discourse and comparably analytics-starved materials science raises an obvious question: What causes this difference? I outline some important distinctions below.

Centralization of data
Exhaustive, clean historical data sets available from industry standard companies such as Elias Sports Bureau or Stats LLC
No entity has collected all materials and chemical data, though Citrine is working toward this goal
Standardization of data
The baseball community has agreed on which aspects of games should be recorded, and how to record them
The materials community lacks data standards, but Citrine is working on changing this situation
Variability and gaps in data
Baseball data are unambiguous: a hit is a hit, no matter who is keeping score, and key in-game events are always recorded properly
All experiments have innate uncertainty, researchers may not fully document aspects of their work, and materials phenomena are best described as probability distributions, not scalar facts.
Relational nature of data
A finite set of entities such as players, positions, and teams have well-defined properties such as games, at-bats, and strikeouts that can be readily mined with simple SQL-style queries
How similar is iron-deficient Fe0.95O to FeO? Which materials are sensibly described with the notion of a chemical formula? Across which dimensions could we compare polyethylene to Inconel?
Relevance of qualitative data
Baseball teams still utilize scouting reports, which are qualitative evaluations of player performance by domain experts
The idea of “chemical resistance” is incredibly important to polymers, even though quantifying it (how resistant to which chemicals?) is challenging

To illustrate just how stark the difference is between baseball and materials, here is a hypothetical baseball data dystopia: Imagine if officials in different stadiums all chose to record different aspects of their games, using a range of non-standard nomenclatures (is a "home run" the same thing as a "four bagger" and a “round trip?”) and then published these data in idiosyncratic box score formats in hundreds of different newspapers, months or years after the game. Further, many key observations from the game would only be recorded privately and never published. Unfortunately, this is precisely the reality we face in the materials community today.

Given the above set of facts, the challenges to unleashing the Moneyball era in materials are daunting. But the opportunities for radical data-driven advancement are even more exciting, and that is precisely why we created the Citrine platform. We have developed a data extraction pipeline that turns documents about materials and chemicals into a highly structured, machine-readable database of facts and relationships. We are building grassroots support around our open MIF (Materials Information File) and next-generation PIF (Physical Information File) standards for representing materials data. We use a combination of heuristics and machine learning to resolve gaps and ambiguities in materials datasets. We have created a toolset that enables extremely powerful searches and predictive AI-based models of large-scale materials data in spite of the complex, non-relational nature of those data. And finally, we are engaging the global community of materials researchers to help us organize and curate the public data on our platform.

We are convinced that a combination of state-of-the-art software and a brilliant user community can bring about the data-driven future of materials and manufacturing that will launch entire industries forward. We’re proud to play our part in this transformation, and to put cutting-edge data analytics capabilities into the hands of visionary Beanesian materials scientists and engineers. Citrine can’t help the Cubs win the World Series, but our platform can crunch huge volumes of data to optimize the properties of the advanced materials in the helmets and shoes they’ll wear when they finally do break through. That’s enough for at least an honorary championship ring, right?



Why Sharing Data is Important

At Citrine Informatics, we believe that everyone should have free access to the materials-related data they need. Our team is working hard to make materials data open, accessible, and useful, fostering a data-themed community of researchers and furthering materials science in the process. 

To achieve these goals, we need your help. Our team has built the infrastructure and done a significant amount of initial data aggregation to make Citrination the largest open access database for the physical world. Currently, it contains over 3 million data points and is continually growing. To keep up the momentum, however, we require users, like you, to contribute and engage with our platform. We encourage all researchers in materials science and engineering to share some (or all) of their data with us. Every bit helps, whether it be just a few property values or gigabytes of research output. You can also interact with the platform by commenting on data points, creating your own user profile, or viewing other's profiles. 

Making your research results available to others in the materials science and engineering field is good for the whole community and will allow others to learn from and build upon what you have done. Of course, you can also benefit from the data you find on Citrination, using it to supplement or validate the work you are doing yourself. In addition, there are a number of other benefits to using and contributing to Citrination. When you contribute to Citrination, your data will be: 

  • Structured, organized, and searchable. Powerful semantic search allows you to find the data you need quickly and easily. 
  • Available online. Use Citrination as a place to host your data, to comply with data management requirements, or as a backup of your work. 
  • Linkable. Share links to data points or entire datasets with your colleagues and others. 
  • Visible. Once on Citrination, your data is accessible to people all around the world. Popular datasets on the site receive a similar number of page views as a Nature Materials paper 2-3 months after publication. 

We currently have users from around 2,000 institutions worldwide who use Citrination to find, access, and store data. Send us an email if you would like to join them and contribute, learn more, or chat to use about your research and ideas. 


My Internship Experience at Citrine


My Internship Experience at Citrine

Working as an intern in a small, fast-moving team has been especially rewarding as projects I've worked on actually see the light of day. Work that I've done has been delivered to customers, and demo'd to investors and potential customers.



Analytics Platform: Build or Buy?

As more materials and manufacturing companies consider acquiring data analytics capabilities, a question we often hear from customers is, "How is Citrine's analytics platform better than what we could build in-house?" 



Create a MIF: Materials Information File

This blog post will walk you through the steps you will need to follow in order to create a Materials Information File (MIF). The MIF is a flexible, JSON-based schema that has been developed to impose structure on materials data. More information on this file format can be found here

Setting up 

Citrine provides a Python toolkit for working with MIF files called mifkit (source code and installation instructions are available here.) We use mifkit throughout this post, as well as Python's built in csv module for parsing CSV files. 

The Data

In this post, we will convert the following table of bulk and shear moduli to the MIF schema. The table also provides information about the materials themselves and the conditions at which the measurements were taken. 


Before we start, we will export this spreadsheet to a CSV, which would look like this:

Let's get started writing a script that will convert this data to a MIF. As we are working with information about a material-measurement pair, we will use the Sample object, part of the MIF's core schema, to store this data. 

1. Import your modules 

Create a Python file and import mifkit and any additional modules you will need to parse the data. 

# -*- coding: utf-8 -*- set the coding here as the input file contains non-ascii characters
from mifkit import mif
from mifkit.objects import * #using import * imports all possible MIF objects 
import csv

2. Open your data file 

Open the data file and parse its content using the CSV module. As the first two rows in the sample file are headers, we will use the next() function to skip over these when iterating through the input file. 

with open("input_table.csv", "rU") as f: #this opens our data file ‘input_table.csv’ in universal read mode
reader = csv.reader(f) #parse the data using the csv module
next(reader) #skip row one
next(reader) #skip row two

3. Create a list to store information from each row and loop through each row of the CSV after the headers 

samples = []
for row in reader:

4. Store the reference information 

Reference objects store information about the source of the data. There are a number of fields that you can use to store this information, the most common being doi, title, and url (all of which are strings). If possible, use the doi field since this is a unique identifier that can be used to unambiguously look up sources. 

reference = Reference()
reference.doi = row[4] #row[4] references the fifth column of the current row and the string from this cell will be stored in the doi field

5. Store the material information 

Material objects store information about the material which is under investigation. Available fields are chemical_formula, common_name, and condition. In this example, we have the chemical formula as well as several conditions (structure, crystallinity, and crystal system) of the material. 

First, the chemical formula can be stored in the relevant field 

material = Material()
material.chemical_formula = row[1] #row[1] references the second column of the current row

The material condition field stores Value objects and we'll create one object for each of the conditions. 

structure = Value() = “Structure”
structure.scalar = row[0] #the structure from the first column is stored as a scalar

crystallinity = Value() = “Crystallinity”
crystallinity.scalar = “Single Crystal” #This is the same for every row and is only provided in the heading so it can be hard coded

crystal_system = Value() = “Crystal System”
crystal_system.scalar = “Cubic” #This is the same for every row and is only provided in the heading so it can be hard coded

We store each of these conditions in a list: 

material.condition = [structure, crystallinity, crystal_system] #store a list of value objects

6. Store the measurement information 

The Measurement object is used to store information about a measurement and the conditions under which it was taken. In this example, we have two measurements: Shear modulus (G0) and Bulk Modulus (K0). We also have information that the measurements were taken at the standard conditions; these fields will be saved as conditions of the measurement. 

First, store the information about the measurement conditions; these apply to both measurements. 

temperature = Value() = “Temperature”
temperature.scalar = “Standard” #The header states that the measurements were taken at standard conditions so this can be hard coded for each row

pressure = Value() = “Pressure”
pressure.scalar = “Standard” #The header states that the measurements were taken at standard conditions so this can be hard coded for each row

Next, create Value objects to store the information about the measurement properties, bulk and shear modulus. 

bulk_modulus = Value() = “Bulk Modulus K$_0$” #Citrination uses LaTeX notation to represent symbols, superscripts and subscripts
bulk_modulus.scalar = row[2] #bulk modulus is given in the third column of each row
bulk_modulus.units = “GPa” #units are given in the heading and can be hard coded

shear_modulus = Value() = “Shear Modulus G$_0$” #Citrination uses LaTeX notation to represent symbols, superscripts and subscripts
shear_modulus.scalar = row[3] #shear modulus is given in the fourth column of each row
shear_modulus.units = “GPa” #units are given in the heading and can be hard coded

Then, we will create a Measurement object for each measurement and store the property and condition information in the relevant fields. 

bulk_modulus_measurement = Measurement() = bulk_modulus
bulk_modulus_measurement.condition = [temperature, pressure]

shear_modulus_measurement = Measurement() = shear_modulus
shear_modulus_measurement.condition = [temperature, pressure]

We will also indicate that the data is experimental data. The data_type field can only store either the string "Experimental" or the string "Computational" 

shear_modulus_measurement.data_type = “Experimental”
bulk_modulus_measurement.data_type = “Experimental”

7. Combine the row's information into a sample 

Once you have stored all the information from a given row, you will need to combine this into a sample object by storing the information in the relevant fields. 

sample = Sample()
sample.reference = reference
sample.material = material
sample.measurement = [bulk_modulus_measurement, shear_modulus_measurement] 

8. Store the sample in the samples list 

For each row in the CSV file append the sample object for that row to the samples list. 


9. Dump the samples to a JSON file using the mif.dump method 

mif.dump functions in a very similar way to the json.dump method in Python's JSON module and it can accept all the same arguments as json.dump. 

with open(“output.json”, “w”) as output_file: #create an output file
mif.dump(samples, output_file, indent=4) #dump the sample list to JSON and include an indent of 4 so that the file can be reviewed more easily

10. Run your Python script 

Your script is now complete! You can run it and view the results in the file output.json. 

Create your own MIF and contribute to Citrination 

Now that you are familiar with mifkit, feel free to share some of your data on Citrination! By uploading data to this platform, you will be contributing to a growing dataset which is making materials data more open, accessible, and useful. 

To contribute data, go to 

If you have any questions or suggestions regarding this post, please contact us. Come back soon for our post on using our CSV template to structure your data. 

The full script can be downloaded from here



Insights from the Intersection: Applying Data Science Thinking to Materials

I’ve spent the summer working at Citrine fresh out of an undergraduate degree where I studied both Materials Science and Computer Science at Stanford. Though I thoroughly enjoyed studying both fields, I found limited opportunities to apply the two together until beginning work here. While companies in entertainment and shopping have reaped the benefits of massive data sets, many fields in the scientific community, notably materials science, have remained largely separate from data science even as they amass huge quantities of data. Working with materials data at Citrine has made me reflect on differences between how data scientists and materials scientists can perceive data in different ways, and how insights from data science can benefit materials research.

When we need accuracy:

A first insight from data science comes from reframing the idea of how information can be used in the research process. In academic materials science, the focus of study tends to be very narrow. Researchers write papers around a single breakthrough result, and can spend years studying a single material. In this frame of mind, the accuracy of a single datapoint is extremely important and vital for making progress. However, maintaining this focus on individual data points at all times during the research process can slow the pace of development by constraining research to known areas. In the view of a data scientist, understanding larger patterns in the data is more important than the accuracy of any single point. Citrine’s technology creates value from finding hidden patterns in the data, and applying those patterns to generate new insights. Using the high throughput and minimal computing resources required for machine learning algorithms, large amounts of information can be generated to direct research into valuable new areas that would not have been considered in a more narrow scope of the data.

Garbage in, garbage out:

In taking a data-driven approach to understanding problems, one of the most important problems that data scientists face is ensuring data quality. The quality of any insights can only ever be as good as the quality of the data on which they are based. This is especially true for machine-learning based models, which cannot fall back on physics if there are problems with the data. Here at Citrine, our solution to the issue of data reliability has been Citrination, a community-driven common repository for all different types of materials data.  Having a trusted, comprehensive location for materials data would also be extremely useful for researchers, who could save valuable time by quickly validating data by surveying similar results or checking results against models built on existing data.

Follow the numbers:

The core concept of data science is potentially the most valuable for materials science: the belief that there is a wealth of important information hidden in patterns that can be uncovered given enough quality data. Reframing materials data analysis as pattern detection means that Citrine’s technology is not bound by the current limits in scientific knowledge. We are able to tackle problems that scientific intuition does not yet have the means to explain or understand by finding complicated patterns and relationships through machine learning. Not only can these patterns help accelerate development, but they can also lend insight to scientific understanding by uncovering connections between things that do not immediately seem relevant.

Citrine brings together people and ideas from both data science and materials science and applies these insights to make research and development faster and smarter.




Data Highlight: Elastic Constants for Single-Crystal Oxides

Many thanks go out to David Teter, Ph.D., of Teter Engineering, for contributing a great data set of elastic constants for some single-crystal oxides. 

Anisotropic single crystal materials have direction-dependent physical properties such as thermal expansion or elasticity that can't adequately be expressed by scalar values. This is why sometimes you will see a matrix of values for a material property on Citrination. Anisotropy is particularly important when a material must endure exposure to extreme forces, temperatures, or a combination of the two. Jet engine components, for example, often require specific engineering of crystal orientations to achieve performance targets. The list of mechanical properties and constants contributed by Dr. Teter is exactly the kid of data that we strive to have readily accessible to the community for efficient materials selection and development. 


1 Comment

Lessons from the lab: ALL data matters

Every day, graduate students in science and engineering fields generate data of varying quality, most of which – especially negative results – are never published. Journal referees and editors are the primary arbiters of what is the most interesting or novel to the research community, and the nature of the peer review and journal acceptance process inevitably leads to the exclusion of some potentially valuable results. Excluding a sometimes-significant portion of results from publications is a detriment to researchers and to research progress because others can't glean a comprehensive view of all the work done to learn from past mistakes. 

Every chemistry graduate student pursuing a PhD must pass a candidacy exam to be considered a PhD candidate. This is usually in the form of a presentation or written paper that is reviewed by a committee of four or five professors. Passing this exam indicates that the committee has confidence in your abilities and direction to obtain a PhD, leaving you with the task of making a contribution to science over the next two to four years. My candidacy presentation started at 8am on a Monday. I spent an hour and a half being questioned by my five professor committee. I was promptly sent out of the room to allow for deliberation. Twenty long minutes later, I was told that I had conditionally passed my candidacy exam, with an emphasis on conditionally. My committee informed me that the few positive results I had presented were not indicative of two years of work. Years of reading positive results in the literature and seeing post-doctoral researchers successfully pump out fantastic results showed me what was valued. So, I made my presentation with a major focus on the few positive results I had obtained. In subsequent talks with committee members, I learned that they were expecting to see all the negative results that I had generated, and how i had overcome experimental hurdles to obtain my few positive results. Had I included a summary of my negative results as well, it could have been a very different exam. This experience changed my perception of the importance of negative results, and the process by which you learn from them in pursuing positive results. This notion became even more apparent in the lab when I took on a project that required reproducing a previously published work from our research group. 

Reproducing results from past scientific publications is a common starting point for many research projects. It provides a basis for comparison and often validates a material or process for further application in the project. As a graduate student in chemistry, I started a solar to fuel conversion project by trying to reproduce a seminal paper published in our own research group a decade earlier. The fabrication method involved a number of steps that would produce uniquely shaped silver nanowires in an array that held promise as a light harvesting material. My professors remembered the process as being robust, with straightforward methods that should take only a week or two to reproduce and extrapolate to other materials. My initial attempts to utilize the process to produce the structures were unsuccessful. With two undergraduates working with me, we spend months changing half a dozen experimental parameters, purchasing fresh precursor materials, and still were not able to reliably obtain the structures. Even speaking with the first author over the phone didn't solve the problem, he said he didn't remember it being particularly difficult and that there wasn't any trick to consistently produce the structures. What was made clear was the hundreds of samples produced over the course of perfecting the process prior to publishing the results. Ultimately, the answer turned out to be a longer aging step than reported in the publication. In the end, reproducing the work cost hundreds of lab hours and thousands of dollars in microscopy characterization time. The biggest cause of this was figuring out how each parameter in the process explicitly impacted the structures. If only I could have seen the data from the hundreds of samples analyzed in producing the original work, then I might have gleaned some valuable insights into what variables to modify. 

All data matters: they are an essential part of the research process and should be accessible to anyone viewing a published work. In the early days, scientists and engineers came together to dispute and validate claims made by others in the field. Today, the digital revolution in data makes it easier than ever to communicate, organize, and access data. This leaves perception as the biggest barrier to change. Here at Citrine, we care deeply about transparency through research data, and provide a platform to store, organize, and access all the results generated in producing great research. 

1 Comment


Data Highlight: Plasmon-Enhanced Upconversion

Many thanks go out to Diane Wu, a PhD candidate in Stanford's Department of Chemistry, for contributing a data set from her recent review paper on Plasmon-Enhanced Upconversion

Upconversion is becoming a more commonly used method to improve light absorption in photovoltaic/photocatalytic systems as well as for background-free bioimaging. The concept of upconversion involves converting lower energy photons to higher energies using paired absorber and emitter materials. This is particularly valuable when the efficiency of photovoltaics or the utilization of light within biological samples can be improved by upconverting two lower energy photons into one higher energy photo. Improving the efficiency of this process has led to numerous unique methods including plasmonic enhancement of the emitter via nano-structured gold or silver. Diane We's review takes a very multidisciplinary area of materials research and comprehensively surveys the methods of improving upconversion efficiency and the current state of the art. Contributing the results of her extensive survey to Citrination benefits all those working in the field as well as those selecting optimal materials for device and imaging applications. 



Data Highlight: Temperature Programmed Desorption Data

Many thanks go out to Josh Buffon, from UCSB's Department of Chemistry and Biochemistry, for contributing some temperature programmed desorption (TPD) data from his model catalyst research. 

This week's user data highlight illustrates the increasing variety of data that users continue to submit to Citrination on a regular basis. Josh Buffon's TPD data shows certain mass fractions of a model catalytic reaction as a function of temperature studied under ultra-high vacuum in a custom-built characterization setup. Having diverse data types such as these TPD results adds to the growing Citrination community and to the discussions that spark when such model studies are performed and shared. 


1 Comment

Data Highlight: Hydrotalcite Heterogeneous Catalyst

Many thanks go out to Jacob Barrett, from UCSB's Department of Chemistry and Biochemistry, for contributing some recent powder X-ray Diffraction patterns

Heterogeneous catalysts have continued to be one of the most applicable and diverse areas of materials science at an industrial scale. From hydrocarbon cracking using zeolites to noble metal catalysts in your car's exhaust system, the development of more effective catalysts continues to be an active are of research. Often times, earth-abundant minerals serve as the inspiration for heterogeneous catalysts given their relative abundance and low cost. This week's data highlight shows a great example of a hydrotalcite and doped hydrotalcite heterogeneous catalyst synthesized in the Ford Group at UCSB for use in biomass conversion to valuable chemical feedstocks. 

1 Comment


Data Highlight: Semiconductor bandgap, conduction band, and valence band values

Many thanks go out to Nirala Singh, from UCSB's Department of Chemical Engineering, for contributing a curated semiconductor band gap data set

Aqueous electrochemistry and photo-electrochemistry have become increasingly important areas of research for groups pursuing water splitting or solar-to-fuel conversion. The energy storage theme continues this week with valuable oxide, sulfide, and phosphide semiconductor materials data curated by Nirala Singh and others in the McFarland group at UCSB. This particular dataset highlights the conduction and valence band levels reported throughout the literature vs. vacuum as well as vs. normal hydrogen electrode (NHE) at neutral pH. Exemplary group-wide efforts to generate data sets such as these truly benefit the greater research community on Citrination, saving others time from searching for materials data throughout literature. 


President Obama's Clean Power Plan Demands Materials Innovation


President Obama's Clean Power Plan Demands Materials Innovation

Today, the Obama Administration announced its Clean Power Plan seeking a reduction by 32% of greenhouse gas emissions from 2005 levels[1]. There are many intricacies to this plan: focusing on building blocks that can be deployed to achieve state-specific reduction targets and an incentive program to prevent a short term run to natural gas over renewables. Perhaps most interesting, and largely ignored as a secondary issue, is how the Obama Administration’s broader efforts over the last 6.5 years have laid the technological and infrastructural groundwork for these policies to succeed in the long term. This has included building a project finance community that understands renewable energy financing, a grid that can respond to both changing demand and changing supply, and fundamental technological innovation that can push clean electrical generation down the cost/performance curve. The latter two of these are reliant on new generations of materials innovation that can only be enabled by following new research paths. Citrine is using its proprietary materials data mining platform and working with partners in government, academia, and industry to help solve these problems.

The Challenge of Renewables

The most obvious path to a reduction in carbon emissions is a strong shift toward renewable energy. Solar, wind, tidal power and others produce no carbon emissions, but present real challenges to the wide adoption that would help achieve the President’s emissions reduction goals.


There are many things that can happen culturally that might reduce Americans’ hunger for energy, but one thing is sure: we expect electricity to be there when we plug into an outlet. Unfortunately, most renewable energy sources are intermittent, so the only way they could cover a large percentage of our power needs would be to store energy for when it is needed. Grid storage, sadly, is not yet broadly deployable. As of 2013, all deployed storage has less than 0.00062% of our electrical grid’s capacity[2,3]. Being able to store and deploy intermittent energy sources require a major change in storage technologies: a combination of batteries, kinetic storage (flywheels, etc.), and pumped hydro. The two of these three that can be deployed without a conveniently placed mountain are batteries and kinetic storage, and both need major materials breakthroughs before they can be deployed at scale.


Even if the storage problem were solved, renewable power has just now broken cost parity with fossil energy, so without incentives, some power producers could select to install more fossil capacity[4]. By further reducing the cost of renewable capacity, that decision goes away. To reduce the cost will require new breakthroughs in material and device technology as well as the creation of systems that are cheaper to install, maintain, and operate that current technologies. While this is most certainly an interdisciplinary opportunity, materials innovation plays a big role: cheaper and stronger magnets for wind turbines, lighter turbine blade materials, more efficient solar panels, coatings to prevent weathering, etc. These each would have a profound stimulus effect on the deployment of renewable capacity.

Fossil Isn’t Dead Yet

Though some outlets have been saying that these new policies mean the end for coal generation, and they may indeed lead to such an outcome over the course of decades, the coal industry isn’t going anywhere quite yet. Coal is on the decline, but it is still about 37% of our total power production[3,5]. When it comes down to it, burning ancient organic matter to power a turbine is a very efficient way to generate electricity. Fortunately, there is a materials opportunity in traditional fossil power. Coal, when burned at high temperature and pressure, can have substantially lower emissions than standard boiler designs, but alloys that can withstand the pressure and heat need to de developed, tested, and manufactured. Similarly, natural gas turbines could operate with lower emissions with materials optimized for extreme conditions. This is an area of active research for the Dept of Energy's Office of Fossil Energy (

Exciting Future Ahead

There is an exciting future ahead for power generation. President Obama’s announcement today opens the door to a future when we, as humans, are producing electricity in a way that does not unduly damage our planet. But it is also an ambitious goal that needs innovation at all levels to be reached. The materials community stands ready. Already, new innovation is taking place, led by in the government by ARPA-E ( the Materials Genome Initiative ( We at Citrine are proud to be at the forefront of solving just such challenges with great partners across the spectrum.





Data Highlight: Lithium ion battery electrode materials

Energy storage in the form of Li-ion batteries has become commonplace in our electronic devices and is now enabling transportation, being found in dozens of different car models. Research groups around the world are pushing the materials used for these batteries to improve capacity and cyclability. With hundreds of publications showcasing the latest electrode material performance improvements, it can be difficult to assess the state of these materials from performance and natural resource perspectives. The recent work of Leila Ghadbeigi, Jaye Harada, Bethany Lettier, and Professor Taylor Sparks, published in Energy & Environmental Science earlier this year, analyzes over 16,000 data points from 200+ publications to shed light on cutting edge data-driven battery material design. The authors have made this extensive dataset available to the public, and Jaye Harada has worked with the Citrine team to make these data readily searchable on the Citrination platform. Check it out





Data Highlight: Arkema Polyimide Materials

Citrination continues to grow to include many more commercial polymers. This week, we highlight the inclusion of Arkema Rilsan AMN D and many more. We include break strength, melting properties, moduli, specific gravity, and other mechanical properties. Keep an eye here and on Citrination for some more exciting announcements about datasets going into the system!



Data Highlight: Enabling Organic Electronics

Organic electronic devices are being found in more and more consumer devices, from OLEDs (organic light emitting diodes) in smartphones, to OPVs (organic photovoltaics) for future flexible and inexpensive solar panels. The organic semiconductors that drive these devices are an active area of research, one that Citrine has started to support with new datasets such as the recently uploaded 1-D diffusion length and bandgap sets below. 

Organic Seminconductor Exciton Diffusion Lengths and Diffusion Coefficients 

Organic Semiconductor HOMO/LUMO/Transition State/Bandgap and Photoluminescence/Phosphorescence Lifetime Data Set 

Many thanks go out to Alex Mikhnenko, a Post-Doc in the Nguyen Group in UC Santa Barbara's Department of Chemistry and Biochemistry, for contributing both of these fantastic datasets.