Did you ever face a TypeError: Object of type ndarray is not JSON serializable,
TypeError: Object of type int64 is not JSON serializable or something along those
lines? I certainly did! This is happening because you are trying to serialize a numpy
datatype using the default JSONEncoder included in the Python standard library.
I frequently use pandas and numpy in my projects and I have to store the results in a NoSQL database. The results are usually nested dictionaries generated dynamically in the algorithm. Initially I was explicitly type casting the numpy datatypes to pure python datatypes before converting to a json string. But it quickly became cumbersome and code started to look... lets say not so clean. So I started looking for a better solution. I found a stackoverflow answer 1 and a linked GitHub discussion 2 that suggested using a custom JSONEncoder to handle the numpy numeric and array datatypes.
While np.number and np.ndarray type objects throw a TypeError when trying to
serialize them, np.nan and np.inf are gracefully serialized as NaN and Infinity
respectively. But according to JSON specification 3, NaN and Infinity are not
valid JSON values. So if we want a valid JSON string which we can eventually save into
a NoSQL database, we need to handle these datatypes as well. Pure python float('inf')
and float('nan') also face the same issue. Let's see how we can handle all these
datatypes.
To use a custom
JSONEncodersubclass (e.g. one that overrides the.default()method to serialize additional types), specify it with theclskwarg; otherwiseJSONEncoderis used.
This is from the docstring of the dumps method in the json module. If we have a
custom datatype that is not serializable by the default JSONEncoder, we
can create a custom encoder class to define how to serialize that datatype. This custom
class inherits the default JSONEncoder and overrides the default method of the
JSONEncoder class.
Here are the numpy datatypes that are not serializable by the default JSONEncoder and
the python datatypes they closely resemble.
| Numpy Datatype | Pure Python Datatype |
|---|---|
| numpy.int8 | int |
| numpy.int16 | int |
| numpy.int32 | int |
| numpy.int64 | int |
| numpy.float16 | float |
| numpy.float32 | float |
| numpy.float64 | float |
| numpy.ndarray | list |
We can create a custom encoder class that will convert the numpy datatypes to the corresponding python datatypes.
import json
import numpy as np
class NumpyEncoder(json.JSONEncoder):
"""
This encoder can be used to convert incompatible numpy data types
to types compatible with json.dumps()
Use like json.dumps(output, cls=NumpyEncoder)
"""
def default(self, o):
if isinstance(o, np.integer):
return int(o)
elif isinstance(o, np.floating):
return float(o)
elif isinstance(o, np.ndarray):
return o.tolist()
return json.JSONEncoder.default(self, o)import json
import numpy as np
data = {
'int': np.int64(42),
'float': np.float64(3.14),
'array': np.array([1, 2, 3, 4, 5])
}
json_data = json.dumps(data) # This will raise a TypeError
json_data = json.dumps(data, cls=NumpyEncoder) # This will workThis still serializes np.nan and np.inf as NaN and Infinity respectively. The
dumps method has allow_nan argument which can be set to False, but this just
raises a ValueError if the input contains nan or inf.
pip install simplejsonsimplejson is an externally maintained JSON encoder/decoder with a similar interface. 4
The dumps method in simplejson has an argument ignore_nan which can be set to
True to serialize nan, inf and -inf to null. null is a valid JSON value. So
lets swap out the json module with simplejson and see how it works.
import simplejson as json
import numpy as np
class NumpyEncoder(json.JSONEncoder):
"""
This encoder can be used to convert incompatible data types
to types compatible with json.dumps()
Use like json.dumps(output, ignore_nan=True, cls=NumpyEncoder)
"""
def default(self, o):
if isinstance(o, np.integer):
return int(o)
elif isinstance(o, np.floating):
return float(o)
elif isinstance(o, np.ndarray):
return o.tolist()
return json.JSONEncoder.default(self, o)import simplejson as json
import numpy as np
data = {
'int': np.int64(42),
'float': np.float64(3.14),
'array': np.array([1, 2, 3, 4, 5]),
'nan': np.nan,
'inf': np.inf,
'-inf': -np.inf
}
json_data = json.dumps(data, ignore_nan=True, cls=NumpyEncoder)
print(json_data)This will output a valid JSON string.
{
"int": 42,
"float": 3.14,
"array": [1, 2, 3, 4, 5],
"nan": null,
"inf": null,
"-inf": null
}We have seen how to create a custom JSONEncoder to handle numpy datatypes and how to
serialize nan, inf and -inf as null using simplejson. This is a simple and
clean solution to the problem of serializing numpy datatypes. This will make the code
cleaner and more readable.
json.loads. You will
have to explicitly convert the python datatypes back to numpy datatypes.