After manipulating and analyzing numerical data with NumPy, you'll often need to save your arrays to disk for later use, sharing, or as part of a larger data processing pipeline. Conversely, you'll need to load existing data from files into NumPy arrays. NumPy provides straightforward and efficient functions for these input/output (I/O) operations, handling both simple text formats and optimized binary formats.
Text files, such as comma-separated values (CSV), are human-readable and easily shared across different applications. NumPy offers functions to interact with these formats.
The np.savetxt()
function saves a NumPy array to a text file. Its basic usage requires the filename and the array to save.
import numpy as np
# Create a sample 2D array
data_array = np.arange(12, dtype=np.float32).reshape(3, 4)
print("Original Array:")
print(data_array)
# Save the array to a text file (e.g., CSV)
np.savetxt('data_array.csv', data_array, delimiter=',')
print("\nArray saved to data_array.csv")
By default, np.savetxt()
uses spaces as delimiters. We specified delimiter=','
to create a standard CSV file. You can also control the output format of the numbers using the fmt
argument. For example, fmt='%.4f'
would save floating-point numbers with 4 decimal places.
To load data from a text file into a NumPy array, use np.loadtxt()
. You need to provide the filename and often the delimiter used in the file.
# Load the array back from the text file
loaded_array_text = np.loadtxt('data_array.csv', delimiter=',')
print("\nArray loaded from data_array.csv:")
print(loaded_array_text)
print("Data type:", loaded_array_text.dtype)
Note that np.loadtxt()
assumes all data can be converted to a float by default. If your data contains different types or has headers, you might need to use more advanced options (like the dtype
or skiprows
arguments) or consider using Pandas for more flexible text file parsing, which we'll cover in the next chapter.
While text files are convenient for inspection and sharing, they can be inefficient in terms of storage space and I/O speed, especially for large arrays. They are also generally limited to saving 1D and 2D arrays.
For efficient storage and faster I/O, especially within Python-based workflows, NumPy's native binary format (.npy
) is preferred.
The np.save()
function saves a single NumPy array to a binary file with a .npy
extension. This format preserves the array's shape, data type, and contents exactly.
# Create another sample array
array_to_save = np.linspace(0, 1, 10).reshape(2, 5)
print("\nArray to save in binary format:")
print(array_to_save)
# Save the array to a .npy file
np.save('single_array.npy', array_to_save)
print("\nArray saved to single_array.npy")
The resulting .npy
file is not human-readable but is highly optimized for NumPy.
Use np.load()
to load an array from a .npy
file. NumPy automatically handles reading the metadata (shape, dtype) and the data itself.
# Load the array from the .npy file
loaded_array_binary = np.load('single_array.npy')
print("\nArray loaded from single_array.npy:")
print(loaded_array_binary)
print("Data type:", loaded_array_binary.dtype)
print("Shape:", loaded_array_binary.shape)
If you need to save multiple arrays into a single file, np.savez()
is the tool. It saves the arrays into an uncompressed archive file with a .npz
extension. You provide the arrays as keyword arguments, which will be used as keys to retrieve the arrays later.
# Create multiple arrays
array_a = np.array([1, 2, 3, 4])
array_b = np.random.rand(2, 3) # A 2x3 array of random numbers
print("\nSaving multiple arrays (array_a, array_b) to arrays.npz")
# Save multiple arrays into an .npz archive
np.savez('arrays.npz', first_array=array_a, second_array=array_b)
Loading data from an .npz
file also uses np.load()
. It returns a NpzFile
object, which behaves like a dictionary. You can access the individual arrays using the keys you provided during saving.
# Load the .npz archive
loaded_data = np.load('arrays.npz')
print("\nArrays loaded from arrays.npz:")
# Access individual arrays using keys
loaded_a = loaded_data['first_array']
loaded_b = loaded_data['second_array']
print("Loaded 'first_array':")
print(loaded_a)
print("\nLoaded 'second_array':")
print(loaded_b)
# You can list the keys (array names)
print("\nKeys in the archive:", loaded_data.files)
# Important: Close the file handle when done
loaded_data.close()
It's good practice to call .close()
on the loaded NpzFile
object when you are finished with it, or use a with
statement for automatic handling:
with np.load('arrays.npz') as data:
arr1 = data['first_array']
arr2 = data['second_array']
print("\nLoading within 'with' statement - arr1 shape:", arr1.shape)
If file size is a significant concern, you can use np.savez_compressed()
. It works identically to np.savez()
but compresses the output .npz
file using zipfile
. This results in smaller files but takes longer to save and load.
# Save multiple arrays into a compressed .npz archive
np.savez_compressed('arrays_compressed.npz', first_array=array_a, second_array=array_b)
print("\nArrays saved to compressed file: arrays_compressed.npz")
# Loading is the same as with np.savez
with np.load('arrays_compressed.npz') as data_comp:
loaded_a_comp = data_comp['first_array']
print("\nLoaded 'first_array' from compressed file:", loaded_a_comp)
np.savetxt()
/ np.loadtxt()
when:
np.save()
/ np.load()
when:
np.savez()
/ np.savez_compressed()
when:
Being able to persist and reload your numerical data efficiently is fundamental in machine learning. Whether saving processed datasets, intermediate calculations, or model parameters like weights and biases learned during training, these NumPy I/O functions provide the necessary tools.
© 2025 ApX Machine Learning