To manage the distinct components of a graph, nodes, edges, and their associated features, within a single, cohesive unit, PyTorch Geometric introduces a specialized structure: the Data object. This object acts as the fundamental building block for all graph-based machine learning in PyG, packaging everything a model needs to know about a graph into one convenient container.Let's examine the primary attributes of a Data object, which you will interact with constantly when building GNNs.Core Attributes of the Data ObjectA Data object can hold several attributes, but a few are central to defining the graph's structure and features. All of these attributes are stored as PyTorch tensors.data.x: The Node Feature Matrix. This tensor holds the features for each node in the graph. It has a shape of [num_nodes, num_node_features], where num_nodes is the total number of nodes and num_node_features is the dimensionality of the feature vector for each node. This is the same node feature matrix X we discussed in earlier chapters.data.edge_index: Graph Connectivity. This is perhaps the most significant attribute. Instead of using a dense adjacency matrix, which can be inefficient for sparse graphs, PyG represents graph connectivity in the Coordinate (COO) format. edge_index is a tensor of shape [2, num_edges] with a torch.long dtype. The first row contains the source node indices for each edge, and the second row contains the corresponding target node indices. This representation is highly efficient for the sparse graphs commonly found in practice.data.edge_attr: Edge Features. In some graphs, the connections themselves have attributes. For example, in a molecular graph, edges might represent different types of chemical bonds. edge_attr is an optional tensor of shape [num_edges, num_edge_features] that stores these features. Its ordering must correspond to the ordering of edges in edge_index.data.y: Labels. This optional attribute stores the target labels for training your model. The shape of y depends on the task. For node-level tasks like node classification, it might have a shape of [num_nodes]. For graph-level tasks, it would typically have a shape of [1].Creating a Data ObjectLet's construct a Data object for a simple, directed graph to see these attributes in action. Consider a graph with four nodes and four directed edges forming a cycle.digraph G { rankdir=TB; graph [bgcolor="transparent"]; node [style=filled, shape=circle, fontname="sans-serif", margin=0.2, color="#4263eb", fillcolor="#bac8ff"]; edge [color="#868e96"]; 0 -> 1; 1 -> 2; 2 -> 3; 3 -> 0; }A directed graph with four nodes. The connectivity is defined by edges from node 0 to 1, 1 to 2, 2 to 3, and 3 back to 0.We can represent this graph in PyG as follows. Assume each node has a 2-dimensional feature vector, and we have labels for a node classification task.import torch from torch_geometric.data import Data # Define the graph connectivity (COO format) # Edges: 0->1, 1->2, 2->3, 3->0 edge_index = torch.tensor([ [0, 1, 2, 3], # Source nodes [1, 2, 3, 0] # Target nodes ], dtype=torch.long) # Define node features (4 nodes, 2 features each) x = torch.tensor([ [-1, 1], # Features for Node 0 [1, 1], # Features for Node 1 [1, -1], # Features for Node 2 [-1, -1] # Features for Node 3 ], dtype=torch.float) # Define node labels y = torch.tensor([0, 1, 0, 1], dtype=torch.long) # Create the Data object data = Data(x=x, edge_index=edge_index, y=y) print(data)Running this code will produce an output that neatly summarizes the graph's contents:Data(x=[4, 2], edge_index=[2, 4], y=[4])This summary tells us at a glance that our graph has 4 nodes, each with 2 features (x=[4, 2]), 4 edges (edge_index=[2, 4]), and 4 corresponding node labels (y=[4]).Handling Undirected GraphsA common question is how to represent an undirected graph, where an edge between nodes u and v implies a connection in both directions. In PyG, you must represent this explicitly. For every undirected edge, you add two entries to edge_index: one for u -> v and one for v -> u.For instance, if our example graph was undirected, the edge_index would be:# For an undirected graph undirected_edge_index = torch.tensor([ [0, 1, 1, 2, 2, 3, 3, 0], [1, 0, 2, 1, 3, 2, 0, 3] ], dtype=torch.long)This convention ensures that during the message passing process, information can flow in both directions between connected nodes.Useful Properties and MethodsThe Data object is more than just a passive container. It comes with several helpful properties and methods for inspecting the graph:data.num_nodes: Returns the number of nodes in the graph (inferred from x).data.num_edges: Returns the number of edges in the graph (inferred from edge_index).data.num_node_features: Returns the number of features per node.data.is_directed(): Checks if the graph is directed. It returns False if for every edge (u, v), an edge (v, u) also exists.data.is_undirected(): The opposite of is_directed().By encapsulating a graph's entire structure and features into a single, well-defined object, PyG provides a clean and efficient foundation for building models. Now that you understand how to represent a single graph, the next step is to see how PyG helps you work with entire collections of them using its Dataset class.