从Keras转向PyTorch时，您会发现神经网络 (neural network)的核心构成单元——层，都有直接的对应。虽然基本原理相似，但命名规则、参数 (parameter)设置和一些默认行为可能有所不同。这里将实现PyTorch中常见的层类型，如全连接层（Linear）、卷积层（Conv2D）和循环层（LSTM），并与Keras中对应的层进行比较。

全连接层：`tf.keras.layers.Dense` 与 `torch.nn.Linear`

最基本的层是全连接层，它执行线性变换 ( $y = Wx + b$ )。在Keras中，这是tf.keras.layers.Dense。在PyTorch中，它是torch.nn.Linear。

区别与相似之处：

输出单元： Keras使用units参数 (parameter)定义输出空间的维度。PyTorch使用out_features。
输入特征： PyTorch的torch.nn.Linear要求您指定in_features，即输入的维度。Keras通常在模型首次调用时（或为第一层提供input_shape时）从输入形状推断此值。
激活函数 (activation function)： 在Keras中，激活函数可以直接作为参数传递给Dense层（例如，activation='relu'）。在PyTorch中，激活函数通常作为单独的模块（例如，torch.nn.ReLU()）或在线性层之后使用torch.nn.functional中的函数应用。
偏置 (bias)： Keras使用use_bias=True（默认值）来包含偏置项。PyTorch使用bias=True（默认值）。

以下是常用参数的比较：

Keras (`tf.keras.layers.Dense`)	PyTorch (`torch.nn.Linear`)	说明
`units`	`out_features`	输出大小
(推断或 `input_shape`)	`in_features`	输入大小
`activation`	(单独应用)	激活函数
`use_bias`	`bias`	是否包含偏置项
`kernel_initializer`	(处理方式不同)	权重 (weight)初始化策略
`bias_initializer`	(处理方式不同)	偏置初始化策略

示例：

我们来创建一个全连接层，它接收64个输入特征并产生128个输出特征。

TensorFlow (Keras)：

import tensorflow as tf

# Keras 全连接层
keras_dense_layer = tf.keras.layers.Dense(units=128, input_shape=(64,), activation='relu')

# 使用虚拟数据的示例
dummy_input_keras = tf.random.normal(shape=(32, 64)) # 批次大小 32，64个特征
output_keras = keras_dense_layer(dummy_input_keras)
print("Keras 输出形状:", output_keras.shape)

PyTorch：

import torch
import torch.nn as nn

# PyTorch 线性层
pytorch_linear_layer = nn.Linear(in_features=64, out_features=128)
pytorch_relu = nn.ReLU()

# 使用虚拟数据的示例
dummy_input_pytorch = torch.randn(32, 64) # 批次大小 32，64个特征
linear_output_pytorch = pytorch_linear_layer(dummy_input_pytorch)
output_pytorch = pytorch_relu(linear_output_pytorch) # 单独应用激活函数
print("PyTorch 输出形状:", output_pytorch.shape)

在PyTorch示例中，nn.ReLU()被实例化为一个模块。您也可以使用torch.nn.functional.relu()。PyTorch中权重和偏置是自动初始化的，但您可以自定义，这在“权重初始化策略”一节中有所说明。

卷积层：`tf.keras.layers.Conv2D` 与 `torch.nn.Conv2d`

卷积层对计算机视觉任务很重要。Keras提供tf.keras.layers.Conv2D用于二维卷积，而PyTorch提供torch.nn.Conv2d。

区别与相似之处：

输出通道/过滤器： Keras使用filters来指定输出通道的数量（卷积深度）。PyTorch使用out_channels。
输入通道： PyTorch的torch.nn.Conv2d要求指定in_channels。Keras通常会推断此值。
核大小： 两者都使用kernel_size（可以是整数或用于非对称核的元组）。
步长： Keras使用strides（一个元组，例如，(1, 1)）。PyTorch使用stride（一个整数或一个元组）。
填充： Keras使用字符串：'valid'（无填充）或'same'（填充以保持输入空间维度）。PyTorch的padding参数 (parameter)可以接受整数（对称填充）、元组（用于每侧的显式填充）或字符串值'valid'或'same'（与Keras相似，但数值填充提供更多控制）。
数据格式： 这是一个重要区别。
- Keras (tf.keras.layers.Conv2D) 默认为'channels_last'数据格式，这意味着输入张量应为(batch_size, height, width, channels)形状。
- PyTorch (torch.nn.Conv2d) 要求'channels_first'数据格式：(batch_size, channels, height, width)。您需要确保输入数据遵循此格式。

以下是参数比较：

Keras (`tf.keras.layers.Conv2D`)	PyTorch (`torch.nn.Conv2d`)	说明
`filters`	`out_channels`	输出过滤器/通道的数量
(推断或 `input_shape`)	`in_channels`	输入通道的数量
`kernel_size`	`kernel_size`	卷积核大小
`strides`	`stride`	卷积步长
`padding`	`padding`	填充模式或数量
`data_format`	(隐含的 'channels_first')	张量数据格式
`activation`	(单独应用)	激活函数 (activation function)
`use_bias`	`bias`	是否包含偏置 (bias)项

示例：

一个二维卷积层，具有32个输出过滤器，3x3核，步长为1。假设输入图像是灰度图（1通道）。

TensorFlow (Keras)：

import tensorflow as tf

# Keras Conv2D 层
# 输入: (批次, 高度, 宽度, 通道) 例如 (N, 28, 28, 1)
keras_conv_layer = tf.keras.layers.Conv2D(filters=32, kernel_size=(3, 3), strides=(1, 1), padding='same', activation='relu', input_shape=(28, 28, 1))

# 使用示例
dummy_input_keras = tf.random.normal(shape=(32, 28, 28, 1)) # N, H, W, C
output_keras = keras_conv_layer(dummy_input_keras)
print("Keras Conv2D 输出形状:", output_keras.shape) # (32, 28, 28, 32) 因为使用了 'same' 填充

PyTorch：

import torch
import torch.nn as nn

# PyTorch Conv2d 层
# 输入: (批次, 通道, 高度, 宽度) 例如 (N, 1, 28, 28)
pytorch_conv_layer = nn.Conv2d(in_channels=1, out_channels=32, kernel_size=3, stride=1, padding=1) # 3x3 核的 padding=1 达到 "same" 效果
pytorch_relu = nn.ReLU()

# 使用示例
dummy_input_pytorch = torch.randn(32, 1, 28, 28) # N, C, H, W
conv_output_pytorch = pytorch_conv_layer(dummy_input_pytorch)
output_pytorch = pytorch_relu(conv_output_pytorch)
print("PyTorch Conv2d 输出形状:", output_pytorch.shape) # (32, 32, 28, 28)

关于 PyTorch Conv2d 填充的说明： 为了在PyTorch中实现与Keras相似的“same”填充行为（当stride=1时输出空间维度与输入匹配），如果您的kernel_size是 $k$ ，对于奇数核大小，通常设置 padding = (k-1) // 2。例如，对于kernel_size=3，padding=1。对于kernel_size=5，padding=2。PyTorch在近期版本中也接受字符串值padding='same'，这简化了设置。

循环层：`tf.keras.layers.LSTM` 与 `torch.nn.LSTM`

对于序列建模，长短期记忆（LSTM）网络是常用选择。Keras提供tf.keras.layers.LSTM，而PyTorch提供torch.nn.LSTM。

区别与相似之处：

隐藏单元： Keras使用units来表示隐藏状态的维度（如果return_sequences=True，也表示输出状态的维度）。PyTorch使用hidden_size。
输入特征： PyTorch的torch.nn.LSTM要求input_size，即每个时间步输入序列中的特征数量。
批次优先：
- Keras LSTM通常期望输入为(batch_size, timesteps, features)格式。
- PyTorch的torch.nn.LSTM默认为batch_first=False，这意味着它期望输入为(timesteps, batch_size, features)。您可以设置batch_first=True以使用更常用的(batch_size, timesteps, features)格式。这是开发者转换时常需要注意的地方。
返回值：
- Keras LSTM具有return_sequences（返回完整的输出序列）和return_state（返回最终的隐藏状态和单元状态）。
- PyTorch的torch.nn.LSTM的forward方法总是返回output, (h_n, c_n)。
  - output：包含最后一个LSTM层在每个时间步的输出特征。其形状取决于batch_first。如果batch_first=True，形状为(batch_size, seq_len, num_directions * hidden_size)。
  - h_n：包含批次中每个元素的最终隐藏状态。形状：(num_layers * num_directions, batch_size, hidden_size)。
  - c_n：包含批次中每个元素的最终单元状态。形状：(num_layers * num_directions, batch_size, hidden_size)。
层数： PyTorch的num_layers参数 (parameter)允许轻松堆叠LSTM。在Keras中，您需要顺序堆叠LSTM层。

示例：

一个LSTM层，具有128个隐藏单元，处理长度为10、每个时间步有20个特征的序列。

TensorFlow (Keras)：

import tensorflow as tf

# Keras LSTM 层
# 输入形状: (批次大小, 时间步, 特征)
keras_lstm_layer = tf.keras.layers.LSTM(units=128, return_sequences=True, input_shape=(10, 20))

# 使用示例
dummy_input_keras = tf.random.normal(shape=(32, 10, 20)) # 批次, 时间步, 特征
output_keras = keras_lstm_layer(dummy_input_keras)
print("Keras LSTM 输出形状 (序列):", output_keras.shape)

keras_lstm_layer_last_step = tf.keras.layers.LSTM(units=128, return_sequences=False)
output_keras_last = keras_lstm_layer_last_step(dummy_input_keras)
print("Keras LSTM 输出形状 (最后一步):", output_keras_last.shape)

PyTorch：

import torch
import torch.nn as nn

# PyTorch LSTM 层
# input_size: 每个时间步的特征
# hidden_size: LSTM 单元
pytorch_lstm_layer = nn.LSTM(input_size=20, hidden_size=128, num_layers=1, batch_first=True)

# 使用示例
dummy_input_pytorch = torch.randn(32, 10, 20) # 批次, 时间步, 特征 (因为 batch_first=True)
output_pytorch, (h_n, c_n) = pytorch_lstm_layer(dummy_input_pytorch)

print("PyTorch LSTM 完整输出形状:", output_pytorch.shape) # (batch_size, seq_len, hidden_size)
print("PyTorch LSTM 最终隐藏状态形状 (h_n):", h_n.shape) # (num_layers, batch_size, hidden_size)
print("PyTorch LSTM 最终单元状态形状 (c_n):", c_n.shape) # (num_layers, batch_size, hidden_size)

重要提示： 如果您的数据结构为(batch, sequence, feature)（这是常见情况），请记住PyTorch的nn.LSTM中的batch_first=True参数。如果没有它，PyTorch将期望(sequence, batch, feature)。h_n和c_n的形状是(num_layers * num_directions, batch, hidden_size)，因此对于单层、非双向LSTM，其形状为(1, batch, hidden_size)。如果您只需要(batch, hidden_size)，您可能需要对第一维进行squeeze()操作。

其他常用层

许多其他层都有直接的转换方式：

池化层：
- Keras: tf.keras.layers.MaxPool2D, tf.keras.layers.AvgPool2D
- PyTorch: torch.nn.MaxPool2d, torch.nn.AvgPool2d
- Keras的pool_size等参数 (parameter)对应于PyTorch的kernel_size。strides和padding的行为类似于卷积层。请记住PyTorch二维池化层的channels_first数据格式。
Dropout层：
- Keras: tf.keras.layers.Dropout(rate)
- PyTorch: torch.nn.Dropout(p)
- Keras中的rate和PyTorch中的p都表示训练期间元素被归零的概率。
展平层：
- Keras: tf.keras.layers.Flatten()
- PyTorch: torch.nn.Flatten(start_dim=1, end_dim=-1)
- PyTorch的Flatten更灵活；start_dim=1通常用于展平除批次维度外的所有维度，这类似于Keras的默认设置。
批标准化：
- Keras: tf.keras.layers.BatchNormalization(axis=-1, ...) (轴通常是通道)
- PyTorch: torch.nn.BatchNorm1d(num_features), torch.nn.BatchNorm2d(num_features), torch.nn.BatchNorm3d(num_features)
- 在PyTorch中，您根据输入维度选择BatchNorm变体。num_features对应于BatchNorm2d的通道数（如果数据是N, C, H, W）或BatchNorm1d的特征数（如果数据是N, L或N, C, L）。

通过了解这些对应关系并留意输入形状和参数名称等细节，您可以有效地将您的Keras层知识转换，以使用torch.nn构建模型。下一步是将这些层组装成完整的模型架构。

这部分内容有帮助吗？

参考文献

Keras layers | TensorFlow Core, TensorFlow Authors, 2023 - TensorFlow 中 Keras 层的官方文档，解释了各种神经网络构建块的 API 和用法。
Deep Learning, Ian Goodfellow, Yoshua Bengio, and Aaron Courville, 2016 (MIT Press) - 一本涵盖深度学习理论基础和实际考量的教科书，包括对全连接、卷积和循环神经网络的讨论。

常见层类型：对比实现

全连接层：`tf.keras.layers.Dense` 与 `torch.nn.Linear`

最基本的层是全连接层，它执行线性变换 ( $y = Wx + b$ )。在Keras中，这是tf.keras.layers.Dense。在PyTorch中，它是torch.nn.Linear。

区别与相似之处：

输出单元： Keras使用units参数 (parameter)定义输出空间的维度。PyTorch使用out_features。
输入特征： PyTorch的torch.nn.Linear要求您指定in_features，即输入的维度。Keras通常在模型首次调用时（或为第一层提供input_shape时）从输入形状推断此值。
激活函数 (activation function)： 在Keras中，激活函数可以直接作为参数传递给Dense层（例如，activation='relu'）。在PyTorch中，激活函数通常作为单独的模块（例如，torch.nn.ReLU()）或在线性层之后使用torch.nn.functional中的函数应用。
偏置 (bias)： Keras使用use_bias=True（默认值）来包含偏置项。PyTorch使用bias=True（默认值）。

以下是常用参数的比较：

Keras (`tf.keras.layers.Dense`)	PyTorch (`torch.nn.Linear`)	说明
`units`	`out_features`	输出大小
(推断或 `input_shape`)	`in_features`	输入大小
`activation`	(单独应用)	激活函数
`use_bias`	`bias`	是否包含偏置项
`kernel_initializer`	(处理方式不同)	权重 (weight)初始化策略
`bias_initializer`	(处理方式不同)	偏置初始化策略

示例：

我们来创建一个全连接层，它接收64个输入特征并产生128个输出特征。

TensorFlow (Keras)：

import tensorflow as tf

# Keras 全连接层
keras_dense_layer = tf.keras.layers.Dense(units=128, input_shape=(64,), activation='relu')

# 使用虚拟数据的示例
dummy_input_keras = tf.random.normal(shape=(32, 64)) # 批次大小 32，64个特征
output_keras = keras_dense_layer(dummy_input_keras)
print("Keras 输出形状:", output_keras.shape)

PyTorch：

import torch
import torch.nn as nn

# PyTorch 线性层
pytorch_linear_layer = nn.Linear(in_features=64, out_features=128)
pytorch_relu = nn.ReLU()

# 使用虚拟数据的示例
dummy_input_pytorch = torch.randn(32, 64) # 批次大小 32，64个特征
linear_output_pytorch = pytorch_linear_layer(dummy_input_pytorch)
output_pytorch = pytorch_relu(linear_output_pytorch) # 单独应用激活函数
print("PyTorch 输出形状:", output_pytorch.shape)

卷积层：`tf.keras.layers.Conv2D` 与 `torch.nn.Conv2d`

卷积层对计算机视觉任务很重要。Keras提供tf.keras.layers.Conv2D用于二维卷积，而PyTorch提供torch.nn.Conv2d。

区别与相似之处：

输出通道/过滤器： Keras使用filters来指定输出通道的数量（卷积深度）。PyTorch使用out_channels。
输入通道： PyTorch的torch.nn.Conv2d要求指定in_channels。Keras通常会推断此值。
核大小： 两者都使用kernel_size（可以是整数或用于非对称核的元组）。
步长： Keras使用strides（一个元组，例如，(1, 1)）。PyTorch使用stride（一个整数或一个元组）。
填充： Keras使用字符串：'valid'（无填充）或'same'（填充以保持输入空间维度）。PyTorch的padding参数 (parameter)可以接受整数（对称填充）、元组（用于每侧的显式填充）或字符串值'valid'或'same'（与Keras相似，但数值填充提供更多控制）。
数据格式： 这是一个重要区别。
- Keras (tf.keras.layers.Conv2D) 默认为'channels_last'数据格式，这意味着输入张量应为(batch_size, height, width, channels)形状。
- PyTorch (torch.nn.Conv2d) 要求'channels_first'数据格式：(batch_size, channels, height, width)。您需要确保输入数据遵循此格式。

以下是参数比较：

Keras (`tf.keras.layers.Conv2D`)	PyTorch (`torch.nn.Conv2d`)	说明
`filters`	`out_channels`	输出过滤器/通道的数量
(推断或 `input_shape`)	`in_channels`	输入通道的数量
`kernel_size`	`kernel_size`	卷积核大小
`strides`	`stride`	卷积步长
`padding`	`padding`	填充模式或数量
`data_format`	(隐含的 'channels_first')	张量数据格式
`activation`	(单独应用)	激活函数 (activation function)
`use_bias`	`bias`	是否包含偏置 (bias)项

示例：

一个二维卷积层，具有32个输出过滤器，3x3核，步长为1。假设输入图像是灰度图（1通道）。

TensorFlow (Keras)：

import tensorflow as tf

# Keras Conv2D 层
# 输入: (批次, 高度, 宽度, 通道) 例如 (N, 28, 28, 1)
keras_conv_layer = tf.keras.layers.Conv2D(filters=32, kernel_size=(3, 3), strides=(1, 1), padding='same', activation='relu', input_shape=(28, 28, 1))

# 使用示例
dummy_input_keras = tf.random.normal(shape=(32, 28, 28, 1)) # N, H, W, C
output_keras = keras_conv_layer(dummy_input_keras)
print("Keras Conv2D 输出形状:", output_keras.shape) # (32, 28, 28, 32) 因为使用了 'same' 填充

PyTorch：

import torch
import torch.nn as nn

# PyTorch Conv2d 层
# 输入: (批次, 通道, 高度, 宽度) 例如 (N, 1, 28, 28)
pytorch_conv_layer = nn.Conv2d(in_channels=1, out_channels=32, kernel_size=3, stride=1, padding=1) # 3x3 核的 padding=1 达到 "same" 效果
pytorch_relu = nn.ReLU()

# 使用示例
dummy_input_pytorch = torch.randn(32, 1, 28, 28) # N, C, H, W
conv_output_pytorch = pytorch_conv_layer(dummy_input_pytorch)
output_pytorch = pytorch_relu(conv_output_pytorch)
print("PyTorch Conv2d 输出形状:", output_pytorch.shape) # (32, 32, 28, 28)

关于 PyTorch Conv2d 填充的说明： 为了在PyTorch中实现与Keras相似的“same”填充行为（当stride=1时输出空间维度与输入匹配），如果您的kernel_size是 $k$ ，对于奇数核大小，通常设置 padding = (k-1) // 2。例如，对于kernel_size=3，padding=1。对于kernel_size=5，padding=2。PyTorch在近期版本中也接受字符串值padding='same'，这简化了设置。

循环层：`tf.keras.layers.LSTM` 与 `torch.nn.LSTM`

对于序列建模，长短期记忆（LSTM）网络是常用选择。Keras提供tf.keras.layers.LSTM，而PyTorch提供torch.nn.LSTM。

区别与相似之处：

隐藏单元： Keras使用units来表示隐藏状态的维度（如果return_sequences=True，也表示输出状态的维度）。PyTorch使用hidden_size。
输入特征： PyTorch的torch.nn.LSTM要求input_size，即每个时间步输入序列中的特征数量。
批次优先：
- Keras LSTM通常期望输入为(batch_size, timesteps, features)格式。
- PyTorch的torch.nn.LSTM默认为batch_first=False，这意味着它期望输入为(timesteps, batch_size, features)。您可以设置batch_first=True以使用更常用的(batch_size, timesteps, features)格式。这是开发者转换时常需要注意的地方。
返回值：
- Keras LSTM具有return_sequences（返回完整的输出序列）和return_state（返回最终的隐藏状态和单元状态）。
- PyTorch的torch.nn.LSTM的forward方法总是返回output, (h_n, c_n)。
  - output：包含最后一个LSTM层在每个时间步的输出特征。其形状取决于batch_first。如果batch_first=True，形状为(batch_size, seq_len, num_directions * hidden_size)。
  - h_n：包含批次中每个元素的最终隐藏状态。形状：(num_layers * num_directions, batch_size, hidden_size)。
  - c_n：包含批次中每个元素的最终单元状态。形状：(num_layers * num_directions, batch_size, hidden_size)。
层数： PyTorch的num_layers参数 (parameter)允许轻松堆叠LSTM。在Keras中，您需要顺序堆叠LSTM层。

示例：

一个LSTM层，具有128个隐藏单元，处理长度为10、每个时间步有20个特征的序列。

TensorFlow (Keras)：

import tensorflow as tf

# Keras LSTM 层
# 输入形状: (批次大小, 时间步, 特征)
keras_lstm_layer = tf.keras.layers.LSTM(units=128, return_sequences=True, input_shape=(10, 20))

# 使用示例
dummy_input_keras = tf.random.normal(shape=(32, 10, 20)) # 批次, 时间步, 特征
output_keras = keras_lstm_layer(dummy_input_keras)
print("Keras LSTM 输出形状 (序列):", output_keras.shape)

keras_lstm_layer_last_step = tf.keras.layers.LSTM(units=128, return_sequences=False)
output_keras_last = keras_lstm_layer_last_step(dummy_input_keras)
print("Keras LSTM 输出形状 (最后一步):", output_keras_last.shape)

PyTorch：

import torch
import torch.nn as nn

# PyTorch LSTM 层
# input_size: 每个时间步的特征
# hidden_size: LSTM 单元
pytorch_lstm_layer = nn.LSTM(input_size=20, hidden_size=128, num_layers=1, batch_first=True)

# 使用示例
dummy_input_pytorch = torch.randn(32, 10, 20) # 批次, 时间步, 特征 (因为 batch_first=True)
output_pytorch, (h_n, c_n) = pytorch_lstm_layer(dummy_input_pytorch)

print("PyTorch LSTM 完整输出形状:", output_pytorch.shape) # (batch_size, seq_len, hidden_size)
print("PyTorch LSTM 最终隐藏状态形状 (h_n):", h_n.shape) # (num_layers, batch_size, hidden_size)
print("PyTorch LSTM 最终单元状态形状 (c_n):", c_n.shape) # (num_layers, batch_size, hidden_size)

重要提示： 如果您的数据结构为(batch, sequence, feature)（这是常见情况），请记住PyTorch的nn.LSTM中的batch_first=True参数。如果没有它，PyTorch将期望(sequence, batch, feature)。h_n和c_n的形状是(num_layers * num_directions, batch, hidden_size)，因此对于单层、非双向LSTM，其形状为(1, batch, hidden_size)。如果您只需要(batch, hidden_size)，您可能需要对第一维进行squeeze()操作。

其他常用层

许多其他层都有直接的转换方式：

池化层：
- Keras: tf.keras.layers.MaxPool2D, tf.keras.layers.AvgPool2D
- PyTorch: torch.nn.MaxPool2d, torch.nn.AvgPool2d
- Keras的pool_size等参数 (parameter)对应于PyTorch的kernel_size。strides和padding的行为类似于卷积层。请记住PyTorch二维池化层的channels_first数据格式。
Dropout层：
- Keras: tf.keras.layers.Dropout(rate)
- PyTorch: torch.nn.Dropout(p)
- Keras中的rate和PyTorch中的p都表示训练期间元素被归零的概率。
展平层：
- Keras: tf.keras.layers.Flatten()
- PyTorch: torch.nn.Flatten(start_dim=1, end_dim=-1)
- PyTorch的Flatten更灵活；start_dim=1通常用于展平除批次维度外的所有维度，这类似于Keras的默认设置。
批标准化：
- Keras: tf.keras.layers.BatchNormalization(axis=-1, ...) (轴通常是通道)
- PyTorch: torch.nn.BatchNorm1d(num_features), torch.nn.BatchNorm2d(num_features), torch.nn.BatchNorm3d(num_features)
- 在PyTorch中，您根据输入维度选择BatchNorm变体。num_features对应于BatchNorm2d的通道数（如果数据是N, C, H, W）或BatchNorm1d的特征数（如果数据是N, L或N, C, L）。

这部分内容有帮助吗？

参考文献

Keras layers | TensorFlow Core, TensorFlow Authors, 2023 - TensorFlow 中 Keras 层的官方文档，解释了各种神经网络构建块的 API 和用法。
Deep Learning, Ian Goodfellow, Yoshua Bengio, and Aaron Courville, 2016 (MIT Press) - 一本涵盖深度学习理论基础和实际考量的教科书，包括对全连接、卷积和循环神经网络的讨论。

常见层类型：对比实现

全连接层：tf.keras.layers.Dense 与 torch.nn.Linear

卷积层：tf.keras.layers.Conv2D 与 torch.nn.Conv2d

循环层：tf.keras.layers.LSTM 与 torch.nn.LSTM

其他常用层

常见层类型：对比实现

全连接层：tf.keras.layers.Dense 与 torch.nn.Linear

卷积层：tf.keras.layers.Conv2D 与 torch.nn.Conv2d

循环层：tf.keras.layers.LSTM 与 torch.nn.LSTM

其他常用层

全连接层：`tf.keras.layers.Dense` 与 `torch.nn.Linear`

卷积层：`tf.keras.layers.Conv2D` 与 `torch.nn.Conv2d`

循环层：`tf.keras.layers.LSTM` 与 `torch.nn.LSTM`

全连接层：`tf.keras.layers.Dense` 与 `torch.nn.Linear`

卷积层：`tf.keras.layers.Conv2D` 与 `torch.nn.Conv2d`

循环层：`tf.keras.layers.LSTM` 与 `torch.nn.LSTM`