Practical exercises demonstrate the application of unsupervised learning techniques. You'll use scikit-learn to implement K-Means and DBSCAN for clustering, and a common technique for anomaly detection on a generated dataset. Hands-on experience will help solidify your understanding of how these algorithms work and how to interpret their results.Let's start by generating a synthetic dataset. We'll use make_blobs from scikit-learn to create distinct groups of points, and then add some randomly scattered points that can be considered outliers or noise.import numpy as np import pandas as pd from sklearn.datasets import make_blobs from sklearn.preprocessing import StandardScaler import plotly.express as px # Generate sample data X, y_true = make_blobs(n_samples=400, centers=4, cluster_std=0.80, random_state=42) # Add some noise points far from the clusters rng = np.random.RandomState(42) n_outliers = 30 outliers = rng.uniform(low=np.min(X) - 5, high=np.max(X) + 5, size=(n_outliers, 2)) X = np.vstack([X, outliers]) # Standardize the features for algorithms sensitive to scale (like K-Means and DBSCAN) scaler = StandardScaler() X_scaled = scaler.fit_transform(X) # Create a DataFrame for easier visualization df = pd.DataFrame(X_scaled, columns=['Feature 1', 'Feature 2']) # Initial visualization of the data fig_initial = px.scatter(df, x='Feature 1', y='Feature 2', title='Synthetic Dataset with Potential Outliers', color_discrete_sequence=['#495057']) # Use gray color for unclustered points fig_initial.update_layout(showlegend=False) # fig_initial.show() # Display the plot in a Python environment{"layout": {"title": {"text": "Synthetic Dataset with Potential Outliers"}, "xaxis": {"title": {"text": "Feature 1"}}, "yaxis": {"title": {"text": "Feature 2"}}, "colorway": ["#495057"], "showlegend": false, "template": "plotly_white"}, "data": [{"type": "scatter", "x": [-0.1407, -1.0994, 1.0277, -1.0999, 0.5868, -0.9657, 1.3333, 0.9139, -0.5425, 1.0503, -0.9569, -0.8658, 1.4888, -0.5704, 1.2312, 0.5124, 0.7864, -1.294, -0.9789, -1.2406, -1.2164, 1.1359, 0.7019, 1.6568, -1.101, -0.2347, 1.0796, -0.1379, 0.3974, -1.0173, -1.0665, 0.871, 0.3669, -0.472, -1.1459, 0.9665, 0.9813, -1.0097, -0.7846, 0.5722, -1.3466, -1.0766, 0.9912, -0.5746, -1.184, -1.4592, 1.4559, 0.8112, 0.9221, 0.6401, 1.106, -0.853, 0.5223, -0.196, -1.0974, 1.1218, 0.8076, -1.3195, -0.4492, -1.4464, 1.0315, 1.2396, 0.9347, -1.0508, -1.2227, -0.4981, -1.2317, -0.178, -0.4956, -1.2049, 0.653, 1.3812, -1.2812, -0.9394, -0.3137, -0.026, 0.4793, -1.1917, 0.579, 0.5904, -1.4075, 1.2042, 1.3355, -1.4301, 1.1796, 0.9514, 0.9065, 1.0447, -0.9378, 1.2249, 0.8301, 0.3932, -0.1822, -1.1175, 1.1579, 0.8179, -0.2471, 0.954, 1.2973, -0.1876, -0.7474, 1.2337, 0.6645, 0.5167, 0.343, -0.4198, -1.3238, -0.4518, -0.8374, 0.8626, 1.2979, -0.9468, -0.966, 0.6049, -0.297, 0.7823, 0.8005, 1.2691, 1.2789, -0.4227, -1.1512, 1.1617, -1.0609, -0.4972, -1.0459, -1.1928, -1.2507, -0.404, -1.1426, 0.7135, 1.038, 1.1142, 0.8662, 0.953, 0.875, -1.1291, 0.8807, -0.1804, 0.6456, -0.4304, 1.0115, 1.2328, 0.9183, -1.0549, -1.0544, 0.8762, 0.9617, 1.412, 0.9029, 1.3752, 1.0726, -1.1757, -1.2255, 0.6447, -1.0514, -0.8675, 0.724, -1.0342, -1.2793, -1.3291, -1.0629, -1.0499, 0.7652, 1.1709, 0.817, 0.8323, -0.627, -1.018, 1.0299, 1.2619, 0.9741, -1.1077, -0.9245, 1.3683, 0.8466, -0.3231, 0.5547, -0.2757, -0.786, 0.8479, -1.2604, 1.1134, 1.3738, -1.2446, 1.2693, 0.7628, -0.413, -0.3673, 1.1481, -0.5984, -0.857, 1.2557, -1.3063, -1.4615, -1.1838, 1.0693, -1.1197, 0.9167, -0.3066, 1.1313, 0.7927, -0.9761, 0.5715, -1.0618, -0.8806, -0.5839, 0.9319, 1.003, 0.8129, 1.068, 1.0767, 0.6772, -0.8908, -0.5365, -1.022, 1.1591, 0.8086, 1.3369, -0.9094, -1.1797, 0.9738, 1.099, -1.1366, 0.9422, 1.1731, 1.0436, 1.272, -1.2634, -0.5254, -0.5075, -0.8034, -0.9109, 1.0501, 0.9625, -1.0425, -0.8976, -1.2641, 1.2188, 1.0554, -0.9723, -1.2454, 0.8943, -1.3716, 0.9681, -0.963, 0.8296, -1.3439, -0.5657, -0.5068, -0.3491, 0.6947, -0.5227, -0.9843, 0.5515, 0.9467, 0.9095, 1.0409, -0.7745, 1.2774, -0.4679, 1.0725, -0.5888, -1.1504, 0.9305, -1.2354, 0.5948, -1.1913, 1.2305, 1.4638, -1.1734, 0.7548, -1.3545, 1.225, 1.0042, 1.2468, -1.2765, -0.9975, 1.1996, -0.3678, 1.0236, -1.1804, -1.1794, 1.0052, -0.8974, -1.0682, -1.2588, -0.5266, -0.3957, 0.9817, 1.0017, -0.5352, -0.4289, 1.2081, 1.1471, -1.0053, 1.1611, 1.0302, 0.871, -1.1139, 0.7779, -1.0842, -1.1987, -0.3755, -0.4483, -1.2899, -0.5117, 0.7213, -0.2189, 0.9803, -1.241, -1.0553, -0.7993, 1.058, 0.812, 0.737, 1.0485, 0.8861, 1.1864, 0.7884, -1.1194, -1.156, -0.5524, -1.223, -1.2573, 0.6275, -0.903, 0.8745, -0.8966, 1.3016, -0.9358, 0.7795, 0.559, 0.6593, 0.9051, -1.365, 0.8671, -0.3485, 1.1828, 0.9904, -0.4873, -0.4018, 1.0078, -0.3441, -0.8616, 0.5845, -1.0916, -1.068, 0.8667, -0.3337, 0.6409, 0.9494, 0.9515, 1.3517, -0.5303, 1.1751, -0.5874, -1.0161, -1.1064, -0.4131, 0.691, 0.6861, 0.9201, -0.8531, -1.1725, -1.0665, 0.7191, -1.3018, -1.1861, -0.563, 1.2947, 0.9699, 1.1703, 1.3062, -1.2465, 0.8114, -1.1754, 1.177, -0.9338, 0.6174, -1.1542, -1.1445, 1.0532, -1.2332, 1.1821, 1.1137, 0.9939, -0.985, 1.0243, -0.4958, -0.3992, 0.9532, 1.1678, 0.6459, 1.2372, 1.1185, 1.3594, -0.4372, 0.8968, 0.6877, -1.1731, -1.1971, 0.9246, -1.3562, -1.0278, 0.8815, 1.7774, 0.201, 0.8197, -1.6969, -0.5767, 1.2943, 1.9177, -2.7636, -0.8424, -0.1145, -1.7681, -2.7093, 1.9856, -0.8059, 1.6636, 1.3533, -0.3615, 2.6857, -1.8161, 2.4043, 0.0361, 2.8059, -1.7882, -1.0587, -2.059, 1.6951, -0.7022, 2.4761, 0.3833, 2.0588], "y": [-0.8666, -1.3076, 1.1466, 0.4787, -0.8475, -1.2978, 0.6605, 0.8718, 0.714, 0.4256, -0.8196, -1.2232, 0.4811, 0.953, 0.4931, -1.3832, 0.7797, -1.331, 0.4578, -0.9236, -1.1063, 1.2174, 0.9948, 1.2829, 0.4216, 0.6066, 0.4783, -0.7977, -0.9061, -1.0119, 0.6711, 1.0054, 0.1427, 1.0733, -0.9633, 0.7437, 0.491, -0.9161, -1.2534, -1.1416, -1.1548, -1.1194, 0.8288, 0.7664, 0.5591, 0.3396, 0.8146, 0.9518, 0.7766, -0.8617, 0.9247, -0.8838, -1.2205, 0.9336, 0.3487, 0.9102, 0.5091, -1.0738, 1.0237, -0.9905, 1.0576, 0.6369, 0.6352, 0.4794, 0.5595, 0.8968, -0.7779, -0.9835, 1.2515, 0.5617, -0.9929, 0.9237, -1.1041, -0.8775, 0.8621, -0.6849, 0.6546, -1.0216, -1.1289, -0.9066, 0.2926, 0.9491, -0.9869, 0.9693, 0.6341, 0.6737, 1.053, 0.6279, -1.1446, 0.6207, 0.7901, -1.227, 0.7906, -1.0079, 0.8611, 0.9608, 0.8524, 0.6033, 0.9035, 0.6524, -1.1019, 0.588, 0.662, -1.1907, -0.9544, 0.9727, -1.2646, 1.1534, 0.8496, 0.4639, -0.838, 1.0923, 0.823, -0.9584, -0.8054, -1.0062, -1.0041, 0.7601, 1.0145, 1.1052, -1.1101, 0.8728, 0.3247, -1.106, 0.5216, 0.8813, 0.8996, 0.8193, 0.457, -1.0145, 0.4635, 0.6285, 0.4733, 0.9403, 0.6084, 1.1098, -0.9653, -1.0408, 1.0144, 0.6763, 0.5978, -1.2139, 0.5135, -0.8991, -0.9251, -0.9663, -1.0803, 1.0567, 0.7046, 0.658, 1.1024, 0.9846, 0.5846, 0.4874, 0.5805, 0.6397, 0.6516, 0.8132, -0.9473, -0.784, -0.8091, 0.6908, 0.5801, -1.2028, -0.8661, -1.2063, 0.7175, 0.9038, -1.1497, 0.5951, -0.9887, 0.6929, -0.975, -0.8862, -0.911, 0.7555, 0.7327, 0.5639, 0.8366, -0.7963, -0.9452, 0.8443, 0.592, 0.6781, 0.5405, 0.952, -1.0689, 0.7107, -0.9613, 0.844, -1.0717, 0.8914, 1.1217, -0.8121, 0.8773, -0.8166, 0.836, 0.9099, 0.8133, -1.1503, 0.9635, 0.6772, -1.0588, 0.8235, -0.9308, -0.745, 1.0284, -1.2506, 0.4704, -0.974, 1.0008, 1.0714, 1.0788, 1.2273, 0.8939, -1.0659, 0.6827, 1.0467, 0.797, 0.6862, 0.7885, -0.8131, 0.6264, 0.6088, -1.0091, -1.2596, 0.536, 0.7983, -1.1136, 0.826, 0.7737, 0.9269, 0.5285, -1.0536, -1.069, -1.1664, 1.1098, 1.0448, -0.9114, 0.5947, -1.1356, -0.9871, -0.8602, -1.2562, -1.0156, -0.9124, 0.8714, 0.6371, -0.8954, 0.6962, 0.8474, -1.0044, -1.1812, -0.8671, -1.2392, 0.784, 1.0565, 0.9457, 1.2709, -1.0251, 0.534, -0.8715, 0.9925, 1.2658, -1.152, 0.7166, 0.4394, -1.0566, -1.1288, 0.6154, 0.6149, -0.9808, -1.0642, -1.1407, 0.6211, 0.5483, 0.752, -1.0223, 0.5054, 0.7415, -0.8241, -1.1274, 0.9468, 1.0843, 0.906, 0.9415, 0.4671, 0.6138, 0.5739, -1.1624, -0.8083, 1.0958, 0.7062, 0.8619, 0.7824, -1.1197, 1.0712, -0.8046, 0.4209, -1.063, -1.0467, 1.0466, 0.5681, 0.8338, 0.8469, 1.0606, -0.8658, 0.5806, -0.8557, -1.1336, 1.1908, 0.6739, 0.8456, 1.0095, -1.2388, 1.0619, 0.968, 0.9672, -1.0416, -1.1241, 0.7364, -0.9385, 0.759, -1.1857, 0.9836, 0.7427, -0.8048, -1.1141, 0.6893, -1.0811, 0.4271, 0.9466, -1.1202, 0.5478, 0.8205, 0.641, 0.8387, -0.8243, 0.5748, 0.6667, 0.8304, 0.8808, -1.0283, 1.0995, -0.9389, 0.8828, -0.8715, -1.0821, 0.494, 0.674, 0.8062, 1.0601, 0.9894, 0.9091, 0.6143, 0.6635, 1.0328, 0.9499, 0.6412, -1.0721, 0.8737, -1.0907, 0.8763, 0.8912, -1.0699, -1.0579, 0.5113, 0.6379, -1.1604, 0.5059, 0.9606, 1.0291, 0.7514, 0.8672, 0.7508, 0.9982, -0.9846, -1.0452, 0.938, -1.0694, 0.4534, 0.6177, 0.987, 0.6557, 0.4082, -0.7897, 0.8985, 0.8507, 0.6348, 1.2741, -1.0649, -0.7699, 1.1993, -0.9392, -1.3995, -1.1145, 0.5012, -0.2276, -0.8575, 1.863, -0.7261, 1.1205, -1.1653, 2.6689, -1.042, 0.3852, -0.436, -1.5648, 0.4107, 0.3205, -2.7582, 1.776, -0.5487, 2.7825, 0.9804, 0.0688, -1.9431, -2.5416, 0.1559], "mode": "markers", "marker": {"color": "#495057"}}]}Initial scatter plot of the generated dataset features after scaling. The distinct groups are visible, along with some scattered points.Applying K-Means ClusteringK-Means aims to partition the data into $k$ distinct, non-overlapping clusters. Each data point belongs to the cluster with the nearest mean (cluster centroid). We need to specify the number of clusters, $k$. Based on the visualization (and the way we generated the data), $k=4$ seems like a reasonable starting point.from sklearn.cluster import KMeans # Instantiate and fit K-Means kmeans = KMeans(n_clusters=4, random_state=42, n_init=10) # n_init='auto' or 10 for future versions kmeans.fit(X_scaled) # Get cluster assignments and centroids df['KMeans Cluster'] = kmeans.labels_.astype(str) # Convert to string for discrete colors centroids = scaler.inverse_transform(kmeans.cluster_centers_) # Transform centroids back to original scale # Visualize K-Means results fig_kmeans = px.scatter(df, x='Feature 1', y='Feature 2', color='KMeans Cluster', title='K-Means Clustering Results (k=4)', color_discrete_sequence=px.colors.qualitative.Pastel) # Use a nice color sequence # Add centroids to the plot (transformed back to scaled coordinates for plotting) fig_kmeans.add_scatter(x=kmeans.cluster_centers_[:, 0], y=kmeans.cluster_centers_[:, 1], mode='markers', marker=dict(color='#d6336c', size=12, symbol='x'), name='Centroids') # fig_kmeans.show(){"layout": {"title": {"text": "K-Means Clustering Results (k=4)"}, "xaxis": {"title": {"text": "Feature 1"}}, "yaxis": {"title": {"text": "Feature 2"}}, "coloraxis": {"colorbar": {"title": {"text": "KMeans Cluster"}}}, "color_discrete_sequence": ["#AEC7E8", "#FFBB78", "#98DF8A", "#FF9896"], "legend": {"traceorder": "reversed"}, "template": "plotly_white"}, "data": [{"type": "scatter", "x": [-0.1407, -1.0994, 1.0277, -1.0999, 0.5868, -0.9657, 1.3333, 0.9139, -0.5425, 1.0503, -0.9569, -0.8658, 1.4888, -0.5704, 1.2312, 0.5124, 0.7864, -1.294, -0.9789, -1.2406, -1.2164, 1.1359, 0.7019, 1.6568, -1.101, -0.2347, 1.0796, -0.1379, 0.3974, -1.0173, -1.0665, 0.871, 0.3669, -0.472, -1.1459, 0.9665, 0.9813, -1.0097, -0.7846, 0.5722, -1.3466, -1.0766, 0.9912, -0.5746, -1.184, -1.4592, 1.4559, 0.8112, 0.9221, 0.6401, 1.106, -0.853, 0.5223, -0.196, -1.0974, 1.1218, 0.8076, -1.3195, -0.4492, -1.4464, 1.0315, 1.2396, 0.9347, -1.0508, -1.2227, -0.4981, -1.2317, -0.178, -0.4956, -1.2049, 0.653, 1.3812, -1.2812, -0.9394, -0.3137, -0.026, 0.4793, -1.1917, 0.579, 0.5904, -1.4075, 1.2042, 1.3355, -1.4301, 1.1796, 0.9514, 0.9065, 1.0447, -0.9378, 1.2249, 0.8301, 0.3932, -0.1822, -1.1175, 1.1579, 0.8179, -0.2471, 0.954, 1.2973, -0.1876, -0.7474, 1.2337, 0.6645, 0.5167, 0.343, -0.4198, -1.3238, -0.4518, -0.8374, 0.8626, 1.2979, -0.9468, -0.966, 0.6049, -0.297, 0.7823, 0.8005, 1.2691, 1.2789, -0.4227, -1.1512, 1.1617, -1.0609, -0.4972, -1.0459, -1.1928, -1.2507, -0.404, -1.1426, 0.7135, 1.038, 1.1142, 0.8662, 0.953, 0.875, -1.1291, 0.8807, -0.1804, 0.6456, -0.4304, 1.0115, 1.2328, 0.9183, -1.0549, -1.0544, 0.8762, 0.9617, 1.412, 0.9029, 1.3752, 1.0726, -1.1757, -1.2255, 0.6447, -1.0514, -0.8675, 0.724, -1.0342, -1.2793, -1.3291, -1.0629, -1.0499, 0.7652, 1.1709, 0.817, 0.8323, -0.627, -1.018, 1.0299, 1.2619, 0.9741, -1.1077, -0.9245, 1.3683, 0.8466, -0.3231, 0.5547, -0.2757, -0.786, 0.8479, -1.2604, 1.1134, 1.3738, -1.2446, 1.2693, 0.7628, -0.413, -0.3673, 1.1481, -0.5984, -0.857, 1.2557, -1.3063, -1.4615, -1.1838, 1.0693, -1.1197, 0.9167, -0.3066, 1.1313, 0.7927, -0.9761, 0.5715, -1.0618, -0.8806, -0.5839, 0.9319, 1.003, 0.8129, 1.068, 1.0767, 0.6772, -0.8908, -0.5365, -1.022, 1.1591, 0.8086, 1.3369, -0.9094, -1.1797, 0.9738, 1.099, -1.1366, 0.9422, 1.1731, 1.0436, 1.272, -1.2634, -0.5254, -0.5075, -0.8034, -0.9109, 1.0501, 0.9625, -1.0425, -0.8976, -1.2641, 1.2188, 1.0554, -0.9723, -1.2454, 0.8943, -1.3716, 0.9681, -0.963, 0.8296, -1.3439, -0.5657, -0.5068, -0.3491, 0.6947, -0.5227, -0.9843, 0.5515, 0.9467, 0.9095, 1.0409, -0.7745, 1.2774, -0.4679, 1.0725, -0.5888, -1.1504, 0.9305, -1.2354, 0.5948, -1.1913, 1.2305, 1.4638, -1.1734, 0.7548, -1.3545, 1.225, 1.0042, 1.2468, -1.2765, -0.9975, 1.1996, -0.3678, 1.0236, -1.1804, -1.1794, 1.0052, -0.8974, -1.0682, -1.2588, -0.5266, -0.3957, 0.9817, 1.0017, -0.5352, -0.4289, 1.2081, 1.1471, -1.0053, 1.1611, 1.0302, 0.871, -1.1139, 0.7779, -1.0842, -1.1987, -0.3755, -0.4483, -1.2899, -0.5117, 0.7213, -0.2189, 0.9803, -1.241, -1.0553, -0.7993, 1.058, 0.812, 0.737, 1.0485, 0.8861, 1.1864, 0.7884, -1.1194, -1.156, -0.5524, -1.223, -1.2573, 0.6275, -0.903, 0.8745, -0.8966, 1.3016, -0.9358, 0.7795, 0.559, 0.6593, 0.9051, -1.365, 0.8671, -0.3485, 1.1828, 0.9904, -0.4873, -0.4018, 1.0078, -0.3441, -0.8616, 0.5845, -1.0916, -1.068, 0.8667, -0.3337, 0.6409, 0.9494, 0.9515, 1.3517, -0.5303, 1.1751, -0.5874, -1.0161, -1.1064, -0.4131, 0.691, 0.6861, 0.9201, -0.8531, -1.1725, -1.0665, 0.7191, -1.3018, -1.1861, -0.563, 1.2947, 0.9699, 1.1703, 1.3062, -1.2465, 0.8114, -1.1754, 1.177, -0.9338, 0.6174, -1.1542, -1.1445, 1.0532, -1.2332, 1.1821, 1.1137, 0.9939, -0.985, 1.0243, -0.4958, -0.3992, 0.9532, 1.1678, 0.6459, 1.2372, 1.1185, 1.3594, -0.4372, 0.8968, 0.6877, -1.1731, -1.1971, 0.9246, -1.3562, -1.0278, 0.8815, 1.7774, 0.201, 0.8197, -1.6969, -0.5767, 1.2943, 1.9177, -2.7636, -0.8424, -0.1145, -1.7681, -2.7093, 1.9856, -0.8059, 1.6636, 1.3533, -0.3615, 2.6857, -1.8161, 2.4043, 0.0361, 2.8059, -1.7882, -1.0587, -2.059, 1.6951, -0.7022, 2.4761, 0.3833, 2.0588], "y": [-0.8666, -1.3076, 1.1466, 0.4787, -0.8475, -1.2978, 0.6605, 0.8718, 0.714, 0.4256, -0.8196, -1.2232, 0.4811, 0.953, 0.4931, -1.3832, 0.7797, -1.331, 0.4578, -0.9236, -1.1063, 1.2174, 0.9948, 1.2829, 0.4216, 0.6066, 0.4783, -0.7977, -0.9061, -1.0119, 0.6711, 1.0054, 0.1427, 1.0733, -0.9633, 0.7437, 0.491, -0.9161, -1.2534, -1.1416, -1.1548, -1.1194, 0.8288, 0.7664, 0.5591, 0.3396, 0.8146, 0.9518, 0.7766, -0.8617, 0.9247, -0.8838, -1.2205, 0.9336, 0.3487, 0.9102, 0.5091, -1.0738, 1.0237, -0.9905, 1.0576, 0.6369, 0.6352, 0.4794, 0.5595, 0.8968, -0.7779, -0.9835, 1.2515, 0.5617, -0.9929, 0.9237, -1.1041, -0.8775, 0.8621, -0.6849, 0.6546, -1.0216, -1.1289, -0.9066, 0.2926, 0.9491, -0.9869, 0.9693, 0.6341, 0.6737, 1.053, 0.6279, -1.1446, 0.6207, 0.7901, -1.227, 0.7906, -1.0079, 0.8611, 0.9608, 0.8524, 0.6033, 0.9035, 0.6524, -1.1019, 0.588, 0.662, -1.1907, -0.9544, 0.9727, -1.2646, 1.1534, 0.8496, 0.4639, -0.838, 1.0923, 0.823, -0.9584, -0.8054, -1.0062, -1.0041, 0.7601, 1.0145, 1.1052, -1.1101, 0.8728, 0.3247, -1.106, 0.5216, 0.8813, 0.8996, 0.8193, 0.457, -1.0145, 0.4635, 0.6285, 0.4733, 0.9403, 0.6084, 1.1098, -0.9653, -1.0408, 1.0144, 0.6763, 0.5978, -1.2139, 0.5135, -0.8991, -0.9251, -0.9663, -1.0803, 1.0567, 0.7046, 0.658, 1.1024, 0.9846, 0.5846, 0.4874, 0.5805, 0.6397, 0.6516, 0.8132, -0.9473, -0.784, -0.8091, 0.6908, 0.5801, -1.2028, -0.8661, -1.2063, 0.7175, 0.9038, -1.1497, 0.5951, -0.9887, 0.6929, -0.975, -0.8862, -0.911, 0.7555, 0.7327, 0.5639, 0.8366, -0.7963, -0.9452, 0.8443, 0.592, 0.6781, 0.5405, 0.952, -1.0689, 0.7107, -0.9613, 0.844, -1.0717, 0.8914, 1.1217, -0.8121, 0.8773, -0.8166, 0.836, 0.9099, 0.8133, -1.1503, 0.9635, 0.6772, -1.0588, 0.8235, -0.9308, -0.745, 1.0284, -1.2506, 0.4704, -0.974, 1.0008, 1.0714, 1.0788, 1.2273, 0.8939, -1.0659, 0.6827, 1.0467, 0.797, 0.6862, 0.7885, -0.8131, 0.6264, 0.6088, -1.0091, -1.2596, 0.536, 0.7983, -1.1136, 0.826, 0.7737, 0.9269, 0.5285, -1.0536, -1.069, -1.1664, 1.1098, 1.0448, -0.9114, 0.5947, -1.1356, -0.9871, -0.8602, -1.2562, -1.0156, -0.9124, 0.8714, 0.6371, -0.8954, 0.6962, 0.8474, -1.0044, -1.1812, -0.8671, -1.2392, 0.784, 1.0565, 0.9457, 1.2709, -1.0251, 0.534, -0.8715, 0.9925, 1.2658, -1.152, 0.7166, 0.4394, -1.0566, -1.1288, 0.6154, 0.6149, -0.9808, -1.0642, -1.1407, 0.6211, 0.5483, 0.752, -1.0223, 0.5054, 0.7415, -0.8241, -1.1274, 0.9468, 1.0843, 0.906, 0.9415, 0.4671, 0.6138, 0.5739, -1.1624, -0.8083, 1.0958, 0.7062, 0.8619, 0.7824, -1.1197, 1.0712, -0.8046, 0.4209, -1.063, -1.0467, 1.0466, 0.5681, 0.8338, 0.8469, 1.0606, -0.8658, 0.5806, -0.8557, -1.1336, 1.1908, 0.6739, 0.8456, 1.0095, -1.2388, 1.0619, 0.968, 0.9672, -1.0416, -1.1241, 0.7364, -0.9385, 0.759, -1.1857, 0.9836, 0.7427, -0.8048, -1.1141, 0.6893, -1.0811, 0.4271, 0.9466, -1.1202, 0.5478, 0.8205, 0.641, 0.8387, -0.8243, 0.5748, 0.6667, 0.8304, 0.8808, -1.0283, 1.0995, -0.9389, 0.8828, -0.8715, -1.0821, 0.494, 0.674, 0.8062, 1.0601, 0.9894, 0.9091, 0.6143, 0.6635, 1.0328, 0.9499, 0.6412, -1.0721, 0.8737, -1.0907, 0.8763, 0.8912, -1.0699, -1.0579, 0.5113, 0.6379, -1.1604, 0.5059, 0.9606, 1.0291, 0.7514, 0.8672, 0.7508, 0.9982, -0.9846, -1.0452, 0.938, -1.0694, 0.4534, 0.6177, 0.987, 0.6557, 0.4082, -0.7897, 0.8985, 0.8507, 0.6348, 1.2741, -1.0649, -0.7699, 1.1993, -0.9392, -1.3995, -1.1145, 0.5012, -0.2276, -0.8575, 1.863, -0.7261, 1.1205, -1.1653, 2.6689, -1.042, 0.3852, -0.436, -1.5648, 0.4107, 0.3205, -2.7582, 1.776, -0.5487, 2.7825, 0.9804, 0.0688, -1.9431, -2.5416, 0.1559], "marker": {"color": ["1", "0", "2", "1", "1", "0", "2", "2", "1", "2", "0", "0", "2", "1", "2", "1", "2", "0", "1", "0", "0", "2", "2", "2", "1", "1", "2", "1", "1", "0", "1", "2", "1", "1", "0", "2", "2", "0", "0", "1", "0", "0", "2", "1", "1", "0", "2", "2", "2", "1", "2", "0", "1", "1", "1", "2", "2", "0", "1", "0", "2", "2", "2", "1", "0", "1", "0", "1", "1", "0", "2", "2", "0", "0", "1", "1", "1", "0", "1", "1", "0", "2", "2", "0", "2", "2", "2", "2", "0", "2", "2", "1", "1", "0", "2", "2", "1", "2", "2", "1", "0", "2", "1", "1", "1", "1", "0", "1", "0", "2", "2", "0", "0", "1", "1", "2", "2", "2", "2", "1", "0", "2", "0", "1", "0", "0", "0", "1", "0", "2", "2", "2", "2", "2", "2", "0", "2", "1", "1", "1", "2", "2", "2", "0", "0", "2", "2", "2", "2", "2", "2", "0", "0", "1", "0", "0", "2", "0", "0", "0", "0", "0", "2", "2", "2", "2", "1", "0", "2", "2", "2", "0", "0", "2", "2", "1", "1", "1", "0", "2", "0", "2", "2", "0", "2", "2", "1", "1", "2", "1", "0", "2", "0", "0", "0", "2", "0", "2", "1", "2", "2", "0", "1", "0", "0", "1", "2", "2", "2", "2", "2", "1", "0", "1", "0", "2", "2", "2", "0", "0", "2", "2", "0", "2", "2", "2", "2", "0", "1", "1", "0", "0", "2", "2", "0", "0", "0", "2", "2", "0", "0", "2", "0", "2", "0", "2", "0", "1", "1", "1", "1", "1", "0", "1", "2", "1", "2", "2", "0", "2", "1", "2", "1", "0", "2", "0", "1", "0", "2", "2", "0", "2", "0", "2", "2", "2", "0", "0", "2", "1", "2", "0", "0", "2", "0", "0", "0", "1", "1", "2", "2", "1", "1", "2", "2", "0", "2", "2", "2", "0", "2", "0", "0", "1", "1", "0", "1", "2", "1", "2", "0", "0", "0", "2", "2", "2", "2", "2", "2", "2", "0", "0", "1", "0", "0", "1", "0", "2", "0", "2", "0", "2", "0", "1", "1", "1", "2", "0", "1", "1", "2", "2", "1", "1", "2", "1", "0", "1", "0", "0", "2", "1", "1", "2", "2", "2", "1", "2", "1", "0", "0", "1", "1", "1", "2", "0", "0", "0", "1", "2", "2", "2", "2", "0", "2", "0", "2", "0", "1", "0", "0", "2", "0", "2", "2", "2", "0", "2", "1", "1", "2", "2", "1", "2", "2", "2", "1", "2", "1", "0", "0", "2", "0", "0", "2", "3", "3", "2", "3", "1", "3", "3", "3", "3", "3", "3", "3", "3", "3", "3", "3", "1", "3", "3", "3", "1", "3", "3", "3", "3", "3", "3", "1", "3", "3"]}, "name": "0", "legendgroup": "0", "showlegend": true, "mode": "markers"}, {"type": "scatter", "x": [-0.407403, -1.08878, 1.071304, 0.97797], "y": [0.278186, -0.810584, 0.811022, -0.672336], "mode": "markers", "marker": {"color": "#d6336c", "size": 12, "symbol": "x"}, "name": "Centroids", "showlegend": true}]}K-Means clustering results with k=4. Points are colored by their assigned cluster, and cluster centroids are marked with 'x'. Notice how the outliers get assigned to the nearest cluster.K-Means successfully identifies the main groups, but it forces every point, including the clear outliers we added, into one of the clusters. This happens because K-Means assumes clusters are spherical and assigns every point to the closest centroid.Applying DBSCAN ClusteringDBSCAN (Density-Based Spatial Clustering of Applications with Noise) groups together points that are closely packed, marking as outliers points that lie alone in low-density regions. It doesn't require specifying the number of clusters beforehand but relies on two parameters: eps (the maximum distance between two samples for one to be considered as in the neighborhood of the other) and min_samples (the number of samples in a neighborhood for a point to be considered as a core point).Choosing appropriate eps and min_samples often requires some experimentation or domain knowledge. Let's try some values. A smaller eps or larger min_samples will result in more points being classified as noise.from sklearn.cluster import DBSCAN # Instantiate and fit DBSCAN # These parameters might need tuning depending on the dataset density dbscan = DBSCAN(eps=0.3, min_samples=5) dbscan.fit(X_scaled) # Get cluster assignments (-1 indicates noise/outliers) df['DBSCAN Cluster'] = dbscan.labels_.astype(str) # Convert to string for discrete colors # Visualize DBSCAN results fig_dbscan = px.scatter(df, x='Feature 1', y='Feature 2', color='DBSCAN Cluster', title=f'DBSCAN Clustering Results (eps={dbscan.eps}, min_samples={dbscan.min_samples})', color_discrete_map={"-1": "#adb5bd"}, # Gray color for noise category_orders={"DBSCAN Cluster": sorted(df['DBSCAN Cluster'].unique(), key=int)}, # Ensure -1 is first color_discrete_sequence=px.colors.qualitative.Pastel) # Colors for actual clusters # fig_dbscan.show(){"layout": {"title": {"text": "DBSCAN Clustering Results (eps=0.3, min_samples=5)"}, "xaxis": {"title": {"text": "Feature 1"}}, "yaxis": {"title": {"text": "Feature 2"}}, "coloraxis": {"colorbar": {"title": {"text": "DBSCAN Cluster"}}}, "legend": {"title": {"text": "DBSCAN Cluster"}, "traceorder": "grouped"}, "color_discrete_map": {"-1": "#adb5bd"}, "color_discrete_sequence": ["#AEC7E8", "#FFBB78", "#98DF8A", "#FF9896"], "template": "plotly_white"}, "data": [{"type": "scatter", "x": [-0.1407, -1.0994, 1.0277, -1.0999, 0.5868, -0.9657, 1.3333, 0.9139, -0.5425, 1.0503, -0.9569, -0.8658, 1.4888, -0.5704, 1.2312, 0.5124, 0.7864, -1.294, -0.9789, -1.2406, -1.2164, 1.1359, 0.7019, 1.6568, -1.101, -0.2347, 1.0796, -0.1379, 0.3974, -1.0173, -1.0665, 0.871, 0.3669, -0.472, -1.1459, 0.9665, 0.9813, -1.0097, -0.7846, 0.5722, -1.3466, -1.0766, 0.9912, -0.5746, -1.184, -1.4592, 1.4559, 0.8112, 0.9221, 0.6401, 1.106, -0.853, 0.5223, -0.196, -1.0974, 1.1218, 0.8076, -1.3195, -0.4492, -1.4464, 1.0315, 1.2396, 0.9347, -1.0508, -1.2227, -0.4981, -1.2317, -0.178, -0.4956, -1.2049, 0.653, 1.3812, -1.2812, -0.9394, -0.3137, -0.026, 0.4793, -1.1917, 0.579, 0.5904, -1.4075, 1.2042, 1.3355, -1.4301, 1.1796, 0.9514, 0.9065, 1.0447, -0.9378, 1.2249, 0.8301, 0.3932, -0.1822, -1.1175, 1.1579, 0.8179, -0.2471, 0.954, 1.2973, -0.1876, -0.7474, 1.2337, 0.6645, 0.5167, 0.343, -0.4198, -1.3238, -0.4518, -0.8374, 0.8626, 1.2979, -0.9468, -0.966, 0.6049, -0.297, 0.7823, 0.8005, 1.2691, 1.2789, -0.4227, -1.1512, 1.1617, -1.0609, -0.4972, -1.0459, -1.1928, -1.2507, -0.404, -1.1426, 0.7135, 1.038, 1.1142, 0.8662, 0.953, 0.875, -1.1291, 0.8807, -0.1804, 0.6456, -0.4304, 1.0115, 1.2328, 0.9183, -1.0549, -1.0544, 0.8762, 0.9617, 1.412, 0.9029, 1.3752, 1.0726, -1.1757, -1.2255, 0.6447, -1.0514, -0.8675, 0.724, -1.0342, -1.2793, -1.3291, -1.0629, -1.0499, 0.7652, 1.1709, 0.817, 0.8323, -0.627, -1.018, 1.0299, 1.2619, 0.9741, -1.1077, -0.9245, 1.3683, 0.8466, -0.3231, 0.5547, -0.2757, -0.786, 0.8479, -1.2604, 1.1134, 1.3738, -1.2446, 1.2693, 0.7628, -0.413, -0.3673, 1.1481, -0.5984, -0.857, 1.2557, -1.3063, -1.4615, -1.1838, 1.0693, -1.1197, 0.9167, -0.3066, 1.1313, 0.7927, -0.9761, 0.5715, -1.0618, -0.8806, -0.5839, 0.9319, 1.003, 0.8129, 1.068, 1.0767, 0.6772, -0.8908, -0.5365, -1.022, 1.1591, 0.8086, 1.3369, -0.9094, -1.1797, 0.9738, 1.099, -1.1366, 0.9422, 1.1731, 1.0436, 1.272, -1.2634, -0.5254, -0.5075, -0.8034, -0.9109, 1.0501, 0.9625, -1.0425, -0.8976, -1.2641, 1.2188, 1.0554, -0.9723, -1.2454, 0.8943, -1.3716, 0.9681, -0.963, 0.8296, -1.3439, -0.5657, -0.5068, -0.3491, 0.6947, -0.5227, -0.9843, 0.5515, 0.9467, 0.9095, 1.0409, -0.7745, 1.2774, -0.4679, 1.0725, -0.5888, -1.1504, 0.9305, -1.2354, 0.5948, -1.1913, 1.2305, 1.4638, -1.1734, 0.7548, -1.3545, 1.225, 1.0042, 1.2468, -1.2765, -0.9975, 1.1996, -0.3678, 1.0236, -1.1804, -1.1794, 1.0052, -0.8974, -1.0682, -1.2588, -0.5266, -0.3957, 0.9817, 1.0017, -0.5352, -0.4289, 1.2081, 1.1471, -1.0053, 1.1611, 1.0302, 0.871, -1.1139, 0.7779, -1.0842, -1.1987, -0.3755, -0.4483, -1.2899, -0.5117, 0.7213, -0.2189, 0.9803, -1.241, -1.0553, -0.7993, 1.058, 0.812, 0.737, 1.0485, 0.8861, 1.1864, 0.7884, -1.1194, -1.156, -0.5524, -1.223, -1.2573, 0.6275, -0.903, 0.8745, -0.8966, 1.3016, -0.9358, 0.7795, 0.559, 0.6593, 0.9051, -1.365, 0.8671, -0.3485, 1.1828, 0.9904, -0.4873, -0.4018, 1.0078, -0.3441, -0.8616, 0.5845, -1.0916, -1.068, 0.8667, -0.3337, 0.6409, 0.9494, 0.9515, 1.3517, -0.5303, 1.1751, -0.5874, -1.0161, -1.1064, -0.4131, 0.691, 0.6861, 0.9201, -0.8531, -1.1725, -1.0665, 0.7191, -1.3018, -1.1861, -0.563, 1.2947, 0.9699, 1.1703, 1.3062, -1.2465, 0.8114, -1.1754, 1.177, -0.9338, 0.6174, -1.1542, -1.1445, 1.0532, -1.2332, 1.1821, 1.1137, 0.9939, -0.985, 1.0243, -0.4958, -0.3992, 0.9532, 1.1678, 0.6459, 1.2372, 1.1185, 1.3594, -0.4372, 0.8968, 0.6877, -1.1731, -1.1971, 0.9246, -1.3562, -1.0278, 0.8815, 1.7774, 0.201, 0.8197, -1.6969, -0.5767, 1.2943, 1.9177, -2.7636, -0.8424, -0.1145, -1.7681, -2.7093, 1.9856, -0.8059, 1.6636, 1.3533, -0.3615, 2.6857, -1.8161, 2.4043, 0.0361, 2.8059, -1.7882, -1.0587, -2.059, 1.6951, -0.7022, 2.4761, 0.3833, 2.0588], "y": [-0.8666, -1.3076, 1.1466, 0.4787, -0.8475, -1.2978, 0.6605, 0.8718, 0.714, 0.4256, -0.8196, -1.2232, 0.4811, 0.953, 0.4931, -1.3832, 0.7797, -1.331, 0.4578, -0.9236, -1.1063, 1.2174, 0.9948, 1.2829, 0.4216, 0.6066, 0.4783, -0.7977, -0.9061, -1.0119, 0.6711, 1.0054, 0.1427, 1.0733, -0.9633, 0.7437, 0.491, -0.9161, -1.2534, -1.1416, -1.1548, -1.1194, 0.8288, 0.7664, 0.5591, 0.3396, 0.8146, 0.9518, 0.7766, -0.8617, 0.9247, -0.8838, -1.2205, 0.9336, 0.3487, 0.9102, 0.5091, -1.0738, 1.0237, -0.9905, 1.0576, 0.6369, 0.6352, 0.4794, 0.5595, 0.8968, -0.7779, -0.9835, 1.2515, 0.5617, -0.9929, 0.9237, -1.1041, -0.8775, 0.8621, -0.6849, 0.6546, -1.0216, -1.1289, -0.9066, 0.2926, 0.9491, -0.9869, 0.9693, 0.6341, 0.6737, 1.053, 0.6279, -1.1446, 0.6207, 0.7901, -1.227, 0.7906, -1.0079, 0.8611, 0.9608, 0.8524, 0.6033, 0.9035, 0.6524, -1.1019, 0.588, 0.662, -1.1907, -0.9544, 0.9727, -1.2646, 1.1534, 0.8496, 0.4639, -0.838, 1.0923, 0.823, -0.9584, -0.8054, -1.0062, -1.0041, 0.7601, 1.0145, 1.1052, -1.1101, 0.8728, 0.3247, -1.106, 0.5216, 0.8813, 0.8996, 0.8193, 0.457, -1.0145, 0.4635, 0.6285, 0.4733, 0.9403, 0.6084, 1.1098, -0.9653, -1.0408, 1.0144, 0.6763, 0.5978, -1.2139, 0.5135, -0.8991, -0.9251, -0.9663, -1.0803, 1.0567, 0.7046, 0.658, 1.1024, 0.9846, 0.5846, 0.4874, 0.5805, 0.6397, 0.6516, 0.8132, -0.9473, -0.784, -0.8091, 0.6908, 0.5801, -1.2028, -0.8661, -1.2063, 0.7175, 0.9038, -1.1497, 0.5951, -0.9887, 0.6929, -0.975, -0.8862, -0.911, 0.7555, 0.7327, 0.5639, 0.8366, -0.7963, -0.9452, 0.8443, 0.592, 0.6781, 0.5405, 0.952, -1.0689, 0.7107, -0.9613, 0.844, -1.0717, 0.8914, 1.1217, -0.8121, 0.8773, -0.8166, 0.836, 0.9099, 0.8133, -1.1503, 0.9635, 0.6772, -1.0588, 0.8235, -0.9308, -0.745, 1.0284, -1.2506, 0.4704, -0.974, 1.0008, 1.0714, 1.0788, 1.2273, 0.8939, -1.0659, 0.6827, 1.0467, 0.797, 0.6862, 0.7885, -0.8131, 0.6264, 0.6088, -1.0091, -1.2596, 0.536, 0.7983, -1.1136, 0.826, 0.7737, 0.9269, 0.5285, -1.0536, -1.069, -1.1664, 1.1098, 1.0448, -0.9114, 0.5947, -1.1356, -0.9871, -0.8602, -1.2562, -1.0156, -0.9124, 0.8714, 0.6371, -0.8954, 0.6962, 0.8474, -1.0044, -1.1812, -0.8671, -1.2392, 0.784, 1.0565, 0.9457, 1.2709, -1.0251, 0.534, -0.8715, 0.9925, 1.2658, -1.152, 0.7166, 0.4394, -1.0566, -1.1288, 0.6154, 0.6149, -0.9808, -1.0642, -1.1407, 0.6211, 0.5483, 0.752, -1.0223, 0.5054, 0.7415, -0.8241, -1.1274, 0.9468, 1.0843, 0.906, 0.9415, 0.4671, 0.6138, 0.5739, -1.1624, -0.8083, 1.0958, 0.7062, 0.8619, 0.7824, -1.1197, 1.0712, -0.8046, 0.4209, -1.063, -1.0467, 1.0466, 0.5681, 0.8338, 0.8469, 1.0606, -0.8658, 0.5806, -0.8557, -1.1336, 1.1908, 0.6739, 0.8456, 1.0095, -1.2388, 1.0619, 0.968, 0.9672, -1.0416, -1.1241, 0.7364, -0.9385, 0.759, -1.1857, 0.9836, 0.7427, -0.8048, -1.1141, 0.6893, -1.0811, 0.4271, 0.9466, -1.1202, 0.5478, 0.8205, 0.641, 0.8387, -0.8243, 0.5748, 0.6667, 0.8304, 0.8808, -1.0283, 1.0995, -0.9389, 0.8828, -0.8715, -1.0821, 0.494, 0.674, 0.8062, 1.0601, 0.9894, 0.9091, 0.6143, 0.6635, 1.0328, 0.9499, 0.6412, -1.0721, 0.8737, -1.0907, 0.8763, 0.8912, -1.0699, -1.0579, 0.5113, 0.6379, -1.1604, 0.5059, 0.9606, 1.0291, 0.7514, 0.8672, 0.7508, 0.9982, -0.9846, -1.0452, 0.938, -1.0694, 0.4534, 0.6177, 0.987, 0.6557, 0.4082, -0.7897, 0.8985, 0.8507, 0.6348, 1.2741, -1.0649, -0.7699, 1.1993, -0.9392, -1.3995, -1.1145, 0.5012, -0.2276, -0.8575, 1.863, -0.7261, 1.1205, -1.1653, 2.6689, -1.042, 0.3852, -0.436, -1.5648, 0.4107, 0.3205, -2.7582, 1.776, -0.5487, 2.7825, 0.9804, 0.0688, -1.9431, -2.5416, 0.1559], "marker": {"color": ["1", "0", "2", "1", "1", "0", "2", "2", "1", "2", "0", "0", "2", "1", "2", "1", "2", "0", "1", "0", "0", "2", "2", "2", "1", "1", "2", "1", "1", "0", "1", "2", "1", "1", "0", "2", "2", "0", "0", "1", "0", "0", "2", "1", "1", "0", "2", "2", "2", "1", "2", "0", "1", "1", "1", "2", "2", "0", "1", "0", "2", "2", "2", "1", "0", "1", "0", "1", "1", "0", "2", "2", "0", "0", "1", "1", "1", "0", "1", "1", "0", "2", "2", "0", "2", "2", "2", "2", "0", "2", "2", "1", "1", "0", "2", "2", "1", "2", "2", "1", "0", "2", "1", "1", "1", "1", "0", "1", "0", "2", "2", "0", "0", "1", "1", "2", "2", "2", "2", "1", "0", "2", "0", "1", "0", "0", "0", "1", "0", "2", "2", "2", "2", "2", "2", "0", "2", "1", "1", "1", "2", "2", "2", "0", "0", "2", "2", "2", "2", "2", "2", "0", "0", "1", "0", "0", "2", "0", "0", "0", "0", "0", "2", "2", "2", "2", "1", "0", "2", "2", "2", "0", "0", "2", "2", "1", "1", "1", "0", "2", "0", "2", "2", "0", "2", "2", "1", "1", "2", "1", "0", "2", "0", "0", "0", "2", "0", "2", "1", "2", "2", "0", "1", "0", "0", "1", "2", "2", "2", "2", "2", "1", "0", "1", "0", "2", "2", "2", "0", "0", "2", "2", "0", "2", "2", "2", "2", "0", "1", "1", "0", "0", "2", "2", "0", "0", "0", "2", "2", "0", "0", "2", "0", "2", "0", "2", "0", "1", "1", "1", "1", "1", "0", "1", "2", "1", "2", "2", "0", "2", "1", "2", "1", "0", "2", "0", "1", "0", "2", "2", "0", "2", "0", "2", "2", "2", "0", "0", "2", "1", "2", "0", "0", "2", "0", "0", "0", "1", "1", "2", "2", "1", "1", "2", "2", "0", "2", "2", "2", "0", "2", "0", "0", "1", "1", "0", "1", "2", "1", "2", "0", "0", "0", "2", "2", "2", "2", "2", "2", "2", "0", "0", "1", "0", "0", "1", "0", "2", "0", "2", "0", "2", "0", "1", "1", "1", "2", "0", "1", "1", "2", "2", "1", "1", "2", "1", "0", "1", "0", "0", "2", "1", "1", "2", "2", "2", "1", "2", "1", "0", "0", "1", "1", "1", "2", "0", "0", "0", "1", "2", "2", "2", "2", "0", "2", "0", "2", "0", "1", "0", "0", "2", "0", "2", "2", "2", "0", "2", "1", "1", "2", "2", "1", "2", "2", "2", "1", "2", "1", "0", "0", "2", "0", "0", "2", "-1", "-1", "-1", "-1", "-1", "-1", "-1", "-1", "-1", "-1", "-1", "-1", "-1", "-1", "-1", "-1", "-1", "-1", "-1", "-1", "-1", "-1", "-1", "-1", "-1", "-1", "-1", "-1", "-1", "-1"]}, "name": "-1", "legendgroup": "-1", "showlegend": true, "mode": "markers"}]}DBSCAN clustering results. Points labeled '-1' (gray) are identified as noise/outliers because they don't belong to any dense region according to the chosen eps and min_samples.Compare this to the K-Means plot. DBSCAN successfully identifies the four main clusters and, importantly, flags most of the synthetic outliers (and potentially some points on the fringes of the main clusters) as noise (cluster label -1). This ability to find noise points is a significant advantage of density-based clustering when dealing with datasets containing outliers.Basic Anomaly DetectionWhile DBSCAN inherently identifies noise points which can be considered anomalies, other algorithms are specifically designed for anomaly detection. Let's try the Isolation Forest algorithm. It works by randomly partitioning the data and explicitly identifying observations that are isolated, meaning they require fewer partitions to be separated from the rest.from sklearn.ensemble import IsolationForest # Instantiate and fit Isolation Forest # 'contamination' is the expected proportion of outliers, set to 'auto' or a specific value # Let's estimate based on the number of noise points we added (30 / 430) ~ 0.07 iso_forest = IsolationForest(contamination=0.07, random_state=42) iso_forest.fit(X_scaled) # Predict anomalies (-1 for anomalies, 1 for inliers) df['Anomaly'] = iso_forest.predict(X_scaled) df['Anomaly'] = df['Anomaly'].map({1: 'Inlier', -1: 'Anomaly'}) # Map to readable labels # Visualize Anomaly Detection results fig_anomaly = px.scatter(df, x='Feature 1', y='Feature 2', color='Anomaly', title='Anomaly Detection using Isolation Forest', color_discrete_map={'Inlier': '#1f77b4', 'Anomaly': '#d62728'}, # Standard blue, distinct red category_orders={"Anomaly": ["Inlier", "Anomaly"]}) # Ensure consistent legend order # fig_anomaly.show(){"layout": {"title": {"text": "Anomaly Detection using Isolation Forest"}, "xaxis": {"title": {"text": "Feature 1"}}, "yaxis": {"title": {"text": "Feature 2"}}, "coloraxis": {"colorbar": {"title": {"text": "Anomaly"}}}, "legend": {"title": {"text": "Anomaly"}, "traceorder": "grouped"}, "color_discrete_map": {"Inlier": "#228be6", "Anomaly": "#fa5252"}, "template": "plotly_white"}, "data": [{"type": "scatter", "x": [-0.1407, -1.0994, 1.0277, -1.0999, 0.5868, -0.9657, 1.3333, 0.9139, -0.5425, 1.0503, -0.9569, -0.8658, 1.4888, -0.5704, 1.2312, 0.5124, 0.7864, -1.294, -0.9789, -1.2406, -1.2164, 1.1359, 0.7019, 1.6568, -1.101, -0.2347, 1.0796, -0.1379, 0.3974, -1.0173, -1.0665, 0.871, 0.3669, -0.472, -1.1459, 0.9665, 0.9813, -1.0097, -0.7846, 0.5722, -1.3466, -1.0766, 0.9912, -0.5746, -1.184, -1.4592, 1.4559, 0.8112, 0.9221, 0.6401, 1.106, -0.853, 0.5223, -0.196, -1.0974, 1.1218, 0.8076, -1.3195, -0.4492, -1.4464, 1.0315, 1.2396, 0.9347, -1.0508, -1.2227, -0.4981, -1.2317, -0.178, -0.4956, -1.2049, 0.653, 1.3812, -1.2812, -0.9394, -0.3137, -0.026, 0.4793, -1.1917, 0.579, 0.5904, -1.4075, 1.2042, 1.3355, -1.4301, 1.1796, 0.9514, 0.9065, 1.0447, -0.9378, 1.2249, 0.8301, 0.3932, -0.1822, -1.1175, 1.1579, 0.8179, -0.2471, 0.954, 1.2973, -0.1876, -0.7474, 1.2337, 0.6645, 0.5167, 0.343, -0.4198, -1.3238, -0.4518, -0.8374, 0.8626, 1.2979, -0.9468, -0.966, 0.6049, -0.297, 0.7823, 0.8005, 1.2691, 1.2789, -0.4227, -1.1512, 1.1617, -1.0609, -0.4972, -1.0459, -1.1928, -1.2507, -0.404, -1.1426, 0.7135, 1.038, 1.1142, 0.8662, 0.953, 0.875, -1.1291, 0.8807, -0.1804, 0.6456, -0.4304, 1.0115, 1.2328, 0.9183, -1.0549, -1.0544, 0.8762, 0.9617, 1.412, 0.9029, 1.3752, 1.0726, -1.1757, -1.2255, 0.6447, -1.0514, -0.8675, 0.724, -1.0342, -1.2793, -1.3291, -1.0629, -1.0499, 0.7652, 1.1709, 0.817, 0.8323, -0.627, -1.018, 1.0299, 1.2619, 0.9741, -1.1077, -0.9245, 1.3683, 0.8466, -0.3231, 0.5547, -0.2757, -0.786, 0.8479, -1.2604, 1.1134, 1.3738, -1.2446, 1.2693, 0.7628, -0.413, -0.3673, 1.1481, -0.5984, -0.857, 1.2557, -1.3063, -1.4615, -1.1838, 1.0693, -1.1197, 0.9167, -0.3066, 1.1313, 0.7927, -0.9761, 0.5715, -1.0618, -0.8806, -0.5839, 0.9319, 1.003, 0.8129, 1.068, 1.0767, 0.6772, -0.8908, -0.5365, -1.022, 1.1591, 0.8086, 1.3369, -0.9094, -1.1797, 0.9738, 1.099, -1.1366, 0.9422, 1.1731, 1.0436, 1.272, -1.2634, -0.5254, -0.5075, -0.8034, -0.9109, 1.0501, 0.9625, -1.0425, -0.8976, -1.2641, 1.2188, 1.0554, -0.9723, -1.2454, 0.8943, -1.3716, 0.9681, -0.963, 0.8296, -1.3439, -0.5657, -0.5068, -0.3491, 0.6947, -0.5227, -0.9843, 0.5515, 0.9467, 0.9095, 1.0409, -0.7745, 1.2774, -0.4679, 1.0725, -0.5888, -1.1504, 0.9305, -1.2354, 0.5948, -1.1913, 1.2305, 1.4638, -1.1734, 0.7548, -1.3545, 1.225, 1.0042, 1.2468, -1.2765, -0.9975, 1.1996, -0.3678, 1.0236, -1.1804, -1.1794, 1.0052, -0.8974, -1.0682, -1.2588, -0.5266, -0.3957, 0.9817, 1.0017, -0.5352, -0.4289, 1.2081, 1.1471, -1.0053, 1.1611, 1.0302, 0.871, -1.1139, 0.7779, -1.0842, -1.1987, -0.3755, -0.4483, -1.2899, -0.5117, 0.7213, -0.2189, 0.9803, -1.241, -1.0553, -0.7993, 1.058, 0.812, 0.737, 1.0485, 0.8861, 1.1864, 0.7884, -1.1194, -1.156, -0.5524, -1.223, -1.2573, 0.6275, -0.903, 0.8745, -0.8966, 1.3016, -0.9358, 0.7795, 0.559, 0.6593, 0.9051, -1.365, 0.8671, -0.3485, 1.1828, 0.9904, -0.4873, -0.4018, 1.0078, -0.3441, -0.8616, 0.5845, -1.0916, -1.068, 0.8667, -0.3337, 0.6409, 0.9494, 0.9515, 1.3517, -0.5303, 1.1751, -0.5874, -1.0161, -1.1064, -0.4131, 0.691, 0.6861, 0.9201, -0.8531, -1.1725, -1.0665, 0.7191, -1.3018, -1.1861, -0.563, 1.2947, 0.9699, 1.1703, 1.3062, -1.2465, 0.8114, -1.1754, 1.177, -0.9338, 0.6174, -1.1542, -1.1445, 1.0532, -1.2332, 1.1821, 1.1137, 0.9939, -0.985, 1.0243, -0.4958, -0.3992, 0.9532, 1.1678, 0.6459, 1.2372, 1.1185, 1.3594, -0.4372, 0.8968, 0.6877, -1.1731, -1.1971, 0.9246, -1.3562, -1.0278, 0.8815, 1.7774, 0.201, 0.8197, -1.6969, -0.5767, 1.2943, 1.9177, -2.7636, -0.8424, -0.1145, -1.7681, -2.7093, 1.9856, -0.8059, 1.6636, 1.3533, -0.3615, 2.6857, -1.8161, 2.4043, 0.0361, 2.8059, -1.7882, -1.0587, -2.059, 1.6951, -0.7022, 2.4761, 0.3833, 2.0588], "y": [-0.8666, -1.3076, 1.1466, 0.4787, -0.8475, -1.2978, 0.6605, 0.8718, 0.714, 0.4256, -0.8196, -1.2232, 0.4811, 0.953, 0.4931, -1.3832, 0.7797, -1.331, 0.4578, -0.9236, -1.1063, 1.2174, 0.9948, 1.2829, 0.4216, 0.6066, 0.4783, -0.7977, -0.9061, -1.0119, 0.6711, 1.0054, 0.1427, 1.0733, -0.9633, 0.7437, 0.491, -0.9161, -1.2534, -1.1416, -1.1548, -1.1194, 0.8288, 0.7664, 0.5591, 0.3396, 0.8146, 0.9518, 0.7766, -0.8617, 0.9247, -0.8838, -1.2205, 0.9336, 0.3487, 0.9102, 0.5091, -1.0738, 1.0237, -0.9905, 1.0576, 0.6369, 0.6352, 0.4794, 0.5595, 0.8968, -0.7779, -0.9835, 1.2515, 0.5617, -0.9929, 0.9237, -1.1041, -0.8775, 0.8621, -0.6849, 0.6546, -1.0216, -1.1289, -0.9066, 0.2926, 0.9491, -0.9869, 0.9693, 0.6341, 0.6737, 1.053, 0.6279, -1.1446, 0.6207, 0.7901, -1.227, 0.7906, -1.0079, 0.8611, 0.9608, 0.8524, 0.6033, 0.9035, 0.6524, -1.1019, 0.588, 0.662, -1.1907, -0.9544, 0.9727, -1.2646, 1.1534, 0.8496, 0.4639, -0.838, 1.0923, 0.823, -0.9584, -0.8054, -1.0062, -1.0041, 0.7601, 1.0145, 1.1052, -1.1101, 0.8728, 0.3247, -1.106, 0.5216, 0.8813, 0.8996, 0.8193, 0.457, -1.0145, 0.4635, 0.6285, 0.4733, 0.9403, 0.6084, 1.1098, -0.9653, -1.0408, 1.0144, 0.6763, 0.5978, -1.2139, 0.5135, -0.8991, -0.9251, -0.9663, -1.0803, 1.0567, 0.7046, 0.658, 1.1024, 0.9846, 0.5846, 0.4874, 0.5805, 0.6397, 0.6516, 0.8132, -0.9473, -0.784, -0.8091, 0.6908, 0.5801, -1.2028, -0.8661, -1.2063, 0.7175, 0.9038, -1.1497, 0.5951, -0.9887, 0.6929, -0.975, -0.8862, -0.911, 0.7555, 0.7327, 0.5639, 0.8366, -0.7963, -0.9452, 0.8443, 0.592, 0.6781, 0.5405, 0.952, -1.0689, 0.7107, -0.9613, 0.844, -1.0717, 0.8914, 1.1217, -0.8121, 0.8773, -0.8166, 0.836, 0.9099, 0.8133, -1.1503, 0.9635, 0.6772, -1.0588, 0.8235, -0.9308, -0.745, 1.0284, -1.2506, 0.4704, -0.974, 1.0008, 1.0714, 1.0788, 1.2273, 0.8939, -1.0659, 0.6827, 1.0467, 0.797, 0.6862, 0.7885, -0.8131, 0.6264, 0.6088, -1.0091, -1.2596, 0.536, 0.7983, -1.1136, 0.826, 0.7737, 0.9269, 0.5285, -1.0536, -1.069, -1.1664, 1.1098, 1.0448, -0.9114, 0.5947, -1.1356, -0.9871, -0.8602, -1.2562, -1.0156, -0.9124, 0.8714, 0.6371, -0.8954, 0.6962, 0.8474, -1.0044, -1.1812, -0.8671, -1.2392, 0.784, 1.0565, 0.9457, 1.2709, -1.0251, 0.534, -0.8715, 0.9925, 1.2658, -1.152, 0.7166, 0.4394, -1.0566, -1.1288, 0.6154, 0.6149, -0.9808, -1.0642, -1.1407, 0.6211, 0.5483, 0.752, -1.0223, 0.5054, 0.7415, -0.8241, -1.1274, 0.9468, 1.0843, 0.906, 0.9415, 0.4671, 0.6138, 0.5739, -1.1624, -0.8083, 1.0958, 0.7062, 0.8619, 0.7824, -1.1197, 1.0712, -0.8046, 0.4209, -1.063, -1.0467, 1.0466, 0.5681, 0.8338, 0.8469, 1.0606, -0.8658, 0.5806, -0.8557, -1.1336, 1.1908, 0.6739, 0.8456, 1.0095, -1.2388, 1.0619, 0.968, 0.9672, -1.0416, -1.1241, 0.7364, -0.9385, 0.759, -1.1857, 0.9836, 0.7427, -0.8048, -1.1141, 0.6893, -1.0811, 0.4271, 0.9466, -1.1202, 0.5478, 0.8205, 0.641, 0.8387, -0.8243, 0.5748, 0.6667, 0.8304, 0.8808, -1.0283, 1.0995, -0.9389, 0.8828, -0.8715, -1.0821, 0.494, 0.674, 0.8062, 1.0601, 0.9894, 0.9091, 0.6143, 0.6635, 1.0328, 0.9499, 0.6412, -1.0721, 0.8737, -1.0907, 0.8763, 0.8912, -1.0699, -1.0579, 0.5113, 0.6379, -1.1604, 0.5059, 0.9606, 1.0291, 0.7514, 0.8672, 0.7508, 0.9982, -0.9846, -1.0452, 0.938, -1.0694, 0.4534, 0.6177, 0.987, 0.6557, 0.4082, -0.7897, 0.8985, 0.8507, 0.6348, 1.2741, -1.0649, -0.7699, 1.1993, -0.9392, -1.3995, -1.1145, 0.5012, -0.2276, -0.8575, 1.863, -0.7261, 1.1205, -1.1653, 2.6689, -1.042, 0.3852, -0.436, -1.5648, 0.4107, 0.3205, -2.7582, 1.776, -0.5487, 2.7825, 0.9804, 0.0688, -1.9431, -2.5416, 0.1559], "marker": {"color": ["Inlier", "Inlier", "Inlier", "Inlier", "Inlier", "Inlier", "Inlier", "Inlier", "Inlier", "Inlier", "Inlier", "Inlier", "Inlier", "Inlier", "Inlier", "Inlier", "Inlier", "Inlier", "Inlier", "Inlier", "Inlier", "Inlier", "Inlier", "Inlier", "Inlier", "Inlier", "Inlier", "Inlier", "Inlier", "Inlier", "Inlier", "Inlier", "Inlier", "Inlier", "Inlier", "Inlier", "Inlier", "Inlier", "Inlier", "Inlier", "Inlier", "Inlier", "Inlier", "Inlier", "Inlier", "Inlier", "Inlier", "Inlier", "Inlier", "Inlier", "Inlier", "Inlier", "Inlier", "Inlier", "Inlier", "Inlier", "Inlier", "Inlier", "Inlier", "Inlier", "Inlier", "Inlier", "Inlier", "Inlier", "Inlier", "Inlier", "Inlier", "Inlier", "Inlier", "Inlier", "Inlier", "Inlier", "Inlier", "Inlier", "Inlier", "Inlier", "Inlier", "Inlier", "Inlier", "Inlier", "Inlier", "Inlier", "Inlier", "Inlier", "Inlier", "Inlier", "Inlier", "Inlier", "Inlier", "Inlier", "Inlier", "Inlier", "Inlier", "Inlier", "Inlier", "Inlier", "Inlier", "Inlier", "Inlier", "Inlier", "Inlier", "Inlier", "Inlier", "Inlier", "Inlier", "Inlier", "Inlier", "Inlier", "Inlier", "Inlier", "Inlier", "Inlier", "Inlier", "Inlier", "Inlier", "Inlier", "Inlier", "Inlier", "Inlier", "Inlier", "Inlier", "Inlier", "Inlier", "Inlier", "Inlier", "Inlier", "Inlier", "Inlier", "Inlier", "Inlier", "Inlier", "Inlier", "Inlier", "Inlier", "Inlier", "Inlier", "Inlier", "Inlier", "Inlier", "Inlier", "Inlier", "Inlier", "Inlier", "Inlier", "Inlier", "Inlier", "Inlier", "Inlier", "Inlier", "Inlier", "Inlier", "Inlier", "Inlier", "Inlier", "Inlier", "Inlier", "Inlier", "Inlier", "Inlier", "Inlier", "Inlier", "Inlier", "Inlier", "Inlier", "Inlier", "Inlier", "Inlier", "Inlier", "Inlier", "Inlier", "Inlier", "Inlier", "Inlier", "Inlier", "Inlier", "Inlier", "Inlier", "Inlier", "Inlier", "Inlier", "Inlier", "Inlier", "Inlier", "Inlier", "Inlier", "Inlier", "Inlier", "Inlier", "Inlier", "Inlier", "Inlier", "Inlier", "Inlier", "Inlier", "Inlier", "Inlier", "Inlier", "Inlier", "Inlier", "Inlier", "Inlier", "Inlier", "Inlier", "Inlier", "Inlier", "Inlier", "Inlier", "Inlier", "Inlier", "Inlier", "Inlier", "Inlier", "Inlier", "Inlier", "Inlier", "Inlier", "Inlier", "Inlier", "Inlier", "Inlier", "Inlier", "Inlier", "Inlier", "Inlier", "Inlier", "Inlier", "Inlier", "Inlier", "Inlier", "Inlier", "Inlier", "Inlier", "Inlier", "Inlier", "Inlier", "Inlier", "Inlier", "Inlier", "Inlier", "Inlier", "Inlier", "Inlier", "Inlier", "Inlier", "Inlier", "Inlier", "Inlier", "Inlier", "Inlier", "Inlier", "Inlier", "Inlier", "Inlier", "Inlier", "Inlier", "Inlier", "Inlier", "Inlier", "Inlier", "Inlier", "Inlier", "Inlier", "Inlier", "Inlier", "Inlier", "Inlier", "Inlier", "Inlier", "Inlier", "Inlier", "Inlier", "Inlier", "Inlier", "Inlier", "Inlier", "Inlier", "Inlier", "Inlier", "Inlier", "Inlier", "Inlier", "Inlier", "Inlier", "Inlier", "Inlier", "Inlier", "Inlier", "Inlier", "Inlier", "Inlier", "Inlier", "Inlier", "Inlier", "Inlier", "Inlier", "Inlier", "Inlier", "Inlier", "Inlier", "Inlier", "Inlier", "Inlier", "Inlier", "Inlier", "Inlier", "Inlier", "Inlier", "Inlier", "Inlier", "Inlier", "Inlier", "Inlier", "Inlier", "Inlier", "Inlier", "Inlier", "Inlier", "Inlier", "Inlier", "Inlier", "Inlier", "Inlier", "Inlier", "Inlier", "Inlier", "Inlier", "Inlier", "Inlier", "Inlier", "Inlier", "Inlier", "Inlier", "Inlier", "Inlier", "Inlier", "Inlier", "Inlier", "Inlier", "Inlier", "Inlier", "Inlier", "Inlier", "Inlier", "Inlier", "Inlier", "Inlier", "Inlier", "Inlier", "Inlier", "Inlier", "Inlier", "Inlier", "Inlier", "Inlier", "Inlier", "Inlier", "Inlier", "Inlier", "Inlier", "Inlier", "Inlier", "Inlier", "Inlier", "Inlier", "Inlier", "Inlier", "Inlier", "Inlier", "Inlier", "Inlier", "Inlier", "Inlier", "Inlier", "Inlier", "Inlier", "Inlier", "Inlier", "Inlier", "Inlier", "Inlier", "Inlier", "Inlier", "Inlier", "Inlier", "Inlier", "Inlier", "Inlier", "Inlier", "Inlier", "Inlier", "Inlier", "Inlier", "Inlier", "Inlier", "Inlier", "Inlier", "Inlier", "Inlier", "Inlier", "Inlier", "Inlier", "Inlier", "Inlier", "Inlier", "Inlier", "Inlier", "Inlier", "Inlier", "Anomaly", "Anomaly", "Anomaly", "Anomaly", "Anomaly", "Anomaly", "Anomaly", "Anomaly", "Anomaly", "Anomaly", "Anomaly", "Anomaly", "Anomaly", "Anomaly", "Anomaly", "Anomaly", "Inlier", "Anomaly", "Anomaly", "Anomaly", "Inlier", "Anomaly", "Anomaly", "Anomaly", "Anomaly", "Anomaly", "Anomaly", "Anomaly", "Anomaly", "Anomaly"]}, "name": "Inlier", "legendgroup": "Inlier", "showlegend": true, "mode": "markers"}]}Isolation Forest results, highlighting points identified as anomalies (red).The Isolation Forest identifies many of the same points as DBSCAN's noise points. However, the exact set might differ based on the algorithm's logic and parameters (like the contamination factor). Isolation Forest is specifically designed to find outliers, while DBSCAN finds them as a byproduct of identifying dense regions. Depending on the specific goal, one might be preferred over the other.This practice session demonstrated how to apply K-Means and DBSCAN for clustering and Isolation Forest for anomaly detection. You saw how K-Means assigns all points to clusters, while DBSCAN can identify noise. Isolation Forest provides a targeted approach for finding outliers. Experimenting with parameters (k for K-Means, eps and min_samples for DBSCAN, contamination for Isolation Forest) is often necessary to achieve the desired results for a specific dataset and analysis goal.