python - Sklearn.KMeans() : Get class centroid labels and reference to a dataset -


sci-kit learn kmeans , pca dimensionality reduction

i have dataset, 2m rows 7 columns, different measurements of home power consumption date each measurement.

  • date,
  • global_active_power,
  • global_reactive_power,
  • voltage,
  • global_intensity,
  • sub_metering_1,
  • sub_metering_2,
  • sub_metering_3

i put dataset pandas dataframe, selecting columns date column, perform cross validation split.

import pandas pd sklearn.cross_validation import train_test_split  data = pd.read_csv('household_power_consumption.txt', delimiter=';') power_consumption = data.iloc[0:, 2:9].dropna() pc_toarray = power_consumption.values hpc_fit, hpc_fit1 = train_test_split(pc_toarray, train_size=.01) power_consumption.head() 

power table

i use k-means classification followed pca dimensionality reduction display.

from sklearn.cluster import kmeans import matplotlib.pyplot plt import numpy np sklearn.decomposition import pca  hpc = pca(n_components=2).fit_transform(hpc_fit) k_means = kmeans() k_means.fit(hpc)  x_min, x_max = hpc[:, 0].min() - 5, hpc[:, 0].max() - 1 y_min, y_max = hpc[:, 1].min(), hpc[:, 1].max() + 5 xx, yy = np.meshgrid(np.arange(x_min, x_max, .02), np.arange(y_min, y_max, .02)) z = k_means.predict(np.c_[xx.ravel(), yy.ravel()]) z = z.reshape(xx.shape)  plt.figure(1) plt.clf() plt.imshow(z, interpolation='nearest',           extent=(xx.min(), xx.max(), yy.min(), yy.max()),           cmap=plt.cm.paired,           aspect='auto', origin='lower')  plt.plot(hpc[:, 0], hpc[:, 1], 'k.', markersize=4) centroids = k_means.cluster_centers_ inert = k_means.inertia_ plt.scatter(centroids[:, 0], centroids[:, 1],            marker='x', s=169, linewidths=3,            color='w', zorder=8) plt.xlim(x_min, x_max) plt.ylim(y_min, y_max) plt.xticks(()) plt.yticks(()) plt.show() 

pca output

now find out rows fell under given class dates fell under given class.

  • is there way relate points on graph index in dataset, after pca?
  • some method don't know of?
  • or approach fundamentally flawed?
  • any recommendations?

i new field , trying read through lots of code, compilation of several examples i've seen documented .

my goal classify data , dates fall under class.

thank you

kmeans().predict(x) ..docs here


predict closest cluster each sample in x belongs to.

in vector quantization literature, cluster_centers_ called code book , each value returned predict index of closest code in code book.

parameters: (new data predict)  x : {array-like, sparse matrix}, shape = [n_samples, n_features]  returns: (index of cluster each sample belongs to)    labels : array, shape [n_samples,] 

the problem code submitted use of

train_test_split() 

which returns 2 arrays of random rows in data-set, ruining dataset order making difficult correlate labels returned kmeans classification sequential dates in data set.


here's example:

import pandas pd import numpy np sklearn.cluster import kmeans  #read data pandas dataframe df = pd.read_csv('household_power_consumption.txt', delimiter=';') 

raw dataset head

#convert merge date , time colums , convert datetime objects df['datetime'] = pd.to_datetime(df['date'] + ' ' + df['time']) df.set_index(pd.datetimeindex(df['datetime'],inplace=true)) df.drop(['date','time'], axis=1, inplace=true)  #put last column first cols = df.columns.tolist() cols = cols[-1:] + cols[:-1] df = df[cols] df = df.dropna() 

preprocessed dates

#convert dataframe data array , removes date column not processed,  sliced = df.iloc[0:, 1:8].dropna() hpc = sliced.values  k_means = kmeans() k_means.fit(hpc)  # array of indexes corresponding classes around centroids, in order of dataset classified_data = k_means.labels_  #copy dataframe (may memory intensive illustration) df_processed = df.copy() df_processed['cluster class'] = pd.series(classified_data, index=df_processed.index) 

finished


  • now can see result matched data-set on right side.
  • now it's classified, it's derive meaning.
  • this overall example of how can used, start finish.
  • displaying result, @ pca or making other graphs dependent on class.

Comments

Popular posts from this blog

java - Plugin org.apache.maven.plugins:maven-install-plugin:2.4 or one of its dependencies could not be resolved -

Round ImageView Android -

How can I utilize Yahoo Weather API in android -