python - Create plot with Pandas and show similar output as with Matplotlib directly -
i have query run outputs list of data consisting of date string , count:
date_cnts = [(u'2014-06-27', 1), (u'2014-06-29', 3), (u'2014-06-30', 1), (u'2014-07-01', 1), (u'2014-07-02', 1), (u'2014-07-09', 1), (u'2014-07-10', 3), (u'2014-07-11', 1), (u'2014-07-12', 2), (u'2014-07-14', 1), (u'2014-07-15', 2), (u'2014-07-17', 3), (u'2014-07-18', 1), (u'2014-07-20', 1), (u'2014-07-21', 1), (u'2014-07-23', 2), (u'2014-07-26', 2), (u'2014-07-27', 2), (u'2014-07-28', 7), (u'2014-07-29', 3), (u'2014-07-31', 2), (u'2014-08-01', 1), (u'2014-08-05', 4), (u'2014-08-07', 2), (u'2014-08-08', 1), (u'2014-08-13', 1), (u'2014-08-14', 3), (u'2014-08-15', 1), (u'2014-08-16', 6), (u'2014-08-17', 1), (u'2014-08-18', 1), (u'2014-08-20', 1), (u'2014-08-24', 1), (u'2014-08-25', 3), (u'2014-08-29', 1), (u'2014-08-30', 1), (u'2014-09-03', 3), (u'2014-09-13', 1), (u'2014-09-14', 1), (u'2014-09-24', 3), (u'2014-10-20', 1), (u'2014-10-24', 1), (u'2014-11-05', 3), (u'2014-11-09', 1), (u'2014-11-12', 1), (u'2014-11-13', 1), (u'2014-11-14', 1), (u'2014-11-18', 1), (u'2014-11-19', 4), (u'2014-11-22', 1), (u'2014-11-26', 3), (u'2014-11-28', 3), (u'2014-12-01', 2), (u'2014-12-02', 2), (u'2014-12-04', 2), (u'2014-12-05', 1), (u'2014-12-06', 5), (u'2014-12-11', 1), (u'2014-12-15', 10)]
notice there date gaps in data set, indicating missing dates have value of 0
.
my working (non-pandas) version of code looks this:
from matplotlib import pyplot plt x_val = [datetime.strptime(x[0],'%y-%m-%d') x in date_cnts] y_val = [x[1] x in date_cnts] plt.bar(x_val, y_val) plt.grid(true) plt.show()
this outputs image:
now, if convert query results panda's dataframe
date count 0 2014-06-27 1 1 2014-06-29 3 2 2014-06-30 1 3 2014-07-01 1 4 2014-07-02 1 5 2014-07-09 1 6 2014-07-10 3 7 2014-07-11 1 8 2014-07-12 2 9 2014-07-14 1 10 2014-07-15 2 11 2014-07-17 3 12 2014-07-18 1 13 2014-07-20 1 14 2014-07-21 1 15 2014-07-23 2 16 2014-07-26 2 17 2014-07-27 2 18 2014-07-28 7 19 2014-07-29 3 20 2014-07-31 2 21 2014-08-01 1 22 2014-08-05 4 23 2014-08-07 2 24 2014-08-08 1 25 2014-08-13 1 26 2014-08-14 3 27 2014-08-15 1 28 2014-08-16 6 29 2014-08-17 1 30 2014-08-18 1 31 2014-08-20 1 32 2014-08-24 1 33 2014-08-25 3 34 2014-08-29 1 35 2014-08-30 1 36 2014-09-03 3 37 2014-09-13 1 38 2014-09-14 1 39 2014-09-24 3 40 2014-10-20 1 41 2014-10-24 1 42 2014-11-05 3 43 2014-11-09 1 44 2014-11-12 1 45 2014-11-13 1 46 2014-11-14 1 47 2014-11-18 1 48 2014-11-19 4 49 2014-11-22 1 50 2014-11-26 3 51 2014-11-28 3 52 2014-12-01 2 53 2014-12-02 2 54 2014-12-04 2 55 2014-12-05 1 56 2014-12-06 5 57 2014-12-11 1 58 2014-12-15 10
and utilize simple panda's wrapper plot this:
plt.figure() df.plot(kind='bar', grid=true, legend=false, x='date', y=u'count') plt.show()
i result. notice missing days not appear in graph.
how readd gaps (and 0
values) dates not exist in dataframe?
the reason want utilize pandas take advantage of of it's other features (most importantly, rolling average).
i wrote working version, not best job. based on reindexing original data dataframe sample everyday.
import pandas pd import matplotlib.pyplot plt #%% make data df = pd.dataframe(date_cnts) df.columns = ['date', 'count'] #%% make dataframe everyday sampling df.index = pd.to_datetime(df['date']) startdate = df.index[0] enddate = df.index[-1] df_new = df.reindex(pd.date_range(startdate, enddate, freq='1d')) #%% plot results df_new['count'].plot(kind='bar') # decrease number of days new_xticks = plt.xticks()[0][1:-1:10] plt.xticks(new_xticks)
for further formatting of xticks recommend question: pandas timeseries plot setting x-axis major , minor ticks , labels
Comments
Post a Comment