グラフが綺麗に描けるpythonのライブラリseabornを使うとグラフを描くのが楽しくなる

pythonのグラフを美しく描くライブラリseabornというのを知ったので触ってみる。
こちらが公式サイト
Seaborn: statistical data visualization — seaborn 0.7.1 documentation
こんなのも描けるようになる。
f:id:swdrsker:20161210045429p:plain:w300

インストール

conda install seaborn

pipでもできるらしい

基本操作

基本的にはseabornをimportするだけ、普段通りにmatplotlibを使えばよい

import matplotlib.pyplot as plt
import seaborn as sns

x = y = range(10)
plt.plot(x,y)
plt.show()

f:id:swdrsker:20161210034822p:plain:w300

背景のデザインを変更する

例えば背景を白く、グリッドを入れるなら

sns.set_style("whitegrid")

この関数には darkgrid, whitegrid, dark, white, ticks のパラメータがある。デフォルトはdarkgrid。ticksはxy軸にグリッドを付けるパラメータ。

また、上と右の枠線を消すにはplt.show()の前にこれを挿入

sns.despine()

サンプル

sns.set_style("ticks")
x = np.arange(0, 20, 0.01*np.pi)
y1 = np.sin(x)
y2 = np.cos(x)
plt.plot(x, y1)
plt.plot(x, y2)
sns.despine()
plt.show()

f:id:swdrsker:20161210034923p:plain:w300

応用編

seabornはただの可視化だけでなく、データを突っ込めば自分で処理することなく分布のフィッティングや線形回帰をしてグラフ化してくれる機能がある。EXCELもびっくりだ。

データの用意

import numpy as np
import pandas as pd

x = np.random.normal(size=100)
mean, cov = [0, 1], [(1, .5), (.5, 1)]
data = np.random.multivariate_normal(mean, cov, 200)
df = pd.DataFrame(data, columns=["x", "y"])
group = np.random.normal(size=(20, 6)) + np.arange(10,16) / 2

ヒストグラム・確率密度

1.distplot()
ヒストグラム、カーネル密度関数法で求めた密度関数も出力してくれる

sns.distplot(x)

f:id:swdrsker:20161210040841p:plain:w300

可視化に関するパラメータにはhist, kde, rug (生データ)があり、それぞれを可視化するかしないかを決められる。

sns.distplot(x, rug=True, hist=False)

f:id:swdrsker:20161210040944p:plain:w300

fitを使えば分布のフィッティングもできる

from scipy import stats
sns.distplot(x,color="g")
sns.distplot(x,hist=False,kde=False,fit=stats.norm)

f:id:swdrsker:20161210041009p:plain:w300

2.kdeplot()
純粋なカーネル密度推定法（KDE)のプロットをする。こっちを使えばカーネル幅を変えられる。

sns.kdeplot(x, label="bw: 1")
sns.kdeplot(x, bw=.1, label="bw: 0.1")
sns.kdeplot(x, bw=2, label="bw: 2", shade=True)
plt.legend()

f:id:swdrsker:20161210041029p:plain:w300

2次元のKDEにも使える

sns.kdeplot(*data.T, shade=True, shade_lowest=False)

shade_lowestは一番低いレイヤーを描くかどうかを選択。
f:id:swdrsker:20161210041057p:plain:w300

3.jointplot()

sns.jointplot(x="x", y="y", data=df)

f:id:swdrsker:20161210041120p:plain:w400

発展編

g = sns.jointplot(x="x", y="y", data=df, kind="kde")
g.plot_joint(plt.scatter, c="c", s=30, linewidth=1, marker="+")
g.ax_joint.collections[0].set_alpha(0)
g.set_axis_labels("$X$", "$Y$")

f:id:swdrsker:20161210041133p:plain:w400

線形回帰

線形回帰。陰になっている箇所は95%信頼区間を表す。

sns.regplot(x="x",y="y",data=df)

f:id:swdrsker:20161210041241p:plain:w300

多くのパラメータがあるが、特筆すべきはci,order。ciは[0,100] or Noneをとり、信頼区間あるいは非表示を指定。orderは次数を指定できる。

regx = np.arange(-2,3,0.1)
regy = regx**2 + np.random.random(len(regx))*5
regdf = pd.DataFrame(data=[regx,regy],index=['x','y']).T
sns.regplot(x="x", y="y", data=regdf, ci=68, order=2)