[Python] pandas.DataFrame.sample() 함수

Programming/Python

[Python] pandas.DataFrame.sample() 함수

_Sun_ 2023. 6. 9. 15:40

정의

Return a random sample of items from an axis of object

즉, 데이터프레임에서 랜덤하게 샘플링하고 싶을 때 사용하는 함수

DataFrame.sample(n=None, frac=None, replace=False, weights=None, 
		random_state=None, axis=None, ignore_index=False)

예시

먼저 DataFrame 준비하기.

>>> import pandas as pd
>>> df = pd.DataFrame({'size': [2, 4, 8, 1],
...               	'order_counts': [0, 5, 8, 2],
...              	'num_counts': [10, 2, 1, 8]},
...              	index=['plum', 'peach', 'watermelon', 'blueberry'])
>>> df
           size    order_counts       num_counts
plum         2          0                 10
peach        4          5                  2
watermelon   8          8                  1
blueberry    1          2                  8

n : 랜덤하게 뽑을 데이터의 개수

>>> df['size'].sample(n=3, random_state=1)

blueberry     1
watermelon    8
plum          2
Name: size, dtype: int64

frac : 랜덤하게 뽑을 데이터의 비율.

>>> df.sample(frac=0.5, replace=True, random_state=1)

           size    order_counts       num_counts
peach        4          5                  2
blueberry    1          2                  8

df.sample(frac = 0.7) # 70% 데이터 랜덤하게 return
df.sample(frac = 1) # 100%의 데이터 랜덤하게 return. 즉, shuffle의 기능 수행
df.sample(frac = 2) # 200%의 데이터 랜덤하게 return

Reference

https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.sample.html