【Python66】Twitter と　Tweepy モジュールを使ってTweetを取得したり検索したり【Google Colab】

Python66シリーズ

６６個のモジュールの説明でもしようかと(笑)　既に結構やってるけど。。

Pythonの記事は、こちらから。。

Python

数えてみると。。１１個なので、これで１２個目ですね(笑)

ということで。。今回は、Twitter APIを使うモジュールです。

サンプルファイル

いきなりですが、Google Colabでサンプルファイル作ってあります。

Githubはこちら

https://github.com/tom2rd/Googlecolabutils/blob/master/Twitter_API_sample.ipynb

さぁ、どうぞ！　って言いたいところですが、API　KEYの取得で躓くかと(笑)

API KEYの取得

TwitterのAPI KEYが昔に比べて取りにくくなっているというか取るのがめんどくさくなっています。Twitter のDeveloper登録をしなくてはいけません。。

https://developer.twitter.com/

ここから、登録していきます。

取得方法は、結構かわるので。。いまのところの最新は、こちらがいいかと思います。

https://qiita.com/kngsym2018/items/2524d21455aac111cdee

うまく取得できるまでに、１日待たされるかと(笑)

代表的なモジュールは２つ

Twitter module

インストールは、

!pip install twitter

Githubに最新の情報がのっていますが、あまりキチンと乗ってなさそうです。

https://github.com/sixohsix/twitter/tree/master

使い方は、このTwitter API　リファレンスと一緒にみることが必要です。

https://developer.twitter.com/en/docs/api-reference-index

でも、このTwitterモジュールだけで良さそうです。

Tweepy module

こちらは、日本語での解説があるのが、少し心強いです。

https://kurozumi.github.io/tweepy/index.html

インストールは

!pip install tweepy

こちらの方が、返ってきた値の取り扱いが少しわかりやすいかもしれませんが。。どっちもどっちのようです。

Twitter　モジュール少し説明

自分が使う時にまずは、取得したデータをどう扱うのか？？で悩んだので、少し説明します。

抽出

タイムラインのツイートを取得したり、ある特定ユーザのツイートを取得するとき

results = twitter.statuses.user_timeline(screen_name="tom2rd")

のようにしてデータを取得するのですが、文字の羅列になります(笑)

ひとつひとつのツイートは、results[0] とか　配列で指定できます。

これが。。ひとつのツイートです。

{'contributors': None,
'coordinates': None,
'created_at': 'Sun Nov 17 07:34:29 +0000 2019',
'entities': {'hashtags': [],
　'symbols': [],
　'urls': [{'display_url': 'instagram.com/p/B49VVoGA_iZ/…',
　　'expanded_url': 'https://www.instagram.com/p/B49VVoGA_iZ/?igshid=kdlptrj3zrmg',
　　'indices': [25, 48],
　　'url': 'https://t.co/F1JeAQKwt2'}],
　'user_mentions': []},
'favorite_count': 0,
'favorited': False,
'geo': None,
'id': 1195968436586696704,
'id_str': '1195968436586696704',
'in_reply_to_screen_name': None,
'in_reply_to_status_id': None,
'in_reply_to_status_id_str': None,
'in_reply_to_user_id': None,
'in_reply_to_user_id_str': None,
'is_quote_status': False,
'lang': 'ja',
'place': None,
'possibly_sensitive': False,
'retweet_count': 0,
'retweeted': False,
'source': '<a href="http://instagram.com" rel="nofollow">Instagram</a>',
'text': '超高加水パン と ベビーリーフ と\nタンスモーク https://t.co/F1JeAQKwt2',
'truncated': False,
'user': {'can_media_tag': True,
　'contributors_enabled': False,
　'created_at': 'Sun Oct 18 07:10:04 +0000 2009',
　'default_profile': False,
　'default_profile_image': False,
　'description': 'https://t.co/Jaqu2snTJk なやつです。あくまでも個人の活動です。 趣味はラジコンとギター（ベース）と・・・ 料理も好き・・・キャンプも・・・なんでも絡んでください。',
　'entities': {'description': {'urls': [{'display_url': 'tom2rd.sakura.ne.jp',
　　'expanded_url': 'http://tom2rd.sakura.ne.jp',
　　'indices': [0, 23],
　　'url': 'https://t.co/Jaqu2snTJk'}]},
　　　'url': {'urls': [{'display_url': 'tom2rd.sakura.ne.jp/wp/',
　　　　'expanded_url': 'https://tom2rd.sakura.ne.jp/wp/',
　　　　'indices': [0, 23],
　　　　'url': 'https://t.co/EGsotJJxzn'}]}},
    'favourites_count': 6535,
    'follow_request_sent': False,
    'followed_by': False,
    'followers_count': 894,
    'following': False,
    'friends_count': 1765,
    'geo_enabled': True,
    'has_extended_profile': True,
    'id': 83316254,
    'id_str': '83316254',
    'is_translation_enabled': False,
    'is_translator': False,
    'lang': None,
    'listed_count': 38,
    'location': 'Tokyo',
    'name': 'Tetsuya Tominaga',
    'notifications': False,
    'profile_background_color': '9AE4E8',
    'profile_background_image_url': 'http://abs.twimg.com/images/themes/theme16/bg.gif',
    'profile_background_image_url_https': 'https://abs.twimg.com/images/themes/theme16/bg.gif',
    'profile_background_tile': False,
    'profile_banner_url': 'https://pbs.twimg.com/profile_banners/83316254/1398508047',
    'profile_image_url': 'http://pbs.twimg.com/profile_images/1116497313/11427014_3721527804_normal.jpg',
    'profile_image_url_https': 'https://pbs.twimg.com/profile_images/1116497313/11427014_3721527804_normal.jpg',
    'profile_link_color': '0084B4',
    'profile_sidebar_border_color': 'BDDCAD',
    'profile_sidebar_fill_color': 'DDFFCC',
    'profile_text_color': '333333',
    'profile_use_background_image': True,
    'protected': False,
    'screen_name': 'tom2rd',
    'statuses_count': 11590,
    'time_zone': None,
    'translator_type': 'none',
    'url': 'https://t.co/EGsotJJxzn',
    'utc_offset': None,
    'verified': False}}

これ、｛｝で括られたところで、配列ができているような感じで、例えば、

print(results[0]['user']['name'])
print(results[0]['text'])
print(results[0]['lang'])

とすると、

Tetsuya Tominaga 
超高加水パン と ベビーリーフ と タンスモーク https://t.co/F1JeAQKwt2 
ja

と抽出することができます。

results[0][‘user’][‘screen_name’] だと　tom2rdが返ってきます。

｛｝の数を数えてというか、クラス構造を考えて抽出するのがいいですね。pandasのFrameworkがあるのかもしれないですが。。

検索

TwitterモジュールでもTweepyでも、localsとかLangを指定すると、Geocodeでの検索が引っ掛からないです。

twitter.search.tweets(q='tom2rd', geocode='35.7271559,139.4387704,10km')

api.search(q='tom2rd', geocode='35.7271559,139.4387704,10km')

ほぼ同じなんですが。。

例えば、小平市の中心から、10㎞で火事に関するツイートを収集したい時は、以下のようになります。これを時々まわして、回答があれば、誰かが火事のツイートをしたってわかるプログラムが書けたりします。

api.search(q='火事',count='10',geocode='35.7271559,139.4387704,10km')

あとは、みなさん使ってみてください。