Commit 52ee7522 authored by Himanshu Dabas's avatar Himanshu Dabas Committed by GitHub

fix(profile): ported user profile to v2 API endpoint (#955)

* fix for deprecation of v1.1 endpoints

* fix for cashtags

* typo

* fix(datetime): _formatDateTime tries %d-%m-%y

* fix(pandas): use new str-format Tweet.datetime data rep

* fix(pandas datetime): use ms

* fix(cashtags unwind): undo PRs field removals

* Revert "fix(cashtags unwind): undo PRs field removals"

This reverts commit dfa57c20186a969aa2bf010fbe198f5e0bbbbd01.

* fix(pandas): remove broken fields

* fix(cash): use provided field as suggested by pr review

* fix (cashtags): re enable cashtags in output

* fix(db): remove broken fields

* fix(datetime): Y-m-d and factored out

* fixes #947

* fix(get.py): json exception in User

* to-do: added to-do tasks

added to-do tasks for --profile-full feature

* chore(test): PEP8 formatting

* fix(profile): ported user profile to v2 API

fixed user profile feature which was broken since v1 endpoints were deprecated

* updated Readme

* fix: fixes #965 inconsistent timezones

* fix: handle tombstone tweets

tombstone tweets are those tweets which are flagged by Twitter for being inappropriate, misleading, graphic etc.

* fixes #976: saving tweets to csv

This patch fixes the issue caused by #967, which broke the functionality of saving the retrieved data into a csv file.

* feature: port Lookup to v2 endpoint

fixes #970, lookup is ported to v2 endpoint. this can now be used to lookup a certain profile.
Co-authored-by: default avatarSiegfriedWagner <mateus.chojnowski@gmail.com>
Co-authored-by: default avatarlmeyerov <leo@graphistry.com>
parent ae5e7e11
...@@ -111,3 +111,5 @@ ENV/ ...@@ -111,3 +111,5 @@ ENV/
*.csv *.csv
*.json *.json
*.txt *.txt
test_twint.py
...@@ -28,6 +28,7 @@ Twitter limits scrolls while browsing the user timeline. This means that with `. ...@@ -28,6 +28,7 @@ Twitter limits scrolls while browsing the user timeline. This means that with `.
- aiodns; - aiodns;
- beautifulsoup4; - beautifulsoup4;
- cchardet; - cchardet;
- dataclasses
- elasticsearch; - elasticsearch;
- pysocks; - pysocks;
- pandas (>=0.23.0); - pandas (>=0.23.0);
...@@ -65,7 +66,7 @@ pipenv install git+https://github.com/twintproject/twint.git#egg=twint ...@@ -65,7 +66,7 @@ pipenv install git+https://github.com/twintproject/twint.git#egg=twint
## CLI Basic Examples and Combos ## CLI Basic Examples and Combos
A few simple examples to help you understand the basics: A few simple examples to help you understand the basics:
- `twint -u username` - Scrape all the Tweets from *user*'s timeline. - `twint -u username` - Scrape all the Tweets of a *user* (doesn't include **retweets** but includes **replies**).
- `twint -u username -s pineapple` - Scrape all Tweets from the *user*'s timeline containing _pineapple_. - `twint -u username -s pineapple` - Scrape all Tweets from the *user*'s timeline containing _pineapple_.
- `twint -s pineapple` - Collect every Tweet containing *pineapple* from everyone's Tweets. - `twint -s pineapple` - Collect every Tweet containing *pineapple* from everyone's Tweets.
- `twint -u username --year 2014` - Collect Tweets that were tweeted **before** 2014. - `twint -u username --year 2014` - Collect Tweets that were tweeted **before** 2014.
...@@ -83,7 +84,7 @@ A few simple examples to help you understand the basics: ...@@ -83,7 +84,7 @@ A few simple examples to help you understand the basics:
- `twint -u username --following` - Scrape who a Twitter user follows. - `twint -u username --following` - Scrape who a Twitter user follows.
- `twint -u username --favorites` - Collect all the Tweets a user has favorited (gathers ~3200 tweet). - `twint -u username --favorites` - Collect all the Tweets a user has favorited (gathers ~3200 tweet).
- `twint -u username --following --user-full` - Collect full user information a person follows - `twint -u username --following --user-full` - Collect full user information a person follows
- `twint -u username --profile-full` - Use a slow, but effective method to gather Tweets from a user's profile (Gathers ~3200 Tweets, Including Retweets). - `twint -u username --timeline` - Use an effective method to gather Tweets from a user's profile (Gathers ~3200 Tweets, including **retweets** & **replies**).
- `twint -u username --retweets` - Use a quick method to gather the last 900 Tweets (that includes retweets) from a user's profile. - `twint -u username --retweets` - Use a quick method to gather the last 900 Tweets (that includes retweets) from a user's profile.
- `twint -u username --resume resume_file.txt` - Resume a search starting from the last saved scroll-id. - `twint -u username --resume resume_file.txt` - Resume a search starting from the last saved scroll-id.
......
...@@ -2,7 +2,6 @@ ...@@ -2,7 +2,6 @@
from setuptools import setup from setuptools import setup
import io import io
import os import os
import sys
# Package meta-data # Package meta-data
NAME = 'twint' NAME = 'twint'
...@@ -15,52 +14,52 @@ VERSION = None ...@@ -15,52 +14,52 @@ VERSION = None
# Packages required # Packages required
REQUIRED = [ REQUIRED = [
'aiohttp', 'aiodns', 'beautifulsoup4', 'cchardet', 'aiohttp', 'aiodns', 'beautifulsoup4', 'cchardet', 'dataclasses',
'elasticsearch', 'pysocks', 'pandas', 'aiohttp_socks', 'elasticsearch', 'pysocks', 'pandas', 'aiohttp_socks',
'schedule', 'geopy', 'fake-useragent', 'googletransx' 'schedule', 'geopy', 'fake-useragent', 'googletransx'
] ]
here = os.path.abspath(os.path.dirname(__file__)) here = os.path.abspath(os.path.dirname(__file__))
with io.open(os.path.join(here, 'README.md'), encoding='utf-8') as f: with io.open(os.path.join(here, 'README.md'), encoding='utf-8') as f:
long_description = '\n' + f.read() long_description = '\n' + f.read()
# Load the package's __version__.py # Load the package's __version__.py
about = {} about = {}
if not VERSION: if not VERSION:
with open(os.path.join(here, NAME, '__version__.py')) as f: with open(os.path.join(here, NAME, '__version__.py')) as f:
exec(f.read(), about) exec(f.read(), about)
else: else:
about['__version__'] = VERSION about['__version__'] = VERSION
setup( setup(
name=NAME, name=NAME,
version=about['__version__'], version=about['__version__'],
description=DESCRIPTION, description=DESCRIPTION,
long_description=long_description, long_description=long_description,
long_description_content_type="text/markdown", long_description_content_type="text/markdown",
author=AUTHOR, author=AUTHOR,
author_email=EMAIL, author_email=EMAIL,
python_requires=REQUIRES_PYTHON, python_requires=REQUIRES_PYTHON,
url=URL, url=URL,
packages=['twint', 'twint.storage'], packages=['twint', 'twint.storage'],
entry_points={ entry_points={
'console_scripts':[ 'console_scripts': [
'twint = twint.cli:run_as_command', 'twint = twint.cli:run_as_command',
], ],
}, },
install_requires=REQUIRED, install_requires=REQUIRED,
dependency_links=[ dependency_links=[
'git+https://github.com/x0rzkov/py-googletrans#egg=googletrans' 'git+https://github.com/x0rzkov/py-googletrans#egg=googletrans'
], ],
license='MIT', license='MIT',
classifiers=[ classifiers=[
'License :: OSI Approved :: MIT License', 'License :: OSI Approved :: MIT License',
'Programming Language :: Python', 'Programming Language :: Python',
'Programming Language :: Python :: 3', 'Programming Language :: Python :: 3',
'Programming Language :: Python :: 3.6', 'Programming Language :: Python :: 3.6',
'Programming Language :: Python :: 3.7', 'Programming Language :: Python :: 3.7',
'Programming Language :: Python :: 3.8', 'Programming Language :: Python :: 3.8',
'Programming Language :: Python :: Implementation :: CPython', 'Programming Language :: Python :: Implementation :: CPython',
], ],
) )
...@@ -5,21 +5,25 @@ import os ...@@ -5,21 +5,25 @@ import os
Test.py - Testing TWINT to make sure everything works. Test.py - Testing TWINT to make sure everything works.
''' '''
def test_reg(c, run): def test_reg(c, run):
print("[+] Beginning vanilla test in {}".format(str(run))) print("[+] Beginning vanilla test in {}".format(str(run)))
run(c) run(c)
def test_db(c, run): def test_db(c, run):
print("[+] Beginning DB test in {}".format(str(run))) print("[+] Beginning DB test in {}".format(str(run)))
c.Database = "test_twint.db" c.Database = "test_twint.db"
run(c) run(c)
def custom(c, run, _type): def custom(c, run, _type):
print("[+] Beginning custom {} test in {}".format(_type, str(run))) print("[+] Beginning custom {} test in {}".format(_type, str(run)))
c.Custom['tweet'] = ["id", "username"] c.Custom['tweet'] = ["id", "username"]
c.Custom['user'] = ["id", "username"] c.Custom['user'] = ["id", "username"]
run(c) run(c)
def test_json(c, run): def test_json(c, run):
c.Store_json = True c.Store_json = True
c.Output = "test_twint.json" c.Output = "test_twint.json"
...@@ -27,6 +31,7 @@ def test_json(c, run): ...@@ -27,6 +31,7 @@ def test_json(c, run):
print("[+] Beginning JSON test in {}".format(str(run))) print("[+] Beginning JSON test in {}".format(str(run)))
run(c) run(c)
def test_csv(c, run): def test_csv(c, run):
c.Store_csv = True c.Store_csv = True
c.Output = "test_twint.csv" c.Output = "test_twint.csv"
...@@ -34,52 +39,54 @@ def test_csv(c, run): ...@@ -34,52 +39,54 @@ def test_csv(c, run):
print("[+] Beginning CSV test in {}".format(str(run))) print("[+] Beginning CSV test in {}".format(str(run)))
run(c) run(c)
def main(): def main():
c = twint.Config() c = twint.Config()
c.Username = "verified" c.Username = "verified"
c.Limit = 20 c.Limit = 20
c.Store_object = True c.Store_object = True
# Seperate objects are neccessary. # Separate objects are necessary.
f = twint.Config() f = twint.Config()
f.Username = "verified" f.Username = "verified"
f.Limit = 20 f.Limit = 20
f.Store_object = True f.Store_object = True
f.User_full = True f.User_full = True
runs = [twint.run.Following, runs = [
twint.run.Followers, twint.run.Profile, # this doesn't
twint.run.Search, twint.run.Search, # this works
twint.run.Profile, twint.run.Following,
twint.run.Favorites twint.run.Followers,
] twint.run.Favorites,
]
tests = [test_reg, test_json, test_csv, test_db] tests = [test_reg, test_json, test_csv, test_db]
# Something breaks if we don't split these up
for run in runs[:2]: # Something breaks if we don't split these up
for test in tests:
test(f, run)
for run in runs[2:]: for run in runs[:3]:
if run == twint.run.Search: if run == twint.run.Search:
c.Since = "2012-1-1 20:30:22" c.Since = "2012-1-1 20:30:22"
c.Until = "2017-1-1" c.Until = "2017-1-1"
else: else:
c.Since = "" c.Since = ""
c.Until = "" c.Until = ""
for test in tests: for test in tests:
test(c, run) test(c, run)
for run in runs[3:]:
for test in tests:
test(f, run)
files = ["test_twint.db", "test_twint.json", "test_twint.csv"] files = ["test_twint.db", "test_twint.json", "test_twint.csv"]
for _file in files: for _file in files:
os.remove(_file) os.remove(_file)
print("[+] Testing complete!") print("[+] Testing complete!")
if __name__ == '__main__': if __name__ == '__main__':
main() main()
...@@ -16,12 +16,14 @@ from . import run ...@@ -16,12 +16,14 @@ from . import run
from . import config from . import config
from . import storage from . import storage
def error(_error, message): def error(_error, message):
""" Print errors to stdout """ Print errors to stdout
""" """
print("[-] {}: {}".format(_error, message)) print("[-] {}: {}".format(_error, message))
sys.exit(0) sys.exit(0)
def check(args): def check(args):
""" Error checking """ Error checking
""" """
...@@ -34,7 +36,12 @@ def check(args): ...@@ -34,7 +36,12 @@ def check(args):
"--userid and -u cannot be used together.") "--userid and -u cannot be used together.")
if args.all: if args.all:
error("Contradicting Args", error("Contradicting Args",
"--all and -u cannot be used together") "--all and -u cannot be used together.")
elif args.search and args.timeline:
error("Contradicting Args",
"--s and --tl cannot be used together.")
elif args.timeline and not args.username:
error("Error", "-tl cannot be used without -u.")
elif args.search is None: elif args.search is None:
if args.custom_query is not None: if args.custom_query is not None:
pass pass
...@@ -53,6 +60,7 @@ def check(args): ...@@ -53,6 +60,7 @@ def check(args):
if args.min_wait_time < 0: if args.min_wait_time < 0:
error("Error", "Please specifiy a non negative value for min_wait_time") error("Error", "Please specifiy a non negative value for min_wait_time")
def loadUserList(ul, _type): def loadUserList(ul, _type):
""" Concatenate users """ Concatenate users
""" """
...@@ -67,6 +75,7 @@ def loadUserList(ul, _type): ...@@ -67,6 +75,7 @@ def loadUserList(ul, _type):
return un[15:] return un[15:]
return userlist return userlist
def initialize(args): def initialize(args):
""" Set default values for config from args """ Set default values for config from args
""" """
...@@ -100,7 +109,7 @@ def initialize(args): ...@@ -100,7 +109,7 @@ def initialize(args):
c.Essid = args.essid c.Essid = args.essid
c.Format = args.format c.Format = args.format
c.User_full = args.user_full c.User_full = args.user_full
c.Profile_full = args.profile_full # c.Profile_full = args.profile_full
c.Pandas_type = args.pandas_type c.Pandas_type = args.pandas_type
c.Index_tweets = args.index_tweets c.Index_tweets = args.index_tweets
c.Index_follow = args.index_follow c.Index_follow = args.index_follow
...@@ -119,7 +128,7 @@ def initialize(args): ...@@ -119,7 +128,7 @@ def initialize(args):
c.Tor_control_password = args.tor_control_password c.Tor_control_password = args.tor_control_password
c.Retweets = args.retweets c.Retweets = args.retweets
c.Custom_query = args.custom_query c.Custom_query = args.custom_query
c.Popular_tweets = args.popular_tweets c.Popular_tweets = args.popular_tweets
c.Skip_certs = args.skip_certs c.Skip_certs = args.skip_certs
c.Hide_output = args.hide_output c.Hide_output = args.hide_output
c.Native_retweets = args.native_retweets c.Native_retweets = args.native_retweets
...@@ -136,6 +145,7 @@ def initialize(args): ...@@ -136,6 +145,7 @@ def initialize(args):
c.Min_wait_time = args.min_wait_time c.Min_wait_time = args.min_wait_time
return c return c
def options(): def options():
""" Parse arguments """ Parse arguments
""" """
...@@ -180,7 +190,9 @@ def options(): ...@@ -180,7 +190,9 @@ def options():
ap.add_argument("--proxy-host", help="Proxy hostname or IP.") ap.add_argument("--proxy-host", help="Proxy hostname or IP.")
ap.add_argument("--proxy-port", help="The port of the proxy server.") ap.add_argument("--proxy-port", help="The port of the proxy server.")
ap.add_argument("--tor-control-port", help="If proxy-host is set to tor, this is the control port", default=9051) ap.add_argument("--tor-control-port", help="If proxy-host is set to tor, this is the control port", default=9051)
ap.add_argument("--tor-control-password", help="If proxy-host is set to tor, this is the password for the control port", default="my_password") ap.add_argument("--tor-control-password",
help="If proxy-host is set to tor, this is the password for the control port",
default="my_password")
ap.add_argument("--essid", ap.add_argument("--essid",
help="Elasticsearch Session ID, use this to differentiate scraping sessions.", help="Elasticsearch Session ID, use this to differentiate scraping sessions.",
nargs="?", default="") nargs="?", default="")
...@@ -192,9 +204,16 @@ def options(): ...@@ -192,9 +204,16 @@ def options():
ap.add_argument("--user-full", ap.add_argument("--user-full",
help="Collect all user information (Use with followers or following only).", help="Collect all user information (Use with followers or following only).",
action="store_true") action="store_true")
ap.add_argument("--profile-full", # I am removing this this feature for the time being, because it is no longer required, default method will do this
help="Slow, but effective method of collecting a user's Tweets and RT.", # ap.add_argument("--profile-full",
action="store_true") # help="Slow, but effective method of collecting a user's Tweets and RT.",
# action="store_true")
ap.add_argument(
"-tl",
"--timeline",
help="Collects every tweet from a User's Timeline. (Tweets, RTs & Replies)",
action="store_true",
)
ap.add_argument("--translate", ap.add_argument("--translate",
help="Get tweets translated by Google Translate.", help="Get tweets translated by Google Translate.",
action="store_true") action="store_true")
...@@ -221,24 +240,28 @@ def options(): ...@@ -221,24 +240,28 @@ def options():
ap.add_argument("-pc", "--pandas-clean", ap.add_argument("-pc", "--pandas-clean",
help="Automatically clean Pandas dataframe at every scrape.") help="Automatically clean Pandas dataframe at every scrape.")
ap.add_argument("-cq", "--custom-query", help="Custom search query.") ap.add_argument("-cq", "--custom-query", help="Custom search query.")
ap.add_argument("-pt", "--popular-tweets", help="Scrape popular tweets instead of recent ones.", action="store_true") ap.add_argument("-pt", "--popular-tweets", help="Scrape popular tweets instead of recent ones.",
action="store_true")
ap.add_argument("-sc", "--skip-certs", help="Skip certs verification, useful for SSC.", action="store_false") ap.add_argument("-sc", "--skip-certs", help="Skip certs verification, useful for SSC.", action="store_false")
ap.add_argument("-ho", "--hide-output", help="Hide output, no tweets will be displayed.", action="store_true") ap.add_argument("-ho", "--hide-output", help="Hide output, no tweets will be displayed.", action="store_true")
ap.add_argument("-nr", "--native-retweets", help="Filter the results for retweets only.", action="store_true") ap.add_argument("-nr", "--native-retweets", help="Filter the results for retweets only.", action="store_true")
ap.add_argument("--min-likes", help="Filter the tweets by minimum number of likes.") ap.add_argument("--min-likes", help="Filter the tweets by minimum number of likes.")
ap.add_argument("--min-retweets", help="Filter the tweets by minimum number of retweets.") ap.add_argument("--min-retweets", help="Filter the tweets by minimum number of retweets.")
ap.add_argument("--min-replies", help="Filter the tweets by minimum number of replies.") ap.add_argument("--min-replies", help="Filter the tweets by minimum number of replies.")
ap.add_argument("--links", help="Include or exclude tweets containing one o more links. If not specified"+ ap.add_argument("--links", help="Include or exclude tweets containing one o more links. If not specified" +
" you will get both tweets that might contain links or not.") " you will get both tweets that might contain links or not.")
ap.add_argument("--source", help="Filter the tweets for specific source client.") ap.add_argument("--source", help="Filter the tweets for specific source client.")
ap.add_argument("--members-list", help="Filter the tweets sent by users in a given list.") ap.add_argument("--members-list", help="Filter the tweets sent by users in a given list.")
ap.add_argument("-fr", "--filter-retweets", help="Exclude retweets from the results.", action="store_true") ap.add_argument("-fr", "--filter-retweets", help="Exclude retweets from the results.", action="store_true")
ap.add_argument("--backoff-exponent", help="Specify a exponent for the polynomial backoff in case of errors.", type=float, default=3.0) ap.add_argument("--backoff-exponent", help="Specify a exponent for the polynomial backoff in case of errors.",
ap.add_argument("--min-wait-time", type=float, default=15, help="specifiy a minimum wait time in case of scraping limit error. This value will be adjusted by twint if the value provided does not satisfy the limits constraints") type=float, default=3.0)
ap.add_argument("--min-wait-time", type=float, default=15,
help="specifiy a minimum wait time in case of scraping limit error. This value will be adjusted by twint if the value provided does not satisfy the limits constraints")
args = ap.parse_args() args = ap.parse_args()
return args return args
def main(): def main():
""" Main """ Main
""" """
...@@ -283,7 +306,7 @@ def main(): ...@@ -283,7 +306,7 @@ def main():
run.Followers(c) run.Followers(c)
else: else:
run.Followers(c) run.Followers(c)
elif args.retweets or args.profile_full: elif args.retweets: # or args.profile_full:
if args.userlist: if args.userlist:
_userlist = loadUserList(args.userlist, "profile") _userlist = loadUserList(args.userlist, "profile")
for _user in _userlist: for _user in _userlist:
...@@ -301,9 +324,12 @@ def main(): ...@@ -301,9 +324,12 @@ def main():
run.Lookup(c) run.Lookup(c)
else: else:
run.Lookup(c) run.Lookup(c)
elif args.timeline:
run.Profile(c)
else: else:
run.Search(c) run.Search(c)
def run_as_command(): def run_as_command():
version = ".".join(str(v) for v in sys.version_info[:2]) version = ".".join(str(v) for v in sys.version_info[:2])
if float(version) < 3.6: if float(version) < 3.6:
...@@ -312,5 +338,6 @@ def run_as_command(): ...@@ -312,5 +338,6 @@ def run_as_command():
main() main()
if __name__ == '__main__': if __name__ == '__main__':
main() main()
...@@ -6,6 +6,7 @@ class Config: ...@@ -6,6 +6,7 @@ class Config:
Username: Optional[str] = None Username: Optional[str] = None
User_id: Optional[str] = None User_id: Optional[str] = None
Search: Optional[str] = None Search: Optional[str] = None
Lookup: bool = False
Geo: str = "" Geo: str = ""
Location: bool = False Location: bool = False
Near: str = None Near: str = None
...@@ -38,7 +39,7 @@ class Config: ...@@ -38,7 +39,7 @@ class Config:
Favorites: bool = False Favorites: bool = False
TwitterSearch: bool = False TwitterSearch: bool = False
User_full: bool = False User_full: bool = False
Profile_full: bool = False # Profile_full: bool = False
Store_object: bool = False Store_object: bool = False
Store_object_tweets_list: list = None Store_object_tweets_list: list = None
Store_object_users_list: list = None Store_object_users_list: list = None
...@@ -83,3 +84,4 @@ class Config: ...@@ -83,3 +84,4 @@ class Config:
Min_wait_time: int = 0 Min_wait_time: int = 0
Bearer_token: str = None Bearer_token: str = None
Guest_token: str = None Guest_token: str = None
deleted: list = None
...@@ -2,10 +2,12 @@ import datetime ...@@ -2,10 +2,12 @@ import datetime
import logging as logme import logging as logme
from .tweet import utc_to_local
class Datelock: class Datelock:
_until = None until = None
_since = None since = None
_since_def_user = None _since_def_user = None
...@@ -25,15 +27,18 @@ def Set(Until, Since): ...@@ -25,15 +27,18 @@ def Set(Until, Since):
d = Datelock() d = Datelock()
if Until: if Until:
d._until = datetime.datetime.strptime(convertToDateTime(Until), "%Y-%m-%d %H:%M:%S") d.until = datetime.datetime.strptime(convertToDateTime(Until), "%Y-%m-%d %H:%M:%S")
d.until = utc_to_local(d.until)
else: else:
d._until = datetime.datetime.today() d.until = datetime.datetime.today()
if Since: if Since:
d._since = datetime.datetime.strptime(convertToDateTime(Since), "%Y-%m-%d %H:%M:%S") d.since = datetime.datetime.strptime(convertToDateTime(Since), "%Y-%m-%d %H:%M:%S")
d.since = utc_to_local(d.since)
d._since_def_user = True d._since_def_user = True
else: else:
d._since = datetime.datetime.strptime("2006-03-21 00:00:00", "%Y-%m-%d %H:%M:%S") d.since = datetime.datetime.strptime("2006-03-21 00:00:00", "%Y-%m-%d %H:%M:%S")
d.since = utc_to_local(d.since)
d._since_def_user = False d._since_def_user = False
return d return d
import time
from datetime import datetime
from bs4 import BeautifulSoup from bs4 import BeautifulSoup
from re import findall from re import findall
from json import loads from json import loads
import logging as logme import logging as logme
from .tweet import utc_to_local, Tweet_formats
class NoMoreTweetsException(Exception): class NoMoreTweetsException(Exception):
def __init__(self, msg): def __init__(self, msg):
...@@ -23,6 +28,7 @@ def Follow(response): ...@@ -23,6 +28,7 @@ def Follow(response):
return follow, cursor return follow, cursor
# TODO: this won't be used by --profile-full anymore. if it isn't used anywhere else, perhaps remove this in future
def Mobile(response): def Mobile(response):
logme.debug(__name__ + ':Mobile') logme.debug(__name__ + ':Mobile')
soup = BeautifulSoup(response, "html.parser") soup = BeautifulSoup(response, "html.parser")
...@@ -48,14 +54,15 @@ def MobileFav(response): ...@@ -48,14 +54,15 @@ def MobileFav(response):
return tweets, max_id return tweets, max_id
def profile(response): def _get_cursor(response):
logme.debug(__name__ + ':profile') try:
json_response = loads(response) next_cursor = response['timeline']['instructions'][0]['addEntries']['entries'][-1]['content'][
html = json_response["items_html"] 'operation']['cursor']['value']
soup = BeautifulSoup(html, "html.parser") except KeyError:
feed = soup.find_all("div", "tweet") # this is needed because after the first request location of cursor is changed
next_cursor = response['timeline']['instructions'][-1]['replaceEntry']['entry']['content']['operation'][
return feed, feed[-1]["data-item-id"] 'cursor']['value']
return next_cursor
def Json(response): def Json(response):
...@@ -67,44 +74,49 @@ def Json(response): ...@@ -67,44 +74,49 @@ def Json(response):
return feed, json_response["min_position"] return feed, json_response["min_position"]
def search_v2(response): def parse_tweets(config, response):
# TODO need to implement this logme.debug(__name__ + ':parse_tweets')
response = loads(response) response = loads(response)
if len(response['globalObjects']['tweets']) == 0: if len(response['globalObjects']['tweets']) == 0:
msg = 'No more data. finished scraping!!' msg = 'No more data!'
raise NoMoreTweetsException(msg) raise NoMoreTweetsException(msg)
# need to modify things at the function call end
# timeline = response['timeline']['instructions'][0]['addEntries']['entries']
feed = [] feed = []
feed_set = set()
# here we need to remove the quoted and `to-reply` tweets from the list as they may or may not contain the
# for _id in response['globalObjects']['tweets']:
# if 'quoted_status_id_str' in response['globalObjects']['tweets'][_id] or \
# response['globalObjects']['tweets'][_id]['in_reply_to_status_id_str']:
# try:
# feed_set.add(response['globalObjects']['tweets'][_id]['quoted_status_id_str'])
# except KeyError:
# feed_set.add(response['globalObjects']['tweets'][_id]['in_reply_to_status_id_str'])
# i = 1
# for _id in response['globalObjects']['tweets']:
# if _id not in feed_set:
# temp_obj = response['globalObjects']['tweets'][_id]
# temp_obj['user_data'] = response['globalObjects']['users'][temp_obj['user_id_str']]
# feed.append(temp_obj)
for timeline_entry in response['timeline']['instructions'][0]['addEntries']['entries']: for timeline_entry in response['timeline']['instructions'][0]['addEntries']['entries']:
# this will handle the cases when the timeline entry is a tweet # this will handle the cases when the timeline entry is a tweet
if timeline_entry['entryId'].find('sq-I-t-') == 0: if (config.TwitterSearch or config.Profile) and (timeline_entry['entryId'].startswith('sq-I-t-') or
_id = timeline_entry['content']['item']['content']['tweet']['id'] timeline_entry['entryId'].startswith('tweet-')):
temp_obj = response['globalObjects']['tweets'][_id] if 'tweet' in timeline_entry['content']['item']['content']:
_id = timeline_entry['content']['item']['content']['tweet']['id']
# skip the ads
if 'promotedMetadata' in timeline_entry['content']['item']['content']['tweet']:
continue
elif 'tombstone' in timeline_entry['content']['item']['content'] and 'tweet' in \
timeline_entry['content']['item']['content']['tombstone']:
_id = timeline_entry['content']['item']['content']['tombstone']['tweet']['id']
else:
_id = None
if _id is None:
raise ValueError('Unable to find ID of tweet in timeline.')
try:
temp_obj = response['globalObjects']['tweets'][_id]
except KeyError:
logme.info('encountered a deleted tweet with id {}'.format(_id))
config.deleted.append(_id)
continue
temp_obj['user_data'] = response['globalObjects']['users'][temp_obj['user_id_str']] temp_obj['user_data'] = response['globalObjects']['users'][temp_obj['user_id_str']]
if 'retweeted_status_id_str' in temp_obj:
rt_id = temp_obj['retweeted_status_id_str']
_dt = response['globalObjects']['tweets'][rt_id]['created_at']
_dt = datetime.strptime(_dt, '%a %b %d %H:%M:%S %z %Y')
_dt = utc_to_local(_dt)
_dt = str(_dt.strftime(Tweet_formats['datetime']))
temp_obj['retweet_data'] = {
'user_rt_id': response['globalObjects']['tweets'][rt_id]['user_id_str'],
'user_rt': response['globalObjects']['tweets'][rt_id]['full_text'],
'retweet_id': rt_id,
'retweet_date': _dt,
}
feed.append(temp_obj) feed.append(temp_obj)
next_cursor = _get_cursor(response)
try:
next_cursor = response['timeline']['instructions'][0]['addEntries']['entries'][-1]['content'][
'operation']['cursor']['value']
except KeyError:
# this is needed because after the first request location of cursor is changed
next_cursor = response['timeline']['instructions'][-1]['replaceEntry']['entry']['content']['operation'][
'cursor']['value']
return feed, next_cursor return feed, next_cursor
...@@ -105,27 +105,21 @@ def get_connector(config): ...@@ -105,27 +105,21 @@ def get_connector(config):
return _connector return _connector
async def RequestUrl(config, init, headers=[]): async def RequestUrl(config, init):
logme.debug(__name__ + ':RequestUrl') logme.debug(__name__ + ':RequestUrl')
_connector = get_connector(config) _connector = get_connector(config)
_serialQuery = "" _serialQuery = ""
params = [] params = []
_url = "" _url = ""
_headers = {} _headers = [("authorization", config.Bearer_token), ("x-guest-token", config.Guest_token)]
# TODO : do this later # TODO : do this later
if config.Profile: if config.Profile:
if config.Profile_full: logme.debug(__name__ + ':RequestUrl:Profile')
logme.debug(__name__ + ':RequestUrl:Profile_full') _url, params, _serialQuery = url.SearchProfile(config, init)
_url = await url.MobileProfile(config.Username, init)
else:
logme.debug(__name__ + ':RequestUrl:notProfile_full')
_url = await url.Profile(config.Username, init)
_serialQuery = _url
elif config.TwitterSearch: elif config.TwitterSearch:
logme.debug(__name__ + ':RequestUrl:TwitterSearch') logme.debug(__name__ + ':RequestUrl:TwitterSearch')
_url, params, _serialQuery = await url.Search(config, init) _url, params, _serialQuery = await url.Search(config, init)
_headers = [("authorization", config.Bearer_token), ("x-guest-token", config.Guest_token)]
else: else:
if config.Following: if config.Following:
logme.debug(__name__ + ':RequestUrl:Following') logme.debug(__name__ + ':RequestUrl:Following')
...@@ -212,21 +206,25 @@ async def Tweet(url, config, conn): ...@@ -212,21 +206,25 @@ async def Tweet(url, config, conn):
logme.critical(__name__ + ':Tweet:' + str(e)) logme.critical(__name__ + ':Tweet:' + str(e))
async def User(username, config, conn, bearer_token, guest_token, user_id=False): async def User(username, config, conn, user_id=False):
logme.debug(__name__ + ':User') logme.debug(__name__ + ':User')
_dct = {'screen_name': username, 'withHighlightedLabel': False} _dct = {'screen_name': username, 'withHighlightedLabel': False}
_url = 'https://api.twitter.com/graphql/jMaTS-_Ea8vh9rpKggJbCQ/UserByScreenName?variables={}'\ _url = 'https://api.twitter.com/graphql/jMaTS-_Ea8vh9rpKggJbCQ/UserByScreenName?variables={}'\
.format(dict_to_url(_dct)) .format(dict_to_url(_dct))
_headers = { _headers = {
'authorization': bearer_token, 'authorization': config.Bearer_token,
'x-guest-token': guest_token, 'x-guest-token': config.Guest_token,
} }
try: try:
response = await Request(_url, headers=_headers) response = await Request(_url, headers=_headers)
j_r = loads(response) j_r = loads(response)
if user_id: if user_id:
_id = j_r['data']['user']['rest_id'] try:
return _id _id = j_r['data']['user']['rest_id']
return _id
except KeyError as e:
logme.critical(__name__ + ':User:' + str(e))
return
await Users(j_r, config, conn) await Users(j_r, config, conn)
except Exception as e: except Exception as e:
logme.critical(__name__ + ':User:' + str(e)) logme.critical(__name__ + ':User:' + str(e))
......
...@@ -88,13 +88,13 @@ def _output(obj, output, config, **extra): ...@@ -88,13 +88,13 @@ def _output(obj, output, config, **extra):
logme.debug(__name__ + ':_output:Lowercase:tweet') logme.debug(__name__ + ':_output:Lowercase:tweet')
obj.username = obj.username.lower() obj.username = obj.username.lower()
author_list.update({obj.username}) author_list.update({obj.username})
for i in range(len(obj.mentions)): for dct in obj.mentions:
obj.mentions[i] = obj.mentions[i].lower() for key, val in dct.items():
dct[key] = val.lower()
for i in range(len(obj.hashtags)): for i in range(len(obj.hashtags)):
obj.hashtags[i] = obj.hashtags[i].lower() obj.hashtags[i] = obj.hashtags[i].lower()
# TODO : dont know what cashtags are, <also modify in tweet.py> for i in range(len(obj.cashtags)):
# for i in range(len(obj.cashtags)): obj.cashtags[i] = obj.cashtags[i].lower()
# obj.cashtags[i] = obj.cashtags[i].lower()
else: else:
logme.info('_output:Lowercase:hiddenTweetFound') logme.info('_output:Lowercase:hiddenTweetFound')
print("[x] Hidden tweet found, account suspended due to violation of TOS") print("[x] Hidden tweet found, account suspended due to violation of TOS")
...@@ -128,49 +128,40 @@ def _output(obj, output, config, **extra): ...@@ -128,49 +128,40 @@ def _output(obj, output, config, **extra):
async def checkData(tweet, config, conn): async def checkData(tweet, config, conn):
logme.debug(__name__ + ':checkData') logme.debug(__name__ + ':checkData')
tweet = Tweet(tweet, config) tweet = Tweet(tweet, config)
if not tweet.datestamp: if not tweet.datestamp:
logme.critical(__name__ + ':checkData:hiddenTweetFound') logme.critical(__name__ + ':checkData:hiddenTweetFound')
print("[x] Hidden tweet found, account suspended due to violation of TOS") print("[x] Hidden tweet found, account suspended due to violation of TOS")
return return
if datecheck(tweet.datestamp + " " + tweet.timestamp, config): if datecheck(tweet.datestamp + " " + tweet.timestamp, config):
output = format.Tweet(config, tweet) output = format.Tweet(config, tweet)
if config.Database: if config.Database:
logme.debug(__name__ + ':checkData:Database') logme.debug(__name__ + ':checkData:Database')
db.tweets(conn, tweet, config) db.tweets(conn, tweet, config)
if config.Pandas: if config.Pandas:
logme.debug(__name__ + ':checkData:Pandas') logme.debug(__name__ + ':checkData:Pandas')
panda.update(tweet, config) panda.update(tweet, config)
if config.Store_object: if config.Store_object:
logme.debug(__name__ + ':checkData:Store_object') logme.debug(__name__ + ':checkData:Store_object')
if hasattr(config.Store_object_tweets_list, 'append'): if hasattr(config.Store_object_tweets_list, 'append'):
config.Store_object_tweets_list.append(tweet) config.Store_object_tweets_list.append(tweet)
else: else:
tweets_list.append(tweet) tweets_list.append(tweet)
if config.Elasticsearch: if config.Elasticsearch:
logme.debug(__name__ + ':checkData:Elasticsearch') logme.debug(__name__ + ':checkData:Elasticsearch')
elasticsearch.Tweet(tweet, config) elasticsearch.Tweet(tweet, config)
_output(tweet, output, config) _output(tweet, output, config)
# else: # else:
# logme.critical(__name__+':checkData:copyrightedTweet') # logme.critical(__name__+':checkData:copyrightedTweet')
async def Tweets(tweets, config, conn, url=''): async def Tweets(tweets, config, conn):
logme.debug(__name__ + ':Tweets') logme.debug(__name__ + ':Tweets')
if config.Favorites or config.Profile_full or config.Location: if config.Favorites or config.Location:
logme.debug(__name__ + ':Tweets:fav+full+loc') logme.debug(__name__ + ':Tweets:fav+full+loc')
for tw in tweets: for tw in tweets:
if tw['data-item-id'] == url.split('?')[0].split('/')[-1]: await checkData(tw, config, conn)
await checkData(tw, config, conn) elif config.TwitterSearch or config.Profile:
elif config.TwitterSearch:
logme.debug(__name__ + ':Tweets:TwitterSearch') logme.debug(__name__ + ':Tweets:TwitterSearch')
await checkData(tweets, config, conn) await checkData(tweets, config, conn)
else: else:
......
This diff is collapsed.
...@@ -91,7 +91,7 @@ def update(object, config): ...@@ -91,7 +91,7 @@ def update(object, config):
"photos": Tweet.photos, "photos": Tweet.photos,
"video": Tweet.video, "video": Tweet.video,
"thumbnail": Tweet.thumbnail, "thumbnail": Tweet.thumbnail,
#"retweet": Tweet.retweet, "retweet": Tweet.retweet,
"nlikes": int(Tweet.likes_count), "nlikes": int(Tweet.likes_count),
"nreplies": int(Tweet.replies_count), "nreplies": int(Tweet.replies_count),
"nretweets": int(Tweet.retweets_count), "nretweets": int(Tweet.retweets_count),
...@@ -100,11 +100,11 @@ def update(object, config): ...@@ -100,11 +100,11 @@ def update(object, config):
"near": Tweet.near, "near": Tweet.near,
"geo": Tweet.geo, "geo": Tweet.geo,
"source": Tweet.source, "source": Tweet.source,
#"user_rt_id": Tweet.user_rt_id, "user_rt_id": Tweet.user_rt_id,
#"user_rt": Tweet.user_rt, "user_rt": Tweet.user_rt,
#"retweet_id": Tweet.retweet_id, "retweet_id": Tweet.retweet_id,
"reply_to": Tweet.reply_to, "reply_to": Tweet.reply_to,
#"retweet_date": Tweet.retweet_date, "retweet_date": Tweet.retweet_date,
"translate": Tweet.translate, "translate": Tweet.translate,
"trans_src": Tweet.trans_src, "trans_src": Tweet.trans_src,
"trans_dest": Tweet.trans_dest "trans_dest": Tweet.trans_dest
......
...@@ -53,7 +53,7 @@ def Csv(obj, config): ...@@ -53,7 +53,7 @@ def Csv(obj, config):
fieldnames, row = struct(obj, config.Custom[_obj_type], _obj_type) fieldnames, row = struct(obj, config.Custom[_obj_type], _obj_type)
base = addExt(config.Output, _obj_type, "csv") base = addExt(config.Output, _obj_type, "csv")
dialect = 'excel-tab' if config.Tabs else 'excel' dialect = 'excel-tab' if 'Tabs' in config.__dict__ else 'excel'
if not (os.path.exists(base)): if not (os.path.exists(base)):
with open(base, "w", newline='', encoding="utf-8") as csv_file: with open(base, "w", newline='', encoding="utf-8") as csv_file:
......
...@@ -21,21 +21,21 @@ def tweetData(t): ...@@ -21,21 +21,21 @@ def tweetData(t):
"hashtags": t.hashtags, "hashtags": t.hashtags,
"cashtags": t.cashtags, "cashtags": t.cashtags,
"link": t.link, "link": t.link,
# "retweet": t.retweet, "retweet": t.retweet,
"quote_url": t.quote_url, "quote_url": t.quote_url,
"video": t.video, "video": t.video,
"thumbnail": t.thumbnail, "thumbnail": t.thumbnail,
"near": t.near, "near": t.near,
"geo": t.geo, "geo": t.geo,
"source": t.source, "source": t.source,
# "user_rt_id": t.user_rt_id, "user_rt_id": t.user_rt_id,
# "user_rt": t.user_rt, "user_rt": t.user_rt,
# "retweet_id": t.retweet_id, "retweet_id": t.retweet_id,
"reply_to": t.reply_to, "reply_to": t.reply_to,
# "retweet_date": t.retweet_date, "retweet_date": t.retweet_date,
"translate": t.translate, "translate": t.translate,
"trans_src": t.trans_src, "trans_src": t.trans_src,
"trans_dest": t.trans_dest "trans_dest": t.trans_dest,
} }
return data return data
......
from time import strftime, localtime from time import strftime, localtime
from datetime import datetime, timezone from datetime import datetime, timezone
import json
import logging as logme import logging as logme
from googletransx import Translator from googletransx import Translator
...@@ -22,33 +21,44 @@ def utc_to_local(utc_dt): ...@@ -22,33 +21,44 @@ def utc_to_local(utc_dt):
return utc_dt.replace(tzinfo=timezone.utc).astimezone(tz=None) return utc_dt.replace(tzinfo=timezone.utc).astimezone(tz=None)
def getMentions(tw): Tweet_formats = {
'datetime': '%Y-%m-%d %H:%M:%S %Z',
'datestamp': '%Y-%m-%d',
'timestamp': '%H:%M:%S'
}
def _get_mentions(tw):
"""Extract mentions from tweet """Extract mentions from tweet
""" """
logme.debug(__name__ + ':getMentions') logme.debug(__name__ + ':get_mentions')
mentions = []
try: try:
for mention in tw['entities']['user_mentions']: mentions = [
mentions.append(mention['screen_name']) {
'screen_name': _mention['screen_name'],
'name': _mention['name'],
'id': _mention['id_str'],
} for _mention in tw['entities']['user_mentions']
if tw['display_text_range'][0] < _mention['indices'][0]
]
except KeyError: except KeyError:
mentions = [] mentions = []
return mentions return mentions
def getQuoteURL(tw): def _get_reply_to(tw):
"""Extract quote from tweet
"""
logme.debug(__name__ + ':getQuoteURL')
base_twitter = "https://twitter.com"
quote_url = ""
try: try:
quote = tw.find("div", "QuoteTweet-innerContainer") reply_to = [
quote_url = base_twitter + quote.get("href") {
except: 'screen_name': _mention['screen_name'],
quote_url = "" 'name': _mention['name'],
'id': _mention['id_str'],
return quote_url } for _mention in tw['entities']['user_mentions']
if tw['display_text_range'][0] > _mention['indices'][1]
]
except KeyError:
reply_to = []
return reply_to
def getText(tw): def getText(tw):
...@@ -63,107 +73,6 @@ def getText(tw): ...@@ -63,107 +73,6 @@ def getText(tw):
return text return text
def getStat(tw, _type):
"""Get stats about Tweet
"""
logme.debug(__name__ + ':getStat')
st = f"ProfileTweet-action--{_type} u-hiddenVisually"
return tw.find("span", st).find("span")["data-tweet-stat-count"]
def getRetweet(tw, _config):
"""Get Retweet
"""
logme.debug(__name__ + ':getRetweet')
if _config.Profile:
if int(tw["data-user-id"]) != _config.User_id:
return _config.User_id, _config.Username
else:
_rt_object = tw.find('span', 'js-retweet-text')
if _rt_object:
_rt_id = _rt_object.find('a')['data-user-id']
_rt_username = _rt_object.find('a')['href'][1:]
return _rt_id, _rt_username
return '', ''
# def getThumbnail(tw):
# """Get Thumbnail
# """
# divs = tw.find_all("div", "PlayableMedia-player")
# thumb = ""
# for div in divs:
# thumb = div.attrs["style"].split("url('")[-1]
# thumb = thumb.replace("')", "")
# return thumb
# def Tweet(tw, config):
# """Create Tweet object
# """
# logme.debug(__name__+':Tweet')
# t = tweet()
# t.id = int(tw["data-item-id"])
# t.id_str = tw["data-item-id"]
# t.conversation_id = tw["data-conversation-id"]
# t.datetime = int(tw.find("span", "_timestamp")["data-time-ms"])
# t.datestamp = strftime("%Y-%m-%d", localtime(t.datetime/1000.0))
# t.timestamp = strftime("%H:%M:%S", localtime(t.datetime/1000.0))
# t.user_id = int(tw["data-user-id"])
# t.user_id_str = tw["data-user-id"]
# t.username = tw["data-screen-name"]
# t.name = tw["data-name"]
# t.place = tw.find("a","js-geo-pivot-link").text.strip() if tw.find("a","js-geo-pivot-link") else ""
# t.timezone = strftime("%z", localtime())
# for img in tw.findAll("img", "Emoji Emoji--forText"):
# img.replaceWith(img["alt"])
# t.mentions = getMentions(tw)
# t.urls = [link.attrs["data-expanded-url"] for link in tw.find_all('a',{'class':'twitter-timeline-link'}) if link.has_attr("data-expanded-url")]
# t.photos = [photo_node.attrs['data-image-url'] for photo_node in tw.find_all("div", "AdaptiveMedia-photoContainer")]
# t.video = 1 if tw.find_all("div", "AdaptiveMedia-video") != [] else 0
# t.thumbnail = getThumbnail(tw)
# t.tweet = getText(tw)
# t.lang = tw.find('p', 'tweet-text')['lang']
# t.hashtags = [hashtag.text for hashtag in tw.find_all("a","twitter-hashtag")]
# t.cashtags = [cashtag.text for cashtag in tw.find_all("a", "twitter-cashtag")]
# t.replies_count = getStat(tw, "reply")
# t.retweets_count = getStat(tw, "retweet")
# t.likes_count = getStat(tw, "favorite")
# t.link = f"https://twitter.com/{t.username}/status/{t.id}"
# t.user_rt_id, t.user_rt = getRetweet(tw, config)
# t.retweet = True if t.user_rt else False
# t.retweet_id = ''
# t.retweet_date = ''
# if not config.Profile:
# t.retweet_id = tw['data-retweet-id'] if t.user_rt else ''
# t.retweet_date = datetime.fromtimestamp(((int(t.retweet_id) >> 22) + 1288834974657)/1000.0).strftime("%Y-%m-%d %H:%M:%S") if t.user_rt else ''
# t.quote_url = getQuoteURL(tw)
# t.near = config.Near if config.Near else ""
# t.geo = config.Geo if config.Geo else ""
# t.source = config.Source if config.Source else ""
# t.reply_to = [{'user_id': t['id_str'], 'username': t['screen_name']} for t in json.loads(tw["data-reply-to-users-json"])]
# t.translate = ''
# t.trans_src = ''
# t.trans_dest = ''
# if config.Translate == True:
# try:
# ts = translator.translate(text=t.tweet, dest=config.TranslateDest)
# t.translate = ts.text
# t.trans_src = ts.src
# t.trans_dest = ts.dest
# # ref. https://github.com/SuniTheFish/ChainTranslator/blob/master/ChainTranslator/__main__.py#L31
# except ValueError as e:
# raise Exception("Invalid destination language: {} / Tweet: {}".format(config.TranslateDest, t.tweet))
# logme.debug(__name__+':Tweet:translator.translate:'+str(e))
# return t
Tweet_formats = {
'datetime': '%Y-%m-%d %H:%M:%S %Z',
'datestamp': '%Y-%m-%d',
'timestamp': '%H:%M:%S'
}
def Tweet(tw, config): def Tweet(tw, config):
"""Create Tweet object """Create Tweet object
""" """
...@@ -185,14 +94,10 @@ def Tweet(tw, config): ...@@ -185,14 +94,10 @@ def Tweet(tw, config):
t.user_id_str = tw["user_id_str"] t.user_id_str = tw["user_id_str"]
t.username = tw["user_data"]['screen_name'] t.username = tw["user_data"]['screen_name']
t.name = tw["user_data"]['name'] t.name = tw["user_data"]['name']
t.place = tw['geo'] if tw['geo'] else "" t.place = tw['geo'] if 'geo' in tw and tw['geo'] else ""
t.timezone = strftime("%z", localtime()) t.timezone = strftime("%z", localtime())
# for img in tw.findAll("img", "Emoji Emoji--forText"): t.mentions = _get_mentions(tw)
# img.replaceWith(img["alt"]) t.reply_to = _get_reply_to(tw)
try:
t.mentions = [_mention['screen_name'] for _mention in tw['entities']['user_mentions']]
except KeyError:
t.mentions = []
try: try:
t.urls = [_url['expanded_url'] for _url in tw['entities']['urls']] t.urls = [_url['expanded_url'] for _url in tw['entities']['urls']]
except KeyError: except KeyError:
...@@ -216,21 +121,27 @@ def Tweet(tw, config): ...@@ -216,21 +121,27 @@ def Tweet(tw, config):
t.hashtags = [hashtag['text'] for hashtag in tw['entities']['hashtags']] t.hashtags = [hashtag['text'] for hashtag in tw['entities']['hashtags']]
except KeyError: except KeyError:
t.hashtags = [] t.hashtags = []
# don't know what this is try:
t.cashtags = [cashtag['text'] for cashtag in tw['entities']['symbols']] t.cashtags = [cashtag['text'] for cashtag in tw['entities']['symbols']]
except KeyError:
t.cashtags = []
t.replies_count = tw['reply_count'] t.replies_count = tw['reply_count']
t.retweets_count = tw['retweet_count'] t.retweets_count = tw['retweet_count']
t.likes_count = tw['favorite_count'] t.likes_count = tw['favorite_count']
t.link = f"https://twitter.com/{t.username}/status/{t.id}" t.link = f"https://twitter.com/{t.username}/status/{t.id}"
# TODO: someone who is familiar with this code, needs to take a look at what this is try:
# t.user_rt_id, t.user_rt = getRetweet(tw, config) if 'user_rt_id' in tw['retweet_data']:
# t.retweet = True if t.user_rt else False t.retweet = True
# t.retweet_id = '' t.retweet_id = tw['retweet_data']['retweet_id']
# t.retweet_date = '' t.retweet_date = tw['retweet_data']['retweet_date']
# if not config.Profile: t.user_rt = tw['retweet_data']['user_rt']
# t.retweet_id = tw['data-retweet-id'] if t.user_rt else '' t.user_rt_id = tw['retweet_data']['user_rt_id']
# t.retweet_date = datetime.fromtimestamp(((int(t.retweet_id) >> 22) + 1288834974657) / 1000.0).strftime( except KeyError:
# "%Y-%m-%d %H:%M:%S") if t.user_rt else '' t.retweet = False
t.retweet_id = ''
t.retweet_date = ''
t.user_rt = ''
t.user_rt_id = ''
try: try:
t.quote_url = tw['quoted_status_permalink']['expanded'] if tw['is_quote_status'] else '' t.quote_url = tw['quoted_status_permalink']['expanded'] if tw['is_quote_status'] else ''
except KeyError: except KeyError:
...@@ -239,13 +150,10 @@ def Tweet(tw, config): ...@@ -239,13 +150,10 @@ def Tweet(tw, config):
t.near = config.Near if config.Near else "" t.near = config.Near if config.Near else ""
t.geo = config.Geo if config.Geo else "" t.geo = config.Geo if config.Geo else ""
t.source = config.Source if config.Source else "" t.source = config.Source if config.Source else ""
# TODO: check this whether we need the list of all the users to whom this tweet is a reply or we only need
# the immediately above user id
t.reply_to = {'user_id': tw['in_reply_to_user_id_str'], 'username': tw['in_reply_to_screen_name']}
t.translate = '' t.translate = ''
t.trans_src = '' t.trans_src = ''
t.trans_dest = '' t.trans_dest = ''
if config.Translate == True: if config.Translate:
try: try:
ts = translator.translate(text=t.tweet, dest=config.TranslateDest) ts = translator.translate(text=t.tweet, dest=config.TranslateDest)
t.translate = ts.text t.translate = ts.text
...@@ -253,6 +161,6 @@ def Tweet(tw, config): ...@@ -253,6 +161,6 @@ def Tweet(tw, config):
t.trans_dest = ts.dest t.trans_dest = ts.dest
# ref. https://github.com/SuniTheFish/ChainTranslator/blob/master/ChainTranslator/__main__.py#L31 # ref. https://github.com/SuniTheFish/ChainTranslator/blob/master/ChainTranslator/__main__.py#L31
except ValueError as e: except ValueError as e:
raise Exception("Invalid destination language: {} / Tweet: {}".format(config.TranslateDest, t.tweet))
logme.debug(__name__ + ':Tweet:translator.translate:' + str(e)) logme.debug(__name__ + ':Tweet:translator.translate:' + str(e))
raise Exception("Invalid destination language: {} / Tweet: {}".format(config.TranslateDest, t.tweet))
return t return t
...@@ -5,7 +5,6 @@ from urllib.parse import urlencode ...@@ -5,7 +5,6 @@ from urllib.parse import urlencode
from urllib.parse import quote from urllib.parse import quote
mobile = "https://mobile.twitter.com" mobile = "https://mobile.twitter.com"
# base = "https://twitter.com/i"
base = "https://api.twitter.com/2/search/adaptive.json" base = "https://api.twitter.com/2/search/adaptive.json"
...@@ -65,18 +64,6 @@ async def MobileProfile(username, init): ...@@ -65,18 +64,6 @@ async def MobileProfile(username, init):
return url return url
async def Profile(username, init):
logme.debug(__name__ + ':Profile')
url = f"{base}/profiles/show/{username}/timeline/tweets?include_"
url += "available_features=1&lang=en&include_entities=1"
url += "&include_new_items_bar=true"
if init != '-1':
url += f"&max_position={init}"
return url
async def Search(config, init): async def Search(config, init):
logme.debug(__name__ + ':Search') logme.debug(__name__ + ':Search')
url = base url = base
...@@ -123,7 +110,7 @@ async def Search(config, init): ...@@ -123,7 +110,7 @@ async def Search(config, init):
q += f" geocode:{config.Geo}" q += f" geocode:{config.Geo}"
if config.Search: if config.Search:
q += f"{config.Search}" q += f" {config.Search}"
if config.Year: if config.Year:
q += f" until:{config.Year}-1-1" q += f" until:{config.Year}-1-1"
if config.Since: if config.Since:
...@@ -173,17 +160,18 @@ async def Search(config, init): ...@@ -173,17 +160,18 @@ async def Search(config, init):
if config.Custom_query: if config.Custom_query:
q = config.Custom_query q = config.Custom_query
q = q.strip()
params.append(("q", q)) params.append(("q", q))
_serialQuery = _sanitizeQuery(url, params) _serialQuery = _sanitizeQuery(url, params)
return url, params, _serialQuery return url, params, _serialQuery
# maybe dont need this def SearchProfile(config, init=None):
async def SearchProfile(config, init=None):
logme.debug(__name__ + ':SearchProfile') logme.debug(__name__ + ':SearchProfile')
_url = 'https://api.twitter.com/2/timeline/profile/{}.json?' _url = 'https://api.twitter.com/2/timeline/profile/{user_id}.json'.format(user_id=config.User_id)
q = "" tweet_count = 100
params = [ params = [
# some of the fields are not required, need to test which ones aren't required
('include_profile_interstitial_type', '1'), ('include_profile_interstitial_type', '1'),
('include_blocking', '1'), ('include_blocking', '1'),
('include_blocked_by', '1'), ('include_blocked_by', '1'),
...@@ -205,14 +193,12 @@ async def SearchProfile(config, init=None): ...@@ -205,14 +193,12 @@ async def SearchProfile(config, init=None):
('include_ext_media_availability', 'true'), ('include_ext_media_availability', 'true'),
('send_error_codes', 'true'), ('send_error_codes', 'true'),
('simple_quoted_tweet', 'true'), ('simple_quoted_tweet', 'true'),
('include_tweet_replies', 'false'), ('include_tweet_replies', 'true'),
('count', '50'), ('count', tweet_count),
('userId', '1934388686'), ('ext', 'mediaStats%2ChighlightedLabel'),
('ext', 'mediaStats,ChighlightedLabel'),
] ]
if init: if type(init) == str:
params.append(('cursor', init)) params.append(('cursor', str(init)))
_serialQuery = _sanitizeQuery(_url, params) _serialQuery = _sanitizeQuery(_url, params)
return _url, params, _serialQuery return _url, params, _serialQuery
pass
...@@ -2,12 +2,13 @@ import datetime ...@@ -2,12 +2,13 @@ import datetime
import logging as logme import logging as logme
class User: class user:
type = "user" type = "user"
def __init__(self): def __init__(self):
pass pass
User_formats = { User_formats = {
'join_date': '%Y-%m-%d', 'join_date': '%Y-%m-%d',
'join_time': '%H:%M:%S %Z' 'join_time': '%H:%M:%S %Z'
...@@ -21,31 +22,31 @@ def User(ur): ...@@ -21,31 +22,31 @@ def User(ur):
msg = 'malformed json! cannot be parsed to get user data' msg = 'malformed json! cannot be parsed to get user data'
logme.fatal(msg) logme.fatal(msg)
raise KeyError(msg) raise KeyError(msg)
_usr = User() _usr = user()
_usr.id = ur['data']['user']['rest_id'] _usr.id = ur['data']['user']['rest_id']
_usr.name = ur['data']['user']['rest_id']['legacy']['name'] _usr.name = ur['data']['user']['legacy']['name']
_usr.username = ur['data']['user']['rest_id']['legacy']['screen_name'] _usr.username = ur['data']['user']['legacy']['screen_name']
_usr.bio = ur['data']['user']['rest_id']['legacy']['description'] _usr.bio = ur['data']['user']['legacy']['description']
_usr.location = ur['data']['user']['rest_id']['legacy']['location'] _usr.location = ur['data']['user']['legacy']['location']
_usr.url = ur['data']['user']['rest_id']['legacy']['screen_name']['url'] _usr.url = ur['data']['user']['legacy']['url']
# parsing date to user-friendly format # parsing date to user-friendly format
_dt = ur['data']['user']['rest_id']['legacy']['created_at'] _dt = ur['data']['user']['legacy']['created_at']
_dt = datetime.datetime.strptime(_dt, '%a %b %d %H:%M:%S %z %Y') _dt = datetime.datetime.strptime(_dt, '%a %b %d %H:%M:%S %z %Y')
# date is of the format year, # date is of the format year,
_usr.join_date = _dt.strftime(User_formats['join_date']) _usr.join_date = _dt.strftime(User_formats['join_date'])
_usr.join_time = _dt.strftime(User_formats['join_time']) _usr.join_time = _dt.strftime(User_formats['join_time'])
# :type `int` # :type `int`
_usr.tweets = int(ur['data']['user']['rest_id']['legacy']['statuses_count']) _usr.tweets = int(ur['data']['user']['legacy']['statuses_count'])
_usr.following = int(ur['data']['user']['rest_id']['legacy']['friends_count']) _usr.following = int(ur['data']['user']['legacy']['friends_count'])
_usr.followers = int(ur['data']['user']['rest_id']['legacy']['followers_count']) _usr.followers = int(ur['data']['user']['legacy']['followers_count'])
_usr.likes = int(ur['data']['user']['rest_id']['legacy']['favourites_count']) _usr.likes = int(ur['data']['user']['legacy']['favourites_count'])
_usr.media_count = int(ur['data']['user']['rest_id']['legacy']['media_count']) _usr.media_count = int(ur['data']['user']['legacy']['media_count'])
_usr.is_private = ur['data']['user']['rest_id']['legacy']['protected'] _usr.is_private = ur['data']['user']['legacy']['protected']
_usr.is_verified = ur['data']['user']['rest_id']['legacy']['verified'] _usr.is_verified = ur['data']['user']['legacy']['verified']
_usr.avatar = ur['data']['user']['rest_id']['legacy']['profile_image_url_https'] _usr.avatar = ur['data']['user']['legacy']['profile_image_url_https']
_usr.background_image = ur['data']['user']['rest_id']['legacy']['profile_banner_url'] _usr.background_image = ur['data']['user']['legacy']['profile_banner_url']
# TODO : future implementation # TODO : future implementation
# legacy_extended_profile is also available in some cases which can be used to get DOB of user # legacy_extended_profile is also available in some cases which can be used to get DOB of user
return _usr return _usr
Markdown is supported
0% or
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment