Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Getting all tweets / replies #19

Open
thewh1teagle opened this issue Sep 29, 2024 · 14 comments
Open

Getting all tweets / replies #19

thewh1teagle opened this issue Sep 29, 2024 · 14 comments

Comments

@thewh1teagle
Copy link

Hey!

Thanks for creating such a great library!

I'm trying to retrieve all of my tweets and replies (I have thousands), but I couldn't find any mention of pagination to fetch beyond the maximum limit. Does the library support this feature?

Also, I don't see an option to get my own username or user ID after authentication. Could you clarify how to achieve that?

@imperatrona
Copy link
Owner

imperatrona commented Sep 30, 2024

hi @thewh1teagle! saw you contributed UserTweetsAndReplies thank you so much! i’ll add tests and merge it tomorow.

i suppose you already found out how to paginate.

this library haven’t yet implemented getting the current user from cookie. you can get screen_name / user_id by making get request to https://api.twitter.com/1.1/account/multi/list.json with empty body.

though if your cookie has multiple accounts logged in (have auth_multi cookie) this will return data for all accounts without flaging which is currently active. to get currently active screen_name you can make get request to https://api.twitter.com/1.1/account/settings.json

@thewh1teagle
Copy link
Author

Thanks, now we have account endpoints for getting screen_name :)

I tried to use FetchTweetsAndRepliesByUserID by iterate it and sleep 10 seconds between each iteration

But got this error:

panic: response status 429 Too Many Requests: Rate limit exceeded
func run() {
	creds, err := auth.GetCredentials()
	if err != nil {
		panic(err)
	}
	scraper := twitterscraper.New()
	authToken := twitterscraper.AuthToken{Token: creds.AuthToken, CSRFToken: creds.Ct0}
	scraper.SetAuthToken(authToken)

	if !scraper.IsLoggedIn() {
		panic("Invalid AuthToken")
	}
	settings, err := scraper.GetAccountSettings()
	if err != nil {
		panic(err)
	}
	log.Println("Logged in as: ", settings.ScreenName)

	userId, err := scraper.GetUserIDByScreenName(settings.ScreenName)
	if err != nil {
		panic(err)
	}

	// Load cursors for posts and replies
	cursorPosts, err := storage.LoadCursor(".cursor_posts")
	log.Println("Current Cursor:", cursorPosts)
	if err != nil {
		log.Println("No cursor file found for posts, starting from the beginning.")
		cursorPosts = ""
	}

	// Counter for the number of fetched tweets
	fetchedCount := 0

	// First loop to fetch and save posts
	for {
		// Fetch tweets using the cursor
		tweets, newCursorPosts, err := scraper.FetchTweetsAndRepliesByUserID(userId, 20, cursorPosts)
		if err != nil {
			panic(err)
		}

		// If no new tweets are fetched, exit the loop
		if len(tweets) == 0 {
			log.Println("No new posts found. Exiting...")
			break
		}

		// Increment the fetched count by the number of newly fetched tweets
		fetchedCount += len(tweets)
		log.Printf("Fetched %d new tweets. Total fetched: %d\n", len(tweets), fetchedCount)

		// Save the new cursor state for posts
		if err := storage.SaveCursor(".cursor_posts", newCursorPosts); err != nil {
			panic(err)
		}

		// Save each tweet in JSONL format
		if err := storage.SaveTweetJSONL("posts.jsonl", tweets); err != nil {
			panic(err)
		}

		// Update cursor for the next iteration
		cursorPosts = newCursorPosts

		// Optional: Delay to avoid hitting rate limits
		time.Sleep(sleepBetweenRequests)
	}

	log.Printf("Total tweets fetched: %d\n", fetchedCount)
}

The default sleepBetweenRequest is 10*time.Second`

Did I make the requests too quickly?
How many tweets it takes by default? I noticed that in twitter UI it takes 20 at each scroll.

@imperatrona
Copy link
Owner

@thewh1teagle i was doing the same task recently and 15 seconds delay was enough. each request usually return 20 tweets, but sometimes can do 15-90. this lib has implemented method scraper.WithDelay(15) which you can use instead your sleepBetweenReques

@thewh1teagle
Copy link
Author

thewh1teagle commented Oct 1, 2024

@imperatrona

Thanks! good to hear that you did it recently, though note that I'm using the new endpoint from #20. It's almost the same like the getTweets except that it returns basically everything that the user posted - tweets / replies / reposts / quotes etc.
I changed it to use withDelay instead of sleeping and increased the timeout. I'll check later. hope it will works without this error.

@wade-liwei
Copy link

	creds, err := auth.GetCredentials()
	if err != nil {
		panic(err)
	}

Could you please give me an example of how to implement the func auth.GetCredentials() ?

@thewh1teagle
Copy link
Author

Could you please give me an example of how to implement the func auth.GetCredentials() ?

I used playwright in go.
Instead of messing with complicated auth I log in manually and then extract the cookie with playwright and store / load them from file.

@wade-liwei
Copy link

thank you.
I will try the repository https://github.com/playwright-community/playwright-go.

@wade-liwei
Copy link

Instead of messing with complicated auth I log in manually and then extract the cookie with playwright and store / load them from file.

@thewh1teagle

Could you please provide an example of steps or document?
I have read the playwright document, it is so difficult to implement a twitter login.

@thewh1teagle
Copy link
Author

I have read the playwright document, it is so difficult to implement a twitter login.

I didn't implemented automatic login.
I open the browser from the code, and wait for the user to login and press enter in the terminal

@wade-liwei
Copy link

I found out an example https://github.com/go-numb/x-post-to-blue/blob/master/mod.go#L193

but I am not sure this is your way.

and I try the code https://github.com/playwright-community/playwright-go?tab=readme-ov-file#example, replace https://news.ycombinator.com with "https://x.com/i/flow/login", but does not open the browser from the code.

Could you please share me an example of how to open the browser from the code?
and how to get the auth_token and CSRFToken.

@thewh1teagle
Copy link
Author

@wade-liwei
Copy link

wade-liwei commented Nov 7, 2024

hi @thewh1teagle

I use the following code to get the cookies file.
finish the task: "I log in manually and then extract the cookie with playwright and store / load them from file."

Could you please help me implement how to get the auth_token and CSRFToken?

@cmj
Copy link

cmj commented Nov 7, 2024

Something like this should work too: https://gist.github.com/cmj/17fa133a948eedd0167bdcbff1dfff19

@wade-liwei
Copy link

thank you.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants