Getting all tweets / replies #19

thewh1teagle · 2024-09-29T23:13:09Z

Hey!

Thanks for creating such a great library!

I'm trying to retrieve all of my tweets and replies (I have thousands), but I couldn't find any mention of pagination to fetch beyond the maximum limit. Does the library support this feature?

Also, I don't see an option to get my own username or user ID after authentication. Could you clarify how to achieve that?

imperatrona · 2024-09-30T03:56:14Z

hi @thewh1teagle! saw you contributed UserTweetsAndReplies thank you so much! i’ll add tests and merge it tomorow.

i suppose you already found out how to paginate.

this library haven’t yet implemented getting the current user from cookie. you can get screen_name / user_id by making get request to https://api.twitter.com/1.1/account/multi/list.json with empty body.

though if your cookie has multiple accounts logged in (have auth_multi cookie) this will return data for all accounts without flaging which is currently active. to get currently active screen_name you can make get request to https://api.twitter.com/1.1/account/settings.json

thewh1teagle · 2024-10-01T12:59:07Z

Thanks, now we have account endpoints for getting screen_name :)

I tried to use FetchTweetsAndRepliesByUserID by iterate it and sleep 10 seconds between each iteration

But got this error:

panic: response status 429 Too Many Requests: Rate limit exceeded

func run() {
	creds, err := auth.GetCredentials()
	if err != nil {
		panic(err)
	}
	scraper := twitterscraper.New()
	authToken := twitterscraper.AuthToken{Token: creds.AuthToken, CSRFToken: creds.Ct0}
	scraper.SetAuthToken(authToken)

	if !scraper.IsLoggedIn() {
		panic("Invalid AuthToken")
	}
	settings, err := scraper.GetAccountSettings()
	if err != nil {
		panic(err)
	}
	log.Println("Logged in as: ", settings.ScreenName)

	userId, err := scraper.GetUserIDByScreenName(settings.ScreenName)
	if err != nil {
		panic(err)
	}

	// Load cursors for posts and replies
	cursorPosts, err := storage.LoadCursor(".cursor_posts")
	log.Println("Current Cursor:", cursorPosts)
	if err != nil {
		log.Println("No cursor file found for posts, starting from the beginning.")
		cursorPosts = ""
	}

	// Counter for the number of fetched tweets
	fetchedCount := 0

	// First loop to fetch and save posts
	for {
		// Fetch tweets using the cursor
		tweets, newCursorPosts, err := scraper.FetchTweetsAndRepliesByUserID(userId, 20, cursorPosts)
		if err != nil {
			panic(err)
		}

		// If no new tweets are fetched, exit the loop
		if len(tweets) == 0 {
			log.Println("No new posts found. Exiting...")
			break
		}

		// Increment the fetched count by the number of newly fetched tweets
		fetchedCount += len(tweets)
		log.Printf("Fetched %d new tweets. Total fetched: %d\n", len(tweets), fetchedCount)

		// Save the new cursor state for posts
		if err := storage.SaveCursor(".cursor_posts", newCursorPosts); err != nil {
			panic(err)
		}

		// Save each tweet in JSONL format
		if err := storage.SaveTweetJSONL("posts.jsonl", tweets); err != nil {
			panic(err)
		}

		// Update cursor for the next iteration
		cursorPosts = newCursorPosts

		// Optional: Delay to avoid hitting rate limits
		time.Sleep(sleepBetweenRequests)
	}

	log.Printf("Total tweets fetched: %d\n", fetchedCount)
}

The default sleepBetweenRequest is 10*time.Second`

Did I make the requests too quickly?
How many tweets it takes by default? I noticed that in twitter UI it takes 20 at each scroll.

imperatrona · 2024-10-01T13:31:25Z

@thewh1teagle i was doing the same task recently and 15 seconds delay was enough. each request usually return 20 tweets, but sometimes can do 15-90. this lib has implemented method scraper.WithDelay(15) which you can use instead your sleepBetweenReques

thewh1teagle · 2024-10-01T22:19:35Z

@imperatrona

Thanks! good to hear that you did it recently, though note that I'm using the new endpoint from #20. It's almost the same like the getTweets except that it returns basically everything that the user posted - tweets / replies / reposts / quotes etc.
I changed it to use withDelay instead of sleeping and increased the timeout. I'll check later. hope it will works without this error.

wade-liwei · 2024-11-06T01:41:34Z

	creds, err := auth.GetCredentials()
	if err != nil {
		panic(err)
	}

Could you please give me an example of how to implement the func auth.GetCredentials() ?

thewh1teagle · 2024-11-06T05:25:43Z

Could you please give me an example of how to implement the func auth.GetCredentials() ?

I used playwright in go.
Instead of messing with complicated auth I log in manually and then extract the cookie with playwright and store / load them from file.

wade-liwei · 2024-11-06T05:36:46Z

thank you.
I will try the repository https://github.com/playwright-community/playwright-go.

wade-liwei · 2024-11-06T10:46:00Z

Instead of messing with complicated auth I log in manually and then extract the cookie with playwright and store / load them from file.

@thewh1teagle

Could you please provide an example of steps or document?
I have read the playwright document, it is so difficult to implement a twitter login.

thewh1teagle · 2024-11-06T11:22:59Z

I have read the playwright document, it is so difficult to implement a twitter login.

I didn't implemented automatic login.
I open the browser from the code, and wait for the user to login and press enter in the terminal

wade-liwei · 2024-11-06T14:42:21Z

I found out an example https://github.com/go-numb/x-post-to-blue/blob/master/mod.go#L193

but I am not sure this is your way.

and I try the code https://github.com/playwright-community/playwright-go?tab=readme-ov-file#example, replace https://news.ycombinator.com with "https://x.com/i/flow/login", but does not open the browser from the code.

Could you please share me an example of how to open the browser from the code?
and how to get the auth_token and CSRFToken.

thewh1teagle · 2024-11-06T16:53:15Z

@wade-liwei

See their examples

https://github.com/playwright-community/playwright-go/blob/main/examples/scraping/main.go

wade-liwei · 2024-11-07T01:43:16Z

hi @thewh1teagle

I use the following code to get the cookies file.
finish the task: "I log in manually and then extract the cookie with playwright and store / load them from file."

Could you please help me implement how to get the auth_token and CSRFToken?

cmj · 2024-11-07T02:37:04Z

Something like this should work too: https://gist.github.com/cmj/17fa133a948eedd0167bdcbff1dfff19

wade-liwei · 2024-11-07T04:59:05Z

thank you.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Getting all tweets / replies #19

Getting all tweets / replies #19

thewh1teagle commented Sep 29, 2024

imperatrona commented Sep 30, 2024 •

edited

Loading

thewh1teagle commented Oct 1, 2024

imperatrona commented Oct 1, 2024

thewh1teagle commented Oct 1, 2024 •

edited

Loading

wade-liwei commented Nov 6, 2024

thewh1teagle commented Nov 6, 2024

wade-liwei commented Nov 6, 2024

wade-liwei commented Nov 6, 2024

thewh1teagle commented Nov 6, 2024

wade-liwei commented Nov 6, 2024

thewh1teagle commented Nov 6, 2024

wade-liwei commented Nov 7, 2024 •

edited

Loading

cmj commented Nov 7, 2024

wade-liwei commented Nov 7, 2024

Getting all tweets / replies #19

Getting all tweets / replies #19

Comments

thewh1teagle commented Sep 29, 2024

imperatrona commented Sep 30, 2024 • edited Loading

thewh1teagle commented Oct 1, 2024

imperatrona commented Oct 1, 2024

thewh1teagle commented Oct 1, 2024 • edited Loading

wade-liwei commented Nov 6, 2024

thewh1teagle commented Nov 6, 2024

wade-liwei commented Nov 6, 2024

wade-liwei commented Nov 6, 2024

thewh1teagle commented Nov 6, 2024

wade-liwei commented Nov 6, 2024

thewh1teagle commented Nov 6, 2024

wade-liwei commented Nov 7, 2024 • edited Loading

cmj commented Nov 7, 2024

wade-liwei commented Nov 7, 2024

imperatrona commented Sep 30, 2024 •

edited

Loading

thewh1teagle commented Oct 1, 2024 •

edited

Loading

wade-liwei commented Nov 7, 2024 •

edited

Loading