How to Scrape Twitter (X) Profiles with Python Using Playwright
Article

How to Scrape Twitter (X) Profiles with Python Using Playwright

Engineering

Learn how to scrape Twitter (X) profiles using Python and Playwright with cookie-based authentication. Extract tweets, timestamps, likes, reposts, views, and more using a reliable, fully working scraper.

Twitter — now known as X — has become one of the most valuable real-time data sources for tracking trends, analyzing conversations, doing market research, and understanding audience behavior. But manually copying tweet data is slow and impossible to scale when you need hundreds or thousands of entries.

This is where web scraping becomes essential.

In this guide, we’ll walk through how to scrape public X profiles using Python + Playwright, including how to authenticate using your own session cookies. Since X recently locked most content behind login, cookie-based authentication is the most reliable method.

We’ll cover everything from setting up your environment to extracting tweet text, timestamps, likes, reposts, views, and more.

Why Scrape Twitter Profiles?

X contains massive amounts of real-time public data, making it useful for:

  • Social media analytics
  • Sentiment analysis
  • Market and product research
  • Competitor monitoring
  • Trend tracking
  • Archiving public statements
  • AI and machine learning datasets

Public profiles are especially valuable because they provide curated, chronological activity from individuals, brands, and public figures.

But to access that data programmatically, we need authentication — and we’ll do it properly.

How Authentication Works on X

Since X blocks unauthenticated access to tweets, Playwright must load your login session using cookies.txt.

This works because:

  • Cookies represent your logged-in session
  • Playwright loads them into the browser context
  • X treats your scraper like a real user

A typical Netscape-style cookies.txt file looks like this:


.x.com    TRUE    /    TRUE    1798777260    auth_token    <value>
.x.com    TRUE    /    TRUE    1798548372    ct0           <value>

Once these are loaded, Playwright opens X as if you logged in manually.

Tools You Need

Install the required packages:

pip install playwright
playwright install

We will use:

  • Python (async)
  • Playwright for browser automation
  • cookies.txt for session authentication

Everything else runs directly in the script.

How This Scraper Works

1. Load Your Cookies

We parse cookies from cookies.txt (you can export this using any browser extension).

This makes X believe we’re a normal logged-in user.

2. Launch Playwright

A Chromium browser is opened — either headless or visible.

3. Navigate to the Target Profile

For example:

https://x.com/MrScraper_

We wait for the first set of tweets to load.

4. Auto-Scroll the Profile

X only loads a few tweets at first.

The script scrolls automatically to load more content.

5. Extract Tweets

For each <article> (tweet container), we extract:

  • Text
  • Timestamp
  • Replies
  • Reposts
  • Likes
  • Bookmarks
  • Views

Newer versions of X store stats inside a single ARIA label, e.g.:

aria-label="1629 replies, 4089 reposts, 29401 likes, 1066 bookmarks, 2035963 views"

We capture these using regex.

6. Save Everything to JSON

All output is stored in:

<username>_tweets.json

Perfect for analytics, dashboards, competitor research, and more.

Python Code: Full Working X Profile Scraper

import asyncio
from playwright.async_api import async_playwright
from datetime import datetime
import json
import os

# ------------------------------------------------------
# Convert cookies.txt → Playwright cookies
# ------------------------------------------------------
def parse_cookies_txt(path):
    cookies = []
    with open(path, "r") as f:
        for line in f:
            line = line.strip()
            if not line or line.startswith("#"):
                continue

            parts = line.split("\\t")
            if len(parts) != 7:
                continue

            domain, include_sub, p, secure, expiry, name, value = parts

            cookies.append({
                "name": name,
                "value": value,
                "domain": domain.lstrip("."),
                "path": p,
                "expires": float(expiry),
                "secure": secure.upper() == "TRUE",
                "httpOnly": False,
                "sameSite": "Lax"
            })
    return cookies


# ------------------------------------------------------
# Scrape profile timeline
# ------------------------------------------------------
async def scrape_profile(username):
    url = f"https://x.com/{username}"

    cookies = parse_cookies_txt("cookies.txt")

    async with async_playwright() as p:
        browser = await p.chromium.launch(headless=False)
        context = await browser.new_context()

        # Load the user's session cookies
        await context.add_cookies(cookies)

        page = await context.new_page()

        print(f"Opening profile: {url}")
        await page.goto(url, wait_until="domcontentloaded", timeout=60000)

        # Give time for initial tweets to render
        await asyncio.sleep(3)

        print("Scrolling...")
        for _ in range(20):  # scroll deeper if needed
            await page.evaluate("window.scrollBy(0, document.body.scrollHeight)")
            await asyncio.sleep(4)

        print("Extracting tweets...")

        tweets = await page.evaluate("""
            () => {
                const out = [];

                document.querySelectorAll("article").forEach(a => {
                    try {
                        // ---- Extract tweet text ----
                        const text = Array.from(a.querySelectorAll("div[dir='auto'] span"))
                            .map(s => s.innerText)
                            .join(" ")
                            .trim();

                        // ---- Extract date ----
                        const date = a.querySelector("time")?.getAttribute("datetime") || null;

                        // ---- Extract stats ----
                        let replies = null, reposts = null, likes = null, bookmarks = null, views = null;

                        const group = a.querySelector("div[role='group'][aria-label]");

                        if (group) {
                            const label = group.getAttribute("aria-label");

                            const matchReplies   = label.match(/(\\d[\\\\d,.KkM]*) replies/);
                            const matchReposts   = label.match(/(\\d[\\\\d,.KkM]*) reposts/);
                            const matchLikes     = label.match(/(\\d[\\\\d,.KkM]*) likes/);
                            const matchBookmarks = label.match(/(\\d[\\\\d,.KkM]*) bookmarks/);
                            const matchViews     = label.match(/(\\d[\\\\d,.KkM]*) views/);

                            replies   = matchReplies?.[1]   || null;
                            reposts   = matchReposts?.[1]   || null;
                            likes     = matchLikes?.[1]     || null;
                            bookmarks = matchBookmarks?.[1] || null;
                            views     = matchViews?.[1]     || null;
                        }

                        out.push({
                            text,
                            date,
                            replies,
                            reposts,
                            likes,
                            bookmarks,
                            views
                        });

                    } catch(e) {}
                });

                return out;
            }
        """)

        print(f"Collected {len(tweets)} tweets.")

        # Save as JSON
        file = f"{username}_tweets.json"
        with open(file, "w") as f:
            json.dump(tweets, f, indent=2)

        print("Saved to:", file)
        await browser.close()


# ------------------------------------------------------
# Run
# ------------------------------------------------------
if __name__ == "__main__":
    username = "MrScraper_"  # change this
    asyncio.run(scrape_profile(username))

Conclusion

Scraping X profiles today requires:

  • Authentication
  • A real browser automation tool
  • Stable selectors
  • Smart handling of X’s dynamic DOM

By combining Playwright with your session cookies, you can reliably collect tweet text, engagement metrics, timestamps, and more.

This method can be extended to:

  • Hashtag scraping
  • Full timeline scraping
  • Thread and reply extraction
  • Image/video scraping
  • Bookmark analytics
  • DM automation (with caution)

This scraper is flexible, stable, and ideal for real-world research and analytics.

Table of Contents

    Take a Taste of Easy Scraping!