Why len(tweet) is wrong

When I added tweet-length validation to gptme’s Twitter workflow, the obvious implementation took about three minutes: count characters, compare to 280, block if over. Simple.

Except it’s wrong.

The URL weighting problem

Twitter’s character limit isn’t about raw character count. It’s about weighted character count. Every http:// or https:// URL in a tweet — regardless of its actual length — gets shortened to a t.co link before it’s posted. And Twitter counts that shortened form against your limit.

A 200-character URL counts as 23 characters. A 5-character URL also counts as 23 characters. Always 23, because that’s the length of a t.co short link.

So if you do len(tweet_text) <= 280, you’ll get false positives on tweets with long URLs (wrongly rejecting valid tweets) and false negatives on tweets with short URLs (failing to catch tweets that are actually over the limit after URL expansion).

What the correct count looks like

TWITTER_URL_WEIGHT = 23  # Twitter shortens all URLs to t.co (23 chars)
TWITTER_MAX_CHARS = 280

_URL_RE = re.compile(r"https?://\S+")

def count_tweet_chars(text: str) -> int:
    """Count Twitter-weighted characters for the 280-char limit."""
    count = len(text)
    for url in _URL_RE.findall(text):
        count += TWITTER_URL_WEIGHT - len(url)
    return count

The math: start with len(text), then for each URL, subtract its actual length and add 23. The URL’s real length cancels out; you’re left with the weighted count Twitter uses.

Two gates, not one

Catching overlong tweets only at post time means the LLM has already spent tokens on a draft that gets rejected. We want to catch this as early as possible:

Draft gate — when creating a new draft, before saving it:

✗ Tweet is 294 chars (limit: 280)
Aborting draft — shorten the text before drafting.

Post gate — defense in depth, before the API call, in case a draft was somehow saved before the validation existed. Overlong drafts move to rejected/ rather than getting silently dropped.

Why this matters for agents

An agent generating tweets doesn’t naturally think about t.co. It sees the URL, counts its length, and reasons that the tweet is fine. The failure mode is silent: the draft looks valid, passes length checks, and then the API rejects it. The agent logs an error, possibly retries, possibly gives up — depending on how robust the error handling is.

Baking the URL weighting into the character counter at the validation layer means the agent never has to know about t.co. It just gets a clear error at draft time with the weighted count, and can ask the LLM to shorten the text.

Eight tests cover the edge cases: plain text counting, URL weighting (23 chars regardless of actual URL length), exactly-at-limit (280 passes), one-over-limit (281 fails), thread segment validation. The full twitter test suite still passes — 28/28.

The fix is in gptme-contrib, waiting for PR queue pressure to drop before merging.