Indexed though blocked by robots.txt is not a myth
I used to think this was one of those SEO forum horror stories. Like my cousin’s website ranked without content type stuff. But nope. Indexed though blocked by robots.txt is very real, and honestly a bit annoying when you first see it in Search Console. You block a page, feeling all responsible, and Google still goes cool, I’ll index it anyway… just won’t read it. That’s basically what’s happening. Google knows the page exists, but it’s standing outside the door because robots.txt told it not to come in. The name itself sounds confusing, and yeah, it is.
What actually happens behind the scenes
Here’s the simplest way I explain it to clients and sometimes to myself: imagine your house address is written on Google Maps, but the door is locked and there’s a big do not enter sign. Google can still list the address, but it has no idea what your furniture looks like. When a page is indexed though blocked by robots.txt, Google has the URL from somewhere — internal links, backlinks, old crawls — but robots.txt stops it from crawling the content. So Google indexes the URL without context. That’s why you sometimes see blank titles or weird snippets.
Why Google indexes pages it can’t crawl
This part feels unfair, I know. You block something and still get indexed. But Google has said buried in docs nobody reads fully that robots.txt is a crawling directive, not an indexing rule. If Google already knows the URL exists, it can still index it based on external signals. That’s why SEO Twitter keeps screaming robots.txt is not a noindex. And yeah, they’re right for once. Blocking crawling doesn’t erase a page from existence.
Common mistakes that trigger this issue
Most of the time, this problem is self-inflicted. I’ve seen people block entire folders, like /blog/ or /wp-admin/, and then panic later. Sometimes developers block staging URLs and forget to remove the rule before going live. Another classic one is blocking parameter URLs thinking it’s cleaning SEO, but those URLs already have links pointing to them. Boom. Indexed though blocked by robots.txt shows up, and everyone starts blaming Google instead of that one rushed robots.txt update at 2 a.m.
How this affects rankings
Let me be blunt. These pages usually don’t rank well. Google can’t read the content, so it can’t judge relevance properly. It’s like applying for a job with only your name and no resume.

