Time to First Byte (TTFB) is one of those metrics that quietly sabotages your entire performance story. A sluggish TTFB inflates every metric that follows — FCP, LCP, and even INP — because the browser literally can't paint a single pixel until that first byte arrives. I've seen sites shave hundreds of milliseconds off LCP just by fixing their TTFB, and honestly, it's often the lowest-hanging fruit nobody's picking.
In this guide, we'll walk through how to measure, diagnose, and fix TTFB using server-side caching, CDN edge strategies, HTTP compression, 103 Early Hints, and the Server-Timing API.
What Is TTFB and Why Does It Matter?
TTFB measures the time between the browser sending an HTTP request and receiving the very first byte of the response. That window covers DNS resolution, TCP/TLS handshake, and the full server processing time needed to generate the response.
Here's the thing — TTFB isn't itself a Core Web Vital. But it directly influences the metrics Google actually uses for ranking. A high TTFB pushes your Largest Contentful Paint (LCP) beyond the 2.5-second "good" threshold because the browser can't begin parsing HTML, discovering subresources, or rendering content until that first byte lands.
Optimizing TTFB is often the single fastest way to improve your overall Lighthouse score.
TTFB Benchmarks You Should Target
- Good: Under 800 ms (p75)
- Needs improvement: 800 ms – 1800 ms
- Poor: Over 1800 ms
For competitive sites, you'll want to aim for sub-200 ms document TTFB. And with an edge-cached or edge-rendered architecture, sub-50 ms is absolutely achievable in 2026.
How to Measure TTFB Accurately
Before you can fix TTFB, you need to measure it correctly. There are two main approaches: lab testing and field (RUM) measurement.
Lab Tools
Use WebPageTest, Lighthouse, Chrome DevTools Network tab, or DebugBear to measure TTFB under controlled conditions. In DevTools, look at the "Waiting for server response" segment in the Network panel's timing breakdown — that's your TTFB right there.
Field Measurement with the Navigation Timing API
For real-user monitoring, use the PerformanceNavigationTiming API to capture TTFB from actual visitors:
// Measure TTFB from real users
const observer = new PerformanceObserver((list) => {
for (const entry of list.getEntries()) {
// responseStart = timestamp when the first byte arrived
const ttfb = entry.responseStart;
console.log(`TTFB: ${ttfb.toFixed(0)}ms`);
// For sites using 103 Early Hints, measure the final response:
if (entry.finalResponseHeadersStart) {
const finalTTFB = entry.finalResponseHeadersStart;
console.log(`Final response TTFB: ${finalTTFB.toFixed(0)}ms`);
}
}
});
observer.observe({ type: "navigation", buffered: true });
Note the finalResponseHeadersStart property — this one's critical if you use 103 Early Hints, because responseStart will reflect the early 103 response rather than the final document. Use finalResponseHeadersStart to measure your actual server processing time.
Diagnose TTFB Bottlenecks with the Server-Timing Header
A raw TTFB number tells you something is slow, but not what. That's where the Server-Timing response header comes in — it breaks that opaque number into actionable sub-metrics your backend emits.
How Server-Timing Works
Your server adds a header with named timing entries:
Server-Timing: db;dur=53;desc="Database queries",
app;dur=120;desc="Application logic",
cache;dur=2;desc="Cache lookup",
tmpl;dur=45;desc="Template rendering"
Chrome DevTools visualizes these timings under the "Server Timing" section of a request's Timing tab, so you can instantly spot whether your database, application code, or template engine is the culprit.
Implementing Server-Timing in Node.js
import express from "express";
const app = express();
app.use((req, res, next) => {
const timings = [];
req.serverTimings = timings;
// Helper to record a timing
req.recordTiming = (name, desc, fn) => {
const start = performance.now();
const result = fn();
const dur = performance.now() - start;
timings.push(`${name};dur=${dur.toFixed(1)};desc="${desc}"`);
return result;
};
// Write header just before response is sent
res.on("finish", () => {});
const originalSend = res.send.bind(res);
res.send = (body) => {
res.set("Server-Timing", timings.join(", "));
return originalSend(body);
};
next();
});
app.get("/", async (req, res) => {
const data = await req.recordTiming("db", "Database", () =>
db.query("SELECT * FROM products LIMIT 20")
);
const html = req.recordTiming("tmpl", "Render", () =>
renderTemplate("home", data)
);
res.send(html);
});
You can read these timings in the browser via the PerformanceResourceTiming.serverTiming property and beacon them to your analytics for RUM-based backend monitoring. Super useful for catching regressions before your users start complaining.
Server-Side Caching: The Biggest Quick Win
If there's one thing you take away from this article, let it be this: caching is the most impactful optimization for TTFB. A well-configured cache can take you from 800 ms to under 50 ms because it eliminates server processing entirely for repeat requests.
Layered Caching Strategy
- Application-level cache (in-memory): Store rendered pages or expensive query results in memory using Redis or Memcached. This avoids repeated database queries and template rendering.
- Reverse proxy cache: Place Varnish or NGINX's
proxy_cachein front of your application server to serve responses without touching application code at all. - CDN edge cache: Cache full HTML responses at the edge so most visitors never hit your origin. Use
Cache-Control: public, max-age=60, stale-while-revalidate=300for pages that can tolerate a one-minute cache window.
Cache-Control for Dynamic Content
# NGINX configuration for a dynamic page with SWR
location /blog/ {
proxy_pass http://app_server;
proxy_cache_valid 200 60s;
add_header Cache-Control "public, max-age=60, stale-while-revalidate=300";
add_header X-Cache-Status $upstream_cache_status;
}
The stale-while-revalidate directive is honestly one of my favorite performance features. It serves the stale cached response immediately (near-zero TTFB) while revalidating in the background. Your visitors get instant responses while content stays reasonably fresh.
CDN and Edge Computing Strategies
A CDN eliminates the geographic distance between your server and your users. Without one, a visitor in Tokyo requesting a page from a US East origin might wait 200+ ms on network latency alone — and that's before any server processing even begins.
Static Caching at the Edge
At minimum, serve all static assets (JS, CSS, images, fonts) from CDN edge nodes. But the real TTFB improvement comes from caching your HTML documents at the edge. Services like Cloudflare, Fastly, and CloudFront can cache full pages and serve them in under 20 ms globally.
Edge-Side Rendering (ESR)
This is where things get really interesting. Edge-side rendering moves your server-side rendering logic to CDN edge nodes. Instead of generating HTML on a centralized origin server, your SSR code runs at 300+ global points of presence.
The results speak for themselves:
| Architecture | Typical TTFB |
|---|---|
| Traditional SSR (origin) | 200 – 800 ms |
| Edge SSR (Cloudflare Workers, Vercel Edge) | 20 – 50 ms |
| Full-page edge cache | < 20 ms |
Cloudflare Workers use V8 isolates instead of containers, achieving sub-1 ms cold starts compared to 100–1000 ms for traditional Lambda cold starts. Frameworks like Next.js, Nuxt, and SolidStart now support edge runtimes out of the box, which makes adoption a lot more straightforward than it used to be.
Edge KV Stores for Personalization
The classic objection to edge caching is "but my pages are personalized." Fair enough — but edge KV stores (Cloudflare KV, Vercel Edge Config) solve this by replicating user segments and feature flags globally with sub-millisecond read latency. Your edge function reads the user segment from KV and renders the correct variant, no origin round-trip needed.
HTTP Compression: Brotli, Zstandard, and When Each Wins
Compression reduces transfer size, which reduces time-on-the-wire — a direct component of TTFB. But here's the catch: the wrong compression strategy can actually increase TTFB by consuming too much server CPU.
Brotli vs. Zstandard in 2026
| Algorithm | Compression Ratio | Speed | Best For |
|---|---|---|---|
| Brotli (level 11) | 3.08:1 | Slow (offline only) | Pre-compressed static assets |
| Brotli (level 4–6) | ~2.8:1 | Moderate | Dynamic content on fast servers |
| Zstandard | 2.86:1 | 42% faster than Brotli | Dynamic content, APIs |
| gzip | 2.56:1 | Fast | Legacy fallback |
So here's the practical strategy for 2026: pre-compress static assets with Brotli level 11 at build time for the smallest possible file sizes. For dynamic HTML and API responses, use Zstandard — it compresses 42% faster than Brotli at comparable ratios, keeping your TTFB low under load. High Brotli levels (9–11) can eat 5–10x more CPU, which directly increases TTFB on dynamic pages.
Enabling Zstandard on Cloudflare
Cloudflare supports Zstandard on all plans now, including the free tier. Browsers that send Accept-Encoding: zstd will automatically receive Zstandard-compressed responses. No server configuration changes needed on your end — Cloudflare handles compression at the edge.
103 Early Hints: Start Fetching Before the Page Arrives
HTTP 103 Early Hints is one of those features that sounds almost too good to be true. When your server receives a request, it immediately fires off a 103 response with Link headers pointing to critical resources — then continues processing the full HTML response in the background.
HTTP/2 103 Early Hints
Link: </styles/critical.css>; rel=preload; as=style
Link: </fonts/inter.woff2>; rel=preload; as=font; crossorigin
Link: <https://cdn.example.com>; rel=preconnect
HTTP/2 200 OK
Content-Type: text/html
...
Think about what's happening here: while the server spends 300 ms querying the database and rendering HTML, the browser is already fetching your CSS and fonts. By the time the HTML arrives, critical resources are partially or fully loaded — dramatically improving FCP and LCP.
Implementation in NGINX 1.29+
# NGINX 1.29+ native Early Hints support
location / {
# Send 103 Early Hints before proxying to the backend
early_hints "Link: </css/main.css>; rel=preload; as=style";
early_hints "Link: </fonts/inter.woff2>; rel=preload; as=font; crossorigin";
proxy_pass http://app_backend;
}
Key Caveats
- HTTP/2 or HTTP/3 required — browsers ignore 103 responses over HTTP/1.1 for security reasons.
- Limit to 1–3 hints — Shopify's testing found that preloading more resources can actually degrade LCP on mobile due to bandwidth contention. More isn't always better here.
- Measure real TTFB separately — because
responseStartfires on the 103 response, you'll need to usefinalResponseHeadersStartor theServer-Timingheader to track actual backend processing time.
DNS and Network Optimization
DNS resolution happens before your server even knows a request is coming. A slow DNS provider adds 50–150 ms to every uncached navigation — and it's something a lot of developers just never think to check.
Practical DNS Improvements
- Use a fast DNS provider: Cloudflare DNS (1.1.1.1) and Google Public DNS (8.8.8.8) consistently benchmark under 15 ms globally. Many shared hosting packages come with slow default DNS that quietly adds 80–150 ms to every request.
- Reduce DNS lookups: Each unique hostname on your page requires a separate DNS resolution. Consolidate third-party resources where possible — four tracking scripts from four different domains means four extra DNS lookups.
- Use
dns-prefetchfor third-party origins: Add<link rel="dns-prefetch" href="https://analytics.example.com">in your<head>to resolve third-party DNS in the background before resources are actually requested.
Database and Application Optimization
If your Server-Timing header reveals that db or app timings dominate your TTFB, the fix lives in your backend code. No amount of CDN caching will help if your origin is painfully slow for uncacheable requests.
Database Quick Wins
- Add indexes on columns used in
WHERE,JOIN, andORDER BYclauses. A missing index can turn a 5 ms query into a 500 ms full table scan — I've seen it happen more times than I can count. - Use connection pooling to avoid the overhead of establishing new database connections per request (that's 20–50 ms you're throwing away each time).
- Batch queries — if your page makes 10 sequential database calls, refactor them into 2–3 parallel queries or a single join. Sequential queries accumulate latency linearly, which adds up fast.
- Cache hot queries in Redis with a short TTL. For content that changes hourly, a 60-second cache eliminates 99% of database hits.
Application-Level Optimization
- Stream HTML responses — frameworks like React 18+ and SolidStart support streaming SSR, which sends the
<head>and above-the-fold content immediately while the rest of the page renders. This dramatically reduces perceived TTFB. - Avoid synchronous I/O in the request path. Every blocking file read or synchronous API call stalls the response.
- Profile your application with the Server-Timing header to identify which functions consume the most time, then optimize or cache those specific bottlenecks.
Putting It All Together: A TTFB Optimization Checklist
- Measure your baseline TTFB with WebPageTest (from multiple locations) and RUM via the Navigation Timing API.
- Add Server-Timing headers to identify whether the bottleneck is database, application, or network.
- Implement server-side caching with Redis or Memcached for expensive queries and rendered pages.
- Deploy a CDN and cache HTML documents at the edge with
stale-while-revalidate. - Enable Brotli for static assets (pre-compressed) and Zstandard for dynamic content.
- Implement 103 Early Hints for your 1–3 most critical subresources.
- Optimize DNS with a premium provider and minimize third-party hostname lookups.
- Profile and fix database queries — add indexes, use connection pooling, batch queries.
- Consider edge SSR for server-rendered applications to achieve sub-50 ms TTFB globally.
- Set up continuous monitoring — track p75 TTFB in your RUM dashboard and alert on regressions.
Frequently Asked Questions
Is TTFB a Core Web Vital?
No, it's not. TTFB is classified as a diagnostic metric, not a Core Web Vital. That said, it has a major indirect impact because a slow TTFB directly inflates both First Contentful Paint (FCP) and Largest Contentful Paint (LCP) — and LCP is a Core Web Vital. Optimizing TTFB is one of the fastest paths to better LCP scores.
What is a good TTFB for a website?
Google recommends a TTFB of 800 ms or less at the 75th percentile. For competitive performance, aim for under 200 ms. With edge caching or edge-side rendering, sub-50 ms TTFB is achievable in 2026. Anything above 1800 ms is considered poor and typically points to a server-side issue that needs immediate attention.
How do I check my website's TTFB?
You can measure TTFB with Chrome DevTools (Network tab → "Waiting for server response"), WebPageTest, GTmetrix, or Google PageSpeed Insights. For field data, use the Navigation Timing API with performance.getEntriesByType("navigation")[0].responseStart to capture TTFB from real users and send it to your analytics.
Does a CDN reduce TTFB?
Yes, and significantly so. A CDN caches content on servers physically close to your users, eliminating those long network round-trips to a distant origin. For cacheable content, a CDN can reduce TTFB from 500+ ms to under 20 ms. Even for uncacheable dynamic content, a CDN still reduces the TLS handshake and network latency portions of TTFB.
Can TTFB be zero?
Effectively, yes — with prerendering. The Speculation Rules API lets browsers prerender pages before the user navigates to them. When someone clicks the link, the page is already fully loaded in a hidden tab, resulting in a perceived TTFB of 0 ms. This works best for high-confidence navigations like search result clicks or prominent call-to-action buttons.