Extraction API · /extract + /generate

Stop building the extraction layer. Just call the endpoint.

Define a schema, pass a URL, and get back JSON that matches. No parsing code, no downstream LLM call, no extraction layer to maintain. The intelligence happens inside the call.

Start for free →Read the docs

✓Schema-matched JSON✓Clean Markdown Output✓Reasoning with /generate

/extract/json

Request

url:nike.com/t/pegasus-premium-womens-road-running-shoes-OHKEqA2b/HQ2593-004

schema:{ name, price, sizes: [{ size, in_stock }] }

Response

matches your schema

Nike Pegasus Premium$220

sizes

W 7 / M 5.5in stock

W 8 / M 6.5sold out

W 9 / M 7.5in stock

✓ One call. Down to per-size stock.

Schema-driven

You define the shape. Tabstack returns it.

Pass a URL and the JSON schema you need. Tabstack enforces it on server-rendered, client-rendered, and JS-heavy pages, with no parsing code, no Zod pass, and no prompt engineering on your side. You get the output. You never touch what produced it.

Features

✓Schema-driven extraction from any URL with /extract/json
✓You define the json_schema; Tabstack enforces the shape
✓Schema compliance on every call, even when the page changes

extract.ts

import Tabstack from '@tabstack/sdk'

const client = new Tabstack()

try {
  const pricing = await client.extract.json({
    url: 'https://competitor.com/pricing',
    json_schema: {
      type: 'object',
      properties: {
        plans: { type: 'array', items: {
          type: 'object', properties: {
            name: { type: 'string' },
            price: { type: 'number', description: 'Monthly USD' },
            features: { type: 'array', items: { type: 'string' } }
          }
        }}
      }
    }
  })
  console.log(pricing.plans)
} catch (err) { console.error(err) }

generate.ts

try {
  const analysis = await client.generate.json({
    url: 'https://competitor.com/pricing',
    instructions: 'Analyze the pricing. What segment does each tier target, and why?',
    json_schema: {
      type: 'object',
      properties: { tiers: { type: 'array', items: { type: 'object', properties: {
        name: { type: 'string' },
        target_segment: { type: 'string' },
        positioning_rationale: { type: 'string' }
      }}}}
    }
  })
  console.log(analysis.tiers)
} catch (err) { console.error(err) }

Reasoning

Go beyond fields. Get structured answers, not just values.

/generate/json adds instructions on top of the URL, so you get output that required reasoning, not just a pulled field. The move from getting the price to telling you what that pricing reveals about who they sell to.

Features

✓AI transformation with your custom instructions
✓Reasoning over content, not just field extraction
✓Clean Markdown for LLM input when you need the full page

Control

Keep your extraction running without babysitting it.

nocache forces fresh data for monitoring and change detection. effort scales cost to what the page needs. geo_target fetches pages as seen from any country. The call adapts to what the page requires, not what you hardcoded.

Features

✓nocache: true bypasses cache for fresh data every call
✓effort (min / standard / max): pay for what the page needs
✓geo_target: fetch a page as seen from a specific country

monitor.ts

// Monitoring: fresh data every run, cost scaled to the page
try {
  const current = await client.extract.json({
    url: 'https://competitor.com/pricing',
    json_schema: { /* your schema */ },
    nocache: true,            // always fresh
    effort: 'standard',       // scale to page complexity
    geo_target: { country: 'US' }
  })
  console.log(current)
} catch (err) { console.error(err) }

Who builds on this

Teams that turn web pages into structured data.

Price & catalog monitoring

Track competitor pricing and inventory as it changes.

Lead enrichment

Turn a URL into structured company and contact data.

Listings & marketplace data

Pull products, jobs, and listings into a fixed shape.

RAG & content ingestion

Clean, structured input for retrieval and indexing.

Mozilla-backed

Privacy, Transparency, and Control

When you build on /extract and /generate, the pages you fetch and the data you pull stay yours. Tabstack is a Mozilla-backed platform, and nothing you send is sold or used to train models.

Private by default.Requests and fetched pages are used to build your response and support you, then purged. Never sold, never used to train models.

Transparent by design.Mozilla-documented data practices and robots.txt compliance by default. See exactly how every endpoint sources and handles your data.

Yours to control.You set the schema, effort, and scope of every call. No retained corpus, no lock-in, just clean data you own.

See exactly how we source and handle data in the documentation.

Mozilla Manifesto