How We Translated 118 Pages Into 30 Languages for $22 Each
Last month we shipped a project that would have cost somewhere between $150,000 and $300,000 through a traditional translation agency. We did it for $660 total. That's 118 pages translated into 30 languages at roughly $22 per language. No, that's not a typo. And no, the quality wasn't garbage.
I want to walk through exactly how we pulled this off -- the architecture, the tooling, the prompt engineering, the quality assurance process, and the honest tradeoffs. Because cheap doesn't have to mean bad, but it does mean you need to be smart about where you invest your effort.
Table of Contents
- The Project Scope
- Why Traditional Translation Is So Expensive
- Our AI Translation Architecture
- The Prompt Engineering That Actually Matters
- Cost Breakdown: Where the $22 Goes
- Quality Assurance Without Native Speakers
- Technical Implementation in Next.js
- What AI Translation Gets Wrong
- When You Should Still Pay for Human Translation
- FAQ
The Project Scope
The client was a B2B SaaS company expanding into European, Asian, and Latin American markets. Their marketing site had 118 pages: landing pages, feature pages, blog posts, legal pages, and documentation. The content was originally in English.
The target languages included the usual suspects -- Spanish, French, German, Japanese, Korean, Mandarin Chinese -- plus some that are harder to find translators for, like Estonian, Latvian, Lithuanian, and Slovenian. Thirty languages total.
Some quick math on the content volume:
| Metric | Count |
|---|---|
| Total pages | 118 |
| Average words per page | ~620 |
| Total English words | ~73,160 |
| Total translated words | ~2,194,800 (73,160 × 30) |
| Languages | 30 |
| Total cost | ~$660 |
| Cost per language | ~$22 |
| Cost per word (translated) | $0.0003 |
For context, professional human translation typically runs $0.10 to $0.30 per word depending on the language pair. At the midpoint of $0.20/word, we'd be looking at $14,632 per language or $438,960 total. Even the budget agencies that use machine translation with light human review charge $0.05-0.08 per word.
Why Traditional Translation Is So Expensive
I don't want to bash the translation industry. Human translators do incredible work, and for certain content types, there's no substitute. But here's what drives the cost:
Per-word pricing models were designed for a world where every word required human cognitive effort. A translator might handle 2,000-3,000 words per day for technical content. At 73,160 words, that's 24-36 translator-days per language. Multiply by 30 languages and you're looking at 720-1,080 person-days of work.
Rare language pairs cost more. Finding a quality English-to-Latvian technical translator isn't easy. Supply and demand kicks in.
Project management overhead is real. Translation agencies have project managers coordinating between translators, reviewers, and clients. That overhead gets baked into the per-word rate.
Context switching costs time. A translator working on your marketing copy needs to understand your brand voice, your product terminology, and your audience. That ramp-up time gets amortized across the project, but it's real.
None of this is wasteful -- it's just expensive. And for a company testing new markets, spending $400K on translation before you've validated product-market fit in those regions is a hard pill to swallow.
Our AI Translation Architecture
Here's the system we built. It's not a single API call -- it's a pipeline.
Step 1: Content Extraction and Segmentation
The site was built with Next.js, which made our job easier. All the content lived in structured data files (MDX for blog posts, JSON for UI strings, and structured content from a headless CMS).
We wrote a script that crawled all content sources and produced a normalized intermediate format:
interface TranslationUnit {
id: string; // unique key like "homepage.hero.title"
source: string; // English text
context: string; // where this appears (page, section)
type: 'heading' | 'paragraph' | 'ui-string' | 'legal' | 'meta';
maxLength?: number; // for UI strings with space constraints
glossaryTerms: string[]; // product-specific terms found in this unit
}
This is critical. You don't want to throw entire pages at an LLM and hope for the best. Segmenting content into translation units gives you control over context, lets you handle different content types differently, and makes incremental updates possible later.
Step 2: Glossary and Style Guide Generation
Before translating a single word, we built a glossary. This included:
- Product names (never translate these)
- Technical terms with preferred translations
- Brand-specific phrases
- Tone guidelines per content type
We actually used Claude to help build the initial glossary by analyzing the English content and identifying terms that would need consistent translation. Then we had the client review and approve it.
Step 3: Batch Translation with Claude API
We used the Claude 3.5 Sonnet API (now Claude 4 Sonnet is available and even better for this) for the actual translation. Why Claude over GPT-4o or Gemini? A few reasons:
- Better at following complex system prompts consistently
- More natural output in Romance and Germanic languages in our testing
- The 200K context window let us include full glossaries and style guides in every request
- Pricing was competitive for our use case
We batched translation units in groups of 20-30, organized by page and content type. Each batch included the glossary, style guide, and context about where the text appeared.
import anthropic
import json
client = anthropic.Anthropic()
def translate_batch(units: list[dict], target_lang: str, glossary: dict, style_guide: str) -> list[dict]:
system_prompt = f"""You are a professional translator specializing in {target_lang}
localization for B2B software companies.
GLOSSARY (use these exact translations):
{json.dumps(glossary[target_lang], indent=2, ensure_ascii=False)}
STYLE GUIDE:
{style_guide}
RULES:
- Preserve all markdown formatting
- Never translate product names listed in the glossary
- Adapt idioms naturally -- don't translate literally
- For UI strings with maxLength, stay within the character limit
- Output valid JSON matching the input structure"""
user_prompt = f"""Translate the following translation units to {target_lang}.
Return JSON array with same structure, replacing 'source' with 'translation'.
{json.dumps(units, indent=2, ensure_ascii=False)}"""
response = client.messages.create(
model="claude-sonnet-4-20250514",
max_tokens=8192,
system=system_prompt,
messages=[{"role": "user", "content": user_prompt}]
)
return json.loads(response.content[0].text)
Step 4: Automated Quality Checks
After translation, every unit ran through automated checks:
- Format preservation: Did markdown, HTML tags, and variables survive?
- Length validation: Are UI strings within their max length?
- Glossary compliance: Were product names left untranslated?
- Placeholder integrity: Are
{variable}placeholders intact? - Back-translation sampling: Translate 10% of output back to English and compare semantic similarity
About 3-4% of translation units failed one or more checks and went through a second pass with specific correction instructions.
Step 5: Assembly and Integration
Translated units got assembled back into the format the Next.js app expected -- JSON locale files, translated MDX, and CMS entries. We used next-intl for the routing and locale management.
The Prompt Engineering That Actually Matters
I've seen people throw text at ChatGPT and call it "AI translation." That gives you maybe 70% quality. The gap between 70% and 95% is entirely in how you prompt.
Here's what moved the needle:
Context is everything
Telling the model "translate this to French" gives you generic output. Telling it "translate this hero headline for a B2B SaaS landing page targeting IT directors in France, maintaining a confident but not aggressive tone" gives you something usable.
We included the page type, the target audience, and the purpose of each content block in every request.
Few-shot examples per language
For each language, we created 5-10 example translations that captured the tone we wanted. These went into the system prompt. For languages where we had a native speaker on the team or in our network (about 8 of the 30), we had them write these examples. For the rest, we generated them and then refined through back-translation comparison.
Glossary enforcement
This sounds obvious but it's the most impactful thing you can do. Without a glossary, the model will translate your product name "CloudSync" to the equivalent of "cloud synchronization" in some languages. It'll use different terms for the same feature across pages. Inconsistency kills trust.
Chunking strategy
We found that translating 500-800 words at a time, grouped by page section, gave the best results. Too small (individual sentences) and you lose context. Too large (entire pages) and quality degrades toward the end of the output.
Cost Breakdown: Where the $22 Goes
Let's get specific about the money.
| Cost Component | Per Language | Total (30 langs) |
|---|---|---|
| Claude API (translation) | $16.40 | $492.00 |
| Claude API (QA/back-translation) | $3.20 | $96.00 |
| Claude API (glossary generation) | $0.80 | $24.00 |
| Misc API calls (retries, corrections) | $1.60 | $48.00 |
| Total API costs | $22.00 | $660.00 |
This doesn't include engineering time to build the pipeline, which was about 40 hours. But that pipeline is now reusable. When the client adds a new blog post, translating it into all 30 languages costs about $2-4 in API fees and runs automatically in their CI/CD pipeline.
The Claude API pricing at the time of our project (using Claude 3.5 Sonnet) was $3 per million input tokens and $15 per million output tokens. With Claude 4 Sonnet, the pricing is comparable but you get better quality, which means fewer retries.
Quality Assurance Without Native Speakers
This is the part people are most skeptical about, and honestly, they should be. Here's our actual QA process:
Automated checks (catches ~60% of issues)
The format preservation, length, and glossary checks I mentioned. These are deterministic and catch the most embarrassing errors -- broken HTML, missing variables, translated brand names.
Back-translation comparison (catches ~25% of remaining issues)
We translated a random 10% sample of each language back to English using a different model (GPT-4o) and compared semantic similarity with the original. If the back-translation diverged significantly, we flagged it for review.
Native speaker spot-checks (catches nuance issues)
For the 8 languages where we had access to native speakers (Spanish, French, German, Portuguese, Japanese, Korean, Mandarin, Dutch), we had them review 15-20 pages each. Their feedback was illuminating:
- Overall quality: 8-9/10 for informational content
- Marketing headlines: 6-7/10 (needed more creative adaptation)
- Technical documentation: 9/10
- Legal pages: 7/10 (acceptable but not perfect)
Based on their feedback, we did a second pass on marketing headlines with more creative prompting, which brought those up to 8/10.
Community feedback loop
The client added a small "Suggest a better translation" link on every page. In the first month after launch, they received about 140 suggestions across all languages -- roughly 0.04% of all translated content. Most suggestions were stylistic preferences rather than errors.
Technical Implementation in Next.js
The site uses Next.js App Router with next-intl for internationalization. Here's the high-level setup:
// middleware.ts
import createMiddleware from 'next-intl/middleware';
export default createMiddleware({
locales: ['en', 'es', 'fr', 'de', 'ja', 'ko', 'zh', /* ... 23 more */],
defaultLocale: 'en',
localePrefix: 'as-needed'
});
For the headless CMS integration, translated content gets stored as locale variants. Blog posts in MDX get separate files per locale. UI strings live in JSON message files.
The build generates static pages for all locale/page combinations. That's 118 × 31 (including English) = 3,658 pages. With ISR (Incremental Static Regeneration), this is totally manageable.
One thing worth noting: we implemented hreflang tags programmatically for SEO. Each page links to all its language variants. This is critical for Google to understand your multilingual site structure.
// app/[locale]/layout.tsx
export function generateMetadata({ params: { locale } }) {
const alternates = {
languages: Object.fromEntries(
locales.map(l => [l, `/${l}${pathname}`])
)
};
return { alternates };
}
What AI Translation Gets Wrong
I'd be dishonest if I said AI translation is perfect. Here's where it consistently struggles:
Marketing wordplay and puns. If your headline is clever in English, the AI will either translate it literally (losing the cleverness) or attempt a target-language pun that doesn't quite land. We rewrote about 15% of marketing headlines manually with creative direction.
Cultural adaptation. Translation and localization aren't the same thing. The AI won't know that your American case study about a "401(k) provider" means nothing in Japan. It won't swap your dollar signs for local currency in examples. It won't know that red means luck in China but danger in the West. This requires human thinking.
Legal precision. For terms of service and privacy policies, AI translation gets you 90% there. But legal language needs to be precise, and in some jurisdictions, you need legally certified translations. We flagged legal pages for professional review in the 12 markets where the client was doing actual business (as opposed to the other 18 which were exploratory).
Honorific systems. Japanese, Korean, and Thai have complex systems of formality. The AI sometimes mixed formal and informal registers within the same page. Our glossary and style guide helped, but spot-checks caught a few inconsistencies.
Gender agreement in gendered languages. French, Spanish, German, Arabic -- when the source English is gender-neutral, the AI has to make choices. Sometimes it's inconsistent. Our automated checks caught most of these by comparing gender markers across related translation units.
When You Should Still Pay for Human Translation
AI translation at $22 per language is the right choice when:
- You're testing new markets and need speed over perfection
- Your content is primarily informational or technical
- You have 10+ target languages (the per-language savings compound)
- You need to translate frequently (blog posts, changelogs, docs)
Pay for human translation when:
- Legal liability is involved (contracts, compliance docs)
- Brand voice is critical (taglines, campaigns)
- You're in a regulated industry (medical, financial)
- You have 1-3 target languages and the budget for it
- Cultural adaptation matters as much as linguistic accuracy
The sweet spot we've found for most clients? AI translation for the bulk, human review for the critical 10-20%. That typically brings the total cost to $50-100 per language instead of $22, but with near-human quality across all content types.
If you're considering a multilingual website build, reach out to us -- we've refined this pipeline across several projects and can adapt it to your stack, whether that's Next.js, Astro, or another framework. Check our pricing page for how we scope internationalization projects.
FAQ
How does AI translation quality compare to human translation in 2025?
For informational and technical content, the gap has narrowed dramatically. In blind tests, native speakers rate Claude and GPT-4o translations at 85-92% of human translation quality for most European and East Asian languages. The gap is wider for creative marketing copy (70-80%) and legal text (75-85%). For less common languages like Latvian or Estonian, AI quality is actually comparable to what you'd get from budget human translation agencies, which often use machine translation with light editing anyway.
What's the cheapest way to translate a website in 2025?
The cheapest approach is direct API access to models like Claude or GPT-4o, which runs $0.0002-0.0005 per word. Services like Weglot ($15-50/month) or Lokalise are more expensive per word but handle the infrastructure for you. Google Translate API is cheaper per word (~$20 per million characters) but quality is noticeably lower than frontier LLMs. Our pipeline approach with Claude cost us about $0.0003 per translated word including QA passes.
Does AI translation work for right-to-left languages like Arabic and Hebrew?
Yes, but you need to handle the technical implementation carefully. The translation quality for Arabic and Hebrew from Claude is good -- our Arabic spot-check scored 8/10. The harder part is the RTL layout implementation in your frontend. CSS logical properties (margin-inline-start instead of margin-left) and proper dir="rtl" attributes are essential. Plan for UI elements that need to be mirrored.
How do you handle SEO for a website translated into 30 languages?
Three things matter most: proper hreflang tags on every page, locale-specific URLs (subdirectories like /fr/ or /de/ work well), and translated metadata (titles, descriptions, Open Graph tags). We generate all of this programmatically. Don't forget to submit locale-specific sitemaps to Google Search Console. Within 3 months of launching the 30-language site, the client saw organic traffic from non-English queries increase by 340%.
Can AI translate website content that includes technical jargon?
This is actually where AI translation shines. Technical jargon is usually consistent and well-defined, which plays to the model's strengths. The key is building a glossary of your specific terms with approved translations. Without a glossary, the model might translate "deployment pipeline" three different ways across your site. With one, it's rock-solid consistent.
How long does it take to AI-translate an entire website?
Our pipeline translated all 118 pages into all 30 languages in about 6 hours of compute time, running parallel API requests with rate limiting. The engineering time to build the pipeline was about 40 hours for the first project. Subsequent projects using the same pipeline take 8-15 hours of engineering time for setup and customization, plus the compute time.
What happens when you need to update content on a translated site?
This is where the segmented translation unit approach pays off massively. When a page changes, we diff the translation units against the previous version. Only changed or new units get re-translated. Updating a blog post across all 30 languages costs pennies and happens automatically in CI/CD. We track translation unit hashes to know exactly what's stale.
Is $22 per language realistic for any website, or just certain types?
The $22 figure is specific to our project's content volume (~73K words) and content type (B2B SaaS marketing and docs). Your mileage will vary. A content-heavy site with 500K words might cost $100-150 per language. A simple 10-page marketing site might cost $3-5 per language. The cost scales linearly with word count and slightly with complexity. The fixed cost is the engineering time to build or configure the pipeline.