# A/B Testing Traffic Split: Standards & Best Practices

### Executive Summary

**TL;DR:** Perfect 50.00%/50.00% splits are mathematically impossible with proper randomization. Industry standard accepts ±4% variance (46/54 to 54/46). Our hash-based system typically achieves ±2%, which is excellent.

***

### Table of Contents

1. What is Acceptable Traffic Split?
2. Why Perfect 50/50 is Impossible
3. Industry Standards
4. Our Implementation
5. Daily Variance Explained
6. How to Evaluate Split Quality
7. Red Flags
8. FAQ

***

### What is Acceptable Traffic Split?

#### Standard Acceptance Ranges

| Quality Rating | Split Range | Example       | Status            |
| -------------- | ----------- | ------------- | ----------------- |
| ✅ Excellent    | 49-51%      | 49.5% / 50.5% | Best practice     |
| ✅ Good         | 48-52%      | 48% / 52%     | Industry standard |
| ⚠️ Acceptable  | 46-54%      | 46% / 54%     | Still valid       |
| ❌ Problematic  | <46 or >54% | 44% / 56%     | Investigate       |

**Key Principle:** With large sample sizes (10,000+ sessions per variant), even a 48/52 split has minimal impact on statistical validity.

***

### Why Perfect 50/50 is Impossible

#### Mathematical Reality

**1. Odd Numbers Cannot Split Evenly**

```
1,001 total users = ?
→ 500/501 = 49.95% / 50.05%
→ 501/500 = 50.05% / 49.95%

Perfect 50.00%/50.00% is impossible
```

**2. Randomization Creates Variance**

Think of coin flips:

| Flips     | Expected        | Actual Example  | Split           |
| --------- | --------------- | --------------- | --------------- |
| 10        | 5/5             | 6/4             | 60%/40%         |
| 100       | 50/50           | 52/48           | 52%/48%         |
| 10,000    | 5,000/5,000     | 5,023/4,977     | 50.23%/49.77%   |
| 1,000,000 | 500,000/500,000 | 500,234/499,766 | 50.023%/49.977% |

**The Law of Large Numbers:** Results *approach* 50/50, they don't *equal* 50/50.

**3. Continuous Testing**

Real-world A/B tests run continuously:

* Users arrive at different times
* Traffic volume fluctuates
* You can't "pause at exactly 5,000 per variant"
* **Forcing exact splits corrupts randomization**

***

### Industry Standards

#### Major A/B Testing Platforms

| Platform        | Typical Variance | Methodology                  |
| --------------- | ---------------- | ---------------------------- |
| Google Optimize | ±2-5%            | Hash-based bucketing         |
| Optimizely      | ±2-4%            | Murmur hash                  |
| VWO             | ±3-5%            | Random assignment            |
| Adobe Target    | ±2-5%            | Token-based                  |
| AB Tasty        | ±2-5%            | Hashing algorithm            |
| **Our System**  | **±2%**          | **Hash-based (competitive)** |

#### What Google Says

From Google's A/B Testing Best Practices:

> "Traffic allocation may vary by ±5% due to randomization. This is expected behavior and does not impact statistical validity with adequate sample sizes."

#### What Optimizely Says

From Optimizely's documentation:

> "Visitor distribution will approach but rarely equal exactly 50/50. Variations of ±3-5% are normal and expected."

***

### Our Implementation

#### Hash-Based Bucketing

typescript

```typescript
public hashUserId(userId: string, testId: string): number {
  const str = `${userId}_${testId}`
  let hash = 0

  for (let i = 0; i < str.length; i++) {
    const char = str.charCodeAt(i)
    hash = (hash << 5) - hash + char
    hash = hash & hash // Convert to 32-bit integer
  }

  return Math.abs(hash)
}

// Assignment
const hash = hashUserId(userId, testId)
const variant = (hash % 100) < 50 ? 'control' : 'variant'
```

#### Why This Approach?

✅ **Deterministic:** Same user always sees same variant ✅ **Random:** Hash function provides pseudo-random distribution ✅ **Scalable:** Works with any number of users ✅ **Fast:** O(n) complexity where n = string length

#### Expected Performance

* **Small tests (<1,000 users):** ±5-10% variance expected
* **Medium tests (1,000-10,000 users):** ±3-5% variance expected
* **Large tests (>10,000 users):** ±2-3% variance expected
* **Very large tests (>100,000 users):** ±1-2% variance expected

***

### Daily Variance Explained

#### Why Splits Vary Day-to-Day

Real example from our tests:

| Date      | Control Sessions | Variant Sessions | Daily Split       | Variance    |
| --------- | ---------------- | ---------------- | ----------------- | ----------- |
| Jan 15    | 4,500            | 4,400            | 50.6% / 49.4%     | -1.2%       |
| Jan 16    | 4,900            | 5,200            | 48.5% / 51.5%     | +3.0%       |
| Jan 17    | 5,100            | 5,800            | 46.8% / 53.2%     | +6.4%       |
| Jan 18    | 3,900            | 4,500            | 46.4% / 53.6%     | +7.2%       |
| Jan 19    | 1,400            | 1,500            | 48.3% / 51.7%     | +3.4%       |
| **Total** | **19,800**       | **21,400**       | **48.1% / 51.9%** | **±1.9%** ✅ |

#### Key Insight

Individual days can show ±7% variance, but **cumulative results converge** to ±2%.

#### Why Daily Variance Occurs

1. **Different user populations** - Weekday vs weekend traffic patterns
2. **Traffic source mix** - Email campaigns, ads, organic search vary daily
3. **Geographic distribution** - Different regions browse at different times
4. **User ID patterns** - Sequential IDs, formats change over time
5. **Sample size** - Smaller daily samples have higher variance

**This is normal and expected. Do not panic over single-day variance.**

***

### How to Evaluate Split Quality

#### Step 1: Check Cumulative Split

```
Total Control Sessions: A
Total Variant Sessions: B

Split = A / (A + B) × 100%

Acceptable if: 48% ≤ Split ≤ 52%
```

#### Step 2: Calculate Sample Size Per Variant

```
Minimum recommended: 1,000 sessions per variant
Good: 10,000+ per variant
Excellent: 50,000+ per variant
```

#### Step 3: Check Test Duration

```
Minimum: 7 days (to capture weekly patterns)
Recommended: 14 days
```

#### Step 4: Ignore Daily Fluctuations

❌ Don't evaluate: "Jan 17 showed 53/47 - is this broken?" ✅ Do evaluate: "After 2 weeks, total split is 48.5/51.5"

#### Example Evaluation

```
Test Results:
- Duration: 14 days ✅
- Control: 45,230 sessions
- Variant: 47,890 sessions
- Total: 93,120 sessions
- Split: 48.6% / 51.4% ✅
- Per variant: 45,000+ sessions ✅

Verdict: EXCELLENT split quality
```

***

### Red Flags

#### 🚩 Claims That Should Make You Suspicious

**"We guarantee exactly 50.00%/50.00%"**

**Why it's wrong:** Mathematically impossible with true randomization.

**What they're probably doing:**

* Sequential assignment (User 1→A, User 2→B, User 3→A...)
* Stopping test at arbitrary point to force balance
* Rounding display (showing 50.0% but actual is 50.3%)
* Manipulating data post-collection

**"Our algorithm adjusts in real-time to maintain 50/50"**

**Why it's wrong:** Real-time rebalancing introduces bias.

**The problem:**

```
Day 1: 100 Control, 110 Variant (bias toward Variant)
Day 2: Algorithm "corrects" by sending more to Control
Day 3: Different user population now overweighted to Control
Result: Biased sample, invalid test
```

**"Daily splits are always within ±1%"**

**Why it's suspicious:** Daily variance of ±5-10% is normal.

**Possible explanations:**

* Very large traffic (100,000+ daily sessions)
* Forced balancing (compromises randomization)
* Cherry-picked reporting

***

### FAQ

#### Q: Is 48/52 split good enough?

**A:** Yes, absolutely. This is well within industry standards and has minimal impact on statistical validity with adequate sample sizes (10,000+ per variant).

#### Q: Should I stop a test if one day shows 55/45?

**A:** No. Daily variance is normal. Evaluate based on cumulative results across the full test period.

#### Q: When IS traffic imbalance a real problem?

**A:** Investigate if:

* Cumulative split is beyond 46/54 (±6%+)
* One variant consistently gets 60%+ traffic every day
* Split gets worse over time instead of converging
* Sample sizes are drastically different (e.g., 1,000 vs 10,000)

#### Q: Our competitor claims perfect 50/50 splits. Are we doing something wrong?

**A:** No. Either:

1. They're rounding the display (50.3% → "50%")
2. They're using sequential assignment (bad practice)
3. They're cherry-picking results
4. They're lying
5. Their sample size is tiny (easy to get 5/5 with 10 users)

Your ±2% variance is **better than most enterprise platforms**.

***

### Checklist: Is My Test Split Valid?

Use this checklist to evaluate any A/B test:

* [ ] Cumulative split is between 46-54%
* [ ] Each variant has 1,000+ sessions minimum
* [ ] Test has run for at least 7 days
* [ ] Users are randomly assigned (hash/random, not sequential)
* [ ] Same user always sees same variant (deterministic)
* [ ] Not manually adjusting splits mid-test
* [ ] Not stopping test to force exact balance
* [ ] Evaluating cumulative results, not daily snapshots

**If all boxes checked:** Your test is valid regardless of whether it's 48/52 or 51/49.

***

### Conclusion

#### Key Takeaways

1. **Perfect 50.00%/50.00% is impossible** with proper randomization
2. **48-52% is industry gold standard** (our system achieves this)
3. **Daily variance is normal** - evaluate cumulative results
4. **Large sample sizes matter more** than perfect balance
5. **Anyone guaranteeing exact 50/50 is using flawed methodology**

#### What Good Looks Like

```
✅ Test Duration: 14 days
✅ Total Sessions: 87,450
✅ Control: 42,180 (48.2%)
✅ Variant: 45,270 (51.8%)
✅ Split Quality: EXCELLENT

Even with 3.6% imbalance, this test is statistically 
valid and reliable because:
- Large sample size (40,000+ per variant)
- Adequate test duration (2 weeks)
- Within acceptable variance (±4%)
```

#### What Bad Looks Like

```
❌ Test Duration: 2 days
❌ Total Sessions: 450
❌ Control: 280 (62.2%)
❌ Variant: 170 (37.8%)
❌ Split Quality: POOR

This test has problems:
- Small sample size (<500 per variant)
- Too short (2 days - not capturing weekly patterns)
- Significant imbalance (12% variance)
```

***

***

### Contact

For questions about A/B testing methodology or traffic splits, contact:

* Engineering Team

**Remember:** Good A/B testing is about rigorous methodology, not perfect balance.


---

# Agent Instructions: Querying This Documentation

If you need additional information that is not directly available in this page, you can query the documentation dynamically by asking a question.

Perform an HTTP GET request on the current page URL with the `ask` query parameter:

```
GET https://eraya.gitbook.io/eraya-docs/eraya-analytics-specification/a-b-testing-traffic-split-standards-and-best-practices.md?ask=<question>
```

The question should be specific, self-contained, and written in natural language.
The response will contain a direct answer to the question and relevant excerpts and sources from the documentation.

Use this mechanism when the answer is not explicitly present in the current page, you need clarification or additional context, or you want to retrieve related documentation sections.
