Cost Tracking
Monitor and optimize your embedding costs
Catsu automatically tracks token usage and costs for all embedding requests.
Accessing Usage Data
Every embed() and aembed() response includes usage information:
response = client.embed(
model="voyage-3",
input="Sample text"
)
# Access usage data
print(f"Tokens: {response.usage.tokens}")
print(f"Cost: ${response.usage.cost:.6f}")
print(f"Latency: {response.usage.latency:.3f}s")Cost Components
class Usage:
tokens: int # Total tokens processed
cost: float # Cost in USD
latency: float # Request latency in secondsTracking Batch Costs
texts = ["Text 1", "Text 2", "Text 3"]
response = client.embed(model="voyage-3", input=texts)
# Total cost for batch
total_cost = response.usage.cost
print(f"Total: ${total_cost:.6f}")
# Cost per text
cost_per_text = total_cost / len(texts)
print(f"Per text: ${cost_per_text:.8f}")
# Tokens per text (approximate)
tokens_per_text = response.usage.tokens / len(texts)
print(f"Avg tokens per text: {tokens_per_text:.1f}")Accumulating Costs
Track costs across multiple requests:
total_cost = 0
total_tokens = 0
for batch in batches:
response = client.embed(model="voyage-3", input=batch)
total_cost += response.usage.cost
total_tokens += response.usage.tokens
print(f"Total cost: ${total_cost:.4f}")
print(f"Total tokens: {total_tokens:,}")Local Tokenization
Estimate costs before making API calls using local tokenization:
from transformers import AutoTokenizer
# Load tokenizer for your model
tokenizer = AutoTokenizer.from_pretrained("model-name")
# Count tokens locally (no API call)
text = "Sample text to estimate cost"
tokens = len(tokenizer.encode(text))
# Estimate cost (example for voyage-3 at $0.12/M tokens)
estimated_cost = (tokens / 1_000_000) * 0.12
print(f"Estimated cost: ${estimated_cost:.8f}")For current pricing, check catsu.dev.
Cost Comparison
Compare costs across providers:
models_to_compare = ["voyage-3", "text-embedding-3-small", "embed-v4.0"]
sample_text = "Compare embedding costs across providers"
for model in models_to_compare:
response = client.embed(model=model, input=sample_text)
print(f"{model}:")
print(f" Cost: ${response.usage.cost:.8f}")
print(f" Tokens: {response.usage.tokens}")
print(f" Dimensions: {response.dimensions}")Cost Optimization Strategies
Use Matryoshka Dimensions
# Full dimensions
full_response = client.embed(
model="voyage-3",
input="Text"
)
# Reduced dimensions (same API cost, lower storage/compute)
small_response = client.embed(
model="voyage-3",
input="Text",
dimensions=256
)
print(f"API cost: same")
print(f"Storage cost: 75% reduction (256 vs 1024 dimensions)")Choose Cost-Effective Providers
Different providers have different pricing models. Compare on catsu.dev.
Batch Efficiently
# Batching reduces per-text overhead
response = client.embed(
model="voyage-3",
input=texts # Batch of many texts
)
cost_per_text = response.usage.cost / len(texts)Monitoring in Production
class EmbeddingService:
def __init__(self):
self.client = catsu.Client()
self.total_cost = 0
self.total_requests = 0
def embed(self, text, model="voyage-3"):
response = self.client.embed(model=model, input=text)
# Track metrics
self.total_cost += response.usage.cost
self.total_requests += 1
# Log to monitoring system
self.log_metrics({
"cost": response.usage.cost,
"tokens": response.usage.tokens,
"latency": response.usage.latency,
})
return response.embeddings[0]
def get_stats(self):
return {
"total_cost": self.total_cost,
"total_requests": self.total_requests,
"avg_cost_per_request": self.total_cost / max(self.total_requests, 1)
}Best Practices
- Monitor costs in real-time
- Set budget alerts
- Use Matryoshka dimensions to reduce downstream costs
- Compare provider costs for your workload
- Optimize batch sizes
- Consider cost vs quality trade-offs
- Track costs per user/tenant in multi-tenant apps
Next Steps
- Models Catalog - Compare provider pricing
- Common Parameters: dimensions - Reduce dimensions to save costs
- Batch Processing - Optimize batching for cost