OpenAI
OpenAI embedding provider documentation
OpenAI's text embedding models are industry-standard, widely-used models with excellent performance.
Overview
- Models: 3 models (text-embedding-3-large, text-embedding-3-small, text-embedding-ada-002)
- Key Features: Matryoshka embeddings (text-embedding-3 models), industry-standard performance
- API Docs: OpenAI Embeddings
Environment Variable
export OPENAI_API_KEY="your-openai-api-key"Supported Parameters
| Parameter | Type | Required | Description |
|---|---|---|---|
model | str | Yes | Model identifier (openai:model-name) |
input | str | List[str] | Yes | Text(s) to embed |
dimensions | int | No | Output dimensions (text-embedding-3 models only) |
Note: OpenAI does not support input_type. This parameter is ignored if provided.
Examples
Basic Usage
from catsu import Client
client = Client()
response = client.embed(
"openai:text-embedding-3-small",
"Hello, OpenAI!"
)
print(f"Dimensions: {response.dimensions}") # 1536
print(f"Tokens: {response.usage.tokens}")With Custom Dimensions (text-embedding-3 only)
# Reduce dimensions for faster similarity search
response = client.embed(
"openai:text-embedding-3-small",
["Sample text"],
dimensions=512 # vs default 1536
)
print(f"Dimensions: {response.dimensions}") # 512
# Large model with custom dimensions
response = client.embed(
"openai:text-embedding-3-large",
["Sample text"],
dimensions=1024 # vs default 3072
)Batch Processing
texts = [
"First document",
"Second document",
"Third document"
]
response = client.embed(
"openai:text-embedding-3-small",
texts
)
print(f"Processed {len(response.embeddings)} texts")
print(f"Total tokens: {response.usage.tokens}")Async Usage
import asyncio
from catsu import Client
async def main():
client = Client()
response = await client.aembed(
"openai:text-embedding-3-small",
"Async embedding"
)
print(response.embeddings)
asyncio.run(main())Model Variants
OpenAI offers three embedding models:
- text-embedding-3-large - Highest quality, 3072 dimensions
- text-embedding-3-small - Balanced performance, 1536 dimensions
- text-embedding-ada-002 - Legacy model, 1536 dimensions
For pricing and benchmarks, visit catsu.dev.
Special Notes
input_typeis NOT supported - OpenAI ignores this parameter- Dimensions supported for text-embedding-3 models only
- text-embedding-ada-002 does not support custom dimensions
- Maximum 8191 tokens per input
Next Steps
- Providers Overview - Compare all providers
- Best Practices: Model Selection - Choose the right model