CatsuCatsu Docs

Jina AI

Jina AI embedding provider documentation

Jina AI provides multimodal and code-specific embedding models with extremely long context support.

Overview

  • Models: 6 models (v4 multimodal, v3, code models, v2 variants)
  • Key Features: Multimodal (text+image), code-optimized, up to 32,768 tokens context
  • API Docs: Jina AI Embeddings

Environment Variable

export JINA_API_KEY="your-jina-api-key"

Supported Parameters

ParameterTypeRequiredDescription
modelstrYesModel identifier
inputstr | List[str]YesText(s) to embed
input_typestrNo"query" or "document"
taskstrNoretrieval.query, retrieval.passage, text-matching, classification, separation
dimensionsintNoOutput dimensions (model-dependent)
normalizedboolNoL2 normalize embeddings (default: True)
api_keystrNoOverride API key

Examples

Basic Usage

response = client.embed(
    model="jina-embeddings-v3",
    input="Hello, Jina!"
)

Multimodal (v4)

response = client.embed(
    model="jina-embeddings-v4",
    input="Text that could be paired with images"
)

Code Embeddings

response = client.embed(
    model="jina-code-embeddings-1.5b",
    input="def calculate_similarity(a, b): return cosine(a, b)"
)

With Task Type

response = client.embed(
    model="jina-embeddings-v3",
    input="Search query",
    task="retrieval.query"
)

Long Context

# Jina supports up to 32,768 tokens
very_long_text = "..." * 10000

response = client.embed(
    model="jina-embeddings-v3",
    input=very_long_text
)

Special Notes

  • ✅ Multimodal support in v4 (text + images)
  • ✅ Code-specific models for software development
  • Up to 32,768 tokens context (industry-leading)
  • Normalized embeddings by default
  • Supports Matryoshka dimensions (model-dependent)

Next Steps

On this page