Vision Models

Welcome to the Vision Models API documentation! The AI/ML API allows you to leverage vision capabilities to analyze and understand images through our models.

Key Features

Image Analysis: Understand and describe the content of images.
Flexible Input Methods: Supports both image URLs and base64 encoded images.
Multiple Image Inputs: Analyze multiple images in a single request.

Quick Start

Images can be provided to the model in two main ways: by passing an image URL or by passing the base64 encoded image directly in the request.

Currently we support gpt-4o vision and gpt-4-turbo vision.

Example: What's in this image?

Python Example

import requests
import json

url = "https://api.aimlapi.com/chat/completions"

payload = json.dumps({
  "model": "gpt-4o",
  "messages": [
    {
      "role": "user",
      "content": [
        {"type": "text", "text": "What’s in this image?"},
        {
          "type": "image_url",
          "image_url": {
            "url": "https://upload.wikimedia.org/wikipedia/commons/thumb/d/dd/Gfp-wisconsin-madison-the-nature-boardwalk.jpg/2560px-Gfp-wisconsin-madison-the-nature-boardwalk.jpg"
          }
        }
      ]
    }
  ],
  "max_tokens": 300
})

headers = {
  'Content-Type': 'application/json',
  'Authorization': 'Bearer YOUR_API_KEY'
}

response = requests.post(url, headers=headers, data=payload)
print(response.json())

Uploading Base64 Encoded Images

For local images, you can pass the base64 encoded image to the model.

Python Example

import base64
import requests

# Function to encode the image
def encode_image(image_path):
  with open(image_path, "rb") as image_file:
    return base64.b64encode(image_file.read()).decode('utf-8')

# Path to your image
image_path = "path_to_your_image.jpg"
base64_image = encode_image(image_path)

url = "https://api.aimlapi.com/chat/completions"
headers = {
  "Content-Type": "application/json",
  "Authorization": "Bearer YOUR_API_KEY"
}
payload = {
  "model": "gpt-4o",
  "messages": [
    {
      "role": "user",
      "content": [
        {"type": "text", "text": "What’s in this image?"},
        {"type": "image_url", "image_url": {"url": f"data:image/jpeg;base64,{base64_image}"}}
      ]
    }
  ],
  "max_tokens": 300
}

response = requests.post(url, headers=headers, json=payload)
print(response.json())

Multiple Image Inputs

The API can process multiple images in a single request.

Python Example

import requests
import json

url = "https://api.aimlapi.com/chat/completions"

payload = json.dumps({
  "model": "gpt-4o",
  "messages": [
    {
      "role": "user",
      "content": [
        {"type": "text", "text": "What are in these images? Is there any difference between them?"},
        {"type": "image_url", "image_url": {"url": "https://upload.wikimedia.org/wikipedia/commons/thumb/d/dd/Gfp-wisconsin-madison-the-nature-boardwalk.jpg/2560px-Gfp-wisconsin-madison-the-nature-boardwalk.jpg"}},
        {"type": "image_url", "image_url": {"url": "https://upload.wikimedia.org/wikipedia/commons/thumb/d/dd/Gfp-wisconsin-madison-the-nature-boardwalk.jpg/2560px-Gfp-wisconsin-madison-the-nature-boardwalk.jpg"}}
      ]
    }
  ],
  "max_tokens": 300
})

headers = {
  'Content-Type': 'application/json',
  'Authorization': 'Bearer YOUR_API_KEY'
}

response = requests.post(url, headers=headers, data=payload)
print(response.json())

上一页Image Models 下一页Speech Models

最后更新于1年前

hashtagKey Features

hashtagQuick Start

hashtagMultiple Image Inputs

Key Features

Quick Start

Multiple Image Inputs