- 21 May 2024
- 44 Minutes to read
- Print
- PDF
Items API in Wasabi AiR
- Updated on 21 May 2024
- 44 Minutes to read
- Print
- PDF
Once harvesting has begun, you can access the item data using the Items API.
Item Object
The data that makes up an Item is outlined in Item Object.
Getting Item Without Metadata
GET /api/data/v3/items/{id}
- {id} - (string) The ID of the item to get.
Response
{
"item": {
"id": "f66082c13a7f3a10ebf89405433bb80f",
"location_id": "5c7ec14434f90e13219e3ece821dce55",
"container_id": "da971705615c7d57583a57fa170ab696",
"file_size": 15327,
"etag": "cff1e0c414d74cbcc436e1502f61cc8f",
"file_extension": "jpg",
"path": "",
"file_path": "",
"folder_path": "",
"gm_item_type": "video",
"hash_c4id": "c42YSyLSKjGuuqoUQPiM4qHeYs5CBNhmt5DDoAbzdxxEprrQwD6YmAi3FurRK9tS2kgj5msCq8rUqk95YWeAqwB7CM",
"hash_md5": "cff1e0c414d74cbcc436e1502f61cc8f",
"hash_sha1": "06fdba756542993f0070f4a863eceeb1bcefda33",
"hash_sha512": "630fa3947ed13bb6cdc4a303dba65eb4edfbf66b7ba2670bbb14a744419e0ab101106d1a5a901f1e0d4f876142585b316f1e1fb461db9758500e2e26b677bc85",
"harvester_version": "2.0.3134",
"last_harvested": "2019-06-05T19:46:30.176133Z",
"last_modified": "2019-04-16T13:44:34Z",
"location_kind": "local",
"location_name": "local",
"mime_type": "video/mp4",
"mime_category": "video",
"name": "tears.mp4",
"parent_id": "",
"root_id": "0eed8099520a60a2bd3701655f0fbe81",
"segment_interval": 2,
"shared_link": "",
"stow_container_id": "/data/videos",
"stow_container_name": "",
"stow_url": "s3://https://s3-us-west-2.amazonaws.com/3item/kid.jpg",
"thumbnail": {
"path": "thumbnailer/sprite.jpg",
"type": "sprite",
"frame_count": 30,
"height": 152,
"width": 270
},
"stow_metadata": [
{
"name": "mtime",
"value": "2019-07-02T21:57:41Z"
},
{
"name": "mode",
"value": "644"
},
{
"name": "name",
"value": "Jeff_with_location.JPG"
},
...
],
"stow_tags": [],
"drm": false,
"created_at": "2019-06-05T19:46:30.266769Z",
"updated_at": "2019-06-05T19:46:30.266769Z",
"in_progress": false,
"preview": {
"path": "fb3b37ce2a06c3aba49e07c7eb87acae/video_previews/preview.mp4",
"mime_type": "video/mp4"
},
"duration": 734167
}
}
Status codes:
- 200 (success)
- 404 (item not found)
- 500 (unexpected error)
Deleting by ID
To delete an item by its ID:
DELETE /api/data/v3/items/{id}
Status codes:
- 204 (no content)
- 500 (unexpected error)
Bulk Reading by IDs
You can return several items by providing a list of their IDs.
POST /api/data/v3/items/bulk
{
"ids": ["76ef280bc613f9eb3dace1c89efe982e", "5187feb5044ef4485e7bc0e2e72f79d1"]
}
Status codes:
- 204 (no content)
- 422 (unprocessable entity)
- 500 (unexpected error)
Bulk Deleting by IDs
You can delete several items by providing a list of their IDs.
DELETE /api/data/v3/items/bulk
{
"ids": ["76ef280bc613f9eb3dace1c89efe982e", "5187feb5044ef4485e7bc0e2e72f79d1"]
}
Status codes:
- 204 (no content)
- 422 (unprocessable entity)
- 500 (unexpected error)
Updating an Item's Custom Asset Title
PATCH /api/data/v3/items/{id}
{
"gm_asset_title": "my custom asset title"
}
- {id} - (string) The ID of the item to get.
- {gm_asset_title - (string) The custom asset title.
Response
{
"item": {
"id": "f66082c13a7f3a10ebf89405433bb80f",
"gm_asset_title": "my custom asset title"
}
}
Status codes:
- 200 (success)
- 403 (the user does not have permission to alter data)
- 404 (item not found)
- 500 (unexpected error)
Searching Within an Item
To search within the information associated with an item:
GET /api/data/v3/search/item/{item_id}?q={query}
- {item_id} - (string) The ID of the item within which to search.
- {query} - (string) The query string.
Response
The response is an object of histogram/timelines consisting of contiguous chunks of where the query value shows up within the item. If fields do not have matches, they are not populated.
{
"advertising": {
"histogram": [
{
"start": 2,
"end": 4
},
...
{
"start": 80,
"end": 88
}
]
},
"audio_classification": {
"histogram": [
{
"start": 0,
"end": 4
},
...
{
"start": 60,
"end": 72
}
]
},
"caption": {
"histogram": [
{
"start": 0.03,
"end": 4.92
},
...
{
"start": 207.355,
"end": 211.436
}
]
},
"description": {
"histogram": [
{
"start": 8,
"end": 20
},
...
{
"start": 110,
"end": 122
}
]
},
"location": {
"histogram": [
{
"start": 10,
"end": 14
}
]
},
"logo": {
"histogram": [
{
"start": 20,
"end": 34
}
]
},
"keyword": {
"histogram": [
{
"start": 2,
"end": 134
}
]
},
"mature_content": {
"histogram": [
{
"start": 30,
"end": 34
}
]
},
"ocr": {
"histogram": [
{
"start": 16,
"end": 46
},
...
{
"start": 112,
"end": 122
}
]
},
"people": {
"Kim Ryan": [
{
"start": 8,
"end": 14
},
...
{
"start": 118,
"end": 122
}
],
"Kim Smith": [
{
"start": 68,
"end": 72
}
]
},
"sound": {
"histogram": [
{
"start": 16,
"end": 46
},
...
{
"start": 112,
"end": 122
}
]
},
"speech_to_text": {
"histogram": [
{
"start": 0.1,
"end": 50.53
},
...
{
"start": 110.32,
"end": 130.08
}
]
},
"sport": {
"histogram": [
{
"start": 100,
"end": 164
}
]
},
"tag": {
"histogram": [
{
"start": 64,
"end": 66
}
]
},
"text_content": {
"histogram": [
{
"start": 55,
"end": 101
}
]
}
}
A successful response returns a Status OK (200) and, if an unexpected error occurs, a Status Internal Server Error (500) is returned.
Getting All Metadata
To get all metadata of an item by its ID, make the following request:
GET /api/data/items/{id}
- {id} - (string) The ID of the item to get.
- Do not include the only parameter.
Response
The response is a JSON document containing ALL metadata for the item.
Getting the metadata.json File for an Item
GET /files/{item_id}/metadata2.json
Response
The response is a JSON document containing the same data as the metadata.json file.
Selective Data
The Items API enables you to be selective about what data you get. You may:
- Use the only parameter to get specific leaf fields, or
- Use the include parameter to specify root objects to include.
You cannot use both only and include parameters at the same time.
Getting Specific Leaf Fields
To get only a list of specific fields, specify them using the only parameter.
GET /api/data/items/{id}?only=field1,field2,field3
- field1, field2, field3 - (comma separated list) List of fields to include.
- Only leaf fields are supported, so you must know the full path to the fields to get.
- If you need to get entire groups of data, consider using the include parameter.
This endpoint is extremely efficient and is preferred over include. For a complete list of acceptable fields, perform a GET request without any parameters to see the entire data payload.
Getting Specific Groups of Data
You can get groups of data at a time using the include parameter.
GET /api/data/items/{id}?include=obj1,obj2,obj3
- obj1, obj2, obj3 - (comma separated list) List of fields to include.
- Only root fields are supported. To get only fields from within objects, use the only parameter instead.
This endpoint is not as efficient as using the only parameter, but it is more convenient. For a complete list of acceptable fields, see Item Object.
Associating an Item With Categories
An item could be associated with one or many categories. One can associate an item with categories using:
POST /api/data/items/{id}/categories
{
"categories": ["cat1", "cat2"]
}
To disassociate categories from an item, use:
DELETE /api/data/items/{id}/categories/{categories}
- {categories} - A URL-encoded, comma-separated list of categories (for example, cat1,other%20category).
Getting Timelines
A timeline is a contiguous blocks of time where data is found. The response is separated by individual labels/identifiers.
Technical Cues
The technical cues endpoint is a wrapper for all the technical spanning metadata for a video.
GET /api/data/v3/items/{id}/timeline/technical-cues
Response
{
"technical_cues": {
"black_frames": {
"histogram": [
{
"start": 3.26993,
"end": 7.07373,
"start_frame": 99,
"end_frame": 213,
"start_control_time_code": "00:00:03:08",
"end_control_time_code": "00:00:07:02",
"start_relative_time_code": "00:13:25:10",
"end_relative_time_code": "00:13:29:04"
},
{
"start": 137.104,
"end": 137.471,
"start_frame": 4110,
"end_frame": 4121,
"start_control_time_code": "00:02:16:29",
"end_control_time_code": "00:02:17:10",
"start_relative_time_code": "00:15:39:01",
"end_relative_time_code": "00:15:39:12"
}
]
},
"color_bars": {
"histogram": [
{
"start": 32,
"end": 54
}
]
},
"credits": {
"histogram": [
{
"start": 3.26993,
"end": 7.07373,
"start_frame": 99,
"end_frame": 213,
"start_control_time_code": "00:00:03:08",
"end_control_time_code": "00:00:07:02",
"start_relative_time_code": "00:13:25:10",
"end_relative_time_code": "00:13:29:04"
},
{
"start": 137.104,
"end": 137.471,
"start_frame": 4110,
"end_frame": 4121,
"start_control_time_code": "00:02:16:29",
"end_control_time_code": "00:02:17:10",
"start_relative_time_code": "00:15:39:01",
"end_relative_time_code": "00:15:39:12"
}
]
},
"detected_shots": {
"histogram": [
{
"start": 3.26993,
"end": 7.07373,
"start_frame": 99,
"end_frame": 213,
"start_control_time_code": "00:00:03:08",
"end_control_time_code": "00:00:07:02",
"start_relative_time_code": "00:13:25:10",
"end_relative_time_code": "00:13:29:04"
},
{
"start": 137.104,
"end": 137.471,
"start_frame": 4110,
"end_frame": 4121,
"start_control_time_code": "00:02:16:29",
"end_control_time_code": "00:02:17:10",
"start_relative_time_code": "00:15:39:01",
"end_relative_time_code": "00:15:39:12"
}
]
},
"digital_slates": {
"histogram": [
{
"start": 3.26993,
"end": 7.07373,
"start_frame": 99,
"end_frame": 213,
"start_control_time_code": "00:00:03:08",
"end_control_time_code": "00:00:07:02",
"start_relative_time_code": "00:13:25:10",
"end_relative_time_code": "00:13:29:04"
},
{
"start": 137.104,
"end": 137.471,
"start_frame": 4110,
"end_frame": 4121,
"start_control_time_code": "00:02:16:29",
"end_control_time_code": "00:02:17:10",
"start_relative_time_code": "00:15:39:01",
"end_relative_time_code": "00:15:39:12"
}
]
},
"silence": {
"histogram": [
{
"start": 3.23333,
"end": 7.27165,
"start_frame": 98,
"end_frame": 219,
"start_control_time_code": "00:00:03:07",
"end_control_time_code": "00:00:07:08",
"start_relative_time_code": "00:13:25:09",
"end_relative_time_code": "00:13:29:10"
},
{
"start": 82.278,
"end": 83.9624,
"start_frame": 2467,
"end_frame": 2517,
"start_control_time_code": "00:01:22:06",
"end_control_time_code": "00:01:23:26",
"start_relative_time_code": "00:14:44:08",
"end_relative_time_code": "00:14:45:28"
}
]
},
"slates": {
"all": [
{
"start": 3.26993,
"end": 7.07373,
"start_frame": 99,
"end_frame": 213,
"start_control_time_code": "00:00:03:08",
"end_control_time_code": "00:00:07:02",
"start_relative_time_code": "00:13:25:10",
"end_relative_time_code": "00:13:29:04"
},
{
"start": 137.104,
"end": 137.471,
"start_frame": 4110,
"end_frame": 4121,
"start_control_time_code": "00:02:16:29",
"end_control_time_code": "00:02:17:10",
"start_relative_time_code": "00:15:39:01",
"end_relative_time_code": "00:15:39:12"
}
]
},
"start_end": {
"histogram": [
{
"start": 0,
"end": 137.471,
"start_frame": 1,
"end_frame": 4121,
"start_control_time_code": "00:00:00:00",
"end_control_time_code": "00:02:17:10",
"start_relative_time_code": "00:13:22:02",
"end_relative_time_code": "00:15:39:12"
}
]
},
"textless": {
"histogram": [
{
"start": 3.23333,
"end": 7.27165,
"start_frame": 98,
"end_frame": 219,
"start_control_time_code": "00:00:03:07",
"end_control_time_code": "00:00:07:08",
"start_relative_time_code": "00:13:25:09",
"end_relative_time_code": "00:13:29:10"
},
{
"start": 82.278,
"end": 83.9624,
"start_frame": 2467,
"end_frame": 2517,
"start_control_time_code": "00:01:22:06",
"end_control_time_code": "00:01:23:26",
"start_relative_time_code": "00:14:44:08",
"end_relative_time_code": "00:14:45:28"
}
]
},
}
}
}
A successful call returns a Status OK (200). If any unexpected errors occurred in the process of fulfilling the response, a Status Internal Server Error (500) is returned.
Audio
GET /api/data/v3/items/{id}/timeline/audio
Response
{
"audio": {
"Speech": [
{
"start": 10,
"end": 60
},
{
"start": 120,
"end": 130
}
],
"explosion": [
{
"start": 3.014,
"end": 4.56
}
]
}
}
A successful call returns a Status OK (200). If any unexpected errors occurred in the process of fulfilling the response, a Status Internal Server Error (500) is returned.
Color Bars
GET /api/data/v3/items/{id}/timeline/color-bars
Response
{
"color_bars": [
{
"start": 3.26993,
"end": 7.07373,
// if no frame information found the fields below will not be set
"start_frame": 99,
"end_frame": 213,
"start_control_time_code": "00:00:03:08",
"end_control_time_code": "00:00:07:02",
"start_relative_time_code": "00:13:25:10",
"end_relative_time_code": "00:13:29:04"
}
]
}
A successful call returns a Status OK (200). If any unexpected errors occurred in the process of fulfilling the response, a Status Internal Server Error (500) is returned.
Black Frames
GET /api/data/v3/items/{id}/timeline/black-frames
Response
{
"black_frames": [
{
"start": 3.26993,
"end": 7.07373,
"start_frame": 99,
"end_frame": 213,
"start_control_time_code": "00:00:03:08",
"end_control_time_code": "00:00:07:02",
"start_relative_time_code": "00:13:25:10",
"end_relative_time_code": "00:13:29:04"
},
{
"start": 137.104,
"end": 137.471,
"start_frame": 4110,
"end_frame": 4121,
"start_control_time_code": "00:02:16:29",
"end_control_time_code": "00:02:17:10",
"start_relative_time_code": "00:15:39:01",
"end_relative_time_code": "00:15:39:12"
}
...
]
}
A successful call returns a Status OK (200). If any unexpected errors occurred in the process of fulfilling the response, a Status Internal Server Error (500) is returned.
Credits
GET /api/data/v3/items/{id}/timeline/credits
Response
{
"credits": [
{
"start": 0.1,
"end": 1.1
}
]
}
A successful call returns a Status OK (200). If any unexpected errors occurred in the process of fulfilling the response, a Status Internal Server Error (500) is returned.
Custom Tags (Amazon Rekognition)
GET /api/data/v3/items/{id}/timeline/customtags/amazonrek
Response
{
"tags": {
"Tag1Name": [
{
"start": 10,
"end": 60
},
{
"start": 120,
"end": 130
}
],
"Tag2Name": [
{
"start": 3.014,
"end": 4.56
}
]
}
}
A successful call returns a Status OK (200). If any unexpected errors occurred in the process of fulfilling the response, a Status Internal Server Error (500) is returned.
Detected Shots (Valossa Extractor)
GET /api/data/v3/items/{id}/timeline/detected-shots
Response
{
"detected_shots": [
{
"start": 3.26993,
"end": 7.07373,
"start_frame": 99,
"end_frame": 213,
"start_control_time_code": "00:00:03:08",
"end_control_time_code": "00:00:07:02",
"start_relative_time_code": "00:13:25:10",
"end_relative_time_code": "00:13:29:04"
},
{
"start": 137.104,
"end": 137.471,
"start_frame": 4110,
"end_frame": 4121,
"start_control_time_code": "00:02:16:29",
"end_control_time_code": "00:02:17:10",
"start_relative_time_code": "00:15:39:01",
"end_relative_time_code": "00:15:39:12"
}
...
]
}
Digital Slates
GET /api/data/v3/items/{id}/timeline/digital-slates
Response
{
"digital_slates": [
{
"start": 0.1,
"end": 1.1
}
]
}
A successful call returns a Status OK (200). If any unexpected errors occurred in the process of fulfilling the response, a Status Internal Server Error (500) is returned.
Insights
GET /api/data/v3/items/{item_id}/insights/{insight_group_id}
Response
{
"insights": [
{
"group_name": "Supplier",
"color": "#4DD0E1",
"words": [
"content delivery",
"exclusive",
"hollywood",
"payments",
"pepsi",
"price increase",
"term",
"termination"
],
"matches": [
{
"type": "captions",
"timeline": [
{
"start_at": 30.03,
"end_at": 44.97,
"count": 1
}
],
"source": "2minuteVideo.srt"
},
{
"type": "captions",
"timeline": [
{
"start_at": 30.03,
"end_at": 44.97,
"count": 1
}
],
"source": "2minuteVideo.srt"
},
{
"type": "captions",
"timeline": [
{
"start_at": 30.03,
"end_at": 44.97,
"count": 1
}
],
"source": "2minuteVideo.srt"
}
]
}
]
}
A successful call returns a Status OK (200). If any unexpected errors occurred in the process of fulfilling the response, a Status Internal Server Error (500) is returned.
Mature Content
GET /api/data/v3/items/{id}/timeline/mature-content
Response
{
"mature_content": {
"nudity": [
{
"start": 0.1,
"end": 1.1
}
],
"gore": [
{
"start": 0.0,
"end": 3.5
}
]
}
}
A successful call returns a Status OK (200). If any unexpected errors occurred in the process of fulfilling the response, a Status Internal Server Error (500) is returned.
Locations
GET /api/data/v3/items/{id}/timeline/locations
Response
{
"locations": {
"Rome": [
{
"start": 0.1,
"end": 1.1
},
{
"start": 13.0,
"end": 15.5
}
],
"Paris": [
{
"start": 2.1,
"end": 4.3
}
]
}
}
A successful call returns a Status OK (200). If any unexpected errors occurred in the process of fulfilling the response, a Status Internal Server Error (500) is returned.
Logos
GET /api/data/v3/items/{id}/timeline/logos
Response
{
"logos": {
"Pepsi": [
{
"start": 0.1,
"end": 1.1
},
{
"start": 13.0,
"end": 15.5
}
],
"GrayMeta": [
{
"start": 2.1,
"end": 4.3
}
]
}
}
A successful call returns a Status OK (200). If any unexpected errors occurred in the process of fulfilling the response, a Status Internal Server Error (500) is returned.
Slates
GET /api/data/v3/items/{id}/timeline/slates
Response
{
"slates": {
"all": [
{
"start": 3.26993,
"end": 7.07373,
// if no frame information found the fields below will not be set
"start_frame": 99,
"end_frame": 213,
"start_control_time_code": "00:00:03:08",
"end_control_time_code": "00:00:07:02",
"start_relative_time_code": "00:13:25:10",
"end_relative_time_code": "00:13:29:04"
},
...
]
}
A successful call returns a Status OK (200). If any unexpected errors occurred in the process of fulfilling the response, a Status Internal Server Error (500) is returned.
Sports
GET /api/data/v3/items/{id}/timeline/sports
Response
{
"sport_events": {
"soccer": {
"penalties": [
{
"start": -0.001338,
"end": 3.998662
}
],
"shots on goal": [
{
"start": 4.998662,
"end": 4.998662
}
]
}
}
}
A successful call returns a Status OK (200). If any unexpected errors occurred in the process of fulfilling the response, a Status Internal Server Error (500) is returned.
Silence
GET /api/data/v3/items/{id}/timeline/silence
Response
{
"silence": {
"histogram": [
{
"start": 3.23333,
"end": 7.27165,
"start_frame": 98,
"end_frame": 219,
"start_control_time_code": "00:00:03:07",
"end_control_time_code": "00:00:07:08",
"start_relative_time_code": "00:13:25:09",
"end_relative_time_code": "00:13:29:10"
},
{
"start": 82.278,
"end": 83.9624,
"start_frame": 2467,
"end_frame": 2517,
"start_control_time_code": "00:01:22:06",
"end_control_time_code": "00:01:23:26",
"start_relative_time_code": "00:14:44:08",
"end_relative_time_code": "00:14:45:28"
}
]
}
}
A successful call returns a Status OK (200). If any unexpected errors occurred in the process of fulfilling the response, a Status Internal Server Error (500) is returned.
Start End
GET /api/data/v3/items/{id}/timeline/start-end
Response
{
"start_end": {
"histogram": [
{
"start": 0,
"end": 137.471,
"start_frame": 1,
"end_frame": 4121,
"start_control_time_code": "00:00:00:00",
"end_control_time_code": "00:02:17:10",
"start_relative_time_code": "00:13:22:02",
"end_relative_time_code": "00:15:39:12"
}
]
}
}
A successful call returns a Status OK (200). If any unexpected errors occurred in the process of fulfilling the response, a Status Internal Server Error (500) is returned.
Textless Material
GET /api/data/v3/items/{id}/timeline/textless
Response
{
"textless": {
"histogram": [
{
"start": 3.83129,
"end": 23.308867
}
]
}
}
Texted
GET /api/data/v3/items/{id}/timeline/texted
Response
{
"texted": [
{
"start": 3.26993,
"end": 7.07373,
// if no frame information found the fields below will not be set
"start_frame": 99,
"end_frame": 213,
"start_control_time_code": "00:00:03:08",
"end_control_time_code": "00:00:07:02",
"start_relative_time_code": "00:13:25:10",
"end_relative_time_code": "00:13:29:04"
}
]
}
A successful call returns a Status OK (200). If any unexpected errors occurred in the process of fulfilling the response, a Status Internal Server Error (500) is returned.
Getting Technical Metadata
Technical metadata is item-encoded within files that Wasabi AiR could extract directly. Wasabi Air made this information available either through specific-type or batch APIs.
To get all technical metadata found within a given file, use the following API.
Request
GET /api/data/v3/items/{id}/technical
- id - (string) The identifier of the item.
Response
{
"audio_info": {
"streams": [
{
"avg_frame_rate": "0/0",
"bit_rate": "192000",
"bits_per_sample": 0,
"channel_layout": "stereo",
"channels": 2,
"codec_long_name": "AAC (Advanced Audio Coding)",
"codec_name": "aac",
"codec_tag": "0x000f",
"codec_tag_string": "[15][0][0][0]",
"codec_time_base": "1/48000",
"codec_type": "audio",
"index": 5,
"r_frame_rate": "0/0",
"sample_fmt": "fltp",
"sample_rate": "48000",
"start_pts": 725280,
"start_time": "8.058667",
"time_base": "1/90000",
"disposition": {
"attached_pic": 0,
"clean_effects": 0,
"comment": 0,
"default": 0,
"dub": 0,
"forced": 0,
"hearing_impaired": 0,
"karaoke": 0,
"lyrics": 0,
"original": 0,
"timed_thumbnails": 0,
"visual_impaired": 0
},
"tags": {
"encoder": "",
"language": "eng",
"title": ""
}
},
{
"avg_frame_rate": "0/0",
"bit_rate": "192000",
"bits_per_sample": 0,
"channel_layout": "stereo",
"channels": 2,
"codec_long_name": "AAC (Advanced Audio Coding)",
"codec_name": "aac",
"codec_tag": "0x000f",
"codec_tag_string": "[15][0][0][0]",
"codec_time_base": "1/48000",
"codec_type": "audio",
"index": 6,
"r_frame_rate": "0/0",
"sample_fmt": "fltp",
"sample_rate": "48000",
"start_pts": 725280,
"start_time": "8.058667",
"time_base": "1/90000",
"disposition": {
"attached_pic": 0,
"clean_effects": 0,
"comment": 0,
"default": 0,
"dub": 0,
"forced": 0,
"hearing_impaired": 0,
"karaoke": 0,
"lyrics": 0,
"original": 0,
"timed_thumbnails": 0,
"visual_impaired": 0
},
"tags": {
"encoder": "",
"language": "spa",
"title": ""
}
}
]
},
"audio_peak": {
"integrated_loudness": {
"i_lufs": -29.1,
"threshold_lufs": -39.1
},
"loudness_range": {
"lra_lu": 0.1,
"threshold_lufs": -49.1,
"lra_low_lufs": -29.1,
"lra_high_lufs": -29
},
"true_peak_dbfs": -19.2
},
"exiv2": {
"normalized": {
"resolution_x": 1024,
"resolution_y": 680,
"format": "image/jpeg",
"photo": {
"exif_version": "48 50 50 49",
"color_space": 1,
"pixel_x_dimension": 1024,
"pixel_y_dimension": 680
},
"application2": {},
"image": {
"image_width": 1024,
"image_length": 680,
"bits_per_sample": "8 8 8",
"photometric_interpretation": 2,
"orientation": 1,
"samples_per_pixel": 3,
"x_resolution": "720000/10000",
"y_resolution": "720000/10000",
"resolution_unit": 2,
"software": "Adobe Photoshop CC 2018 (Macintosh)",
"date_time": "2018:10:12 13:43:31",
"exif_tag": 236
},
"xmp": {
"create_date": "2018-10-12T13:34:17-07:00",
"modify_date": "2018-10-12T13:43:31-07:00",
"metadata_date": "2018-10-12T13:43:31-07:00"
}
}
},
"geocoding": {
"place_name": "Los Angeles",
"country_code": "US",
"admin_name1": "California",
"admin_name2": "Los Angeles"
},
"media_info": {
"general": {
"audio_codecs": "AAC LC / AAC LC / AAC LC / AAC LC / AAC LC / AAC LC",
"audio_format_list": "AAC LC / AAC LC / AAC LC / AAC LC / AAC LC / AAC LC",
"audio_format_with_hint_list": "AAC LC / AAC LC / AAC LC / AAC LC / AAC LC / AAC LC",
"audio_language_list": "English / / English / Spanish / English / Spanish",
"codec": "MPEG-TS",
"codecs_video": "AVC",
"commercial_name": "MPEG-TS",
"complete_name": "/tmp/d89a7c3bdaa3cdf23420cc4e905349f1.ts",
"count": 333,
"count_of_audio_streams": 6,
"count_of_stream_of_this_kind": 1,
"count_of_video_streams": 1,
"duration": 6034,
"duration_time": "00:00:06.035 (00:00:06;00)",
"file_extension": "ts",
"file_name": "d89a7c3bdaa3cdf23420cc4e905349f1",
"file_size": 4012484,
"folder_name": "/tmp",
"format": "MPEG-TS",
"format_extensions_usually_used": "ts m2t m2s m4t m4s tmf ts tp trp ty",
"frame_count": 360,
"frame_rate": 59.94,
"internet_media_type": "video/MP2T",
"kind_of_stream": "General",
"overall_bit_rate": 5083438,
"overall_bit_rate_mode": "VBR",
"video_format_list": "AVC",
"video_format_with_hint_list": "AVC"
},
"audio": {
"bit_rate": 112000,
"bit_rate_mode": "CBR",
"channels": 2,
"codec": "MPEG Audio",
"commercial_name": "MPEG Audio",
"compression_mode": "Lossy",
"count": 277,
"count_of_stream_of_this_kind": 1,
"duration": 89808,
"duration_time": "00:01:30:18",
"format": "MPEG Audio",
"format_profile": "Layer 3",
"frame_count": 3438,
"id": "1",
"kind_of_stream": "Audio",
"proportion_of_this_stream": 0.99971,
"samples_count": 3960576,
"sampling_rate": 44100,
"stream_order": "1",
"stream_size": 127325
},
"audio_tracks": [
{
"bit_rate": 112000,
"bit_rate_mode": "CBR",
"channels": 2,
"codec": "MPEG Audio",
"commercial_name": "MPEG Audio",
"compression_mode": "Lossy",
"count": 277,
"count_of_stream_of_this_kind": 1,
"duration": 89808,
"duration_time": "00:01:30:18",
"format": "MPEG Audio",
"format_profile": "Layer 3",
"frame_count": 3438,
"id": "1",
"kind_of_stream": "Audio",
"proportion_of_this_stream": 0.99971,
"samples_count": 3960576,
"sampling_rate": 44100,
"stream_order": "1",
"stream_size": 127325
},
{
"bit_rate": 112000,
"bit_rate_mode": "CBR",
"channels": 2,
"codec": "MPEG Audio",
"commercial_name": "MPEG Audio",
"compression_mode": "Lossy",
"count": 277,
"count_of_stream_of_this_kind": 1,
"duration": 89808,
"duration_time": "00:01:30:18",
"format": "MPEG Audio",
"format_profile": "Layer 3",
"frame_count": 3438,
"id": "2",
"kind_of_stream": "Audio",
"proportion_of_this_stream": 0.99971,
"samples_count": 3960576,
"sampling_rate": 44100,
"stream_order": "2",
"stream_size": 1257325
}
],
"video": {
"bit_depth": 8,
"bits_pixel_frame": 0.072,
"chroma_subsampling": "4:2:0",
"codec": "AVC",
"codec_id": "27",
"color_range": "Limited",
"color_space": "YUV",
"colour_description_present": "Yes",
"commercial_name": "AVC",
"count": 377,
"count_of_stream_of_this_kind": 1,
"display_aspect_ratio": 1.778,
"duration": 6006,
"duration_time": "00:00:06.006 (00:00:06;00)",
"format": "AVC",
"format_info": "Advanced Video Codec",
"format_profile": "High@L4.1",
"format_settings": "CABAC / 2 Ref Frames",
"format_settings_cabac": "Yes",
"format_url": "http://developers.videolan.org/x264.html",
"frame_count": 360,
"frame_rate": 59.94,
"height": 720,
"id": 481,
"internet_media_type": "video/H264",
"kind_of_stream": "Video",
"pixel_aspect_ratio": 1,
"scan_type": "Progressive",
"stream_order": "0-0",
"width": 1280
},
"image": {
"bit_depth": 8,
"chroma_subsampling": "4:4:4",
"codec": "JPEG",
"color_space": "YUV",
"commercial_name": "JPEG",
"compression_mode": "Lossy",
"count": 125,
"count_of_stream_of_this_kind": 1,
"format": "JPEG",
"height": 680,
"internet_media_type": "image/jpeg",
"kind_of_stream": "Image",
"proportion_of_this_stream": 1,
"stream_size": 424430,
"width": 1024
}
},
"pdf": {
"title": "Microsoft Word - Backup4all_network_backup_solution.doc",
"subject": "",
"keywords": "",
"author": "Administrator",
"creator": "Microsoft Word - Backup4all_network_backup_solution.doc",
"producer": "novaPDF Professional Server Ver 5.4 Build 260 (Windows XP x32)",
"creation_date": "2008-05-26T09:02:00Z",
"mod_date": "0001-01-01T00:00:00Z",
"pages": 4,
"javascript": false,
"encrypted": true,
"password_protected": false,
"page_size": "612 x 792 pts (letter)",
"optimized": false,
"pdf_version": 1.4,
"page_rotation": 0,
"tagged": false,
"form": false
}
}
Response codes:
- 200 (StatusOK) - Success.
- 404 (StatusNotFound) - Item not found.
- 500 (StatusInternalServerError) - An unexpected error occurred.
Identifying Items
To determine the Wasabi AiR ID and Stow URL for an item, you need to know the location ID, container ID, and identifier for the item within the container. For more information about item IDs, see the Stow project.
You can make the following request:
POST /api/control/item-id
{
"location_id": "abc123",
"container_id": "MyContainer",
"item_id": "MyItem"
}
- location_id - (string) The Wasabi AiR location ID that indicates which storage location the item is in.
- container_id - (string) The container ID where the item is located (usually the bucket name).
- item_id - (string) The identifier of the item (usually its name within the storage).
Response
Provided that the location, container, and item values are all valid, you are given the following response:
{
"stow_url": "s3://unique/url/to/item",
"gm_item_id": "67779468b22af637e2dd6a2616264b6c"
}
- stow_url - (string) The Stow URL for the item.
- gm_item_id - (string) The internal Wasabi AiR ID for this item.
It is not necessary for the item to have been harvested in order for the ID to be returned, but once harvested, you can trust that the ID matches the gm_item_id returned.
Once you have obtained the identifiers for an item, you can use them in the Harvest API.
Getting a List of Item Captions
Get a list of captions for an item:
GET /api/data/v3/items/{id}/captions
Response
A successful call returns a Status OK (200) with the following response body:
{
"captions": [
{
"id": "c57149e1f0b9387294e1f5efe6cb1ef0",
"item_id": "71ab3889e1c559865ed6bce99b349d4f",
"source": "captions",
"language": {
"code": "eng",
"confidence": 1
}
}
]
}
If any unexpected errors occurred in the process of fulfilling this request or response cycle, a Status Internal Server Error (500) is returned.
Getting a List of Text in an Item Caption
Get a list of text contained in an item caption, along with possible NLP data:
GET /api/data/v3/items/{id}/caption/{item-captions-id}?mask={mask}
- item-captions-id - (string) The ID value contained in the results of a captions request.
- mask - Enables you to mask the embedded NLP data for a caption text, which may result in faster results. Set mask=nlp to remove NLP data from being provided.
Response
A successful call returns a Status OK (200) with the following response body:
{
"caption": [
{
"id": "f27f081531c42d304c285dc7306f29e7",
"item_captions_id": "c57149e1f0b9387294e1f5efe6cb1ef0",
"start_at": 0.03,
"end_at": 4.92,
"text": "Mr. Jones will speak now",
"nlp_properties": {
"entities": [
{
"text": "Mr. Jones",
"confidence": 0.9995918273925781,
"type": "person"
}
],
"key_phrases": [
{
"text": "Mr. Jones",
"confidence": 0.9994778037071228
}
],
"sentiment": {
"text": "neutral",
"sentiment_confidence": {
"Mixed": 0.014400332234799862,
"Negative": 0.10051420331001282,
"Neutral": 0.8749107718467712,
"Positive": 0.010174600407481194
}
},
"language": {
"language": "en",
"confidence": 0.9737588763237
}
}
}
]
}
If any unexpected errors occurred in the process of fulfilling this request or response cycle, a Status Internal Server Error (500) is returned.
Item Descriptions
Getting descriptions for an Item
GET /api/data/v3/items/{id}/descriptions?page-token={page_token}&start={start}&window={window}&all={all}
- page_token - The next page token provided to page the results. When provided, a next page token is all that is needed to retrieve the next page of results. If page_token is set along with other query parameters, the page_token takes precedence.
- start - The time (video) or page (documents) to indicate where to start retrieving results for a given item. This has no affect on an IMG item.
- window - The time (video) or page (documents) to indicate where to end retrieving results for a given item. This has no affect on an IMG item.
- all - Provides all entries for an item regardless of whether or not the description data is present.
If none are set, the full collection is returned without pagination. A valid page_token can be used without the addition of all, limit, and offset.
Response
A successful call returns a Status OK (200) with the following response body.
For an asset that is an IMG:
{
"contents": {
"img": {
"description": {
"id": "b2fe0c8e0ee298eb07c3c5dce457c907",
"item_id": "fe154badd7b2d78349c214938f27547c",
"confidence": 0.443617173003186,
"language": {
"language": "en-US",
"confidence": 0.8
},
"text": "a drawing of a face"
}
},
"pages": null,
"video_frames": null
},
"next_page": ""
}
For an asset that is a video:
{
"contents": {
"img": null,
"pages": null,
"video_frames": [
{
"description": {
"id": "d4acd831bf8d56e6a6e1fcb054228f29",
"item_id": "dddcbe782d810eb80f531361b5799a53",
"confidence": 0.7532465814725752,
"language": {
"code": "en",
"confidence": 0.96
},
"text": "a close up of a person"
},
"frame_id": "59ad92a0ed0d873de84d2cc2bd080898",
"thumbnail_path": "video_main_frames/frame-0000000000.jpg",
"time": 0
},
{
"description": {
"id": "8a7f430ebb24d621021e2014ee05c6eb",
"item_id": "dddcbe782d810eb80f531361b5799a53",
"confidence": 0.2997128373277878,
"language": null,
"text": "a close up of a man with smoke coming out of it"
},
"frame_id": "677deeef7865c9e1b0bb497164aeca50",
"thumbnail_path": "video_main_frames/frame-0000000001.jpg",
"time": 2
},
{
"description": {
"id": "ff31ffc38b95e420dfca366ef02b550d",
"item_id": "dddcbe782d810eb80f531361b5799a53",
"confidence": 0.8331186858981754,
"language": null,
"text": "a blurry image of smoke"
},
"frame_id": "b767a35a35e5a873a109f2d3b4df5ec2",
"thumbnail_path": "video_main_frames/frame-0000000002.jpg",
"time": 4
}
]
},
"next_page": "NextPageTokenString"
}
For an asset that is a document:
{
"contents": {
"img": null,
"pages": [
{
"images": [
{
"description": {
"id": "4c51651ee15c7ce414ae381bdc252622",
"item_id": "0ca4a8e17b66d3946f611621936896c1",
"confidence": 0.7192287180775676,
"language": {
"code": "en",
"confidence": 0.86
},
"text": "a man standing in front of a mirror posing for the camera"
},
"image_id": "24235782fb645ba35c6410617f8c3527",
"image_index": 0,
"thumbnail_path": "document_pages/thumb-pg-00000-img-00000.png"
}
],
"page": 0,
"description": "optionally, a page can have a description as well, or it can be embedded in the images within the page",
"thumbnail_path": "optionally, a page can have a thumbnail path as well as well as each embedded images thumbnail",
"page_id": "a uuid to identify the page, optional, will be available when all query param is set"
},
{
"images": [
{
"description": {
"id": "62a0967c75af0b1f8ba65edd7b287929",
"item_id": "0ca4a8e17b66d3946f611621936896c1",
"confidence": 0.9312026737395315,
"language": null,
"text": "Robb Wells, John Paul Tremblay that are looking at the camera"
},
"image_id": "5fd88f4121c499aa04b9f77fa59e7788",
"image_index": 0,
"thumbnail_path": "document_pages/thumb-pg-00001-img-00000.png"
}
],
"page": 1
}
],
"video_frames": null
},
"next_page": "NextPageTokenString"
}
If start and window are provided, the results may be paginated. The next page token provides a stringified token that can be used to retrieve the next page of results. If the next page token is an empty string, there are no more results.
If any unexpected errors occurred in the process of fulfilling this request or response cycle, a Status Internal Server Error (500) is returned. If the item ID is for an item that does not exist, a 404 is returned indicating the item is not found.
Curating a Description for an Item
To create a curated description, you can post to this endpoint with a valid request body:
POST /api/data/v3/items/{id}/descriptions
{
"segment_index": float64,
"image_index": int,
"item_type": ENUM["image" | "video" | "document"],
"text": string
}
- item_type - The type of item.
- segment_index - Set to -1 if the item is an image.
- image_index - Set to -1 if the item is an image and -1 for a video.
- text - Must not be an empty string.
Response
A successful call returns a Status Create (201) with the following response body that includes the related metadata data associated with that segment/image index.
{
"description": {
"id": string,
"item_id": string,
"confidence": float64,
"language": string,
"language_confidence": float64,
"text": string,
}
}
If there is a conflict with the segment or image index, a Status Unprocessable Entity (422) is returned. If any unexpected errors occurred in the process of fulfilling this request or response cycle, a Status Internal Server Error (500) is returned.
Editing a Description
To edit an existing description text, use the following request:
PATCH /api/data/v3/items/{id}/descriptions/{desc_id}
{
"text": "new description text"
}
- id - The item ID.
- desc_id - The description ID.
- text - Empty string allowed.
Response
A successful call returns a Status OK (200) with the new description after updating.
{
"description": {
"id": string,
"item_id": string,
"confidence": float64,
"language": string,
"language_confidence": float64,
"text": string,
}
}
If the description by the desc_id is not found, a Status Not Found (404) is returned. If any unexpected errors occurred in the process of fulfilling this request or response cycle, a Status Internal Server Error (500) is returned.
Item OCRs
Getting OCRs for an Item
GET /api/data/v3/items/{id}/ocrs?page-token={page_token}&start={start}&window={window}&all={all}
- page_token - The next page token provided to page the results. If page_token is set along with other query parameters, the page_token takes precedence.
- start - The time (video) or page (documents) to indicate where to start retrieving results for a given item. This has no affect on an IMG item.
- window - The time (video) or page (documents) to indicate where to end retrieving results for a given item. This has no affect on an IMG item.
- all - Provides all entries for an item regardless of whether or not the OCR data is present.
If none are set, the full collection is returned without pagination. A valid page_token can be used without the addition of all, limit, and offset.
Response
A successful call returns a Status OK (200) with the following response body.
For an asset that is an IMG:
{
"contents": {
"img": {
"ocrs": [
{
"id": "0a4877c31bd094a32a124a1c5571f751",
"item_id": "c21b3e9fe1fbe09814d6a7bfdf5ba313",
"bounding_box": {
"top": 441,
"left": 220,
"width": 52,
"height": 16
},
"confidence": 0,
"language": null,
"text": "adidas",
"text_type": "lines"
},
{
"id": "1afb7fa88d193d5d9ae766da04e1517f",
"item_id": "c21b3e9fe1fbe09814d6a7bfdf5ba313",
"bounding_box": {
"top": 462,
"left": 295,
"width": 86,
"height": 15
},
"confidence": 0,
"language": null,
"text": "SNALMUNCH 2012",
"text_type": "lines"
},
{
"id": "f4ee60381a3cd69a0787848b71255b62",
"item_id": "c21b3e9fe1fbe09814d6a7bfdf5ba313",
"bounding_box": {
"top": 497,
"left": 222,
"width": 243,
"height": 53
},
"confidence": 0,
"language": null,
"text": "SAMSUNG",
"text_type": "lines"
}
]
},
"pages": null,
"video_frames": null
},
"next_page": ""
}
For an asset that is a video:
{
"contents": {
"img": null,
"pages": null,
"video_frames": [
{
"frame_id": "5248fd3acceab63163a5bcc5ddc15d62",
"ocrs": [
{
"id": "e45aa09fd7d6121e474459e59e8a7d4c",
"item_id": "3b103330378acb3ad604045ba1f4aecd",
"bounding_box": {
"top": 12,
"left": 14,
"width": 169,
"height": 11
},
"confidence": 0.87,
"language": null,
"text": "HIT THAT LIKE BUTTON, NATION!",
"text_type": "lines"
}
],
"thumbnail_path": "video_main_frames/frame-0000000001.jpg",
"time": 2.002
},
{
"frame_id": "c5501c083166450e9885782aac29fad8",
"ocrs": [
{
"id": "7a5d590df38aa2494a1a854640ea6b47",
"item_id": "3b103330378acb3ad604045ba1f4aecd",
"bounding_box": {
"top": 19,
"left": 249,
"width": 33,
"height": 16
},
"confidence": 0,
"language": null,
"text": "BEA",
"text_type": "lines"
},
{
"id": "8485e7771e8f0ecdc871cf8856554be1",
"item_id": "3b103330378acb3ad604045ba1f4aecd",
"bounding_box": {
"top": 12,
"left": 14,
"width": 169,
"height": 11
},
"confidence": 0,
"language": null,
"text": "HIT THAT LIKE BUTTON, NATION!",
"text_type": "lines"
}
],
"thumbnail_path": "video_main_frames/frame-0000000002.jpg",
"time": 4.004
}
]
},
"next_page": "NP-BAwEBBVRva2VuAf-CAAEDAQVMaW1pdAEEAAEGT2Zmc2V0AQQAAQZQYXJhbXMB_4QAAAAh_4MEAQERbWFwW3N0cmluZ11zdHJpbmcB_4QAAQwBDAAAKP-CAwMDYWxsAAVzdGFydA0tMTExMTEwMC45OTk5BndpbmRvdwIxMAA="
}
For an asset that is a document:
{
"contents": {
"img": null,
"pages": [
{
"images": [
{
"image_id": "daea649e947b430acceccc089f653c3f",
"image_index": 0,
"ocrs": [
{
"id": "095a5ca47bda4e0a9db562e23070125c",
"item_id": "43b89caeb2fd3c23f25b8431e644cabc",
"bounding_box": {
"top": 15,
"left": 522,
"width": 217,
"height": 24
},
"confidence": 0,
"language": null,
"text": "Pilgrim Programming, LLC",
"text_type": "lines"
}
],
"thumbnail_path": "document_pages/thumb-pg-00000-img-00000.png"
}
],
"page": 0,
"ocrs": "optionally, a page can have OCR as well, or it can be embedded in the images within the page",
"thumbnail_path": "optionally, a page can have a thumbnail path as well as well as each embedded images thumbnail",
"page_id": "a uuid to identify the page, optional, will be available when all query param is set"
},
{
"images": [
{
"image_id": "18061410280b001f26f34890d0da6b04",
"image_index": 0,
"ocrs": [
{
"id": "2b6d00e3d4d014267a011d64d50144ff",
"item_id": "43b89caeb2fd3c23f25b8431e644cabc",
"bounding_box": {
"top": 1194,
"left": 270,
"width": 817,
"height": 23
},
"confidence": 0,
"language": null,
"text": "There are no liens , claims or encumbrances which might conflict with or otherwise affect",
"text_type": "lines"
}
],
"thumbnail_path": "document_pages/thumb-pg-00001-img-00000.png"
}
],
"page": 1,
}
],
"video_frames": null
},
"next_page": "NP-BAwEBBVRva2VuAf-CAAEDAQVMaW1pdAEEAAEGT2Zmc2V0AQQAAQZQYXJhbXMB_4QAAAAh_4MEAQERbWFwW3N0cmluZ11zdHJpbmcB_4QAAQwBDAAAJ_-CAwMDYWxsAAVzdGFydA0tMTExMTEwOS45OTk5BndpbmRvdwExAA=="
}
If start and window are provided, the results may be paginated. The next page token provides a stringified token that can be used to retrieve the next page of results. If the next page token is an empty string, there are no more results.
If any unexpected errors occurred in the process of fulfilling this request or response cycle, a Status Internal Server Error (500) is returned. If the item ID is for an item that does not exist, a 404 is returned indicating the item is not found.
Curating an OCR for an Item
To create a curated OCR, you can post to the following endpoint with a valid request body.
POST /api/data/v3/items/{id}/ocrs
{
"segment_index": float64,
"image_index": int,
"item_type": ENUM["image" | "video" | "document"],
"text": string
}
- item_type - The type of item.
- segment_index - Set to -1 if the item is an image.
- image_index - Set to -1 if the item is an image and -1 for a video.
- text - Must not be an empty string.
Response
A successful call returns a Status Create (201) with the following response body that includes the related metadata data associated with that segment/image index:
{
"ocr": {
"id": string,
"item_id": string,
"bounding_box": {
"top": int,
"left": int,
"width": int,
"height": int
},
"confidence": float64,
"language": string,
"language_confidence": float64,
"text": string,
"text_type": string,
}
}
If there is a conflict with the segment or image index, a Status Unprocessable Entity (422) is returned. If any unexpected errors occurred in the process of fulfilling this request or response cycle, a Status Internal Server Error (500) is returned.
Editing an OCR
To edit an existing OCR text, use the following request:
PATCH /api/data/v3/items/{id}/ocrs/{ocr_id}
{
"text": "new ocr text"
}
- id - The item ID.
- ocr_id - The description ID.
- text - An empty string is allowed.
Response
A successful call returns a Status OK (200) with the new OCR after updating.
{
"ocr": {
"id": string,
"item_id": string,
"bounding_box": {
"top": int,
"left": int,
"width": int,
"height": int
},
"confidence": float64,
"language": string,
"language_confidence": float64,
"text": string,
"text_type": string,
}
}
If the OCR by the ocr_id is not found, a Status Not Found (404) is returned. If any unexpected errors occurred in the process of fulfilling this request or response cycle, a Status Internal Server Error (500) is returned.
Deleting an OCR
To delete an existing OCR text, use the following request:
DELETE /api/data/v3/items/{id}/ocrs/{ocr_id}
- id - The item ID.
- ocr_id - The description ID.
Response
A successful call returns a Status OK (200) with the ocr ID after deleting.
{
"id": "5218778fd97e4960ebfe40529985fc17",
}
If the OCR by the ocr_id is not found, a Status Not Found (404) is returned. If any unexpected errors occurred in the process of fulfilling this request or response cycle, a Status Internal Server Error (500) is returned.
Item Speech to Texts
To get speech to texts for an item:
GET /api/data/v3/items/{id}/speech-to-texts?mask={mask}
- mask - Enables you to mask the embedded NLP data for a STT, which may result in faster results. Set mask=nlp to remove NLP data from being provided.
Response
A successful call returnS a Status OK (200) with the following response body:
"transcripts": [
{
"source": "amazon_transcribe",
"track": 2,
"transcript": [
{
"id": "f83e777b8bf149438d61109a5d9dbf6f",
"item_id": "8447caddb4c1501a291bf343d6886586",
"start_at": 0,
"end_at": 10.05,
"text": "Wiggle room Small additions to Cuba with yourself So look at my harvest one file here I have a number of different",
"language": null,
"nlp_properties": {
"entities": [
{
"text": "Cuba",
"confidence": 0.9795709848403931,
"type": "location"
},
{
"text": "one file",
"confidence": 0.6889344453811646,
"type": "quantity"
}
],
"key_phrases": [
{
"text": "Wiggle room Small additions",
"confidence": 0.7353704571723938
},
{
"text": "Cuba",
"confidence": 0.9996484518051147
},
{
"text": "my harvest one file",
"confidence": 0.8326694965362549
},
{
"text": "a number",
"confidence": 0.9973674416542053
}
],
"sentiment": {
"text": "neutral",
"sentiment_confidence": {
"Mixed": 0.004104138817638159,
"Negative": 0.012511652894318104,
"Neutral": 0.934111475944519,
"Positive": 0.04927277937531471
}
},
"language": {
"language": "en",
"confidence": 0.9973103404045105
}
}
},
...
]
}
]
}
The next page token provides a stringified token that can be used to retrieve the next page of results. If the next page token is an empty string, there are no more results.
If any unexpected errors occurred in the process of fulfilling this request or response cycle, a Status Internal Server Error (500) is returned.
Downloading a Subtitles File for an Item
GET /api/data/v3/items/{id}/speech-to-texts/downloads?format={format}&source={source}
- format - Optional.vtt or srt. Generates the downloaded file in the specified format. (Defaults to srt if not specified.)
- source - Optional. If specified, it looks specifically for the given source. If not specified, it traverses all possible sources looking for a transcript.
Response
A successful call returns a Status OK (200) with the file in the body of the response.
If any unexpected errors occurred in the process of fulfilling this request or response cycle, a Status Internal Server Error (500) is returned.
If the source parameter is specified and there are no entries from the given source, a Not Found (404) error is returned.
If no source parameter is specified and a transcript cannot be found for any source, a Not Found (404) error is returned.
Getting Item Thumbnails
To get thumbnails for an item:
GET /api/data/v3/items/{id}/thumbnails
Response
A successful call returns a Status OK (200) with the following response body.
For an asset that is a video:
{
"contents": {
"thumbnail": {
"path": "thumbnailer/sprite.jpg",
"type": "sprite",
"frame_count": 30,
"height": 152,
"width": 270
},
"video_frames": [
{
"time": 1,
"frame_id": "16d63b1c711727218a44e4c0a8d43a20",
"thumbnail": "video_main_frames/frame-0000000000.jpg"
},
{
"time": 2,
"frame_id": "0b9d3ac13e51b621d535f812bcdd45fb",
"thumbnail": "video_main_frames/frame-0000000001.jpg"
},
...
]
}
}
For an asset that is a document:
{
"contents": {
"thumbnail": {
"path": "thumbnailer/thumb.png",
"type": "image",
"frame_count": 0,
"height": 152,
"width": 270
},
"pages": [
{
"page": 0,
"page_id": "28e5fa9e0b736baa7f2f7843a024adc9",
"thumbnail_path": "document_pages/thumb-pg-00000.png",
"images": [
{
"image_index": 0,
"image_id": "907b26c326a53e67f32f1cbf8ccbba54",
"thumbnail_path": "document_pages/thumb-pg-00000-img-00000.png"
}
]
},
{
"page": 1,
"page_id": "6b3446f091ad62dc1ea90e6b619666f8",
"thumbnail_path": "document_pages/thumb-pg-00001.png",
"images": [
{
"image_index": 0,
"image_id": "09ca78c5e94ed7cb328bc2980aacacfe",
"thumbnail_path": "document_pages/thumb-pg-00001-img-00000.png"
}
]
},
...
]
}
}
Images in a document show up when an embedded image is detected within the document and are nil if they are not detected.
For all other assets:
{
"contents": {
"thumbnail": {
"path": "thumbnailer/thumb.jpg",
"type": "image",
"frame_count": 0,
"height": 152,
"width": 270
}
}
}
If the thumbnail field is " ", the asset did not have a thumbnail created. This happens with text, caption, and archive files.
If any unexpected errors occurred in the process of fulfilling this request or response cycle, a Status Internal Server Error (500) is returned.
Updating a Speech to Text Entry
PATCH /api/data/v3/items/{item_id}/speech-to-texts/{s2t_id}
{
"text": "my new speech to text"
}
Response
This returns the updated speech to text entry. Otherwise, an error is returned:
- Status Not Found (404) if passing an invalid item_id or s2t_id.
- Status Unprocessable Entity (422) if there is a validation error.
- Status Internal Server Error (500) if some other error occurs.
Deleting a Speech to Text Entry
DELETE /api/data/v3/items/{item_id}/speech-to-texts/{s2t_id}
Response
Upon success, a 204 No Content is returned. Otherwise, an error is returned:
- Status Not Found (404) if passing an invalid item_id or s2t_id.
- Status Internal Server Error (500) if some other error occurs.
Getting Item Custom Tags
To get custom tags for an item:
GET /api/data/v3/items/{id}/customtags/amazonrek?page-token={page_token}&start={start}&window={window}&all={all}
- page_token - The next page token provided to page the results. When provided, a next page token is all that is needed to retrieve the next page of results. If page_token is set along with other query parameters, the page_token takes precedence.
- start - The time (video) or page (documents) to indicate where to start retrieving results for a given item. This has no affect on an IMG item.
- window - The time (video) or page (documents) to indicate where to end retrieving results for a given item. This has no affect on an IMG item.
- all - Provides all entries for an item regardless of whether or not the logo data is present.
If none are set, the full collection is returned without pagination. A valid page_token can be used without the addition of all, limit, and offset.
Response
A successful call returns a Status OK (200) with the following response body.
For an asset that is an IMG:
{
"contents": {
"img": {
"custom_tags": [
{
"id": "8abdb55f697f0a28a8264f3f0a320d09",
"confidence": 0.851,
"name": "Les Paul",
"bounding_box": {
"top": 1.1,
"left": 2.2,
"width": 3.3,
"height": 4.4
}
}
]
}
},
"next_page_token": ""
}
For an asset that is a video:
{
"contents": {
"video_frames": [
{
"time": 4.004,
"frame_id": "adb7e0d8787d2da396ee6e79ab80b0b0",
"thumbnail": "video_main_frames/frame-0000000002.jpg",
"custom_tags": [
{
"id": "cb76d5633637c0d060026adaf87a2804",
"confidence": 0.8071,
"name": "Les Paul",
"bounding_box": {
"top": 1.1,
"left": 2.2,
"width": 3.3,
"height": 4.4
}
}
]
},
{
"time": 8.008,
"frame_id": "8e9ebfb5b7fa100a636fe1aeef01ea5f",
"thumbnail": "video_main_frames/frame-0000000004.jpg",
"custom_tags": [
{
"id": "a7be4458334a717525a902ce4df2f358",
"confidence": 0.81195,
"name": "Stratocaster",
"bounding_box": {
"top": 1.1,
"left": 2.2,
"width": 3.3,
"height": 4.4
}
}
]
}
]
},
"next_page_token": "NP-BAwEBBVRva2VuAf-CAAEDAQVMaW1pdAEEAAEGT2Zmc2V0AQQAAQZQYXJhbXMB_4QAAAAh_4MEAQERbWFwW3N0cmluZ11zdHJpbmcB_4QAAQwBDAAAIv-CAwMFc3RhcnQHMzYuMDAwMQZ3aW5kb3cCMTYDYWxsAAA="
}
For an asset that is a document:
{
"contents": {
"pages": [
{
"page": 22,
"images": [
{
"image_index": 11,
"image_id": "7f6d4b027ba6cf303a3cd4108e99b866",
"thumbnail_path": "document_pages/thumb-pg-00022-img-00011.png",
"custom_tags": [
{
"id": "72157a50ea72d788ca102171639a3f45",
"confidence": 0.8273,
"name": "Gibson SG",
"bounding_box": {
"top": 1.1,
"left": 2.2,
"width": 3.3,
"height": 4.4
}
}
]
}
]
},
{
"page": 34,
"images": [
{
"image_index": 0,
"image_id": "ebf1c28240735ad2b63a348a9f6671ee",
"thumbnail_path": "document_pages/thumb-pg-00034-img-00000.png",
"custom_tags": [
{
"id": "628cd34b9b13f7b5a16d19d62923ce47",
"confidence": 0.81385,
"name": "Paul Reed Smith",
"bounding_box": {
"top": 1.1,
"left": 2.2,
"width": 3.3,
"height": 4.4
}
}
]
}
]
}
]
},
"next_page_token": "NP-BAwEBBVRva2VuAf-CAAEDAQVMaW1pdAEEAAEGT2Zmc2V0AQQAAQZQYXJhbXMB_4QAAAAh_4MEAQERbWFwW3N0cmluZ11zdHJpbmcB_4QAAQwBDAAAIv-CAwMFc3RhcnQHMzYuMDAwMQZ3aW5kb3cCMTYDYWxsAAA="
}
The next page token provides a stringified token that can be used to retrieve the next page of results. If the next page token is an empty string, there are no more results.
If any unexpected errors occurred in the process of fulfilling this request or response cycle, a Status Internal Server Error (500) is returned.
Getting Item Logos
To get logos for an item:
GET /api/data/v3/items/{id}/logos?page-token={page_token}&start={start}&window={window}&all={all}
- page_token - The next page token provided to page the results. When provided, a next page token is all that is needed to retrieve the next page of results. If page_token is set along with other query parameters, the page_token takes precedence.
- start - The time (video) or page (documents) to indicate where to start retrieving results for a given item. This has no affect on an IMG item.
- window - The time (video) or page (documents) to indicate where to end retrieving results for a given item. This has no affect on an IMG item.
- all - Provides all entries for an item regardless of whether or not the logo data is present.
If none are set, the full collection is returned without pagination. A valid page_token can be used without the addition of all, limit, and offset.
Response
A successful call returns a Status OK (200) with the following response body.
For an asset that is an IMG:
{
"contents": {
"img": {
"logos": [
{
"id": "8abdb55f697f0a28a8264f3f0a320d09",
"confidence": 0.851,
"name": "Adidas",
"bounding_box": {
"top": 1083,
"left": 236,
"width": 62,
"height": 88
}
}
]
}
},
"next_page_token": ""
}
For an asset that is a video:
{
"contents": {
"video_frames": [
{
"time": 4.004,
"frame_id": "adb7e0d8787d2da396ee6e79ab80b0b0",
"thumbnail": "video_main_frames/frame-0000000002.jpg",
"logos": [
{
"id": "cb76d5633637c0d060026adaf87a2804",
"confidence": 0.8071,
"name": "eastern connecticut state university",
"bounding_box": {
"top": 7,
"left": 6,
"width": 180,
"height": 21
}
}
]
},
{
"time": 8.008,
"frame_id": "8e9ebfb5b7fa100a636fe1aeef01ea5f",
"thumbnail": "video_main_frames/frame-0000000004.jpg",
"logos": [
{
"id": "a7be4458334a717525a902ce4df2f358",
"confidence": 0.81195,
"name": "eastern connecticut state university",
"bounding_box": {
"top": 7,
"left": 5,
"width": 182,
"height": 21
}
}
]
}
]
},
"next_page_token": "NP-BAwEBBVRva2VuAf-CAAEDAQVMaW1pdAEEAAEGT2Zmc2V0AQQAAQZQYXJhbXMB_4QAAAAh_4MEAQERbWFwW3N0cmluZ11zdHJpbmcB_4QAAQwBDAAAIv-CAwMFc3RhcnQHMzYuMDAwMQZ3aW5kb3cCMTYDYWxsAAA="
}
For an asset that is a document:
{
"contents": {
"pages": [
{
"page": 22,
"images": [
{
"image_index": 11,
"image_id": "7f6d4b027ba6cf303a3cd4108e99b866",
"thumbnail_path": "document_pages/thumb-pg-00022-img-00011.png",
"logos": [
{
"id": "72157a50ea72d788ca102171639a3f45",
"confidence": 0.8273,
"name": "misako",
"bounding_box": {
"top": 0,
"left": 306,
"width": 1079,
"height": 782
}
}
]
}
]
},
{
"page": 34,
"images": [
{
"image_index": 0,
"image_id": "ebf1c28240735ad2b63a348a9f6671ee",
"thumbnail_path": "document_pages/thumb-pg-00034-img-00000.png",
"logos": [
{
"id": "628cd34b9b13f7b5a16d19d62923ce47",
"confidence": 0.81385,
"name": "colgate",
"bounding_box": {
"top": 436,
"left": 375,
"width": 172,
"height": 151
}
}
]
}
]
}
]
},
"next_page_token": "NP-BAwEBBVRva2VuAf-CAAEDAQVMaW1pdAEEAAEGT2Zmc2V0AQQAAQZQYXJhbXMB_4QAAAAh_4MEAQERbWFwW3N0cmluZ11zdHJpbmcB_4QAAQwBDAAAIv-CAwMFc3RhcnQHMzYuMDAwMQZ3aW5kb3cCMTYDYWxsAAA="
}
The next page token provides a stringified token that can be used to retrieve the next page of results. If the next page token is an empty string, there are no more results.
If any unexpected errors occurred in the process of fulfilling this request or response cycle, a Status Internal Server Error (500) is returned.
Getting Item Mature Content
Get mature content categories for an item:
GET /api/data/v3/items/{id}/mature-content?page-token={page_token}&start={start}&window={window}&all={all}
- page_token - The next page token provided to page the results. When provided, a next page token is all that is needed to retrieve the next page of results. If page_token is set along with other query parameters, the page_token takes precedence.
- start - The time (video) or page (documents) to indicate where to start retrieving results for a given item. This has no affect on an IMG item.
- window - The time (video) or page (documents) to indicate where to end retrieving results for a given item. This has no affect on an IMG item.
- all - Provides all entries for an item regardless of whether or not the logo data is present.
If none are set, the full collection is returned without pagination. A valid page_token can be used without the addition of all, limit, and offset.
Response
A successful call returns a Status OK (200) with the following response body.
For an asset that is an IMG:
{
"contents": {
"img": {
"img_id": "0dc050b8997014a97a7585d57ba7a842",
"mature_content": [
{
"id": "89b0a7ceae5c157ffa2bb609112b13ac",
"item_id": "e926f9da91bb002aeb9eb4affcd0b885",
"segment_index": -1,
"image_index": -1,
"metadata_id": "0dc050b8997014a97a7585d57ba7a842",
"name": "adult",
"confidence": 0.9845221042633057,
"source": "azure"
},
{
"id": "1694acfc0920e4c5c1ed1818a7d374f6",
"item_id": "e926f9da91bb002aeb9eb4affcd0b885",
"segment_index": -1,
"image_index": -1,
"metadata_id": "0dc050b8997014a97a7585d57ba7a842",
"name": "racy",
"confidence": 0.9923509359359741,
"source": "azure"
}
]
}
},
"next_page": ""
}
For an asset that is a video:
{
"contents": {
"video_frames": [
{
"time": 10.01,
"frame_id": "8700e93f682bad0af6c568500041c381",
"thumbnail": "video_main_frames/frame-0000000005.jpg",
"mature_content": [
{
"id": "d5dcbcd6a5f9bbde9eb8b66d4ba96ff2",
"item_id": "f7f611715fe91c98abd241fd1f9567ba",
"segment_index": 10.01,
"image_index": -1,
"metadata_id": "8700e93f682bad0af6c568500041c381",
"name": "racy",
"confidence": 0.9602822661399841,
"source": "azure"
}
]
}
]
},
"next_page": ""
}
For an asset that is a document:
{
"contents": {
"pages": [
{
"page": 0,
"images": [
{
"image_index": 0,
"image_id": "88e096b93f357ce138ffb84c12971837",
"thumbnail_path": "document_pages/thumb-pg-00000-img-00000.png",
"mature_content": [
{
"id": "18198fe070802fca122b400de113b196",
"item_id": "738a3f8a44ad00b0a5a3d17ea4bd4673",
"segment_index": 0,
"image_index": 0,
"metadata_id": "88e096b93f357ce138ffb84c12971837",
"name": "racy",
"confidence": 0.9949294924736023,
"source": "azure"
},
{
"id": "4784891bf0b8c051a69edbce5bf4e605",
"item_id": "738a3f8a44ad00b0a5a3d17ea4bd4673",
"segment_index": 0,
"image_index": 0,
"metadata_id": "88e096b93f357ce138ffb84c12971837",
"name": "adult",
"confidence": 0.9898415207862854,
"source": "azure"
}
]
}
]
}
]
},
"next_page": ""
}
The next page token provides a stringified token that can be used to retrieve the next page of results. If the next page token is an empty string, there are no more results.
If any unexpected errors occurred in the process of fulfilling this request or response cycle, a Status Internal Server Error (500) is returned.
Item Tags
Getting Tags for an Item
GET /api/data/v3/items/{id}/tags?page-token={page_token}&start={start}&window={window}&all={all}
- page_token - The next page token provided to page the results. When provided, a next page token is all that is needed to retrieve the next page of results. If page_token is set along with other query parameters, the page_token takes precedence.
- start - The time (video) or page (documents) to indicate where to start retrieving results for a given item. This has no affect on an IMG item.
- window - The time (video) or page (documents) to indicate where to end retrieving results for a given item. This has no affect on an IIMG item.
- all - Provides all entries for an item regardless of whether or not the tag data is present.
If none are set, the full collection is returned without pagination. A valid page_token can be used without the addition of all, limit, and offset.
Response
A successful call returns a Status OK (200) with the following response body.
For an asset that is an IMG:
{
"contents": {
"img": {
"tags": [
{
"id": "eec2accad1cc1b757f3035bd8253ac04",
"text": "window",
"confidence": 0.9015815854072571
},
{
"id": "e95fc00150dbdf08ec3eb2e75c638ac1",
"text": "stained glass",
"confidence": 0.9015815854072571
},
{
"id": "c06ee06a066475e692b8b56433ab371a",
"text": "light",
"confidence": 0.8380934019465514
},
{
"id": "7cfb5e8e12cdfaeb10086a5a9f693786",
"text": "sphere",
"confidence": 0.5156272603992966
},
{
"id": "ef689cbeb566845b46be786438cb8782",
"text": "church",
"confidence": 0.25103029243584873
}
]
},
"pages": null,
"video_frames": null
},
"next_page": ""
}
For an asset that is a video:
{
"contents": {
"img": null,
"pages": null,
"video_frames": [
{
"frame_id": "3a222b4f2e58529ce0dc321cf299dadb",
"tags": [
{
"id": "b12f152bda07508c9a482d74fef7dac3",
"text": "summer",
"confidence": 0.312589002008962
},
{
"id": "a090dafce32110bde870b89055e75939",
"text": "autumn",
"confidence": 0.20671001655433419
}
],
"thumbnail_path": "video_main_frames/frame-0000000000.jpg",
"time": 0
},
{
"frame_id": "6decb0045799aef3f95dbc6327a992fd",
"tags": [
{
"id": "8153baacdde17515bb7d89aa09e10347",
"text": "firefighter",
"confidence": 0.9791649580001832
},
{
"id": "e5c846b2a75496779eb7c32b1f567a86",
"text": "person",
"confidence": 0.9791649580001831
},
{
"id": "6dcff09bdec2deb23d355d89e48a8f34",
"text": "smoke",
"confidence": 0.5069368303763367
}
],
"thumbnail_path": "video_main_frames/frame-0000000001.jpg",
"time": 2
}
]
},
"next_page": "NP-BAwEBBVRva2VuAf-CAAEDAQVMaW1pdAEEAAEGT2Zmc2V0AQQAAQZQYXJhbXMB_4QAAAAh_4MEAQERbWFwW3N0cmluZ11zdHJpbmcB_4QAAQwBDAAAJ_-CAwMDYWxsAAVzdGFydA0tMTExMTEwOS45OTk5BndpbmRvdwExAA=="
}
For an asset that is a document:
{
"contents": {
"img": null,
"pages": [
{
"images": [
{
"image_id": "6ae8b0cd1a26070884f08dac7336a2c2",
"image_index": 0,
"tags": [
{
"id": "e6d5e65e71d93bbada112bfbcb6a5ed0",
"text": "person",
"confidence": 0.9971379041671753
},
{
"id": "5a322b094c87b670ea71bf8eea7cc7ed",
"text": "man",
"confidence": 0.9918940663337708
}
],
"thumbnail_path": "document_pages/thumb-pg-00000-img-00000.png"
}
],
"page": 0,
"tags": "optionally, a page can have tags as well, or it can be embedded in the images within the page as shown above",
"thumbnail_path": "optionally, a page can have a thumbnail path as well as well as each embedded images thumbnail above",
"page_id": "a uuid to identify the page, optional, is guaranteed to be available when all query param is set"
},
{
"images": [
{
"image_id": "b604734057d3462894d6c1e75a8517b3",
"image_index": 0,
"tags": [
{
"id": "2ca9919a848d662a254315f34de006ca",
"text": "standing",
"confidence": 0.8040973544120789
},
{
"id": "807f196d29b0094d558479d50fddbb74",
"text": "crowd",
"confidence": 0.0062681203708052635
}
],
"thumbnail_path": "document_pages/thumb-pg-00001-img-00000.png"
}
],
"page": 1
}
],
"video_frames": null
},
"next_page": "NP-BAwEBBVRva2VuAf-CAAEDAQVMaW1pdAEEAAEGT2Zmc2V0AQQAAQZQYXJhbXMB_4QAAAAh_4MEAQERbWFwW3N0cmluZ11zdHJpbmcB_4QAAQwBDAAAJ_-CAwMDYWxsAAVzdGFydA0tMTExMTEwOS45OTk5BndpbmRvdwExAA=="
}
The next page token provides a stringified token that can be used to retrieve the next page of results. If the next page token is an empty string, there are no more results.
If any unexpected errors occurred in the process of fulfilling this request or response cycle, a Status Internal Server Error (500) is returned.
Adding Tags to an Item
POST /api/data/v3/items/{id}/tags
Request body:
{
"metadata_id": "metadata UUID",
"tags": ["tag1", "tag2","tag3"]
}
- "tags" - A list of tags you want to add. All tags are deduplicated before being applied to the segment.
Response
A successful call returns a Status Created (201) with the metadata segment (or segment parent) with new state of the segment, including any added tags that may have been added. This looks identical to the get tests response, except the response includes only the edited IMG/timeframe/page data.
Deleting Tags From a Segment by Name
DELETE /api/data/v3/items/{id}/tags?metaID={meta_id}&tagName={tag_name}
- meta_id - The segment or image UUID. If an image metadata ID is provided, it removes the tagname from all sibling images under that segment.
- tag_name - The name of the tag(s) to be deleted. This may result in deletion of multiple tags from the segment if they share names.
Response
A successful call returns a Status No Content (204) with no response body.
Getting Contents for a Document Item
GET /api/data/v3/items/{id}/text-contents?page-token={page_token}&start={start}&window={window}&all={all}&mask={mask}
- page_token - The next page token provided to page the results. When provided, a next page token is all that is needed to retrieve the next page of results. If page_token is set along with other query parameters, the page_token takes precedence.
- start - The page (documents) to indicate where to start retrieving results for a given item. This has no affect on an IMG item.
- window - The page (documents) to indicate where to end retrieving results for a given item. This has no affect on an IMG item.
- all - Provide all entries for an item regardless of whether or not the text content data is present.
- mask - Enables you to mask the embedded MLP data for a text content entry, which may result in faster results. Set mask=nlp to remove NLP data from being provided.
If none are set, the full collection is returned without pagination. A valid page_token can be used without the addition of all, limit and offset.
Response
A successful call returns a Status OK (200) with the following response body.
For an asset that is a document:
{
"contents": {
"pages": [
{
"page": 0,
"page_id": "1a15b610dd82bdfec2ee8ecf5597cc97",
"thumbnail_path": "document_pages/thumb-pg-00000.png",
"text_content": {
"id": "65a5d28b38228ebce12f8bab67e0f386",
"metadatas_id": "1a15b610dd82bdfec2ee8ecf5597cc97",
"text": "This is an example pdf\n\n\f",
"language": {
"code": "en-US",
"confidence": 0.7
}
"nlp_properties": {
"entities": null,
"key_phrases": null,
"sentiment": {
"text": "neutral",
"sentiment_confidence": {
"Mixed": 0.013882513158023357,
"Negative": 0.16380225121974945,
"Neutral": 0.7018685936927795,
"Positive": 0.12044669687747955
}
},
"language": {
"language": "en",
"confidence": 0.9962568283081055
}
}
}
},
{
"page": 1,
"page_id": "1bf644159b1b1c1e3025df76f7c66110",
"thumbnail_path": "document_pages/thumb-pg-00001.png",
"text_content": {
"id": "62773d56d314a578f746a080460195a7",
"metadatas_id": "1bf644159b1b1c1e3025df76f7c66110",
"text": "This is page 2 of the example pdf\n\n\f",
"language": null
"nlp_properties": {
"entities": [
{
"text": "page 2",
"confidence": 0.8359338045120239,
"type": "quantity"
}
],
"key_phrases": [
{
"text": "page 2",
"confidence": 0.9814304709434509
}
],
"sentiment": {
"text": "neutral",
"sentiment_confidence": {
"Mixed": 0.007130879443138838,
"Negative": 0.10643889009952545,
"Neutral": 0.783736526966095,
"Positive": 0.10269377380609512
}
},
"language": {
"language": "en",
"confidence": 0.9866665005683899
}
}
}
}
]
},
"next_page": ""
}
For the same asset but with the mask set to remove NLP information:
{
"contents": {
"pages": [
{
"page": 0,
"page_id": "1a15b610dd82bdfec2ee8ecf5597cc97",
"thumbnail_path": "document_pages/thumb-pg-00000.png",
"text_content": {
"id": "65a5d28b38228ebce12f8bab67e0f386",
"metadatas_id": "1a15b610dd82bdfec2ee8ecf5597cc97",
"text": "This is an example pdf\n\n\f",
"language": {
"code": "en-US",
"confidence": 0.7
}
}
},
{
"page": 1,
"page_id": "1bf644159b1b1c1e3025df76f7c66110",
"thumbnail_path": "document_pages/thumb-pg-00001.png",
"text_content": {
"id": "62773d56d314a578f746a080460195a7",
"metadatas_id": "1bf644159b1b1c1e3025df76f7c66110",
"text": "This is page 2 of the example pdf\n\n\f",
"language": null
}
}
}
]
},
"next_page": ""
}
If start and window are provided, then the results may be paginated. The next page token provides a stringified token that can be used to retrieve the next page of results. If the next page token is an empty string, there are no more results.
If any unexpected errors occurred in the process of fulfilling this request or response cycle, a Status Internal Server Error (500) is returned. If the item ID is for an item that does not exist, a 404 is returned indicating the item is not found.
Getting Item Text Tokens
To get the content of text files (.txt):
GET /api/data/v3/items/{id}/tokens
Response
A successful call returns a Status OK (200) with the following response body:
{
"tokens": "The quick brown fox jumps over the lazy dog"
}
If any unexpected errors occurred in the process of fulfilling this request or response cycle, a Status Internal Server Error (500) is returned.
Getting Item Extractor Runs
To get extractor runs for an item:
GET /api/data/v3/items/{id}/extractors?run-type={run-type}
- run-type- An enum from one of the following values:
- all - Returns all the extractors run for an item in its lifetime.
- latest - Returns the extractors from the last run.
- erroneous - Returns the extractors that have errors registered from their last run.
- unique - Returns a list of unique extractors from the history of the item. The latest extractor run for each extractor type is returned.
Response
A successful call returns a Status OK (200) with the following response body:
{
"extractors_runtime": [
{
"request_id": String,
"err": String,
"success": Bool,
"skipped": Bool,
"runtime": Integer duration in nanoseconds,
"start_at": Zulu Timestamp,
"end_at": Zulu Timestamp,
"info": {
"name": String,
"version": Integer
}
}
...
]
}
If the item is not found, a Status Not Found 404 is returned. If any unexpected errors occurred in the process of fulfilling this request or response cycle, a Status Internal Server Error (500) is returned.
Getting Extractor Source Files
Item Amazon Transcribe Source Word File Download
To download the source words for the Amazon Transcribe transcriptions, retrieve it using the following endpoint:
GET /api/files/{item_id}/sourcefiles/amazon_transcribe.json
Response
A response is a list of words and punctuation that make up dictation. The following example shows how each type would look in a source file.
{
"words": [
{
"start_time": "0",
"end_time": "0",
"type": "punctuation",
"alternatives": [
{
"confidence": "0",
"content": "."
}
]
},
{
"start_time": "0.28",
"end_time": "0.34",
"type": "pronunciation",
"alternatives": [
{
"confidence": "0.215",
"content": "Yeah"
}
]
},
...
]
}
Getting Distinct Item Types
Request
GET /api/data/v3/items/types
Response
A list of distinct item types from analyzed items in the system is returned.
{
"types": [
"audio",
"image/raster",
"video"
]
}
Listing Extractor History for an Item
Request
GET /api/data/v3/items/{id}/extractors/history
Response
The response includes a list of every extractor that has run, along with details about each historical run of that extractor.
{
"extractors": [
{
"id": "archive",
"runs": [
{
"request_id": "5df2a0e51fa4fd2993cf27a4ac4d26ab",
"error": "",
"success": true,
"skipped": false,
"start_at": "2019-12-12T20:19:50.094366Z",
"end_at": "2019-12-12T20:19:50.094429Z"
},
{
"request_id": "5deacee3b42b927cd17fc10094ab16d6",
"error": "",
"success": true,
"skipped": false,
"start_at": "2019-12-06T21:57:55.924024Z",
"end_at": "2019-12-06T21:57:55.924068Z"
}
]
},
{
"id": "document_pages",
"runs": [
{
"request_id": "5df2a0e51fa4fd2993cf27a4ac4d26ab",
"error": "",
"success": true,
"skipped": false,
"start_at": "2019-12-12T20:19:51.257149Z",
"end_at": "2019-12-12T20:19:51.257279Z"
},
{
"request_id": "5deacee3b42b927cd17fc10094ab16d6",
"error": "",
"success": true,
"skipped": false,
"start_at": "2019-12-06T21:58:06.406967Z",
"end_at": "2019-12-06T21:58:06.407036Z"
}
]
}
]
}
Listing Frames (FrameDNA)
To list frames in an item available for FrameDNA:
GET /api/data/v3/items/{id}/frames
Response
A list of frames is returned. Use the “frame_id” for the FrameDNA detail call.
{
"count": 2,
"frames": [
{
"frame_id": "b00fd699530f12452353f2532ebcefcf",
"time_seconds": 0,
"thumbnail": {
"path": "video_main_frames/frame-0000000000.jpg",
"type": "",
"frame_count": 0,
"height": 336,
"width": 624
}
},
{
"frame_id": "8dd3e5ec19525a1b764fcac5318fa4de",
"time_seconds": 2,
"thumbnail": {
"path": "video_main_frames/frame-0000000001.jpg",
"type": "",
"frame_count": 0,
"height": 336,
"width": 624
}
}
]
}
Getting FrameDNA for a Given Frame
GET /api/data/v3/items/{id}/frame-dna/{frame_id}
Response
A list of visual metadata is returned for the requested frame.
{
"frame_dna": {
"frame_id": "3a74a72d97414034ca3a23d0fb47fcbf",
"time_seconds": 2.002,
"thumbnail": {
"path": "video_main_frames/frame-0000000001.jpg",
"type": "",
"frame_count": 0,
"height": 360,
"width": 640
},
"adult_categories": [
{
"id": "812ee64889e16096fbe63a3bd0310a9e",
"category": "porn_detection"
},
{
"id": "f8d6138daf6d2f98d6f4360efcb9f517",
"category": "suggestive_nudity_detection"
}
],
"faces": null,
"ocr": [
{
"id": "9204e628d5ceb246eb1c25b40e65f335",
"text": "ALL OF THE",
"text_type": "lines",
"order": 0
},
{
"id": "d69999f160e966bc31e33d26f6b3a799",
"text": "GAME OF HRONES",
"text_type": "lines",
"order": 1
},
{
"id": "a5e65387116cc45e85dd17f36db14b46",
"text": "SEX & NUDITY",
"text_type": "lines",
"order": 2
},
{
"id": "546bc7fd9960a137719a08285fdeb23e",
"text": "SEASON FIVE",
"text_type": "lines",
"order": 3
}
],
"tags": [
{
"id": "6c399a5adb5d37e30c8995653d9d149b",
"text": "text"
},
{
"id": "550f30757f06d6a77c22326367a268e9",
"text": "design"
},
{
"id": "608dada3bb3f0377bb54d1061dd7d68c",
"text": "poster"
},
{
"id": "4cf6facfbbc7af5da784e913db542e38",
"text": "alcohol"
}
],
"description": {
"id": "346b1c99ee0f9de724c5ef65f3acc00f",
"text": "a close up of food",
"language": ""
},
"custom_tags": null,
"locations": null,
"logos": [
{
"id": "869dc02aa066b3d7f1f051af220759dc",
"name": "A Game of Thrones"
}
],
"technical_cues": {
"black_frame": true,
"color_bars": false,
"credits": false,
"digital_slate": false,
"slate": false,
"texted": false
}
}
}