Activity API in Wasabi AiR
    • 18 Jul 2024
    • 7 Minutes to read
    • PDF

    Activity API in Wasabi AiR

    • PDF

    Article summary

    The Activity API enables you to view details about a specific harvesting job. This endpoint enables you to find extractor errors and items that had a specific error message.

    Listing Extractor Errors

    To show you the number of errors each extractor had:

    GET /api/data/v3/activity/request/{request_id}/extractors
    • {request_id} - (string) The harvest job request ID.

    Response

    {
    	"extractors": [{
    			"name": "captions",
    			"display_name": "Embedded Captions",
    			"num_errors": 0
    		},
    		{
    			"name": "document_pages",
    			"display_name": "Documents",
    			"num_errors": 0
    		},
    		{
    			"name": "drm",
    			"display_name": "DRM",
    			"num_errors": 2
    		},
    		{
    			"name": "exiv2",
    			"display_name": "EXIV2",
    			"num_errors": 0
    		}
    	]
    }
    • extractors[].name - (string) The actual name of the extractor that ran in this harvest. This value is used as {extractor} in all other Activity endpoint API requests.
    • extractors[].display_name - (string) The display name of the extractor that ran in this harvest.
    • extractors[].num_error - (integer) The number of errors the extractor caught.

    Listing Errors for an Extractor

    This API lists all errors that occurred in the harvesting job. Each error has an error_hash that enables you to find items in the next API endpoint.

    GET /api/data/v3/activity/request/{request_id}/extractors/{extractor}?page_token={token}&limit={limit}&offset={offset}
    • {request_id} - (string) The harvest job request ID.
    • {extractor} - (string) The extractor name.

    Optional Queries

    • limit - (integer) The maximum results to show.
    • offset - (integer) The offset number of results (used for pagination).
    • page-token - (string) Show results for the next set of results, from the next_page_token value.

    Response

    {
    	"extractor": "credits",
    	"extractor_name": "Credits",
    	"errors": [{
    		"error": "credits: error message ouchhh",
    		"error_hash": "8f1e07deac51451cc1ed2778312af796",
    		"num_files": 1
    	}],
    	"next_page_token": "",
    	"previous_page_token": ""
    }
    • extractor - (string) The actual name of the extractor being searched.
    • extractor_name - (string) The display name of the extractor being searched.
    • errors[].error - (string) The error message that occurred.
    • errors[].error_hash - (string) The MD5 hash of the error message.
    • errors[].num_files - (integer) The number of errors with the same error message.
    • num_files - (integer) The total number of files that had errors with the extractor.
    • next_page_token - (string) The next set of paginated results for the page-token query.
    • previous_page_token - (string) The previous set of paginated results for the page-token query.

    The previous and next page tokens can be used to retrieve the previous or next page of results. If the previous or next page tokens are an empty string, there are no more results. Use previous_page_token or next_page_token as the URL query string value for the page-token to fetch the previous/next set of results.

    Listing Items That Have a Specific Error

    This API lists all items in the harvesting job that had a specific error message. The error_md5 is an MD5 hash of the error message string.

    GET /api/data/v3/activity/request/{request_id}/extractors/{extractor}/errors/{error_md5}/files?page_token={token}&limit={limit}&offset={offset}
    • {request_id} - (string) The harvest job request ID.
    • {extractor} - (string) The extractor name.
    • {error_md5} - (string) The error message hashed into MD5.

    Optional Queries

    • limit - (integer) The maximum results to show.
    • offset - (integer) The offset number of results (used for pagination).
    • page-token - (string) Show results for the next set of results, from the next_page_token value.

    Response

    {
        "files": [
            {
                "item_id": "a517fe747f698e1a62d382a4addbc825",
                "name": "smpte-color-bars.mp4"
            }
        ],
        "num_files": 1,
        "next_page_token": "",
        "previous_page_token": ""
    }
    • files[].item_id - (string) The Item ID that had this extractor error.
    • files[].name - (string) The full name of the item.
    • num_files - (integer) The total number of files that had errors with the extractor.
    • next_page_token - (string) The next set of paginated results for the page-token query.
    • previous_page_token - (string) The previous set of paginated results for the page-token query.

    The previous and next page tokens can be used to retrieve the previous or next page of results. If the previous or next page tokens are an empty string, there are no more results. Use previous_page_token or next_page_token as the URL query string value for the page-token to fetch the previous/next set of results.

    Listing Multiple Extractor Errors

    To return extractor errors for all of the request IDs you send as a JSON array of strings:

    POST /api/data/v3/activity/bulk/extractors
    
    {"jobs": ["{request_id}", "{request_id}"]}

    • jobs - (string) The JSON object that contains an array of request_ids.
    • request_id[] - (string) The request ID for a specific job.

    Response

    You will receive the following response in the same order as your array of request IDs.

    [
    	{
    		"request_id": "5dfd2a32e285d27ed8f17d46f29869d2",
    		"extractors": [
    			{
    				"name": "black_scenes",
    				"display_name": "Black Frames",
    				"num_errors": 0
    			},
    			{
    				"name": "drm",
    				"display_name": "DRM",
    				"num_errors": 0
    			},
    			{
    				"name": "exiv2",
    				"display_name": "EXIV2",
    				"num_errors": 0
    			},
    			{
    				"name": "hashes",
    				"display_name": "Hashes",
    				"num_errors": 0
    			}
    		]
    	},
    	{
    		"request_id": "5dfd1289dc77d22895667329961ceece",
    		"extractors": [
    			{
    				"name": "drm",
    				"display_name": "DRM",
    				"num_errors": 0
    			},
    			{
    				"name": "exiv2",
    				"display_name": "EXIV2",
    				"num_errors": 0
    			},
    			{
    				"name": "hashes",
    				"display_name": "Hashes",
    				"num_errors": 0
    			}
    		]
    	}
    ]
    • request_id - (string) The request ID for this object.
    • extractors[].name - (string) The actual name of the extractor that ran in this harvest. This value is used as {extractor} in all other Activity endpoint API requests.
    • extractors[].display_name - (string) The display name of the extractor that ran in this harvest.
    • extractors[].num_error - (integer) The number of errors the extractor caught.

    Listing Items Harvested for a Specific Request

    To show all the items associated with the request:

    GET /api/data/v3/activity/request/{request_id}/items?limit={limit}&page-token={page_token}
    • {request_id} - (string) The harvest job request ID.
    • {limit} - (int) Limit the number of results (default: 10, max: 1000).
    • {page-token} - The token specifying the results page (for multi-page results).

    Response

    {
        "files": [
            {
                "item_id": "a517fe747f698e1a62d382a4addbc825",
                "name": "smpte-color-bars.mp4"
            }
        ],
        "num_files": 1,
        "next_page_token": "",
        "previous_page_token": ""
    }
    • files[].item_id - (string) The item_id in process by request ID.
    • files[].name - (string) The full name of the item.
    • num_files - (int) The total number of files that had errors with the extractor.
    • next_page_token - (string) The next set of paginated results for the page-token query.
    • previous_page_token - (string) The previous set of paginated results for the page-token query.

    Listing Jobs for a Specific Item

    To search all job_id in the activity index for a specific item_id and return the job status for those job_ids:

    GET /api/data/v3/activity/item/{item_id}?limit={limit}&page-token={page_token}
    • {item_id} - (string) The item ID for which to search.
    • {limit} - (int) Limit the number of results (default: 10, max: 100).
    • {page-token} - The token specifying the results page (for multi-page results).
    {
        "jobs": [
            {
                "request_id": "5f69073f7a855cc76574acc96e105667",
                "container_id": "1ebc9ff9b9310d8ea5fe259b96a9fe38",
                "user_id": "5f69032a319d4f3aeb580c2946e3d1de",
                "walked_count": 1,
                "indexed_count": 1,
                "created": "2020-09-21T20:04:15.126019Z",
                "updated": "2020-09-21T20:05:00.495541Z",
                "job_type": "harvest",
                "cancelled": false,
                "walk_complete": true,
                "error_count": 1
            }
        ],
        "num_jobs": 1,
        "next_page_token": "",
        "previous_page_token": ""
    }
    • num_job - (int) The total number of jobs found for the item_id.
    • next_page_token - (string) The next set of paginated results for the page-token query.
    • previous_page_token - (string) The previous set of paginated results for the page-token query.

    Getting All Activity for an item_id and request_id

    To return all the activity for a specific item ID and request ID:

    GET /api/data/v3/activity/item/{item_id}/{request_id}?limit={limit}&page-token={page_token}
    • {item_id} - (string) The item ID for which to search.
    • {request_id} - (string) The request ID for which to search.
    • {limit} - (int) Limit the number of results (default: 10, max: 100).
    • {page-token} - The token specifying the results page (for multi-page results).
    {
        "activities": [
            {
                "ts": "2020-09-21T22:34:41.4519053Z",
                "kind": "harvestend",
                "request_id": "5f692a73b7872fa794f7c1f0159663f1",
                "location_id": "5f690733ea16ebd7e6de5f7cee8c5199",
                "container_id": "/data/files/colorbars",
                "harvest_id": "harvest:development:5f692a8155c9bc800ba794e54a91aceb",
                "profile_id": "default",
                "extractors": [
                    "audioinfo"
                ],
                "name": "smpte-color-bars.mp4",
                "source": "harvest",
                "item_id": "a517fe747f698e1a62d382a4addbc825",
                "count": 0,
                "bytes": 44909,
                "error": "",
                "error_hash": "",
                "duration": 97970200
            },
            {
                "ts": "2020-09-21T22:34:41.4126822Z",
                "kind": "extractor",
                "request_id": "5f692a73b7872fa794f7c1f0159663f1",
                "location_id": "5f690733ea16ebd7e6de5f7cee8c5199",
                "container_id": "/data/files/colorbars",
                "harvest_id": "harvest:development:5f692a8155c9bc800ba794e54a91aceb",
                "profile_id": "default",
                "extractors": [],
                "name": "smpte-color-bars.mp4",
                "source": "audioinfo",
                "item_id": "a517fe747f698e1a62d382a4addbc825",
                "count": 0,
                "bytes": 44909,
                "error": "audioinfo: no audio",
                "error_hash": "6199cf359b32392d186f07f7756e165c",
                "duration": 5820100
            },
            {
                "ts": "2020-09-21T22:34:41.3775794Z",
                "kind": "harvestdownload",
                "request_id": "5f692a73b7872fa794f7c1f0159663f1",
                "location_id": "5f690733ea16ebd7e6de5f7cee8c5199",
                "container_id": "/data/files/colorbars",
                "harvest_id": "harvest:development:5f692a8155c9bc800ba794e54a91aceb",
                "profile_id": "default",
                "extractors": [
                    "audioinfo"
                ],
                "name": "smpte-color-bars.mp4",
                "source": "harvest",
                "item_id": "a517fe747f698e1a62d382a4addbc825",
                "count": 0,
                "bytes": 44909,
                "error": "",
                "error_hash": "",
                "duration": 23643900
            },
            {
                "ts": "2020-09-21T22:34:27.5080515Z",
                "kind": "walk",
                "request_id": "5f692a73b7872fa794f7c1f0159663f1",
                "location_id": "5f690733ea16ebd7e6de5f7cee8c5199",
                "container_id": "/data/files/colorbars",
                "harvest_id": "",
                "profile_id": "default",
                "extractors": [
                    "audioinfo"
                ],
                "name": "smpte-color-bars.mp4",
                "source": "walkd",
                "item_id": "a517fe747f698e1a62d382a4addbc825",
                "count": 0,
                "bytes": 44909,
                "error": "",
                "error_hash": "",
                "duration": 0
            }
        ],
        "num_activities": 4,
        "next_page_token": "",
        "previous_page_token": ""
    }
    • num_activities - (int) The total number of activities found for the item_id and request_id.
    • next_page_token - (string) The next set of paginated results for the page-token query.
    • previous_page_token - (string) The previous set of paginated results for the page-token query.

    Listing Extractor Errors for a Specific Item

    To show the number of errors each extractor had for a specific item:

    GET /api/data/v3/activity/item/{item_id}/{request_id}/extractors
    • {item_id} - (string) The Item ID.
    • {request_id} - (string) Harvest Job request ID.

    Response

    {
    	"extractors": [{
    			"name": "captions",
    			"display_name": "Embedded Captions",
    			"num_errors": 0
    		},
    		{
    			"name": "document_pages",
    			"display_name": "Documents",
    			"num_errors": 0
    		},
    		{
    			"name": "drm",
    			"display_name": "DRM",
    			"num_errors": 2
    		},
    		{
    			"name": "exiv2",
    			"display_name": "EXIV2",
    			"num_errors": 0
    		}
    	]
    }
    • extractors[].name - (string) The actual name of the extractor that ran in this harvest. This value is used as {extractor} in all other Activity endpoint API requests.
    • extractors[].display_name - (string) The display name of the extractor that ran in this harvest.
    • extractors[].num_error - (integer) The number of errors the extractor caught.

    Listing Errors for an Extractor

    To show details of an error for a specific item_id, request_id, and extractor:

    GET /api/data/v3/activity/item/{item_id}/{request_id}/extractors/{extractor}
    • {item_id} - (string) The Item ID.
    • {request_id} - (string) Harvest Job request ID.
    • {extractor} - (string) Extractor name.

    Response

    {
        "extractor": "audioinfo",
        "extractor_name": "Audio Info",
        "errors": [
            {
                "error": "audioinfo: no audio",
                "error_hash": "6199cf359b32392d186f07f7756e165c"
            }
        ]
    }
    • extractor - (string) The actual name of the extractor being searched.
    • extractor_name - (string) The display name of the extractor being searched.
    • errors[].error - (string) The error message that occurred.
    • errors[].error_hash - (string) The MD5 hash of the error message.



    ESC

    Eddy AI, facilitating knowledge discovery through conversational intelligence