Insight Analytics API
The Insight Analytics API offers means to observe the work done with connected devices.
Use cases include:
- Providing operational dashboards to understand the current state
- Exploring data to discover process dynamics
- Exporting data (e.g. as .csv file) to process in custom BI tools
Time series data model
The analytics is done using the time series data where all data is stored with a specific point in time to which the data correlates to.
E.g. every scan of a connected device creates a datum at the time of the scan. The data persisted at this point in time is all the information retrieved about the scan: E.g. information about the devices used, scan-related metadata such as the decoded symbology, but also inferred information such as an estimation of the worker effort preceding such as step counts or missed scanning attempts.
The data model provides efficient access to the data persisted for a specific interval of time (e.g. a day, hour, etc.) Additionally, other attributes (e.g. scan symbology, device serial number, etc.) can be used to further filter or group the data.
Conceptionally, you can consider the data model a big table where the data is indexed by timestamps with additional columns:
Time | Attribute: Symbology | Attribute: Device Serial | Attribute: Scan Duration |
---|---|---|---|
2021.01.23 06:02:12 | QR Code | M2MD010101234 | 2.03 s |
2021.01.23 06:03:14 | DataMatrix | M2MB010105678 | 9.23 s |
2021.01.23 11:17:45 | QR Code | M2MD010101234 | 0.54 s |
2021.01.23 14:18:03 | QR Code | M2MD010101234 | 5.54 s |
Each of these columns is either a simple attribute or a measure.
A measure has a varying value of a specific type that is meaningful to observe. This can be a simple count like the number of steps taken or a continous measure such as the duration of triggering the scanner to perform a barcode scan.
An attribute is any kind of label that further identifies a measure. Examples include the device's metadata such as the serial number or the firmware version, process details (like the location, or a process group of a connectivity device), or just the symbology of the decoded barcode. Below, you can find the current list of potential attributes attached to measure.
Recorded measures
Measure | SQL Data Type | Description |
---|---|---|
device_battery |
DOUBLE | The percentage of the battery's charge levels |
scan_code |
VARCHAR | The scanned barcode value |
scan_duration |
DOUBLE | The trigger time in seconds needed to scan the barcode |
metrics_worker_steps |
DOUBLE | The approximate number of steps since the last scan |
bce_activation_total |
DOUBLE | The sum of trigger times for unsuccessful scans |
bce_num_activations |
DOUBLE | The number of trigger presses for unsuccessful scans |
trigger_effort_bce_activation_total |
DOUBLE | The barcode scanning effort measure in miliseconds |
trigger_effort_bce_num_activations |
DOUBLE | The scanning effort measured as a number of trigger button presses. |
time_effort |
DOUBLE | The time effort between the current scan and the previous successful scan made by the same device, measured in milliseconds (max. 5 min). |
Recorded attributes
Each measurement can be identified by a set of metadata attributes:
Attribute | Description |
---|---|
device_model |
Worker scanner device model |
device_manufacturer |
Worker scanner device manufacturer |
device_serial |
Worker scanner device serial number |
device_firmware |
Worker scanner device firmware revision |
gateway_model |
Connectivity device model |
gateway_manufacturer |
Connectivity device manufacturer |
gateway_id |
Connectivity device ID |
gateway_type |
Connectivity device deploymen type |
gateway_firmware |
Connectivity device software revision |
event_type |
Type of event: scan or telemetry |
event_id |
Unique event ID |
scan_decode_symbology |
The decoded barcode symbology |
customer_gateway_name |
Connectivity device name |
customer_l0_name |
Name of the first associated level in the connectivity device organization |
customer_l1_name |
Name of the second associated level in the connectivity device organization |
customer_l2_name |
Name of the third associated level in the connectivity device organization |
customer_l3_name |
Name of the forth associated level in the connectivity device organization |
gateway_l0_id |
ID of the first associated level in the connectivity device organization |
gateway_l1_id |
ID of the second associated level in the connectivity device organization |
gateway_l2_id |
ID of the third associated level in the connectivity device organization |
gateway_l3_id |
ID of the forth associated level in the connectivity device organization |
Time series query structure
To access the time-series data, a simple sub-dialect of SQL (Structured Query Language) offers basic capabilities to filter, group, and represent the data:
An analytics query can specify the following constraints:
- A time frame to extract data from with the inclusive start and end time stamps
- A resolution in the time domain by which it will be reported and its data grouped
- The selected measurement and how it is aggregated
- Optional additional grouping to further divide the data (e.g. to present distinct devices)
- Optional additional filtering to further restrict what data to report (e.g. to only show the data of the selected device)
- Optional ordering clauses that sort the resulting data
Example: Time series query structure
Retrieve the number of scans for a use case in time frames of one hour grouped by the used scanning device:
{
"range_start": "2021-02-20T00:00:00+0000",
"range_end": "2021-02-22T23:59:59+0000",
"resolution": "1h",
"select": [
{"name": "measure_value::varchar", "aggregate": "count", "alias": "scans"},
{"name": "gateway_l3_id"}
],
"where": [
{"name": "measure_name", "operator": "=", "value": "scan_code"},
{"name": "gateway_l2_id", "operator": "=", "value": "b2a931"}
],
"groupby": ["gateway_l3_id"]
}
The sections below describe the existing constraints:
1.Time frame
The time-series data is always selected within the bounds of a time frame.
This is specified by two inclusive points in time, range_start
and range_end
, as the boundaries.
Only data within the interval of these points in time is considered for the query and its result.
Calendar-sensitive resolutions: weeks and months
Most resolutions are constant durations expressed in seconds. They are insensitive to time zones and other calendar effects.
In contrast, the resolutions w
for week and M
for month are both sensitive to the calendar and time zone as they represent a full calendar week or month rather than the number of seconds a week or an average month would take.
For this reason, certain restrictions and special behavior apply to week and month resolutions:
- At the moment, the resolutions of weeks or months are only possible with a single week or month respectively.
- The start of a week or a month is adjusted by evaluating the time zone of the range of the timestamps.
Time zone adjustments
As outlined above both week and month resolution (w
and M
) are time zone sensitive.
When selecting them, these rules apply:
The time zone of the range_start
parameters is evaluated.
The UTC offset of the time zone is used to transpose the result times to match the local time zone.
Time Range Zones | Example | Effect |
---|---|---|
different | range_start : "2000-01-01:00:00+0000" |
warning |
range_end : "2000-01-01:00:00+0100" |
||
UTC | range_start : "2000-01-01:00:00+0000" |
no adjustment |
range_end : "2000-01-01:00:00+0000" |
||
local time zone | range_start : "2000-01-01:00:00+0100" |
adjustment +1h |
range_end : "2000-01-01:00:00+0100" |
This is done to allow the client to get results reflecting either the neutral UTC time zone or a local time zone of the users choice.
2.Resolution
The resulting time-series data is grouped into equidistant time slots. By picking a suitable resolution, the granularity of this grouping can be adjusted to fit the time range and observation requirements.
To group the data into time windows the following resolution values are available:
Resolution | Factor | Description | Example |
---|---|---|---|
s |
any | Seconds | 30s |
m |
any | Minutes | 10m |
h |
any | Hours | 12h |
d |
any | Days | 3d |
w |
1 | (One) week | 1w |
M |
1 | (One) month | 1M |
Example: Resolution
E.g. in the first query above, without any other grouping clauses, picking different resolution values yields these results: Consider we aggregate over the resolution windows by summing up the Scan Duration.
Resolution | Sum(Scan Duration) |
---|---|
1 min | 2.03 s , 9.23 s, 0.54 s , 5.54 s |
1 h | 11.26 s , 0.54 s , 5.54 s |
1 d | 17.34 s |
3.Measurement and aggregation
To select measurements:
- select
measure_value::double
ormeasure_value::varchar
according to the type in the table of Recorded measures - filter by the
measure_name
of the desired measure
E.g. this selects the scan_duration
measure:
{
"select": [
{"name": "measure_value::double", "aggregate": "sum"},
],
"where": [
{"name": "measure_name", "aggregate": "=", "value": "scan_duration"},
]
}
To aggregate data, you can use one of these functions:
Aggregation | Description |
---|---|
avg |
Arithmetic mean of all values |
sum |
Sum of all values |
count |
Number of all values |
These aggregation functions are applied to all selected data with a single group and time resolution.
4.Grouping
The resulting data can be grouped by:
- The resolution in the time domain
- And any grouping clauses
Example: Grouping
Additionally, the groups can be subdivided by attributes other than time. In the example above, if we additionally group by the Device Serial Number, the time windows of the resolution parameter are reported for each device:
Resolution | Sum(Scan Duration) |
---|---|
1 min | M2MD010101234:2.03 s , M2MB010105678:9.23 s, M2MD010101234:0.54 s , M2MD010101234:5.54 s |
1 h | M2MD010101234:2.03 s , M2MB010105678:9.23 s, M2MD010101234:0.54 s , M2MD010101234:5.54 s |
1 d | M2MD010101234:8.11 s , M2MB010105678:9.23 s |
5.Filtering
Dividing the data for analysis only into groups of time windows or attribute values is often not enough as it does not allow you to zoom in on specific data. Instead, before grouping and aggregating the data, a filtering pass is necessary.
E.g. consider the retrieved data should be that of a specific process or a set of devices. To achieve this, a filtering clause can be used to exclude any data not matching the criteria:
At the moment data can be matched only by comparing for equality of a string value. To match data attributes these string-valued operators can be used:
Operator | Description |
---|---|
= |
Test for equality |
Example: Filtering
In the example, if we filter by Device Serial to be equal to M2MB010105678
, we aggregate only on this row:
Time | Attribute: Symbology | Attribute: Device Serial | Attribute: Scan Duration |
---|---|---|---|
2021.01.23 06:03:14 | DataMatrix | M2MB010105678 | 9.23 s |
6.Ordering
The order of the returned data can be selected with an ordering clause: One or more attributes can be selected to order the data by. In addition, the data can be either returned in an ascending (lowest values first) or descending direction.
An important use of data ordering is selecting the tail or head of some data: E.g. the top 5 of some measurement or the lowest 10 of some datum can be implemented by ordering such that the first returned rows are the wanted ones and selecting a limit to get exactly the wanted rows.
Example: Ordering
If we were now interested in the longest scan duration we would take it from the top row with ordering set to duration
:
{
"range_start": "2021-02-20T00:00:00+0000",
"range_end": "2021-02-22T23:59:59+0000",
"select": [
{"name": "measure_value::double", "aggregate": "sum", "alias": "duration"},
{"name": "gateway_l3_id"}
],
"where": [
{"name": "measure_name", "aggregate": "=", "value": "scan_duration"},
],
"groupby": ["gateway_l3_id"],
"orderby": ["duration"],
}
The results would be following:
Time | Attribute: Symbology | Attribute: Device Serial | Attribute: Scan Duration |
---|---|---|---|
2021.01.23 06:03:14 | DataMatrix | M2MB010105678 | 9.23 s |
Time series query syntax
When calling an Analytics API endpoint, the time-series query is represented as a structured input such as a JSON object, rather than a DSL string. While this is not as expressive as using a parsed SQL DSL string it has several advantages:
The query syntax is directly specified in the OpenAPI specification as the only documentation of the API. There is no need to maintain another form of input specification for the query syntax. Furthermore, injecting malicious SQL is much harder as parsing and sanitizing requests are much simpler.
Query OpenAPI schema
To query the /{customer_id}/analytics/query
endpoint the OpenAPI 3 schema of the request is the following TimeSeriesQuery
:
openapi: 3.0.3
components:
schemas:
TimeSeriesQuery:
type: object
required:
- range_start
- range_end
- select
- resolution
properties:
range_start:
description: "TheiInclusive oldest time of the data from an ISO 8601 notation. If a UTC offset is given, the responded timestamps will also be in local time, using this parameter's offset."
type: string
format: date-time
example: "2021-01-08T08:14:52+0000"
range_end:
description: "The inclusive newest time of the data from an ISO 8601 notation. If a UTC offset is given, the responded timestamps will be also in local time, using the offset provided via range_start"
type: string
format: date-time
example: "2021-01-08T16:14:52+0000"
resolution:
description: "The time resolution to bin the data into equally sized buckets: Must be a positive integer followed by s,m,h,d for the second, minute, hour, and day. Additionally, 1w and 1M (one week and one month) are also supported."
type: string
pattern: '^\d+(s|m|h|d)|1w|1M$'
example: "1h"
select:
description: "The list of selected clauses determining what data columns to retrieve."
type: array
items:
$ref: "#/components/schemas/SelectClause"
minLength: 1
maxLength: 5
where:
description: "The list of filtering clauses determining the data records to retrieve the data from. Implicitly filled with the range_start and range_end even if unset otherwise."
type: array
items:
$ref: "#/components/schemas/FilterClause"
minLength: 0
maxLength: 5
groupby:
description: "The list of columns to group the data by, additional to the implicit binned time resolution. Results in one time series per column value."
type: array
items:
type: string
minLength: 0
maxLength: 1
order:
description: "Sort order"
type: string
enum:
- ASC
- DESC
default: DESC
orderby:
description: "Sorting fields. The results are always ordered by the implicit time first. Then we sort by the order of fields supplied here."
type: array
items:
type: string
minLength: 0
maxLength: 5
SelectClause:
type: object
required:
- name
properties:
name:
description: "Name of the column (measure, attribute or function) to select."
type: string
example: "measure_value::varchar"
aggregate:
description: "Aggregation function to use."
type: string
enum:
- avg
- count
- sum
example: "count"
alias:
description: "Name to map the column to for convenience."
type: string
example: "scans"
FilterClause:
type: object
required:
- name
- value
properties:
name:
description: "The name of the column (measure, attribute) to compare. Must be a string value for now"
type: string
example: "measure_name"
operator:
description: "Comparison operator. Must be '=' for now."
type: string
value:
description: "Value to compare against."
type: string
example: "scan_code"
Response OpenAPI schema
The obtained response TimeSeries
consists of a XAxis
listing the labels of each reported time in the series and the time series data itself:
A list with one element for each XAxis
element which itself is a list of all the selected fields.
components:
schemas:
TimeSeries:
type: object
description: "Time series data"
required:
- xaxis
- series
properties:
xaxis:
$ref: "#/components/schemas/XAxis"
series:
type: array
items:
$ref: "#/components/schemas/SeriesCollection"
warning:
type: string
description: "An optional warning, displayed if something is wrong or when an ambiguity is removed."
example: "Different UTC time offsets for the start and end time range detected. Reporting with applied start time range's offset."
XAxis:
type: object
description: "Time series data X-axis values: Either the binned times or other grouping selectors"
required:
- categories
- type
properties:
type:
type: string
example: "datetime"
categories:
type: array
items:
type: string
example: "2021-01-08 10:00:00.000000000"
SeriesCollection:
description: "The list of the time-series."
type: array
items:
$ref: "#/components/schemas/Series"
Series:
description: "The time-series data values matching the X-Axis positions and the selected clauses."
type: object
required:
- data
- name
properties:
data:
type: array
items:
type: string
description: "The requested data."
example: "9"
name:
type: string
description: "The name of the series. Corresponds to the first selected name or the value of the groupby attribute, if one is provided."
example: "scans"
Example queries
Retrieve the number of scans for a use case in time windows of one hour grouped by the used scanner device:
{
"range_start": "2021-02-20T00:00:00+0000",
"range_end": "2021-02-22T23:59:59+0000",
"resolution": "1h",
"select": [
{"name": "measure_value::varchar", "aggregate": "count", "alias": "scans"},
{"name": "device_serial"}
],
"where": [
{"name": "measure_name", "operator": "=", "value": "scan_code"},
{"name": "gateway_l2_id", "operator": "=", "value": "b2a931"}
],
"groupby": ["device_serial"]
}
Summarizes the number of steps in one day for a specific use case in all locations:
{
"range_start": "2021-01-08T00:00:00+0000",
"range_end": "2021-01-08T23:59:59+0000",
"select": [
{"name": "measure_value::double", "aggregate": "sum", "alias": "steps"},
],
"where": [
{"name": "measure_name", "operator": "=", "value": "metrics_worker_steps"},
{"name": "customer_l3_name", "operator": "=", "value": "picking"},
],
"resolution": "1h"
}
Retrieves the overhead of acquiring a successful scan measured by the combined durations of triggering the scan engine preceding a successful scan for a whole day:
{
"range_start": "2021-01-08T00:00:00+0000",
"range_end": "2021-01-08T23:59:59+0000",
"select": [
{"name": "measure_value::double", "aggregate": "sum", "alias": "activations"},
],
"where": [
{"name": "measure_name", "operator": "=", "value": "trigger_effort_bce_activation_total"},
],
"resolution": "1h"
}
Patterns and use cases
This is meant as a practicioners collection of emerging patterns and interesting use cases. The presentation favors the problem statements over cohesion and is meant to evolve over time with emerging patterns.
Local time zones in time resolutions
As described above, use a UTC offset in the time range fields to control how the responded times are time zone adjusted. Also
E.g. this would select a time zone shifted 2 hours in respect to UTC:
{
"range_start": "2021-01-02:00:00+0200",
"range_end": "2021-01-08T23:59:59+0200",
"select": [
{"name": "measure_value::double", "aggregate": "sum", "alias": "steps"},
],
"resolution": "1w"
}