haralyzer package

haralyzer.assets module

Provides all of the main functional classes for analyzing HAR files

class haralyzer.assets.HarEntry(entry: dict)[source]

Bases: haralyzer.mixins.MimicDict

An object that represent one entry in a HAR Page

cache
Returns:Cached objects
Return type:str
cookies
Returns:Request and Response Cookies
Return type:list
pageref
Returns:Page for the entry
Return type:str
port
Returns:Port connection was made to
Return type:int
request
Returns:Request of the entry
Return type:Request
response
Returns:Response of the entry
Return type:Response
secure
Returns:Connection was secure
Return type:bool
serverAddress
Returns:IP Address of the server
Return type:str
startTime

Start time and date

Returns:Start time of entry
Return type:Optional[datetime.datetime]
status
Returns:HTTP Status Code
Return type:int
time
Returns:Time taken to complete entry
Return type:int
timings
Returns:Timing of the page load
Return type:dict
url
Returns:URL of Entry
Return type:str
class haralyzer.assets.HarPage(page_id: str, har_parser: Optional[haralyzer.assets.HarParser] = None, har_data: dict = None)[source]

Bases: object

An object representing one page of a HAR resource

actual_page

Returns the first entry object that does not have a redirect status, indicating that it is the actual page we care about (after redirects).

Returns:First entry of the page
Return type:HarEntry
audio_files

All audio files for a page

Returns:Audio entries for a page
Return type:List[HarEntry]
audio_load_time

Audio load time

Returns:Load time for audio on a page
Return type:int
audio_size

Size of audio files from the page

Returns:Size of audio files on the page
Return type:int
audio_size_trans

Audio transfer size

Returns:Size of transfer data for audio
Return type:int
content_load_time

Content load time

Returns:Load time for all content
Return type:int
css_files

All CSS files for a page

Returns:CSS entries for a page
Return type:List[HarEntry]
css_load_time

CSS load time

Returns:Load time for CSS on a page
Return type:int
css_size

Size of CSS files from the page

Returns:Size of CSS files on the page
Return type:int
css_size_trans

CSS transfer size

Returns:Size of transfer data for CSS
Return type:int
duplicate_url_request

Returns a dict of urls and its number of repetitions that are sent more than once

Returns:URLs and the amount of times they were duplicated
Return type:dict
entries
Returns:All entries that make up the page
Return type:List[HarEntry]
filter_entries(request_type: str = None, content_type: str = None, status_code: str = None, http_version: str = None, load_time__gt: int = None, regex: bool = True) → List[haralyzer.assets.HarEntry][source]

Generate a list of entries with from criteria

Parameters:
  • request_type (str) – The request type (i.e. - GET or POST)
  • content_type (str) – Regex to use for finding content type
  • status_code (str) – The desired status code
  • http_version (str) – HTTP version of request
  • load_time__gt (int) – Load time in milliseconds. If provided, an entry whose load time is less than this value will be excluded from the results.
  • regex (bool) – Whether to use regex or exact match.
Returns:

List of entry objects based on the filtered criteria.

Return type:

List[HarEntry]

get_load_time(request_type: str = None, content_type: str = None, status_code: str = None, asynchronous: bool = True, **kwargs) → int[source]

This method can return the TOTAL load time for the assets or the ACTUAL load time, the difference being that the actual load time takes asynchronous transactions into account. So, if you want the total load time, set asynchronous=False.

EXAMPLE:

I want to know the load time for images on a page that has two images, each of which took 2 seconds to download, but the browser downloaded them at the same time.

self.get_load_time(content_types=[‘image’]) (returns 2) self.get_load_time(content_types=[‘image’], asynchronous=False) (returns 4)

Parameters:
  • request_type (str) – The request type (i.e. - GET or POST)
  • content_type (str) – Regex to use for finding content type
  • status_code (str) – The desired status code
  • asynchronous (bool) – Whether to separate load times
Returns:

Total load time

Return type:

int

get_requests

Returns a list of GET requests, each of which is a HarEntry object

Returns:All GET requests
Return type:List[HarEntry]
static get_total_size(entries: List[HarEntry]) → int[source]

Returns the total size of a collection of entries.

Parameters:entrieslist of entries to calculate the total size of.
Returns:Total size of entries
Return type:int
static get_total_size_trans(entries: List[HarEntry]) → int[source]

Returns the total size of a collection of entries - transferred.

NOTE: use with har file generated with chrome-har-capturer

Parameters:entrieslist of entries to calculate the total size of.
Returns:Total size of entries that was transferred
Return type:int
hostname
Returns:Hostname of the initial request
Return type:str
html_files

All HTML files for a page

Returns:HTML entries for a page
Return type:List[HarEntry]
html_load_time

HTML load time

Returns:Load time for HTML on a page
Return type:int
image_files

All image files for a page

Returns:Image entries for a page
Return type:List[HarEntry]
image_load_time

Image load time

Returns:Load time for images on a page
Return type:int
image_size

Size of image files from the page

Returns:Size of image files on the page
Return type:int
image_size_trans

Image transfer size

Returns:Size of transfer data for images
Return type:int
initial_load_time

Initial load time

Returns:Initial load time of the page
Return type:int
js_files

All JS files for a page

Returns:JS entries for a page
Return type:List[HarEntry]
js_load_time

JS load time

Returns:Load time for JS on a page
Return type:int
js_size

Size of JS files from the page

Returns:Size of JS files on the page
Return type:int
js_size_trans

JS transfer size

Returns:Size of transfer data for JS
Return type:int
page_load_time

Load time of the page

Returns:Load time for the page
Return type:int
page_size

Size of the page

Returns:Size of the page
Return type:int
page_size_trans

Page transfer size

Returns:Size of transfer data for the page
Return type:int
post_requests

Returns a list of POST requests, each of which is an HarEntry object

Returns:All POST requests
Return type:List[HarEntry]
text_files

All text files for a page

Returns:Text entries for a page
Return type:List[HarEntry]
text_size

Size of text files from the page

Returns:Size of text files on the page
Return type:int
text_size_trans

Text transfer size

Returns:Size of transfer data for text
Return type:int
time_to_first_byte
Returns:Time to first byte of the page request in ms
Return type:int
url

The absolute URL of the initial request.

Returns:URL of first request
Return type:str
video_files

All video files for a page

Returns:Video entries for a page
Return type:List[HarEntry]
video_load_time

Video load time

Returns:Load time for video on a page
Return type:int
video_size

Size of video files from the page

Returns:Size of video files on the page
Return type:int
video_size_trans

Video transfer size

Returns:Size of transfer data for images
Return type:int
class haralyzer.assets.HarParser(har_data: dict = None)[source]

Bases: object

A Basic HAR parser that also adds helpful stuff for analyzing the performance of a web page.

browser

Browser of Har File

Returns:Browser of the Har File
Return type:str
static create_asset_timeline(asset_list: List[HarEntry]) → dict[source]

Returns a dict of the timeline for the requested assets. The key is a datetime object (down to the millisecond) of ANY time where at least one of the requested assets was loaded. The value is a list of ALL assets that were loading at that time.

Parameters:asset_list (List[HarEntry]) – The assets to create a timeline for.
Returns:Milliseconds and assets that were loaded
Return type:dict
creator

Creator of Har File. Usually the same as the browser but not always

Returns:Program that created the HarFile
Return type:str
hostname

Hostname of first page

Returns:Hostname of the first known page
Return type:str
static match_content_type(entry: haralyzer.assets.HarEntry, content_type: str, regex: bool = True) → bool[source]

Matches the content type of a request using the mimeType metadata.

Parameters:
  • entry (HarEntry) – Entry to analyze
  • content_type (str) – Regex to use for finding content type
  • regex (bool) – Whether to use regex or exact match.
Returns:

Mime type matches

Return type:

bool

static match_headers(entry: haralyzer.assets.HarEntry, header_type: str, header: str, value: str, regex: bool = True) → bool[source]

Function to match headers.

Since the output of headers might use different case, like:

‘content-type’ vs ‘Content-Type’

This function is case-insensitive

Parameters:
  • entry (HarEntry) – Entry to analyze
  • header_type (str) – Header type. Valid values: ‘request’, or ‘response’
  • header (str) – The header to search for
  • value (str) – The value to search for
  • regex (bool) – Whether to use regex or exact match
Returns:

Whether a match was found

Return type:

bool

static match_http_version(entry: haralyzer.assets.HarEntry, http_version: str, regex: bool = True) → bool[source]

Helper function that returns entries with a request type matching the given request_type argument.

Parameters:
  • entry (HarEntry) – Entry to analyze
  • http_version (str) – HTTP version type to match
  • regex (bool) – Whether to use a regex or string match
Returns:

HTTP version matches

Return type:

bool

static match_request_type(entry: haralyzer.assets.HarEntry, request_type: str, regex: bool = True) → bool[source]

Helper function that returns entries with a request type matching the given request_type argument.

Parameters:
  • entry (HarEntry) – Entry to analyze
  • request_type (str) – Request type to match
  • regex (bool) – Whether to use a regex or string match
Returns:

Request method matches

Return type:

bool

static match_status_code(entry: haralyzer.assets.HarEntry, status_code: str, regex: bool = True) → bool[source]

Helper function that returns entries with a status code matching then given status_code argument.

NOTE: This is doing a STRING comparison NOT NUMERICAL

Parameters:
  • entry (HarEntry) – Entry to analyze
  • status_code (str) – Status code to search for
  • regex (bool) – Whether to use a regex or string match
Returns:

Status code matches

Return type:

bool

pages

This is a list of HarPage objects, each of which represents a page from the HAR file.

Returns:HarPages in the file
Return type:List[HarPage]
version

HAR Version

Returns:Version of HAR used
Return type:str
haralyzer.assets.convert_to_entry(func)[source]

Wrapper function for converting dicts of entries to HarEnrty Objects

haralyzer.errors module

Custom exceptions for good ol haralyzer.

exception haralyzer.errors.PageNotFoundError[source]

Bases: AttributeError

Error raised in the Page is not found

haralyzer.http module

Creates the Request and Response sub class that are used by each entry

class haralyzer.http.Request(entry: dict)[source]

Bases: haralyzer.mixins.HttpTransaction

Request object for an HarEntry

accept
Returns:HTTP Accept header
Return type:str
bodySize
Returns:Body size of the request
Return type:int
cacheControl
Returns:HTTP CacheControl header
Return type:str
cookies
Returns:Cookies from the request
Return type:list
encoding
Returns:HTTP Accept-Encoding Header
Return type:str
headersSize
Returns:Headers size from the request
Return type:int
host
Returns:HTTP Host header
Return type:str
httpVersion
Returns:HTTP version used in the request
Return type:str
language
Returns:HTTP language header
Return type:str
method
Returns:HTTP method of the request
Return type:str
queryString
Returns:Query string from the request
Return type:list
url
Returns:URL of the request
Return type:str
userAgent
Returns:User Agent
Return type:str
class haralyzer.http.Response(url: str, entry: dict)[source]

Bases: haralyzer.mixins.HttpTransaction

Response object for a HarEntry

bodySize
Returns:Body Size
Return type:int
cacheControl
Returns:Cache Control Header
Return type:str
contentSecurityPolicy
Returns:Content Security Policy Header
Return type:str
contentSize
Returns:Content Size
Return type:int
contentType
Returns:Content Type
Return type:str
date
Returns:Date of response
Return type:str
headersSize
Returns:Header size
Return type:int
httpVersion
Returns:HTTP Version
Return type:str
lastModified
Returns:Last modified time
Return type:str
mimeType
Returns:Mime Type of response
Return type:str
redirectURL
Returns:Redirect URL
Return type:Optional[str]
status
Returns:HTTP Status
Return type:int
statusText
Returns:HTTP Status Text
Return type:str
text
Returns:Response body
Return type:str

haralyzer.mixins module

Mixin Objects that allow for shared methods

class haralyzer.mixins.GetHeaders[source]

Bases: object

Mixin to get a header

get_header_value(name: str) → Optional[str][source]

Returns the header value of the header defined in name

Parameters:name (str) – Name of the header to get the value of
Returns:Value of the header
Return type:Optional[str]
class haralyzer.mixins.HttpTransaction(entry: dict)[source]

Bases: haralyzer.mixins.GetHeaders, haralyzer.mixins.MimicDict

Class the represents a request or response

headers

Headers from the entry

Returns:Headers from both request and response
Return type:list
class haralyzer.mixins.MimicDict[source]

Bases: collections.abc.MutableMapping

Mixin for functions to mimic a dictionary for backward compatibility

haralyzer.multihar module

Contains the mutlihar parse object

class haralyzer.multihar.MultiHarParser(har_data, page_id=None, decimal_precision=0)[source]

Bases: object

An object that represents multiple HAR files OF THE SAME CONTENT. It is used to gather overall statistical data in situations where you have multiple runs against the same web asset, which is common in performance testing.

asset_types

Mimic the asset types stored in HarPage

Returns:Asset types from HarPage
Return type:dict
audio_load_time
Returns:Aggregate audio load time for all pages. Can be an int or float depending on the self.decimal_precision
Return type:int, float
css_load_time
Returns:Aggregate css load time for all pages. Can be an int or float depending on the self.decimal_precision
Return type:int, float
get_load_times(asset_type: str) → list[source]

Just a list of the load times of a certain asset type for each page

Parameters:asset_type (str) – The asset type to return load times for
Returns:List of load times
Return type:list
get_stdev(asset_type: str) → Union[int, float][source]

Returns the standard deviation for a set of a certain asset type.

Parameters:asset_type (str) – The asset type to calculate standard deviation for.
Returns:Standard deviation, which can be an int or float depending on the self.decimal_precision
Return type:int, float
html_load_time
Returns:Aggregate html load time for all pages. Can be an int or float depending on the self.decimal_precision
Return type:int, float
image_load_time
Returns:Aggregate image load time for all pages. Can be an int or float depending on the self.decimal_precision
Return type:int, float
js_load_time
Returns:Aggregate javascript load time. Can be an int or float depending on the self.decimal_precision
Return type:int, float
page_load_time
Returns:Average total load time for all runs (not weighted). Can be an int or float depending on the self.decimal_precision
Return type:int, float
pages

Aggregate pages of all the parser objects.

Returns:All the pages from parsers
Return type:List[haralyzer.assets.HarPage]
time_to_first_byte
Returns:The aggregate time to first byte for all pages. Can be an int or float depending on the self.decimal_precision
Return type:int, float
video_load_time
Returns:Aggregate video load time for all pages. Can be an int or float depending on the self.decimal_precision
Return type:int, float