niklib.data package

Submodules

niklib.data.constant module

niklib.data.constant.EXAMPLE_FINANCIAL_RATIOS = {'deposit2rent': 0.03, 'deposit2worth': 5.0, 'income2tax': 0.15, 'income2worth': 15.0, 'rent2deposit': 33.333333333333336, 'tax2income': 6.666666666666667, 'worth2deposit': 0.2, 'worth2income': 0.06666666666666667}

Ratios used to convert rent, deposit, and total worth to each other

Note

This is part of dictionaries containing factors in used in heuristic calculations using domain knowledge.

Info:

Although this is created as an code example, values chosen here are from basic rule of thump and actually can be used if no other reliable information is available.

class niklib.data.constant.ExampleFillna(value)[source]

Bases: CustomNamingEnum

Values used to fill None s depending on the form structure

Members follow the <field_name>_<form_name> naming convention. The value has been extracted by manually inspecting the documents. Hence, for each form, user must find and set this value manually.

Note

We do not use any heuristics here, we just follow what form used and only add another option which should be used as None state; i.e. None as a separate feature in categorical mode.

CHD_M_STATUS_5645E = 9
class niklib.data.constant.ExampleDocTypes(value)[source]

Bases: CustomNamingEnum

Contains all document types which can be used to customize ETL steps for each document type

Members follow the <country_name>_<document_type> naming convention. The value and its order are meaningless.

CANADA = 1
CANADA_5257E = 2
CANADA_5645E = 3
CANADA_LABEL = 4
class niklib.data.constant.ExampleMarriageStatus(value)[source]

Bases: CustomNamingEnum

States of marriage in (some specific) form

Note

Values for the members are the values used in original forms. Hence, it should not be modified by any means as it is tied to dataset, transformation, and other domain-specific values.

Info:

These values have been chosen for demonstration purposes in this class and and do not carry any meaning or information (El No Sabe). But for real world, you must use meaningful ones.

COMMON_LAW = 69
DIVORCED = 3
SEPARATED = 4
MARRIED = 0
SINGLE = 7
WIDOWED = 85
UNKNOWN = 9
class niklib.data.constant.ExampleSex(value)[source]

Bases: CustomNamingEnum

Sex types in general

Note

The values of enum members are not important, hence no explicit valuing is used

Info:

The name of the members has to be customized because of bad preprocessing (or in some cases, domain-specific knowledge), hence, name has been overridden.

FEMALE = 1
MALE = 2

niklib.data.functional module

niklib.data.logic module

niklib.data.pdf module

niklib.data.preprocessor module

Module contents