Simple python script that inserts data points read from a csv file into a influxdb database.

FloKra 46bdad14f6 2020-10-17 by Flo Kra 1 year ago
.gitignore 21e3e36ddb Fix bug where pytz won't let you localize a date that's already localized 2 years ago
CHANGELOG.md 46bdad14f6 2020-10-17 by Flo Kra 1 year ago
LICENSE 50792e74c1 Batch insert 4 years ago
README.md 46bdad14f6 2020-10-17 by Flo Kra 1 year ago
csv-to-influxdb.py 46bdad14f6 2020-10-17 by Flo Kra 1 year ago

README.md

csv-to-influxdb-ext

Simple python script that inserts data points read from a csv file into a influxdb database.

To create a new database, specify the parameter --create. This will drop any database with a name equal to the one supplied with --dbname.

Changes/Improvements compared to the original version:

  • Improved guessing of data types from the data found in the first row.

    • number without . --> integer
    • number that contains a . --> float
    • string that is "true" or "false" --> bool Unlike in the original version, these data types are then used for the entire file for this column. If a value in another row does not fit (i.e. is empty, is a string like "NaN" or "") it will be skipped. Unlike the original code, this does not stop the rest of the import. If a row does not contain any valid data it is entirely skipped and not written to database.
  • specify the data types on the commandline

  • Possibility to have a look on what´s going on before actually write anything to the database, by simply specifying --dryrun.

  • show data processed on the console

  • Possibility to specify a target retention policy rather than only the database name

Usage

usage: csv-to-influxdb.py [-h] -i [INPUT] [-d [DELIMITER]] [-s [SERVER]]
                          [--ssl] [-u [USER]] [-p [PASSWORD]] --dbname
                          [DBNAME] [--create] [-m [METRICNAME]]
                          [-tc [TIMECOLUMN]] [-tf [TIMEFORMAT]] [-tz TIMEZONE]
                          [--fieldcolumns [FIELDCOLUMNS]]
                          [--tagcolumns [TAGCOLUMNS]] [-g] [-b BATCHSIZE]

Csv to influxdb.

optional arguments:
  -h, --help            show this help message and exit
  -i [INPUT], --input [INPUT]
                        Input csv file.
  -d [DELIMITER], --delimiter [DELIMITER]
                        Csv delimiter. Default: ','.
  -s [SERVER], --server [SERVER]
                        Server address. Default: localhost:8086
  --ssl                 Use HTTPS instead of HTTP.
  -u [USER], --user [USER]
                        User name.
  -p [PASSWORD], --password [PASSWORD]
                        Password.
  --dbname [DBNAME]     Database name. Specify target Retention Policy: [DBNAME].[RPNAME]
  --create              Drop database and create a new one.
  -m [METRICNAME], --metricname [METRICNAME]
                        Metric column name. Default: value
  -tc [TIMECOLUMN], --timecolumn [TIMECOLUMN]
                        Timestamp column name. Default: timestamp.
  -tf [TIMEFORMAT], --timeformat [TIMEFORMAT]
                        Timestamp format. Default: '%Y-%m-%d %H:%M:%S' e.g.:
                        1970-01-01 00:00:00
  -tz TIMEZONE, --timezone TIMEZONE
                        Timezone of supplied data. Default: UTC
  -tp, --tspass         Pass the timestamp from CSV directly to InfluxDB (do no conversion).
                        Use only if the format is compatible to InfluxDB.
  --fieldcolumns [FIELDCOLUMNS]
                        List of csv columns to use as fields, separated by
                        comma, e.g.: value1,value2. Default: value
  --datatypes           Force specify data types for fields specified in --fieldcolumns: 
                        i.e. value1=int,value2=float,value3=bool,name=str ...
                        Valid types: int, float, str, bool
  --tagcolumns [TAGCOLUMNS]
                        List of csv columns to use as tags, separated by
                        comma, e.g.: host,data_center. Default: host
  -g, --gzip            Compress before sending to influxdb.
  -b BATCHSIZE, --batchsize BATCHSIZE
                        Batch size. Default: 5000.
  --showdata            Print detailed information to the console what will be done with the data (or is intended to, when using --dryrun).
  --dryrun              Do not change anything in the DB. Also enables --showdata.

Examples

1. Considering the csv file:


timestamp,value,computer
1970-01-01 00:00:00,51.374894,0
1970-01-01 00:00:01,74.562764,1
1970-01-01 00:00:02,17.833757,2
1970-01-01 00:00:03,40.125102,0
1970-01-01 00:00:04,88.160817,1
1970-01-01 00:00:05,28.401695,2
1970-01-01 00:00:06,98.670792,3
1970-01-01 00:00:07,69.532011,0
1970-01-01 00:00:08,39.198964,0

The following command will insert the file into a influxdb database:


python csv-to-influxdb.py --dbname test --input data.csv --tagcolumns computer --fieldcolumns value

2. Another example:


timestamp,temperature,humidity,sensor
1970-01-01 00:00:00,17.2,55,garden
1970-01-01 00:00:01,17.3,56,garden
1970-01-01 00:00:02,17.1,57,garden
1970-01-01 00:00:03,16.9,55,garden
1970-01-01 00:00:04,16.7,53,garden
1970-01-01 00:00:05,16.8,52,garden
1970-01-01 00:00:06,17.0,55,garden
1970-01-01 00:00:07,17.1,57,garden
1970-01-01 00:00:08,17.2,60,garden

Command:

python csv-to-influxdb.py --dbname test --input data.csv --tagcolumns sensor --fieldcolumns temperature,humidity --datatypes temperature=float,humidity=int

Where --datatypes cam be omitted if they are clearly to identify in the first data row.

3. Importing historic aggregated data to a different Retention Policy named "daily":


timestamp,temp_avg,hum_avg,sensor
2020-06-01 00:00:00,17.2,55,garden
2020-06-02 00:00:00,17.3,56,garden
2020-06-03 00:00:00,17.1,57,garden
2020-06-04 00:00:00,16.9,55,garden
2020-06-05 00:00:00,16.7,53,garden
2020-06-06 00:00:00,16.8,52,garden
2020-06-07 00:00:00,17.0,55,garden
2020-06-08 00:00:00,17.1,57,garden
2020-06-09 00:00:00,17.2,60,garden

Command:

python csv-to-influxdb.py --dbname test.daily --input data.csv --tagcolumns sensor --fieldcolumns temp_avg,hum_avg