Create a Python-Based Elasticsearch Input Pipeline
$30-250 USD
Betalades vid leverans
The script will be run against a directory of zipped CSV (tab-delimited) files. The steps of the script are as follows:
For every file:
Unzip the file
Ignore the data labels in the text file. A separate set of labels will be provided (same number of columns, just cleaned up, spaces removed, etc...)
Validate the input file to ensure it contains the appropriate number of columns
For each record contained within each file:
* Generate a UUID that will later be used as the record's unique identifier
* Transform the record into valid JSON
* PUT the json into Elasticsearch using Index API
* Ensure that the PUT is successful and the record was written
* Attempt to resubmit records that fail
* Keep a log of submissions containing UUID, Submit Status (Success/Fail), FIPS/APN, and Date/Time of the submission
* Keep track of the total numbers of successful and failed attempts
Once a file is complete:
- write the log file back to the directory the source files came from
- delete the uncompressed version of the source file
We'll be importing close to 20m records. It would be ideal if we could thread these operations - but we have to be careful to not inundate Elasticsearch. We'd have to make sure it can keep up so submissions do not fail.
Looking for a price of $225 on this initial work.
Projekt-id: #20079217
About the project
3 frilansare har lagt bud på i genomsnitt $132 för det här jobbet
Hello Sir/ Ma’am We are a group of Software Engineers (Programmers) having 10+ years of experience. Expert in JAVA, C, C++ , C# , Python, ANDROID,IOS, MATLAB, IONIC. Done 40+ projects here on FREELANCER.COM. Mer
Hi there, I am Python developer, having below given skills: Engineering professional with 10 years of experience in Software development. Mastering/Leading in the development of applications/tools using Python for 6 Mer