Find Jobs
Hire Freelancers

Project for Dmitry A. -- phase 4 - (150 EUR)

€30-250 EUR

Slutfört
Publicerad ungefär sex år sedan

€30-250 EUR

Betalning vid leverans
a) Download and convert pdf files to txt from [login to view URL] b) The program should iterate through a list of ID numbers and download, convert and store the corresponding file as separate files named OS+"ID number". c) The files to be converted are "official statements" from municipal bond issues, accessible at [login to view URL] d) Follow the next steps to download the statements: Step 1: Append the ID number to the end of the following link [login to view URL] Example: [login to view URL] Step 2: Go to the second tab, "official statement", located at the grey bar below the "Issue details" section. e) Program requisites: i) The program needs to be fast. I need to convert hundreds of thousands of documents. ii) Storage space is very important. File sizes should be as small as possible. The files contain only american english characters. Images and maps are not important. iii) The program should have the option to establish the maximum number of pages to be stored. The default should be to store every page. iv) The program should handle well the following cases: CASE 1: when there is no pdf to download. [login to view URL] CASE 2: when the whole pdf is an image. It is the typical case for "old" issues. Do whatever you can here, but don't waste your time if it is not possible to store the text. [login to view URL] CASE 3: when the pdf is not an image, and you can Ctrl+copy/Ctrl+paste it directly to any text program. [login to view URL] CASE 4: when the pdf is not an image, but you CANNOT Ctrl+copy/Ctrl+paste it directly to a text program. [login to view URL] CASE 5: TWO pdf files: Sub-case 5.1: one is the official statement and the other one is the preliminary official statement. Convert and store ONLY the official statement and NOT the "preliminary" one. [login to view URL] Sub-case 5.2: two "official statement posted...", i.e. the first three words of the file names are the same. THIS is what you should do: 1. Append the texts of the official statements if the size difference between files is more than 10% wrt the larger size one, AND the posted date difference is not more than 1 year. Examples: [login to view URL] [login to view URL] [login to view URL] [login to view URL] 2. OTHERWISE, only keep the most recently posted file. Examples: [login to view URL] [login to view URL] Sub-case 5.3: when the second file is neither another "official statement posted..." nor a preliminary one. Always ignore those files for which the first three words are not "official statement posted...", unless it says any synonym of "amendment" or "supplement" (I didn't find an example), in which in that case, you should proceed as in sub-case 5.2. Otherwise, always disregard them. [login to view URL] CASE 6: when there are more than 2 files. In this case proceed by combining the sub-cases 5.1 to 5.3. I have provided some examples with 3 files in the sub-case 5.2 section.
Project ID: 16161560

Om projektet

4 anbud
Distansprojekt
Senaste aktivitet sex år sedan

Ute efter att tjäna lite pengar?

Fördelar med att lägga anbud hos Freelancer

Ange budget och tidsram
Få betalt för ditt arbete
Beskriv ditt förslag
Det är gratis att registrera sig och att lägga anbud på uppdrag
Tilldelad till:
Använd avatar
A proposal has not yet been provided
€150 EUR Om 4 dagar
5,0 (23 omdömen)
5,3
5,3

Om kunden

Flagga för SPAIN
Segovia, Spain
5,0
2
Verifierad betalningsmetod
Medlem sedan maj 4, 2017

Kundverifikation

Andra uppdrag från denna kund

Web Scraping
€30-250 EUR
Tack! Vi har skickat en länk för aktivering av gratis kredit.
Något gick fel med ditt e-postmeddelande. Vänligen försök igen.
Registrerade Användare Totalt antal jobb publicerade
Freelancer ® is a registered Trademark of Freelancer Technology Pty Limited (ACN 142 189 759)
Copyright © 2024 Freelancer Technology Pty Limited (ACN 142 189 759)
Laddar förhandsgranskning
Tillstånd beviljat för geolokalisering.
Din inloggningssession har löpt ut och du har blivit utloggad. Logga in igen.