Find Jobs
Hire Freelancers

Need PDFBox expert to help extract text from pdfs with coordinates and a flag what part of text is visible

$250-750 USD

Avslutat
Publicerad över sex år sedan

$250-750 USD

Betalning vid leverans
I am looking for help understanding the PDFBox library. Please apply only if you already worked with PDFBox or iText or other PDF software. What we need: Utility/jar/class we can call from our java WebApp which is running on Linux server (this may affect non-java solutions) under Tomcat with Java 8. Problem: we need to extract text from searchable PDF (not scanned) and preserve text positions - so ideally lib should return words/tokens with x/y start/end positions as well as start/end coordinates of vertical and horizontal line separators. We need to get only the text a user can see; or if we get full text, we need a clear understanding what part of text is visible to the end-user and what part of text is not-visible. Attached is an example of a pdf file that has hidden text. We tried Apache PDFBox, however, default PDFTextStripper handles only simple cases, when all extracted text is visible on screen. There are attached files where text is partially invisible because of PDF clipping/filling paths, so to track it, you need manually process PDF instructions and calculate if character is not covered/overlapped by another element, like image, other filled field etc. So we would like to get only the text a user can see; or if we get full text, we need a clear understanding what part of text is visible to the end-user and what part of text is not-visible. There are some others tools could be used, like iText, Tika, but looks like they are built on top of PDFBox. Also we considered using Acrobat SDK but we are not familiar with it.
Project ID: 15915061

Om projektet

6 anbud
Distansprojekt
Senaste aktivitet sex år sedan

Ute efter att tjäna lite pengar?

Fördelar med att lägga anbud hos Freelancer

Ange budget och tidsram
Få betalt för ditt arbete
Beskriv ditt förslag
Det är gratis att registrera sig och att lägga anbud på uppdrag
6 frilansar lägger i genomsnitt anbud på $517 USD för detta uppdrag
Använd avatar
Greeting, I have understood your Need PDFBox expert to help extract text from pdfs with coordinates and a flag what part of text is visible task and can do it with your 100% satisfaction. Please ping me for more discussion. I have more than 5 years of experience in Java, PDF
$500 USD Om 6 dagar
5,0 (21 omdömen)
5,0
5,0
Använd avatar
Hi, I have huge experience in PDFbox & iText PDF library, i reviewed your requirement for extracting text from PDF and it's position is looking good to me as it's searchable PDF so we can get the text easily, for getting position of text in Page i can get the X & Y coordinates of the text in that page. I don't think need to use the Adobe SDK, PDFBox, itext is enough for this task. If you want i know another library called tableu which will handle this. If you you have time can we connect on chat so i can ask you few question to get my understanding clear and make sure we both are on the dame page. Thanks,
$480 USD Om 10 dagar
4,7 (14 omdömen)
4,9
4,9
Använd avatar
Hey man , I have worked on PDF box library, I have seen your document and I can try to do it, if interested, message men Thanks
$690 USD Om 10 dagar
5,0 (10 omdömen)
4,2
4,2
Använd avatar
I am an IITK graduate and I have 11 years of experience in software development. I have 100% completion rate and I have finished projects with the highest level of customer satisfaction. I have a team of rock star developers, who are working with top product companies and contribute to these projects as part time gig.
$555 USD Om 10 dagar
3,8 (20 omdömen)
5,4
5,4
Använd avatar
Hello Sir/Mam Relevant Skills and Experience: Please send us all details and we will do the job now if possible...and we are always ready to take any challenge + we have an adobe lab too Proposed Milestones: 475 - (ProjectTitile) For any query please consult our profile on https://www.freelancer.com/u/benni25.html
$475 USD Om 1 dag
4,9 (5 omdömen)
3,1
3,1

Om kunden

Flagga för UNITED STATES
United States
0,0
0
Medlem sedan dec. 20, 2017

Kundverifikation

Tack! Vi har skickat en länk för aktivering av gratis kredit.
Något gick fel med ditt e-postmeddelande. Vänligen försök igen.
Registrerade Användare Totalt antal jobb publicerade
Freelancer ® is a registered Trademark of Freelancer Technology Pty Limited (ACN 142 189 759)
Copyright © 2024 Freelancer Technology Pty Limited (ACN 142 189 759)
Laddar förhandsgranskning
Tillstånd beviljat för geolokalisering.
Din inloggningssession har löpt ut och du har blivit utloggad. Logga in igen.