Web scrap project

Pågående Publicerat 5 år sedan Betalades vid leverans
Pågående Betalades vid leverans

script language: PHP

front end: html/javascript

database: mysql

NO FRAME WORKS

Table Structure:

ALL TABLES:

id (auto insert)

createddate (auto insert)

modifieddate (auto update)

Table A (entities)

name varchar

state (varchar) 2 letter us state code

type varchar(city, county, school district, university, college)

url varchar

table B (url)

datasource (url or query where the data came from)

url varchar

googleposition

maxlayers (defaults 2)

statusname (values would include "found match", "no match")

table C (curl data)

url varchar

retrievedhtml largetext

match varchar

statusname (values would include "undefined" (default), "good", "bad", "review")

The 1st script will:

1 parse the LEA_NAME column for unique values for "school district" names from here - [login to view URL], get the state, school district name, & url. 25,000 results

2 parse the "county names" from here - [login to view URL], get state name, convert it to 2 letter code, 3,098 records

3 parse the "city names" from here - [login to view URL] grab city name, usps (state). 29,000 records.

4 parse the "US college/university" from here - [login to view URL] - & grab college/university name, & url. 2,073 records.

5 populate table A with the name, type, state code (2 digit) while skipping duplicate. convert the state name to 2 digit.

The 1st script will be a one time script, run from linux cli.

The 2nd script will:

1 Loop through table A, & attempt to find the url that matches with a google search, if one was not present from the datasource. The logic must skip certain false positives such as a domain with the word "weather" or "census" or "zillow" or "google" in it or url with ".jpg" or ".asp"

2 populate the record in table B, with :

datasource = the url of the data source above

url = url (skip duplicate)

statusname = null

googleposition = 1-20 (first page of google results only)

The 2nd script will return 35,000 - 200,000 results.

The 2nd script will run periodically from linux cli, on a crontab, & will be rerun, in the future, when additional excemptions are added.

The 2nd script should be multi threaded, & should cap out above a 100mb/second connection

The 3rd php script will:

1 Loop through table B, use curl to retrieve the web page

2 Loop through each of the child pages, for the value in the maxlayers column

3 Look for a particular pattern of text, including a case insensitive search for "bids" "request for proposal" "rfp" "rfq" "request for bids" "proposals"

4 Compare the curl returned html against the keywords

if there is match - insert a record into table C with the url (skip non unique url), the retrieved html, what keyword caused a match, & update table B statusname to "found match"

if there is no match, updated table B maxlayers count upward 1, & updated the statusname to "no match"

Each record from table B may have multiple records in table C

The 3rd script will run be run periodically from the linux cli

The 3rd script should be multi threaded, & should cap out above a 100mb/second connection

The 4th script will be ANDROID PHONE FRIENDLY:

1 Define an sql query which should return the top 10 selection from table C, sorted by modifieddate ASC, WHERE type != "bad" & != "good"

2 Provide a simple html table view front end to review each of the url, which should have columns for all values from table B.

3 An additional column will show a update status button, which when pressed shows the values (as buttons) from table C, which when pressed, update record in table C

The 4th script view is intended for a quality check employee to review all results from, & log if url matches our ultimate criteria or not.

MySQL PHP Webbskrapning

Projekt-id: #18227988

About the project

15 offerter Distansprojekt Aktivt 5 år sedan

15 frilansare har lagt bud på i genomsnitt $550 för det här jobbet

mingxiao2008

Hello, Dear How are you? I have check your project description and am ready for discussing with you about project for now. I have experienced in PHHP and WebScraping , MySQL. I will work very hard and best for y Mer

$500 USD inom 10 dagar
(81 omdömen)
8.1
schoudhary1553

Hello Sir, I am the expert freelancer here. I am on the 6th position through out the world to deliver the quality job. I have deliver here more than 400 + projects with 100% client satisfaction. I have more than 5 Mer

$600 USD inom 10 dagar
(102 omdömen)
7.1
bestit4u

Hi. I am very interested in your project, because I have much experience in such projects. I have good skills with the program language including C/C++, C#, java, php, asp.net, python, VB.NET. So I have expert and s Mer

$555 USD inom 10 dagar
(114 omdömen)
7.1
Mickelson

Hi Nice to meet you. I'm scraping expert. My past works: Youtube comment scrapping Real estate property list to csv Job-site content to csv And scrap posts from facebook, twitter, instagram using scrapy. In add Mer

$500 USD inom 10 dagar
(119 omdömen)
6.9
lightingdavid

Hey? How are you? I have reviewed "Web scrap project" .I have good skills for these (MySQL, PHP, Web Scraping). I have been working for 7 yrs in this scope. While we contract and work in our jobs, I will get paid o Mer

$500 USD inom 10 dagar
(145 omdömen)
6.3
saad2038

Hi, I can help you to writes script that parse the pages and save the data in database based on conditions and rules that you describes in the project description. I've read the description carefully that we need to wr Mer

$1000 USD inom 10 dagar
(55 omdömen)
6.3
naishodayo

How are you today? I am a super expert in this area. If you contact me, I can show you my past work too. Please contact me. Thank you.

$555 USD inom 10 dagar
(4 omdömen)
4.8
extravagantweb

I can scrape any data you require. Please contact me and we can discuss getting started, I'm eager to begin working for you.

$333 USD inom 10 dagar
(6 omdömen)
4.3
reosoftwares6

"Hi, Hope you are doing well! Thanks for sharing your project requirement with us. It will be our great pleasure to work on your project. I have checked your requirement, yes we can do it, because we already work on si Mer

$616 USD inom 7 dagar
(0 omdömen)
0.0