PawnScraper - SyS - 2019-04-14
PawnScraper
Installing
Thanks to Southclaws,plugin installation is now much easier with sampctl
PHP Code: sampctl p install Sreyas-Sreelal/pawn-scraper
OR
- Download suitable binary files from releases for your operating system
- Add it your plugins folder
- Add PawnScraper to server.cfg or PawnScraper.so (for linux)
- Add pawnscraper.inc in includes folder
Building
- Clone the repo
PHP Code: git clone https://github.com/Sreyas-Sreelal/pawn-scraper.git
- Compile the plugin using nightly compiler
- Windows
PHP Code: cargo +nightly-i686-pc-windows-msvc build --release
- Linux
PHP Code: cargo +nightly-i686-unknown-linux-gnu build --release
API
- ParseHtmlDocument(document[])]
- Params
- document[] - string of html document
- Returns
- Html document instance id
- if failed to parse document INVALID_HTML_DOC is returned
- Example Usage
PHP Code: new Html:doc = ParseHtmlDocument("\ <!DOCTYPE html>\ <meta charset=\"utf-8\">\ <title>Hello, world!</title>\ <h1 class=\"foo\">Hello, <i>world!</i></h1>\ "); ASSERT(doc != INVALID_HTML_DOC); DeleteHtml(doc);
- ResponseParseHtml(Response:id)
- Params
- id - Http response id returned from HttpGet
- Returns
- Html document instance id
- if failed to parse document INVALID_HTML_DOC is returned
- Example Usage
PHP Code: new Response:response = HttpGet("https://www.sa-mp.com"); new Html:doc = ResponseParseHtml(response); ASSERT(doc != INVALID_HTML_DOC); DeleteHtml(doc);
- HttpGet(url[],Header:headerid=INVALID_HEADER)
- Params
- url[] - Url of a website
- header - id of header object created using CreateHeader
- Returns
- Response id if successful
- if failed to INVALID_HTTP_RESPONSE is returned
- Example Usage
PHP Code: new Response:response = HttpGet("https://www.sa-mp.com"); ASSERT(response != INVALID_HTTP_RESPONSE); DeleteResponse(response);
- HttpGetThreaded(playerid,callback[],url[],Header:headerid=INVALID_HEADER)
- Params
- playerid - id of the player
- callback[] - name of the callback function to handle the response.
- url[] - Url of a website
- header - id of header object created using CreateHeader
- Example Usage
PHP Code: HttpGetThreaded(0,"MyHandler","https://sa-mp.com"); //******** forward MyHandler(playerid,Response:responseid); public MyHandler(playerid,Response:responseid){ ASSERT(responseid != INVALID_HTTP_RESPONSE); DeleteResponse(responseid); }
- ParseSelector(string[])
- Params
- Returns
- Selector instance id if successful
- if failed to INVALID_SELECTOR is returned
- Example Usage
PHP Code: new Selector:selector = ParseSelector("h1 .foo"); ASSERT(selector != INVALID_SELECTOR); DeleteSelector(selector);
- CreateHeader(…)
- Params
- key,value pairs of String type
- Returns
- Header instance id if successful
- if failed to INVALID_HEADER is returned
- Example Usage
PHP Code: new Header:header = CreateHeader( "User-Agent","Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36" ); ASSERT(header != INVALID_HEADER); new Response:response = HttpGet("https://sa-mp.com/",header); ASSERT(response != INVALID_HTTP_RESPONSE); ASSERT(DeleteHeader(header) == 1);
- GetNthElementName(Html:docid,Selector:selectorid,idx,string[],size = sizeof(string))
- Params
- docid - Html instance id
- selectorid - CSS selector instance id
- idx - the n’th occurence of element in the document (starts from 0)
- string[] - element name is stored
- size - sizeof string
- Returns
- 1 if successful
- 0 if failed
- Example Usage
PHP Code: new Html:doc = ParseHtmlDocument("\ <!DOCTYPE html>\ <meta charset=\"utf-8\">\ <title>Hello, world!</title>\ <h1 class=\"foo\">Hello, <i>world!</i></h1>\ "); ASSERT(doc != INVALID_HTML_DOC);
new Selector:selector = ParseSelector("i"); ASSERT(selector != INVALID_SELECTOR);
new i= -1,element_name[10]; while(GetNthElementName(doc,selector,++i,element_name)!=0){ ASSERT(strcmp(element_name,"i") == 0); }
DeleteSelector(selector); DeleteHtml(doc);
- GetNthElementText(Html:docid,Selector:selectorid,idx,string[],size = sizeof(string))
- Params
- docid - Html instance id
- selectorid - CSS selector instance id
- idx - the n’th occurence of element in the document (starts from 0)
- string[] - element name
- size - sizeof string
- Returns
- 1 if successful
- 0 if failed
- Example Usage
PHP Code: new Html:doc = ParseHtmlDocument("\ <!DOCTYPE html>\ <meta charset=\"utf-8\">\ <title>Hello, world!</title>\ <h1 class=\"foo\">Hello, <i>world!</i></h1>\ "); ASSERT(doc != INVALID_HTML_DOC);
new Selector:selector = ParseSelector("h1.foo"); ASSERT(selector != INVALID_SELECTOR);
new element_text[20]; ASSERT(GetNthElementText(doc,selector,0,element_text) == 1);
new check = strcmp(element_text,("Hello, world!")); ASSERT(check == 0);
DeleteSelector(selector); DeleteHtml(doc);
- GetNthElementAttrVal(Html:docid,Selector:selectorid,idx,attribute[],string[],size = sizeof(string))
- Params
- docid - Html instance id
- selectorid - CSS selector instance id
- idx - the n’th occurence of element in the document (starts from 0)
- attribute[] - the attribute of element
- string[] - element name
- size - sizeof string
- Returns
- 1 if successful
- 0 if failed
- Example Usage
PHP Code: new Html:doc = ParseHtmlDocument("\ <!DOCTYPE html>\ <meta charset=\"utf-8\">\ <title>Hello, world!</title>\ <h1 class=\"foo\">Hello, <i>world!</i></h1>\ "); ASSERT(doc != INVALID_HTML_DOC);
new Selector:selector = ParseSelector("h1"); ASSERT(selector != INVALID_SELECTOR);
new element_attribute[20]; ASSERT(GetNthElementAttrVal(doc,selector,0,"class",element_attribute) == 1);
new check = strcmp(element_attribute,("foo")); ASSERT(check == 0);
DeleteSelector(selector); DeleteHtml(doc);
- DeleteHtml(Html:id)
- Params
- id - html instance to be deleted
- Returns
- 1 if successful
- 0 if failed
- DeleteSelector(Selector:id)
- Params
- id - selector instance to be deleted
- Returns
- 1 if successful
- 0 if failed
- DeleteResponse(Html:id)
- Params
- id - response instance to be deleted
- Returns
- 1 if successful
- 0 if failed
- DeleteHeader(Header:id)
- Params
- id - header instance to be deleted
- Returns
- 1 if successful
- 0 if failed
Example Usage
A small example to fetch all links in wiki.sa-mp.com
PHP Code: new Response:response = HttpGet("https://wiki.sa-mp.com"); if(response == INVALID_HTTP_RESPONSE){ printf("HTTP ERROR"); return; }
new Html:html = ResponseParseHtml(response); if(html == INVALID_HTML_DOC){ DeleteResponse(response); return; }
new Selector:selector = ParseSelector("a"); if(selector == INVALID_SELECTOR){ DeleteResponse(response); DeleteHtml(html); return; }
new str[500],i; while(GetNthElementAttrVal(html,selector,i,"href",str)){ printf("%s",str); ++i; } //delete created objects after the usage.. DeleteHtml(html); DeleteResponse(response); DeleteSelector(selector);
The same above with threaded http call would be
PHP Code: HttpGetThreaded(0,"MyHandler","https://wiki.sa-mp.com"); //... forward MyHandler(playerid,Response:responseid); public MyHandler(playerid,Response:responseid){ if(responseid == INVALID_HTTP_RESPONSE){ printf("HTTP ERROR"); return 0; }
new Html:html = ResponseParseHtml(responseid); if(html == INVALID_HTML_DOC){ DeleteResponse(response); return 0; }
new Selector:selector = ParseSelector("a"); if(selector == INVALID_SELECTOR){ DeleteResponse(response); DeleteHtml(html); return 0; }
new str[500],i; while(GetNthElementAttrVal(html,selector,i,"href",str)){ printf("%s",str); ++i; }
DeleteHtml(html); Delete(response); DeleteSelector(selector); return 1; }
More examples can be found in examples
Repository
https://github.com/Sreyas-Sreelal/pawn-scraper
Note
The plugin is in primary stage and more tests and features needed to be added.I’m open to any kind of contribution, just open a pull request if you have anything to improve or add new features.
Special thanks
|