Burgershot
  • Home
  • Members
  • Team
  • Help
  • Search
  • Register
  • Login
  • Home
  • Members
  • Help
  • Search
Burgershot SA-MP Releases Plugins [Plugin] PawnScraper

 
  • 1 Vote(s) - 5 Average
Plugin PawnScraper
SyS
Offline

Burgershot Member
Posts: 37
Threads: 5
Joined: Apr 2019
Reputation: 4
#1
2019-04-14, 06:46 AM (This post was last modified: 2019-04-15, 03:47 AM by Kar.)
PawnScraper


[Image: pawn-scraper.svg?branch=master] [Image: 5rq55kukvy8xymly?svg=true] [Image: sampctl-PawnScraper-2f2f2f.svg] [Image: pawn-scraper.svg] [Image: pawn-scraper.svg] [Image: pawn-scraper.svg]

A powerful scraper plugin that provides interface for utlising html_parsers and css selectors in pawn.


Installing

Thanks to Southclaws,plugin installation is now much easier with sampctl

PHP Code:
sampctl p install Sreyas-Sreelal/pawn-scraper 

OR
  • Download suitable binary files from releases for your operating system
  • Add it your plugins folder
  • Add PawnScraper to server.cfg or  PawnScraper.so (for linux)
  • Add pawnscraper.inc in includes folder


Building
  • Clone the repo

    PHP Code:
    git clone https://github.com/Sreyas-Sreelal/pawn-scraper.git 

  • Compile the plugin using nightly compiler
    • Windows
      PHP Code:
      cargo +nightly-i686-pc-windows-msvc build --release 
    • Linux
      PHP Code:
      cargo +nightly-i686-unknown-linux-gnu build --release 


API
  • ParseHtmlDocument(document[])]
    • Params
      • document[] - string of html document
    • Returns
      • Html document instance id
      • if failed to parse document INVALID_HTML_DOC is returned
    • Example Usage

      PHP Code:
      new Html:doc = ParseHtmlDocument("\
       <!DOCTYPE html>\
       <meta charset=\"utf-8\">\
       <title>Hello, world!</title>\
       <h1 class=\"foo\">Hello, <i>world!</i></h1>\
       "
      );
      ASSERT(doc != INVALID_HTML_DOC);
      DeleteHtml(doc); 

  • ResponseParseHtml(Response:id)
    • Params
      • id - Http response id returned from HttpGet
    • Returns
      • Html document instance id
      • if failed to parse document INVALID_HTML_DOC is returned
    • Example Usage

      PHP Code:
      new Response:response = HttpGet("https://www.sa-mp.com");
      new 
      Html:doc = ResponseParseHtml(response);
      ASSERT(doc != INVALID_HTML_DOC);
      DeleteHtml(doc); 

  • HttpGet(url[],Header:headerid=INVALID_HEADER)
    • Params
      • url[] - Url of a website
      • header - id of header object created using CreateHeader
    • Returns
      • Response id if successful
      • if failed to INVALID_HTTP_RESPONSE is returned
    • Example Usage

      PHP Code:
      new Response:response = HttpGet("https://www.sa-mp.com");
      ASSERT(response != INVALID_HTTP_RESPONSE);
      DeleteResponse(response); 

  • HttpGetThreaded(playerid,callback[],url[],Header:headerid=INVALID_HEADER)
    • Params
      • playerid - id of the player
      • callback[] - name of the callback function to handle the response.
      • url[] - Url of a website
      • header - id of header object created using CreateHeader
    • Example Usage
      PHP Code:
      HttpGetThreaded(0,"MyHandler","https://sa-mp.com");
      //********
      forward MyHandler(playerid,Response:responseid);
      public 
      MyHandler(playerid,Response:responseid){
          ASSERT(responseid != INVALID_HTTP_RESPONSE);
          DeleteResponse(responseid);
      } 
  • ParseSelector(string[])
    • Params
      • string[] - CSS selector
    • Returns
      • Selector instance id if successful
      • if failed to INVALID_SELECTOR is returned
    • Example Usage

      PHP Code:
      new Selector:selector = ParseSelector("h1 .foo");
      ASSERT(selector != INVALID_SELECTOR);
      DeleteSelector(selector); 

  • CreateHeader(…)
    • Params
      • key,value pairs of String type
    • Returns
      • Header instance id if successful
      • if failed to INVALID_HEADER is returned
    • Example Usage

      PHP Code:
      new Header:header = CreateHeader(
          "User-Agent","Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36"
      );
      ASSERT(header != INVALID_HEADER);
      new 
      Response:response = HttpGet("https://sa-mp.com/",header);
      ASSERT(response != INVALID_HTTP_RESPONSE);
      ASSERT(DeleteHeader(header) == 1); 

  • GetNthElementName(Html:docid,Selector:selectorid,idx,string[],size = sizeof(string))
    • Params
      • docid - Html instance id
      • selectorid - CSS selector instance id
      • idx - the n’th occurence of element in the document (starts from 0)
      • string[] - element name is stored
      • size - sizeof string
    • Returns
      • 1 if successful
      • 0 if failed
    • Example Usage

      PHP Code:
      new Html:doc = ParseHtmlDocument("\
          <!DOCTYPE html>\
          <meta charset=\"utf-8\">\
          <title>Hello, world!</title>\
          <h1 class=\"foo\">Hello, <i>world!</i></h1>\
      "
      );
      ASSERT(doc != INVALID_HTML_DOC);

      new 
      Selector:selector = ParseSelector("i");
      ASSERT(selector != INVALID_SELECTOR);

      new 
      i= -1,element_name[10];
      while(
      GetNthElementName(doc,selector,++i,element_name)!=0){
          ASSERT(strcmp(element_name,"i") == 0);
      }

      DeleteSelector(selector);
      DeleteHtml(doc); 
  • GetNthElementText(Html:docid,Selector:selectorid,idx,string[],size = sizeof(string))
    • Params
      • docid - Html instance id
      • selectorid - CSS selector instance id
      • idx - the n’th occurence of element in the document (starts from 0)
      • string[] - element name
      • size - sizeof string
    • Returns
      • 1 if successful
      • 0 if failed
    • Example Usage

      PHP Code:
      new Html:doc = ParseHtmlDocument("\
          <!DOCTYPE html>\
          <meta charset=\"utf-8\">\
          <title>Hello, world!</title>\
          <h1 class=\"foo\">Hello, <i>world!</i></h1>\
      "
      );
      ASSERT(doc != INVALID_HTML_DOC);

      new 
      Selector:selector = ParseSelector("h1.foo");
      ASSERT(selector != INVALID_SELECTOR);

      new 
      element_text[20];
      ASSERT(GetNthElementText(doc,selector,0,element_text) == 1);

      new 
      check = strcmp(element_text,("Hello, world!"));
      ASSERT(check == 0);

      DeleteSelector(selector);
      DeleteHtml(doc); 
  • GetNthElementAttrVal(Html:docid,Selector:selectorid,idx,attribute[],string[],size = sizeof(string))
    • Params
      • docid - Html instance id
      • selectorid - CSS selector instance id
      • idx - the n’th occurence of element in the document (starts from 0)
      • attribute[] - the attribute of element
      • string[] - element name
      • size - sizeof string
    • Returns
      • 1 if successful
      • 0 if failed
    • Example Usage

      PHP Code:
      new Html:doc = ParseHtmlDocument("\
       <!DOCTYPE html>\
       <meta charset=\"utf-8\">\
       <title>Hello, world!</title>\
       <h1 class=\"foo\">Hello, <i>world!</i></h1>\
      "
      );
      ASSERT(doc != INVALID_HTML_DOC);

      new 
      Selector:selector = ParseSelector("h1");
      ASSERT(selector != INVALID_SELECTOR);

      new 
      element_attribute[20];
      ASSERT(GetNthElementAttrVal(doc,selector,0,"class",element_attribute) == 1);

      new 
      check = strcmp(element_attribute,("foo"));
      ASSERT(check == 0);

      DeleteSelector(selector);
      DeleteHtml(doc); 
  • DeleteHtml(Html:id)
    • Params
      • id - html instance to be deleted
    • Returns
      • 1 if successful
      • 0 if failed
  • DeleteSelector(Selector:id)
    • Params
      • id - selector instance to be deleted
    • Returns
      • 1 if successful
      • 0 if failed
  • DeleteResponse(Html:id)
    • Params
      • id - response instance to be deleted
    • Returns
      • 1 if successful
      • 0 if failed
  • DeleteHeader(Header:id)
    • Params
      • id - header instance to be deleted
    • Returns
      • 1 if successful
      • 0 if failed


Example Usage

A small example to fetch all links in wiki.sa-mp.com

PHP Code:
new Response:response = HttpGet("https://wiki.sa-mp.com");
if(
response == INVALID_HTTP_RESPONSE){
 
printf("HTTP ERROR");
 return;
}

new 
Html:html = ResponseParseHtml(response);
if(
html == INVALID_HTML_DOC){
 
DeleteResponse(response);
 return;
}

new 
Selector:selector = ParseSelector("a");
if(
selector == INVALID_SELECTOR){
 
DeleteResponse(response);
 
DeleteHtml(html);
 return;
}

new 
str[500],i;
while(
GetNthElementAttrVal(html,selector,i,"href",str)){
 
printf("%s",str);
 ++
i;
}
//delete created objects after the usage..
DeleteHtml(html);
DeleteResponse(response);
DeleteSelector(selector); 

The same above with threaded http call would be

PHP Code:
HttpGetThreaded(0,"MyHandler","https://wiki.sa-mp.com");
//...
forward MyHandler(playerid,Response:responseid);
public 
MyHandler(playerid,Response:responseid){
 if(
responseid == INVALID_HTTP_RESPONSE){
 
printf("HTTP ERROR");
 return 
0;
 }

 new 
Html:html = ResponseParseHtml(responseid);
 if(
html == INVALID_HTML_DOC){
 
DeleteResponse(response);
 return 
0;
 }

 new 
Selector:selector = ParseSelector("a");
 if(
selector == INVALID_SELECTOR){
 
DeleteResponse(response);
 
DeleteHtml(html);
 return 
0;
 }

 new 
str[500],i;
 while(
GetNthElementAttrVal(html,selector,i,"href",str)){
 
printf("%s",str);
 ++
i;
 }

 
DeleteHtml(html);
 
Delete(response);
 
DeleteSelector(selector);
 return 
1;
} 


More examples can be found in examples

Repository
https://github.com/Sreyas-Sreelal/pawn-scraper

Note

The plugin is in primary stage and more tests and features needed to be added.I’m open to any kind of contribution, just open a pull request if you have anything to improve or add new features.

Special thanks
  • Eva for samp-rust-sdk
  • Y_Less for y_tests
  • Discord members in SAMP discord channel
« Next Oldest | Next Newest »



  • View a Printable Version
  • Subscribe to this thread
Forum Jump:

© Burgershot - Powered by our Community and MyBB Original Theme by Emerald

Linear Mode
Threaded Mode