RC Groups Scraper
What is it?
I, like many, am very lazy. I, like many, am also very interested in radio control things. I have a couple planes and a couple multicopters. As I’ve gotten into the hobby, I’ve found that myself and the members of the community are always on the prowl for good deals, and one of the hubs of those sorts of deals is consistently RC Groups. RC groups is a great resource for making purchases, but it’s not my favorite interface for finding deals. As a result, I automated it. Instead of having to go back to the website and refreshing every 10 minutes to see if the thing that I want is there, I made a bot that will go out to the site, scrape it, and pull information back to me whenever it finds new results. This is available as a public, free to change and manipulate and enjoy repository on Github here:
Please feel free to download the code and check it out. I’m fairly happy with it, but I’m not the greatest developer in the world, and I would love any feedback/pull requests that you’d be willing to contribute.
What is it and how does it work?
It’s a python script. It is a very simple script that uses python requests and beautiful soup to hit any given forum, find the for-sale items, and pull them down to have something done with them. You can do whatever you want with it. Bake it into a website, make an app to scrape a site, or build an RC groups CLI (Command Line Interface). It’s up to you! This would at least be a good start.
First of all, this project is dependent on the libraries that are included in requirements.txt. I personally use virtualenv to manage my libraries on a project by project basis. This is what I set up:
- virtualenv env
- . env/bin/activate
- pip install -r requirements.txt
- touch urls.json ( the service saves new results in a file called urls.json — these results are saved here so that the app can recognize if the url that you saved is “new” as in the script hasn’t found that unique URL before)
- set up your settings in rcg_bot.py ( they are the first arguments of the file )
- if you aren’t going to use pushbullet, set pushbullet_key to null (more on pushbullet below)
- python rcg_bot.py ( and it will start running and keep running )
Doing all of this will run the bot that will continually go out to the site and check for deals. I have it defaulted to 1 minute (60 seconds). I have noticed some errors when you hit it more than a few times a minute, so I’d suggest leaving that high.
- The last bit of the URL from the part of rc groups classifieds you want to scrape (FPV/Multicopters/Models, whatever)
- these are the strings that you want to match in the title of the forum post. It will do an inclusive OR on the strings, so any matches will be returned (search: fatshark, dominator: anything with either fatshark or dominator will show up in the results
- time in seconds that this should be repeated
- I use a service called pushbullet to send myself push notifications. I have included the pushbullet code as an example way to use this bot to do something useful. If you don’t want to use pushbullet, just be sure to set the pushbullet key to null so it doesn’t try to pull from it
- If you don’t use pushbullet in the rest of your life, you should. It’s an amazing app.
There are a few early steps that can be taken to make this project significantly better:
- Task runner needs to be a little bit more robust. A worthwhile example code to put together would be to run this script with a cron job, rather than a bot that repeats. If an exception happens, for example, the process will tank and it will never start again. Doing an integration with nginx, or django celery might be worthwhile. This could also be easily run with heroku scheduler.
- More search options:
- It would be useful for people to be able to do clever filtering — do an “AND” search or setting up some sort of (this AND that) OR that sorts of expressions
- Maybe people want to be notified when something is listed as WANTED so that they can sell off their parts, etc.
- Code cleanup
- Make a pull request and tell me what you want to do with it or what you want to change!