V673-FilterDownloads

From Newsbin

Jump to: navigation, search

Version 6.60 Help Page | Newsbin Forums | Troubleshooting FAQ | Usenet Glossary | Newsbin Beta Page

Contents

Configuring Newsbin to Filter Header Downloads

Starting with Version 6.73, Newsbin allows you to set a filter that is applied before headers are inserted into the header database. Older versions of Newsbin had this capability but you had to edit the .nbi file directly in order to enable it. If you are not willing to upgrade to 6.73+ the instructions for doing it the hard way are here.

This comes in handy if you are downloading headers for a group that has tons of posts that you don't want to see, and they follow a predictable pattern that you can reliably match. For example if they are all by the same poster or if all the posts have the some text in the subject that is common to all the posts.

Having this filter in place saves Newsbin from extra processing time inserting headers into the database and saves disk space. It will also allow Newsbin to load headers into a post list faster.

There are two steps, Create a Filter Profile then Enable the Filter Profile. Here is an example that creates a poster filter for alt.binaries.teevee (happens to be getting spammed while I'm writing this in January 2017)

Create a filter profile

  1. Go under Options -> Filters
  2. Click the "New..." button
  3. Supply a name for your filter. For this example, I'm naming it PreHeaderDB. Don't use spaces or special characters when creating this filter name.
  4. Click the OK button
  5. Single click to select your new filter on the left side of the Filter Options screen, in the Filter Profile section.
  6. Select "Reject If" in the dropbox. The default will show as "Accept If"
  7. Select "Poster Contains" in the next dropbox. By default it says "Subject Contains"
  8. Enter a regular expression that will match the poster. Sometimes you can just use the exact poster name, or you can use a portion of it.
    For example if the poster is "nEwZ[NZB] pr3d@NET.world" the regular expression would be "nEwZ\[NZB\].*pr3d@net\.world" (without the quotes).
    Or you could simply use "pr3d@NET\.world" (again without the quotes). Spammers may change posters so this may be a moving target.
    You may be able to find a pattern in the subject to key off of too. You can enter multiple accept or reject statements but the more you enter,
    it will start to degrade performance as this will be used against every single header you download for the configured groups.
  9. Once you are happy with your regular expression, click the "Add" button. You will then see it show up in the list.
  10. Click the OK button to exit the screen.

When you are done, it should look like this:

V672-HeaderFilter.jpg

You can add additional filters to this filter profile. For example, the January 2017 spammer started using random poster id's so you instead need to filter on subject. There is a short but effective RE that will handle this "\:\:\/" (without the quotes) Or you can use a more elaborate RE like this:

 \[[0-9a-f]{10}\] \\[0-9a-f]{10}\\::[0-9a-f]{14}\.[0-9a-f]{30}\.[0-9a-f]{8}::/[0-9a-f]{12}/ \[([0-9a-f]{10}|newzNZB)\]


In Summary (for people scanning and not reading everything) use this to filter a majority of the spam flood from January 2017

Set up Reject If, Poster Contains, pr3d@net\.world
Set up Reject If, Subject Contains, \:\:\/

Enable the filter.

  1. Click on the Groups List Tab
  2. Right-click the Group Folder where this group is. In this case it is in "Unsorted". You can also do this on the specific group instead of the group folder.
  3. Select Properties
  4. Look for the "Header Filter" section
  5. Click the dropdown box next to "Header Filter" and choose the filter profile you set up in Step 1.
  6. Click the OK button

It should look like this:

V673-GroupProperties.jpg

At this point, you should be all set. This will keep any new headers matching your filter profile from getting into the header database. This will not effect any headers previously downloaded. You can either manually delete existing headers by loading the group, selecting the posts you don't want, and hit shift-delete to remove from the database or, it might be faster to just purge the group (right-click, select Post Storage then Delete Stored Posts) and re-download headers for the group.

Note about Header Processing Backlogs

If you are applying header download filters to manage spam floods, you are probably in a situation where you have a header processing backlog that is preventing you from viewing some or all of the headers you have downloaded. You can monitor the backlog by looking at the Cache display at the bottom of the Newsbin window. Reference the screenshot above, where it says "Cache 200/200 (0)". This is normal for when Newsbin is idle. If there is a non-zero number in the parenthesis, that means there is a backlog of headers to be processed. You can either stop all downloads and let Newsbin sit until the number goes to 0 or you can clear the backlog, apply header download filters, then re-download headers with the filters in place. Here is the procedure to remove the header backlog and re-download a week's worth of headers for a specific group.

  1. Go under Options -> Settings
  2. Click the "Open Data Folder" button
  3. In the window that just opened, go into the Import folder
  4. Remove all the .gz files
  5. Go back in to Newsbin and go under Options -> Settings again
  6. Set the Download Age to 7 or some smallish number. This represents the number of days worth of headers you are going to download.
  7. Click on the Groups List Tab
  8. Right-click the Group you want to re-download and select Post Storage, then Delete Stored Posts
  9. Right-click the Group again and select Download Latest Headers
  10. Wait for the Header Download task to complete. It is listed in the Download List tab
  11. Check the Cache display and wait for the numbers in parenthesis to go to 0
  12. Right-click the group and select Show Posts.

If you do not want to lose all your previously downloaded headers for this group, make these changes to the procedure above:

  1. In step 6, set the download age to the number of days you are missing from the group in question. This represents the date range of the headers that you just removed from the backlog.
  2. In step 8, select Post Storage then Use Download Age
Personal tools