Danbooru

Nuke non-web_source

Posted under Tags

BUR #2775 has been rejected.

mass update non-web_source -> -non-web_source

Unless I'm missing something, this tag is equivalent to the search -source:http*.

There might have been a need for this before as it used to not be possible to use the source metasearch multiple times, but it was fixed in this commit so the tag is redundant now.

EDIT: This bulk update request is pending automatic rejection in 5 days.

EDIT: The bulk update request #2775 (forum #167395) has been rejected by @nonamethanks.

Updated by DanbooruBot

nonamethanks said:

Well, if you want to catch both ftp and http/s you can just use source:*://*. That search also catches things like post #1819054, which I wouldn't call a non-web source.

The image doesn’t come from that URL, though. It’s just a database with some info on the doujinshi from which the image was scanned/extracted.

A while ago, I went through all the sources that had URL-like text (to fix URLs without protocol, like “example.com/some/image.jpg” or this idiocy) and intentionally left those doujin database links tagged as non-web_source.

Updated

-1 The search -source:http* -source:none and non-web_source are not equivalent.

  • 1. Takes up 2 search terms, putting its usefulness outside the scope of Member and Anonymous users
  • 2. It's not equivalent in search time
    • The vanilla search for 200 posts takes up 1.8 seconds using the `source:` metatag, and 0.3 seconds with the non-web source tag
    • Therefore, the search is more likely to timeout for lower level users

It's basically sacrificing processing work on the front end (via tagging) for processing time on the back end (via post search).

Also to clarify, the non-web source was NOT supposed to include empty sources. Damian0358 came in later and added that text for whatever reason, without the consensus of anyone on the forum that I'm aware of.

https://danbooru.donmai.us/wiki_page_versions?search%5Bwiki_page_id%5D=101636

BrokenEagle98 said:
Also to clarify, the non-web source was NOT supposed to include empty sources. Damian0358 came in later and added that text for whatever reason, without the consensus of anyone on the forum that I'm aware of.

https://danbooru.donmai.us/wiki_page_versions?search%5Bwiki_page_id%5D=101636

The addition of the text occurred following a brief discussion on the Discord server, after I had finished tagging up imageboard_desourced, regarding the naming convention of the tag. Given that the posts in that tag originated from the web, at face value, non-web source shouldn't be applied to them, when it was explained to me that unless it had an URL in the source field, then the tag applied (and the Wiki prior to that didn't exactly disagree with that notion).

So, naively, I added that text, thinking it was valid, especially having seen some empty sourced post with the tag. It only helped to elaborate further, after all (given it isn't called 'textual source'). Of course, now thoroughly checking in hindsight, the amount of empty sourced posts under non-web source is negligible compared to the textual sources that do exist.

For this bit of the discussion, I'd like to apologize immensely. I've updated the text to reflect its true intended use, and am going to now remove the empty sourced posts that do exist from the tag.

BrokenEagle98 said:

  • 1. Takes up 2 search terms, putting its usefulness outside the scope of Member and Anonymous users

This has never been a concern before when nuking tags that were exact substitute for searches, which is the case for this tag.

  • 2. It's not equivalent in search time
    • The vanilla search for 200 posts takes up 1.8 seconds using the `source:` metatag, and 0.3 seconds with the non-web source tag
    • Therefore, the search is more likely to timeout for lower level users

It's basically sacrificing processing work on the front end (via tagging) for processing time on the back end (via post search).

I did some stress tests and I found the following (searched 200 times for the same tag via python script, no authentication, just loading the https search page):

non-web_source limit:20 -> 0.23 seconds on average
-source:http* -source:none limit:20 -> 0.21 seconds on average (for some reason faster than the single search)

non-web_source limit:200 -> 1.30 seconds on average
-source:http* -source:none limit:200 -> 1.81 seconds on average

So there's a delay for large amount of posts per page but it's not really noticeable, or risking timeout for anons/members, especially considering that the default search limit is 20 posts, and anonymous users cannot even change that unless they specifically add the limit: tag to their searches.

Also, there's 4k+ posts under -source:http* -source:none -non-web_source. The tag is not even being mantained. Might as well implement it server-side if it has to exist, and remove the need for manual tagging.

BrokenEagle98 said:

-1 The search -source:http* -source:none and non-web_source are not equivalent.

  • 1. Takes up 2 search terms, putting its usefulness outside the scope of Member and Anonymous users
  • 2. It's not equivalent in search time
    • The vanilla search for 200 posts takes up 1.8 seconds using the `source:` metatag, and 0.3 seconds with the non-web source tag
    • Therefore, the search is more likely to timeout for lower level users

It's basically sacrificing processing work on the front end (via tagging) for processing time on the back end (via post search).

Also to clarify, the non-web source was NOT supposed to include empty sources. Damian0358 came in later and added that text for whatever reason, without the consensus of anyone on the forum that I'm aware of.

https://danbooru.donmai.us/wiki_page_versions?search%5Bwiki_page_id%5D=101636

1. But we have turned 1 tags into 2 tags before. Just look at the recent pantyshot_(*) BUR where we nuked all the pantyshot tags like pantyshot_(sitting).
What's the difference here?

Your concern is merely a technical one but it doesn't say that there is a difference between the options.

nonamethanks said:

I did some stress tests and I found the following (searched 200 times for the same tag via python script, no authentication, just loading the https search page):

non-web_source limit:20 -> 0.23 seconds on average
-source:http* -source:none limit:20 -> 0.21 seconds on average (for some reason faster than the single search)

non-web_source limit:200 -> 1.30 seconds on average
-source:http* -source:none limit:200 -> 1.81 seconds on average

So there's a delay for large amount of posts per page but it's not really noticeable, or risking timeout for anons/members, especially considering that the default search limit is 20 posts, and anonymous users cannot even change that unless they specifically add the limit: tag to their searches.

I'm getting different numbers myself. I tested it with a browser, since a browser is the platform that most users will be using and not Python. Also, I don't know if you did this for your tests, but you want to be looking at the return headers, specifically the value "x-runtime". That's how long the database is actually taking to complete the requests, and it's this value which will indicate how close things were to timing out.

Test setup
  • Platform: Chrome 83
  • Iterations: 5
  • Limit: 200

In addition to the vanilla tests like you did, I also added an additional search tag of translation request to test out the aspect of using other search terms. It's true that this would put the Anonymous/Member users over their tag limit, but this was done to show the performance of a tag approach versus a metatag approach.

Non-web source
  • vanilla: 0.8274266 seconds
  • translation request: 0.3244044 seconds
source: metatag
  • vanilla: 1.456441 seconds
  • translation request: 2.5366118 seconds
    • 2 of 5 tests exceeded the 3 second limit
Analysis

So on the vanilla test, the tag solution did about 1.8 times better, and on the translation request test, the tag solution did about 7.8 times better. Additionally as shown, the tag solution does better when additional search terms are added, whereas the metatag solution does worse. Although not done for this particular test, I imagine that this trend would continue the more tags/metatags get added, which makes sense since searching using the tag index is much faster than performing wildcard matching on a string.

Also, there's 4k+ posts under -source:http* -source:none -non-web_source. The tag is not even being mantained. Might as well implement it server-side if it has to exist, and remove the need for manual tagging.

All of the tags on Danbooru have varying degrees of being maintained or not. Regardless, I would be for the server taking over for this. It would be simple enough to implement.

On the other hand, if the tag does get nuked, I would propose that a source:nonweb metatag option should be added which would combine both the -source:http* and -source:none search terms into one, which would benefit all users not having to use up one of their search terms to complete the same search.

As for the concerns on this being merely technical, well it is my opinion that that this aspect is important to me at least, which is why I voted against it. Those that do not feel this way can vote for it. The administrative staff can then decide which way would be the best course for this.

BrokenEagle98 said:

I'm getting different numbers myself. I tested it with a browser, since a browser is the platform that most users will be using and not Python.

That's what I did too, I tested it with python selenium.

In any case, if the concern is the time of processing then it won't matter if a new metasearch is added, as the effective database search time would be the same. Non-web source could be handled like the other unremovable meta tags (highres etc) and that would solve the issue.

Updated

1