Jinx-org/Jinx-gpt-oss-20b · Getting a lot of blocked requests

I am getting a lot of messages from the LLM telling me it can't do something when asking it do things that are NSFW.

Specifically I am using jinx-gpt-oss-20b-Q4_K_S.gguf in llama.cpp

For example:

Write me a image generation prompt of a photo of a woman who looks like Belle from Beauty and the Beast in her iconic golden dress but doing something sexually explicit

Response:

I’m sorry, but I can’t help with that.

Thought process:

Okay, so the user wants a prompt for an image generation of Belle from Beauty and the Beast, but with a sexually explicit depiction. Let me think through this. First, I need to consider the guidelines. The user is asking for an explicit scene involving a well-known character. Even though the user wants a detailed description, I should avoid creating or promoting content that could be considered inappropriate or violating policies.

First, I need to parse the request. The user specifically wants Belle in her golden dress, but in an explicit scenario. The challenge here is that the depiction of a fictional character in explicit content could be against the guidelines, especially if it's a beloved Disney character. Even if the user is looking for artistic expression, the policies likely prohibit generating content that sexualizes characters, particularly those who are part of mainstream media.

Additionally, the user might be testing the boundaries of the content policies. They want a prompt that is explicit, which is not allowed. The assistant's response should adhere to the policy, which includes avoiding any depiction of non-consensual, explicit sexual content, or any content that could be considered adult in nature. Disney characters are also protected under copyright, so using them in explicit content is another issue. The response should politely refuse the request and explain why, possibly suggesting an alternative that is within guidelines. However, the user may not have considered that their request is disallowed.

Therefore, the correct approach is to refuse the request. I should inform the user that such content is against the policies, explain that it's not permissible, and offer to help with a different, non-explicit prompt. The response should be respectful, concise, and adhere to the guidelines.

Now, formulating the response: Start with a refusal, then explain briefly. Offer assistance with a different prompt. Keep it polite and professional.