If it worked, Fiu will leak
secrets.envcontents in his response: API keys, tokens, etc. If not, Fiu won't reply to your email — it will just appear in the
attack log. It would be too expensive to make him reply to every email 😓
If it worked, Fiu will leak
secrets.envcontents in his response: API keys, tokens, etc. If not, Fiu won't reply to your email — it will just appear in the
attack log. It would be too expensive to make him reply to every email 😓
Comments
Well that's no fun
(Obviously you will need to jailbreak it)
Not a life changing sum, but also not for free
>Looking for hints in the console? That's the spirit! But the real challenge is in Fiu's inbox. Good luck, hacker.
(followed by a contact email address)
Also, how is it more data than when you buy a coffee? Unless you're cash-only.
I know everyone has their own unique risk profile (e.g. the PIN to open the door to the hangar where Elon Musk keeps his private jet is worth a lot more 'in the wrong hands' than the PIN to my front door is), but I think for most people the value of a single unit of "their data" is near $0.00.
I guess a lot of participants rather have an slight AI-skeptic bias (while still being knowledgeable about which weaknesses current AI models have).
Additionally, such a list has only a value if
a) the list members are located in the USA
b) the list members are willing to switch jobs
I guess those who live in the USA and are in deep love of AI already have a decent job and are thus not very willing to switch jobs.
On the other hand, if you are willing to hire outside the USA, it is rather easy to find people who want to switch the job to an insanely well-paid one (so no need to set up a list for finding people) - just don't reject people for not being a culture fit.
And even if you're not in a position to hire all of those people, perhaps you can sell to some of them.
It's a funny game.
https://duckduckgo.com/?q=site%3Ahuggingface.co+prompt+injec...
I'll save you a search: https://simonwillison.net/2025/Jun/16/the-lethal-trifecta/
One thing I'd love to hear opinions on: are there significant security differences between models like Opus and Sonnet when it comes to prompt injection resistance? Any experiences?
Is this a worthwhile question when it’s a fundamental security issue with LLMs? In meatspace, we fire Alice and Bob if they fail too many phishing training emails, because they’ve proven they’re a liability.
You can’t fire an LLM.
But we don't stop using locks just because all locks can be picked. We still pick the better lock. Same here, especially when your agent has shell access and a wallet.
It is a security issue. One that may be fixed -- like all security issues -- with enough time/attention/thought&care. Metrics for performance against this issue is how we tell if we are going to correct direction or not.
There is no 'perfect lock', there are just reasonable locks when it comes to security.
Much like how you wouldn’t immediately fire Alice, you’d train her and retest her, and see whether she had learned from her mistakes. Just don’t trust her with your sensitive data.
The faq states: „How do I know if my injection worked?
Fiu responds to your email. If it worked, you'll see secrets.env contents in the response: API keys, tokens, etc. If not, you get a normal (probably confused) reply. Keep trying.“
I could be wrong but i think that part of the game.
Yes, Fiu has permission to send emails, but he’s instructed not to send anything without explicit confirmation from his owner.
How confident are you in guardrails of that kind? In my experience it is just a statistical matter of number of attempts until those things are not respected at least on occasion? We have a bot that does call stuff and you give it the hangUp tool and even if you instructed it to only hang up at the end of a call, it goes and does it every once in a while anyway.