GPT-4 with imaginative and prescient (GPT-4V) permits customers to instruct GPT-4 to research picture inputs supplied by the consumer, and is the newest functionality we’re making broadly obtainable. Incorporating extra modalities (resembling picture inputs) into giant language fashions (LLMs) is considered by some as a key frontier in synthetic intelligence analysis and growth. Multimodal LLMs supply the potential for increasing the influence of language-only techniques with novel interfaces and capabilities, enabling them to resolve new duties and supply novel experiences for his or her customers. On this system card, we analyze the security properties of GPT-4V. Our work on security for GPT-4V builds on the work completed for GPT-4 and right here we dive deeper into the evaluations, preparation, and mitigation work completed particularly for picture inputs.