Name	Name	Last commit message	Last commit date
parent directory ..
src/main	src/main
.gitignore	.gitignore
README.md	README.md
build.gradle.kts	build.gradle.kts
consumer-rules.pro	consumer-rules.pro
gemini_multimodal.png	gemini_multimodal.png
proguard-rules.pro	proguard-rules.pro

Name

Last commit message

Last commit date

gemini_multimodal.png

proguard-rules.pro

Gemini Multimodal Sample

This sample is part of the AI Sample Catalog. To build and run this sample, you should clone the entire repository.

Description

This sample demonstrates a multimodal (image and text) prompt, using the Gemini Flash model. Users can select an image and provide a text prompt, and the generative model will respond based on both inputs. This showcases how to build a simple, yet powerful, multimodal AI with the Gemini API.

How it works

The application uses the Firebase AI SDK (see How to run) for Android to interact with Gemini Flash. The core logic is in the GeminiDataSource.kt file. A generativeModel is initialized, and then a chat session is started from it. When a user provides an image and a text prompt, they are combined into a multimodal prompt and sent to the model, which then generates a text response.

Here is the key snippet of code that initializes the generative model:

private val generativeModel by lazy {
    Firebase.ai(backend = GenerativeBackend.googleAI()).generativeModel(
        "gemini-2.5-flash",
        generationConfig = generationConfig {
            temperature = 0.9f
            topK = 32
            topP = 1f
            maxOutputTokens = 4096
        },
        safetySettings = listOf(
            SafetySetting(HarmCategory.HARASSMENT, HarmBlockThreshold.MEDIUM_AND_ABOVE),
            SafetySetting(HarmCategory.HATE_SPEECH, HarmBlockThreshold.MEDIUM_AND_ABOVE),
            SafetySetting(HarmCategory.SEXUALLY_EXPLICIT, HarmBlockThreshold.MEDIUM_AND_ABOVE),
            SafetySetting(HarmCategory.DANGEROUS_CONTENT, HarmBlockThreshold.MEDIUM_AND_ABOVE),
        ),
    )
}

Here is the key snippet of code that calls the generateText function:

suspend fun generateText(bitmap: Bitmap, prompt: String): String {
    val multimodalPrompt = content {
        image(bitmap)
        text(prompt)
    }
    val result = generativeModel.generateContent(multimodalPrompt)
    return result.text ?: ""
}

Read more about the Gemini API in the Android Documentation.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

Gemini Multimodal Sample

Description

How it works

FilesExpand file tree

gemini-multimodal

Directory actions

More options

Directory actions

More options

Latest commit

History

gemini-multimodal

Folders and files

parent directory

README.md

Gemini Multimodal Sample

Description

How it works