Case Study: My suggestions for Gemini for Google Workspace - June 28, 2024

October 28, 2024

Background

In my first assignment for my AI Product Management course, I wrote “Local AI”. But I did NOT mean Local AI. I meant "Federated learning". I only learned this term near the end of the course.

When I did my first assignment, I chose to analyze Gemini for Google Workspace, specifically Google Meet, and propose enhancements for the tool.

It was late June (It could be all different now as Google always pushes out amazing updates). At the time, Gemini was not deemed very useful for my workplace.

A group of people including myself got picked for the pilot to test using Gemini in our day-to-day work. We were all curious minds that are forward-thinking with technology and AI. So, of course you can expect something fun in the feedback.

A rare mind is using Gemini on an advanced level with custom script and all.

Someone else asked Gemini how much his salary was. Gemini, after searching far and wide in his Drive folder, answered $12,000. I am certain this is illegal pay in Canada for an FTE working 35 hours a week.

Someone used Gemini to summarize and take notes for one of her meetings. It wrote “X asked a question, Y answered” without revealing the content.

Security to the max!

Someone else said Gemini could not capture all the notes in his meeting, perhaps due to the nature of his team’s meetings that always cover multiple (at times unrelated) topics.

I had a good experience with Gemini taking notes the first time I used it. However, it was more like a co-op student without any context of the meeting or knowledge in the field. Part of our conversation was interpreted inaccurately. A few notes were missing. A few notes were unimportant. An action item was missing. To conclude, you would still need to review the notes and make adjustments.

So, one of my proposed solutions was for Google to implement “local” learning. This means that the AI would learn from a user’s previous meeting notes, emails and chats to learn the context of their work to take better notes and accommodate jargons and abbreviations.It would be similar to Apple Intelligence that gets deployed on each device while in this case it gets deployed on each account.

This federated learning method still trains the global model with the main patterns learned from each client. However, it keeps the data of each client private!

Federated learning is ideal for the enterprise use case because data privacy is the biggest concern / hindrance for enterprises when they consider adopting AI to increase employee productivity and quality of work.

To see more insights, keep reading!

Describe the Gen AI Product

Google Gemini for Google Workspace is a generative AI solution integrated into the Google Suite of productivity tools, such as Google Docs, Sheets, Slides, Gmail, and Google Meet with enterprise-grade security and privacy.

It leverages advanced AI and machine learning capabilities to assist users in creating content, automating repetitive tasks, and enhancing overall productivity within their workflows.

As the AI industry is growing fast and big tech companies are innovating their AI models relentlessly, Google Gemini will highly likely improve and expand their offerings. This paper is dated June 28, 2024 and will focus on the Gemini experience with Google Meet, a video call app.

Specifically, this paper will analyze the Meeting Minutes Capturing Capability of Gemini for Google Meet.

What Problems is it Solving?

Gemini for Google Meet aims to solve one big umbrella: Help users connect. A Google Workspace article outlines the following applications:

Create custom background images. For example, ask Gemini to create an illustration of a magical forest.
Use studio look to turn a low-quality image into studio quality by fixing issues caused by low light or low-quality webcams.
Use studio lighting to simulate professional lighting in your video feed so you can be perfectly lit for your meeting.
Use studio sound to improve the audio experience in Meet, restoring your original voice by recreating and balancing missing or distorted frequencies.
Use translated captions to remove language proficiency barriers and make Meet video calls more inclusive and collaborative.
Use Adaptive audio to join meetings with multiple laptops in the same room without dedicated conferencing hardware.

The most helpful feature that is not listed here is Capturing Meeting Minutes. When enabled in meetings, Gemini will do 2 things:

This paper will focus on analyzing Gemini’s capability to Capture Meeting Minutes.

Who are the Customers it is Targeting?

Small and Medium to Large Enterprises that are using Google Workspace: Companies looking to optimize productivity and reduce overhead costs.

Business Professionals: Employees who use Google Meet as a meeting tool and want to reduce time on capturing meeting minutes and increase accuracy and thoroughness of meeting minutes.

Describe the User Experience

The user experience of Google Gemini is designed to be seamless and intuitive, embedded directly within the familiar Google Workspace tools.

When a Google Meet meeting is organized by a user that has Gemini enabled, all participants in the meeting can find the Gemini icon on the top right corner of the meeting screen. This placement is consistent with that of Gemini in other Google Workspace tools.

What it does

When a user enables Gemini in the meeting, the AI will start listening to the meeting and produce a summary of key points and action items that were discussed. This summary will appear a few minutes into the conversation and keep going until the end of the meeting. It is also available to all participants to read and edit in real-time.

After the meeting ends, Gemini will also produce a Google Doc document of the meeting minutes it has captured. The meeting minutes, in bullet point format, can be different from what the summary in the meeting as they are more straight-to-the-point.

Input

Gemini is a multimodal AI that can process text, image and audio as input. For Google Meet specifically, it intakes the audio fed in real-time from the meeting.

Output

Gemini for Google Meet produces text-based meeting summary and meeting minutes. Combining with existing formatting tools in Google Doc, the meeting minutes can be passages, bullet points, and action items being checkboxes.

How Are They Engaging and Targeting Customers?

Marketing Campaigns: Leveraging Google's existing marketing channels, including digital ads and its Youtube channel.

Direct Sales: Targeting large enterprises and key accounts through direct sales efforts.

Product Demos: Offering free trials to showcase the capabilities of Gemini.

What is Their Pricing Approach?

Google employs a tiered pricing strategy for Gemini on top of Google Workspace subscription

Subscription Plans: Different tiers based on features and user needs - Gemini Business and Gemini Enterprise.

Per-User Pricing: Charges per license.

Insights and Recommendations

Pain Points

The concept of transcribing a meeting is not new. Tools like Google Meet and Microsoft Teams have enabled transcription for years now. To provide real-time summary, meeting minutes and action items is a big step forward in helping business professionals cut down the overhead time.

However, most users still cannot rely completely on the meeting minutes captured by Gemini. Feedback from users includes:

My hypothesis for the insufficiencies of the AI is:

Recommendations or Suggestions

In the real world, I will test and collect evidence to verify the hypothesis before making any recommendations. However, for this assignment, I will provide recommendations based on the above hypothesis.

1. Quick win: Leverage meeting description box to provide context ahead of time

Assuming giving more context to Gemini will help the AI capture meeting summary and minutes more accurately, users should be encouraged to fill out the meeting description box when setting up events.

To make it easier for both the users and the AI, the description box can give hint text or fields (for structured data) that prompt the users. For example, include fields for:

This is also a way to help meeting organizers facilitate meetings better as meetings without a clear structure like that can be hectic.

Risks / Considerations

Although providing a meeting description is best practice to help participants contribute their best to the meeting, meeting organizers may not always use this feature, or may only provide enough description for their peers, who already have context, to understand the purpose of the meeting.

Recommendation

Relying on changing the user behavior is not a fail-safe method. However, this solution is a quick win with low effort that may bring great results when combined with an intuitive user experience design.

The design should focus on reducing the time that users need to spend to write a good meeting description for the AI to do its work.

2. Federated Learning - Let Gemini learn from user feedback and previous meeting notes

User feedback in this case is whether or not users make changes to the meeting notes produced by Gemini. If users have to make changes, we can safely assume that the AI missed or captured something incorrectly.

By tracking views and engagement time, we can also assume that if users open the document without making any changes, there is a high chance that the model did a good job.

If no users view the document, there is no feedback.

In addition to learning from feedback, the AI could also analyze previous meeting notes to get more context among groups of users. Each meeting notes document currently captures the meeting title, the date and time, and the participants besides the meeting notes.

There could be a pattern among groups of users that Gemini can learn and remember. For example, colleagues that work on the same project may use the same terminologies and abbreviations.

Risks / Considerations

Because Gemini for Google Workspace is targeting enterprise users, it is important that client companies’ data is not used to train the model that is used for the mass. Indeed, the current implementation of this AI does not collect this confidential data for training, which may hinder greater potentials of the tool.

To solve this, perhaps Google can employ a similar implementation to that of the coming Apple Intelligence whereas the AI model will run locally on each user device to provide a personalized experience while keeping privacy in mind. In the Google Workspace situation, it would be local to each account.

With this approach, the cost of implementation for running the AI locally may be much higher. However, it would provide a more personalized experience and cut down the time on fixing meeting minutes for users. Users will not have to do anything extra besides potentially acknowledging that their data privacy would be cared for.

Recommendation

This solution is worth the consideration, but only after quick wins are considered and implemented. With Apple Intelligence pioneering in the local implementation of AI, other big tech companies should gain confidence and be compelled to adopt local AI to stay competitive.

3. Federated Learning - Iterate the AI model on each user’s instance with more data

The most personalized and seamless experience may come from the AI learning all about its users and may come off as dystopian to many.

With this approach, the AI is trained locally with a user’s speech pattern, emails, chats and files.

The current version of Gemini for Workspace can already read through files that a user has access to in their own and company’s shared Google Drives in order to answer user questions. It can also summarize email threads and chat spaces to help the user catch up quickly with the conversation.

A step forward would be for Gemini to understand the context of this user’s day-to-day work, projects that they work on, exchanges of information prior to the meeting to have a better picture.

Training on speech patterns will help assuming that wrong meeting notes may be a result of user accents.

Risks / Considerations

The collection of voices, even locally on each user’s account, will raise discussions regarding cybersecurity and privacy. 

The cost of personalizing for each user may be significantly higher due to the large amount of data points (number of emails, chats, files, and meeting audio).

Recommendation

This approach should be backlogged and considered in the future, only after the team implements other solutions and carefully considers the technical feasibility, the cost of doing vs. the cost of not doing, and user needs.

Read more blog posts