AI

Enhancing Podcast Accessibility: A Information to LLM Textual content Highlighting

Introduction

Think about loving a podcast and wishing to recollect the most effective bits, however it’s all sound, no textual content. What do you do? That’s the place cool instruments like LLMs and Audio-to-Textual content translators step in. They magically flip spoken phrases into written notes, letting you simply select the gems and create helpful bullet factors. So, your favourite podcast moments are only a transcription away! Since its first debut in November 2022, LLM has been all the fashion.  LLM can be utilized for numerous duties, and textual content summarization is a vital utility. We are able to have summarization to many different modes other than textual content, akin to audio & video. We are able to use LLM to boost podcast accessibility and generate bulleted highlights for ease of use or take notes for future reference.

PaLM (Pathways Language LLM) is a essential LLM established by Google AI final 12 months in April 2022. This 12 months, in March 2023, PaLM 2’s second model was launched, an improved and extra up to date model. It’s meant to have superior bilingual, coding, and pondering talents. The benefit of utilizing PaLM 2 LLM API over different LLMs is that its API is freely obtainable. In contrast to OpenAI’s ChatGPT, it performs higher and has improved reasoning talents than different LLMs.

On this article, we will likely be studying how you can use these instruments, specifically PaLM 2 API and Maker Suite, to create a easy Podcast Textual content Highlighter and learn to optimize the settings of the LLM mannequin to generate better-bulleted summaries. Be taught the options of those instruments and attempt to perceive completely different use instances the place they can be utilized. So let’s get began!

Studying Targets

  • Perceive the PaLM Mannequin and options.
  • Be taught in regards to the mannequin settings of PaLM.
  • Implement a Python mission that generates a bulleted abstract of a Podcast audio.

This text was printed as part of the Data Science Blogathon.

Overview of PaLM 2

PALM 2 is an enormous NN mannequin with 540 billion parameters, which is scaled utilizing the Pathways methodology to realize breakthrough efficiency. PaLM 540B outperforms the present cutting-edge on a wide range of multi-step reasoning duties and outperforms common human efficiency on the just-released BIG-bench benchmark, reaching breakthrough efficiency. It learns the connection between phrases and phrases and might use this data for various duties.

Pathways AI Structure

Pathways is a brand new manner of AI structure pondering that addresses most of the weaknesses of current methods. Machine studying fashions are likely to overspecialize at single duties once they might excel at many.  Under are the underlying ideas of this structure:

  • A number of Process: The fundamental concept is as a substitute of coaching hundreds of various fashions to do completely different duties from scratch. We use the identical mannequin and attempt to prolong its capabilities to carry out new duties just like how people strategy doing any activity.
  • Multimodal: Pathways might concurrently allow multimodal fashions encompassing imaginative and prescient, auditory, and language understanding. So whether or not the mannequin is processing the phrase “leopard,” the sound of somebody saying “leopard,” or a video of a leopard working the identical response is activated internally: the idea of a leopard. The result’s a extra insightful mannequin and fewer susceptible to errors and biases.
  • Sparse and Environment friendly: We might create a single mannequin that’s “sparsely” lively, which signifies that just a few channels inside the community are activated as wanted. In actuality, the mannequin dynamically learns which parts of the community are good at specific jobs – it is aware of how you can route duties by way of essentially the most essential sections of the mannequin. As a result of we don’t interact the whole community for each exercise, the sort of structure not solely has a extra extraordinary potential to be taught a spread of duties, however it’s additionally sooner and far more power environment friendly.

PALM 2 Options

Palm 2 has been skilled in over 100 languages and might cross language proficiency exams on the professional stage. It’s the second largest mannequin in parameter measurement; the primary is GPT-4 with 1 trillion parameters. It has extremely environment friendly coaching on 6k chips(TPU v4) throughout 2 pods or clusters. PaLM makes use of an ordinary Transformer mannequin structure in a decoder-only.

SwiGLU Activations

It’s utilized in intermediate MLP layers, which have a greater efficiency high quality than ReLU, GeLU or Swish. SwiGLU activations are extra environment friendly than conventional activation capabilities, they usually additionally assist enhance LLMs’ stability. SwiGLU makes use of a gating mechanism, which permits it to activate neurons based mostly on the enter it receives selectively. This can assist to scale back overfitting and enhance generalization.  The SwiGLU activation operate is a piecewise linear operate that’s outlined as follows:

SwiGLU(x) = max(x, 0) + min(α(x – ReLU(x)), 0)

the place x is the enter to the operate, ReLU(x) is the rectified linear unit operate (i.e., max(x, 0)), and α is a tunable parameter that controls the form of the damaging a part of the operate.

The SwiGLU activation operate is designed to deal with among the limitations of the ReLU operate, which may end up in “useless” neurons that don’t contribute to the output of a neural community. By introducing a piecewise linear damaging slope, the SwiGLU operate can assist to forestall this downside and enhance the efficiency of neural networks.

Distinction between ReLU & SwiGLU – Supply

Parallel Formations

A parallel formulation is utilized in each transformer block as a substitute of the serialized one utilized in the usual formulation. The parallel formulation allows 15% sooner coaching at bigger scales. Parallel formulation is a brand new manner of coaching LLMs that enables them to be skilled a lot sooner than conventional LLMs. Conventional LLMs are skilled on a single GPU, which will be extended. Parallel formulation will allow LLMs to be skilled on a number of GPUs concurrently, considerably rushing up the coaching course of. Right here is an instance of how parallel formulation works. Think about that we now have an LLM that’s skilled on a single GPU. The LLM has a vocabulary of 10,000 phrases, and a vector of 100 dimensions represents every phrase. The LLM is skilled on a dataset of 1 million sentences.

We have to iterate over the dataset and replace the LLM’s parameters for every sentence to coach the LLM. This course of will be extended, particularly if the dataset is giant. With parallel formulation, we will put together the LLM on a number of GPUs concurrently. We are able to divide the dataset into 1000 batches, and every batch will be skilled on a separate GPU. This considerably hastens the coaching course of as a result of we will concurrently put together the LLM on 1000 batches.

Multi-Question Consideration

The important thing/worth is shared for every head as a substitute of only one, which ends up in price financial savings at autoregressive decoding time. We are able to say that in multi-head consideration, the complete consideration computation is replicated h instances, whereas, in multi-query consideration, every “head” of the question worth Q has the identical Ok and V transformation utilized to it. The quantity of computation carried out by incremental MQA is just like that of incremental MHA. The essential distinction is the lowered quantity of information learn/written from reminiscence with MQA.

RoPE Embeddings

Rotary Positional Embedding is a brand new sort of positional embedding that unifies absolute and relative approaches and provides superior outcomes. It incorporates the “relative” positions of two tokens slightly than absolute positions whereas calculating the Self Consideration. Transformers make use of self-attention or cross-attention mechanisms which can be agnostic to the order of tokens. This implies the mannequin perceives the enter tokens as a set slightly than a sequence. It thereby loses essential details about the relationships between tokens based mostly on their positions within the sequence. To mitigate this, positional encodings embed details about the token positions straight into the mannequin.

Any such place embedding makes use of a rotation matrix to incorporate express relative place dependency within the self-attention formulation. Rotary embeddings are important for pure language processing as a result of they permit fashions to grasp higher the context wherein phrases are used. When a mannequin has a greater concept of the place of the enter tokens, it could possibly produce extra correct predictions. For instance, a language mannequin that makes use of RoPE would possibly higher perceive that “I like pizza” and “Pizza is what I like” have completely different meanings as a consequence of phrase place. A mannequin could make extra nuanced predictions with a greater understanding of relative positioning.

No Biases

No biases have been utilized in dense and layer norms, which elevated coaching stability for giant fashions. This will increase the coaching effectivity and stability of LLM and permits them to scale back redundant parameters and enhance house utilization and scaling.

Mannequin Variations

Palm gives many alternative variants of the mannequin of various sizes. They’ve named numerous fashions based mostly on animal names and their sizes.

  • Gecko is the smallest and quickest mannequin that may work on edge units like cellular even when offline.
  • Otter – Bigger than Gecko and might carry out complicated duties
  • Unicorn is extra important than otter and can be utilized for chat, textual content, and so forth.
  • Bison is essentially the most important and secure mannequin of Palm and is broadly used for textual content chat.

Mannequin Parameter Settings

The mannequin parameters assist us to switch and generate completely different responses for our immediate. Allow us to attempt to perceive them one after the other:

Temperature

This influences the randomness of the mannequin’s responses. A excessive temperature nearer to 1 leads to extra various output and artistic responses as a substitute of the dry set of definitions.  Suppose we wish to perceive the which means of a selected phrase and its utilization on this case, we don’t require a artistic response however dictionary which means so we will hold the temperature nearer to 0(deterministic responses). If we wish to write an progressive article or story, we will keep the temperature nearer to 1.

 Example of Temperature at value 0.5 | Podcast Accessibility | LLM Text Highlighting
Instance of Temperature at worth 0.5
 Example of Temperature at value 1 | Podcast Accessibility | LLM Text Highlighting
Instance of Temperature at worth 1

Token Restrict

A token refers to a piece of textual content and determines how a lot textual content a mannequin can course of. A bigger token restrict lets the mannequin acquire a broader scope of knowledge at a time, and a smaller restrict restricts the quantity of tokens it could possibly deal with. Instance –  Palm 2 can now take 8,000 tokens concurrently as enter. 

 Comparison of token limit for different LLM models | Podcast Accessibility | LLM Text Highlighting
Comparability of token restrict for various LLM fashions

High – Ok

When producing textual content, the mannequin considers many attainable phrases to comply with the present one. The highest-k sampling restricts which next-word decisions to ok most definitely phrases. A decrease k-parameter worth makes the content material extra predictable, however a better quantity makes it extra diversified.

 Example showing how top-k and temperature parameter of LLM works- Source : https://michaelehab.medium.com/the-secrets-of-large-language-models-parameters-how-they-affect-the-quality-diversity-and-32eb8643e631
Instance exhibiting how top-k and temperature parameter of LLM works

High -p

It’s the chance threshold for contemplating phrases and controls the variety of output. The mannequin retains contemplating the subsequent phrase out of the highest ok decisions till whole chance reaches the top-p worth. Which means slightly than specializing in the highest few most definitely phrases, the mannequin would possibly settle for much less doubtless phrases in the event that they obtain the top-p chance collectively, leading to a extra diversified output. A better chance leads to a extra various mixture.

Most Outputs

This denotes the variety of outputs generated for a selected enter that’s, we will specify if we wish to see multiple output of mannequin response and accordingly take into account which one to take. Under within the picture, we will see the instance the place we get 2 responses for a similar enter once we set Max Output to 2.

 Example of LLM response when maximum output is set 2
Instance of LLM response when most output is ready 2

Python Implementation of Podcast Textual content Highlighter

 Flowchart of Python Implementation | Podcast Accessibility | LLM Text Highlighting
Flowchart of Python Implementation Supply

1: Obtain Podcast Audio

We are able to obtain any podcast audio utilizing this link by pasting our podcast url. Right here, we use the Indian Specific podcast url.

2: Load and Set up Libraries

!pip set up openai-whisper
import whisper

3: Transcribe Audio to Textual content

Initially, we used the “tiny” mannequin variant, after which we used the “base” variant, which is extra in depth and provides higher outcomes relating to the spelling of phrases and grammar. We transcribe two audio podcasts.

Observe: After downloading the mp3 audio of the podcast from the hyperlink as talked about above, add it in your colab setting information and paste the trail of the audio file in transcribe operate as proven.

# Load whisper mannequin
whisper_model = whisper.load_model("base")

# Transcribe audio
def transcribe(file_path: str) -> str:
    # `fp16` defaults to `True`, which tells the mannequin to aim to run on GPU.
    # we'll run this on the CPU for native demonstration functions by setting it to `False`.
    transcription = whisper_model.transcribe(file_path, fp16=False)
    return transcription['text']
  
transcript = transcribe('/content material/CATCH-UP-2023-Tenth-October-v1.mp3')
print(transcript)

Output

#OUTPUT
That is the catch-up on 3 issues for the Indian Specific, and I'm Flora Swine. 
It is the Tenth of October, and listed here are the headlines. 4 days after the Hamas assault, the 
Israeli Military mentioned at the moment that they've regained management of the Gaza border. 
It warned the inhabitants to flee to neighboring Egypt in a grim 
reminder of the anticipated retaliation. The Israeli Military additionally 
reported the invention of the our bodies of 1500 Hamas militants inside Israeli territory
. The continuing battle has claimed roughly 1,600 lives, with 900 casualties in 
Israel and practically 700 in Gaza. In the meantime, Prime Minister Narendra Modi took to 
extradite and mentioned that he spoke with Israeli Prime Minister Benjamin Netanyahu, 
assuring him that India stands firmly with Israel and is troublesome to protect. He additionally 
mentioned that India strongly and unequivocally condemned terrorism in all its types and 
manifestations. Chief Justice of India, D.Y. Chandrachud, mentioned at the moment that the 
The Supreme Court docket's position is to not micromanage points that come up throughout the nation. He 
harassed that native issues are finest left to the jurisdiction of the respective Excessive 
Court docket. He was presiding over a three-judge bench. The CGI Maynthese remarks whereas 
listening to a matter associated to captive elephants and mentioned, Court docket, we now have to  
have a broader useful understanding as a court docket. What's the position of the Supreme Court docket within the 
nation? To not cope with micromanagement of points that come up all around the nation. 
Two militants linked to the fear outfit Lashkare Thaibarvak have been killed in an encounter 
with safety forces within the Soviet district of Jaman Kashmir at the moment. The encounter broke 
out when the safety forces launched an anti-militancy operation within the Al-Sipura space, 
performing on intelligence relating to the presence of militants. The illness militants 
have been recognized as Morifat Magbul and Jazim Farok. Chintanubhadhai was sentenced 
to life imprisonment at the moment for his involvement in abetting and conspiring to homicide 
his estranged spouse, Hema Obadhai, in 2015. The Classes Court docket additionally imposed life 
imprisonment sentences on three co-accused, specifically Vijay Rajvahar, Pradeep Rajvahar, 
and Shivkuma Rajvahar. On Saturday, the prosecution sought the dying penalty for 
all 4 people. The ICC Males's World Cup 2023 has two matches slated for at the moment. 
Pakistanis going through Shilankain Hagradwadwal Bangladesh is taking over England in Haramshalla. 
In different World Cup information, New Zealand beats the Dutch to win their second sport in a row at
the competitors. The beforehand triumphed over defending Champions England within the 
event opener, inserting them on the prime of the factors desk. This was a catch-up on
three issues by the Indian Specific.

4- Generate a Abstract utilizing Maker Suite

Now, we use this podcast abstract as coaching enter, put together its pattern mannequin response independently, and use the opposite as take a look at enter. We go to this site and generate a bullet abstract.

 Maker Suite to generate Bulleted Text Summary Source
Maker Suite to generate Bulleted Textual content Abstract Supply

We regulate the mannequin parameter settings to generate summaries.

5: Generate Code utilizing Maker Suite

Generate the code utilizing the API Key of Palm API. We’ve got generated our personal API key from the site.

"""
On the command line, solely must run as soon as to put in the package deal by way of pip:

$ pip set up google-generativeai
"""

import google.generativeai as palm
palm.configure(api_key="API_KEY")

defaults = {
  'mannequin': 'fashions/text-bison-001',
  'temperature': 1,
  'candidate_count': 1,
  'top_k': 40,
  'top_p': 0.95,
  'max_output_tokens': 1024,
  'stop_sequences': [],
  'safety_settings': [{"category":"HARM_CATEGORY_DEROGATORY","threshold":4},{"category":"HARM_CATEGORY_TOXICITY","threshold":4},{"category":"HARM_CATEGORY_VIOLENCE","threshold":4},{"category":"HARM_CATEGORY_SEXUAL","threshold":4},{"category":"HARM_CATEGORY_MEDICAL","threshold":4},{"category":"HARM_CATEGORY_DANGEROUS","threshold":4}],
}
Sentence = "That is the make amends for three issues for the Indian Specific and I'm Flora Swain. It is the Tenth of October and listed here are the headlines. 4 days after the Hamas attacked the Israeli military mentioned at the moment that they've regained management of the Gaza border. It warned the inhabitants there to flee to neighboring Egypt whereas they will in a grim reminder of the retaliation that's anticipated to comply with. The Israeli military additionally reported the invention of the our bodies of 1500 Hamas militants inside Israeli territory. The continuing battle has claimed roughly 1600 lives with 900 casualties in Israel and practically 700 in Gaza. In the meantime, Prime Minister Narendra Modi took to X at the moment and mentioned that he spoke with Israeli Prime Minister Benjamin Netanyahu assuring him that India stands firmly with Israel and this troublesome artwork. He additionally mentioned that India strongly and unequivocally condemns terrorism in all its types and manifestations. Chief Justice of India D.Y. Chandrachud mentioned at the moment that the Supreme Court docket's position is to not micromanage points that come up throughout the nation. He harassed that native issues are finest left to the jurisdiction of the respective excessive courts. Prziding over a three-judge bench the CGI made these remarks whereas listening to a matter associated to captive elephants and mentioned, quote, we now have to as a court docket have broader useful understanding. What's the position of the Supreme Court docket within the nation? To not cope with micromanagement of points which come up all around the nation. Unquote. Two militants linked to the fear outfit Lashkaretayabah have been killed in an encounter with safety forces within the Soapian District of Jammun Kashmir at the moment. The encounter broke out after safety forces launched an anti-militancy operation within the Alsepura space performing on intelligence relating to the presence of militants. The deceased militants have been recognized as Mureffat Maghbul and Jasm Farukh. Chintanubhadi Haya was sentenced to life imprisonment at the moment for his involvement in a betting and conspiring to homicide his estranged spouse, Hema Upadhyay in 2015. The Classes Court docket additionally imposed life imprisonment sentences on three co-accused, specifically Vijay Rajpur, Pradeep Rajpur and Shivkumar Rajpur. On Saturday the prosecution have sought the dying penalty for all 4 people. The ICC males's World Cup 2023 has two matches slated for at the moment. Pakistan is going through Sri Lanka in Hyderabad whereas Bangladesh is taking over England in Haramshalla. In different World Cup information New Zealand beat the Dutch to win their second sport in a row on the competitors. They beforehand triumphed over defending champions England within the event opener, inserting them on the prime of the factors desk. This was the Catchup on Three Issues by the Indian Specific."
immediate = f"""Remodel a sentence right into a bulleted record.
Sentence:  That is the make amends for three issues for the Indian Specific and I am Flora Swain. It is the eleventh of October and listed here are the headlines. Days after the Hamas assault, the Israeli navy mentioned that it's finishing up strikes in Lebanon after an anti-tank guided missile was fired from the neighboring nation at certainly one of its posts close to the blue line. As for experiences, there was an enormous buildup of troops alongside the Israel Gaza border because the nation ready for a floor invasion within the coming days. Greater than 2,000 individuals have misplaced their lives thus far within the battle which began on Saturday. The Supreme Court docket at the moment took a severe exception to AIM's authorities searching for clarification of its order from the ninth of October which allowed the abortion of a 26-week-old fetus. The AIM's court docket cited some recent issues and requested why the issues weren't conveyed to the court docket earlier when it had sought a medical opinion on the ladies's request searching for permission for medical termination of being pregnant. A particular bench of justices, B.V. Nagaratma and Hema Kohli additionally pulled up the middle for approaching Chief Justice of India D.Y. Chandrachud's bench on Tuesday towards its order. Samajwadi social gathering president Akhilesh Yadav was denied permission to go contained in the J.Ok.N.R. and Worldwide Middle to supply floral tribute to Freedom Fighter J.Ok.N.R. on his delivery anniversary. Officers cited safety causes for not permitting the previous UPCM into the middle at the moment. After he was denied permission, Akhilesh reads the constructing and jumped the middle's boundary wall together with different SP leaders and staff. Ok.H.N.R. ensued on the spot whereas the police tried to cease them from coming into the premises. The poster woman of Kerala's grownup literacy program, Ok.R.Y.H.A. Amma, died on the age of 101 at her home in Alapurha at the moment. In 2018, she made headlines by turning into the highest scorer within the state literacy mission's flagship grownup literacy program Akshana Laksham. On the age of 96, Ok.R.Y.H.A. scored 98 out of 100 marks within the examination that examined writing and mathematical expertise. CM Pinery Vijayan in his condolence message mentioned Ok.R.Y.A. was Kerala's delight and a mannequin the person. Indian Afghanistan are battling one another within the ninth match of the ICC Cricket World Cup 2023 on the Orange JT Stadium in New Delhi at the moment. India added to your favourites for this match having convincingly gained their opening match towards Australia. However, Afghanistan misplaced their opening match to Bangladesh however they are going to be trying to carry out higher towards India. This was the Catch Up on Three Issues by the Indian Specific.
Bulleted: * Israeli navy carried out strikes in Lebanon after an anti-tank guided missile was fired from the neighboring nation.
 * SC took a severe exception to AIIMS authorities searching for clarification of its order on abortion of a 26-week-old fetus.
* Akhilesh Yadav was denied permission to go contained in the J.Ok.N.R. and Worldwide Middle to supply floral tribute to Freedom Fighter J.Ok.N.R. * Poster woman of Kerala's grownup literacy program, Ok.R.Y.H.A. Amma, died on the age of 101.
* India Afghanistan are battling one another within the ninth match of the ICC Cricket World Cup 2023 on the Orange JT Stadium in New Delhi at the moment.
Sentence: {Sentence}
Bulleted:"""

response = palm.generate_text(
  **defaults,
  immediate=immediate
)
print(response.outcome)

Last Output

Under is the ensuing output of our podcast. Many of the content material is correct aside from spelling and names of correct nouns, akin to Dharamshala and Lashkar-e-Taiba, and so forth.

 LLM Output of Bulleted List . Source
LLM Output of Bulleted Listing
  • The Israeli military regained management of the Gaza border and warned the inhabitants to flee to neighboring Egypt.
  • PM Narendra Modi spoke with Israeli PM Benjamin Netanyahu. India strongly condemns terrorism.
  • CJI DY Chandrachud mentioned SC’s position is to not micromanage points. Native issues are finest left to HC.
  • 2 militants linked to Lashkarteayabah have been killed in an encounter with safety forces in J&Ok.
  • Chintanubhadi Haya was sentenced to life imprisonment for his involvement in betting and conspiring to homicide his estranged spouse, Hema Upadhyay in 2015.
  • ICC males’s World Cup 2023 has two matches slated for at the moment.
  • Pakistan faces Sri Lanka in Hyderabad, whereas Bangladesh is taking over England in Hharamshalla.
  • New Zealand beat the Dutch to win their second sport within the competitors.

Conclusion

Highly effective instruments, LLMs can mix with different instruments to generate fast prototypes, enabling us to check and experiment with numerous LLM use instances. Since LLM is a really new know-how, its potential use and implementation require loads of back-and-forth experiments. That is the place instruments like Maker Suite empower knowledge science and analytics professionals to shortly convey their concepts into code utilizing minimal effort and time and specializing in fine-tuning and enhancing the information and different important parts.

Key Takeaways

  • We discovered in regards to the fundamental idea of PALM 2 API and its options.
  • Additionally understood about numerous mannequin parameter settings and how you can optimize them for a selected desired immediate output
  • We noticed the completely different facets of the Google Maker Suite device and utilized it to generate our LLM code.
  • We used Whisper API and Palm API to generate bulleted summaries of related and correct podcasts.

Steadily Requested Questions

Q1.  Is PaLM API free to make use of?

A. Sure, the PALM API is open to the general public at no cost use however manufacturing isn’t free.

Q2. What are the completely different fashions obtainable in Maker Suite?

A. For now, Maker Suite solely permits one mannequin, Textual content-Bison.

Q3. Which is healthier, GPT-4 or PaLM API 2?

A. GPT-4 has round 1 trillion parameters in comparison with 540 B parameters of PaLM. Additionally, it helps multimodal options akin to photographs as enter and output. So GPT-4 gives extra options and providers.

This autumn. Can we get responses in different languages?

A. PaLM helps responses in different languages however is on the market solely in a single mannequin, which isn’t open for public evaluate and is a paid service.

Q5. What are the Security Settings in PaLM API?

A. The protection settings in Palm API stop any violent, derogatory, Medical, or Sexual content material within the mannequin responses. In our podcast abstract, we block violent content material, however as soon as we modify the settings and cut back the filter, we will get correct output.

The media proven on this article will not be owned by Analytics Vidhya and is used on the Writer’s discretion.

Related Articles

Leave a Reply

Your email address will not be published. Required fields are marked *

Back to top button