What Is Deep Learning?
Deep learning is a type of machine learning that can process a wider range of data resources (e.g., images and text), requires less human intervention, and can produce more accurate results than traditional machine learning. Deep learning uses neural networks. Moreover, they mimic the way neurons interact in the human brain.
Let’s explore three types of artificial neural networks (ANNs):
Feed-forward neural networks (FFNNs)
In this foundational neural network, information moves in one direction – forward – from the input layer, through hidden layers, and exiting the output layer. No links exist in the network. FFNNs are the base point for object detection as seen in the Google Photos app.
Convolutional neural networks (CNNs)
This type of FFNN is modeled after an animal visual cortex, the part of the brain that processes images. CNNS are distinguished from other neural networks by their superior performance with speech, audio signal inputs, and image. For instance, say a CNN receives an image of the letter “A.” It then processes “A” as a collection of pixels. In the hidden layers, the CNN identifies unique features such as the individual lines that make up “A.” The CNN can now classify a different image as the letter “A” if it finds that the image contains the previously identified unique feature makeup of the letter.
Recurrent neural networks (RNNs)
RNNs are artificial neural networks whose connections include loops, this means the model both moves data forward and loops it backward to re-run through the previous layers. These deep learning algorithms are trained to process and convert sequential data. RNNS can form a much deeper understanding of a sequence due to its internal memory. RNNs are used in temporal problems such as speech recognition and language translation. Think Siri and Google Translate.
What Is Supervised Learning? What Is Unsupervised Learning?
Supervised learning is a subcategory of machine learning and AI. It is defined by the use of labeled datasets to train algorithms to predict outcomes correctly.
Supervised learning can be divided into two categories, classification and regression:
- Classification is a method of assigning data to the class they most likely belong. For example, in email filtering, a machine learning algorithm is trained with a labeled dataset that contains spam and legitimate emails. The algorithm extracts information such as the sender’s information and the message body. It learns from the labeled dataset to identify patterns and relationships between these features and labels (spam or legitimate). Once trained, the algorithm can use extracted features to predict the label of new emails. If an email is predicted to be spam, it can be automatically filtered into a spam folder.
- Regression predicts a numerical value based on previously observed data. Some examples include house prices and stock price predictions.
Unsupervised learning finds patterns on its own using data that is unlabeled, making it difficult to evaluate its accuracy. However, it can provide insights into the underlying structure of a dataset (unlike supervised learning). One of the most common unsupervised learning approaches involves clustering. Clustering is the process of arranging a group of objects in a way that the objects in the same group (e.g. the cluster) are more similar to each other than to other groups. Netflix uses clustering to recommend movies by identifying which new movies are related to movies that the user has already watched.
Machine Learning in Cybersecurity and Apps
ML models in cybersecurity continuously learn, offering products such as Antivirus software a special edge to detect and block malware. Also, machine classification algorithms are used to label events such as fraud and classify phishing attacks.
In the app world LinkedIn is powered by ML to filter newsfeed items, make employment recommendations, and suggest connections. Spotify engages machine learning models to generate song recommendations.
What Is Generative AI?
Generative AI (GenAI) refers to algorithms that can be used to create new content, including text, images, music, audio, and videos. GenAI works by using large language models (LLMs). An LLM is a specialized type of AI that has been trained on a vast amount of data to understand both existing content and generate original content.
What Is ChatGPT and How Does It Work?
In November 2022, OpenAI released ChatGPT, an AI chatbot that became an overnight sensation. GPT stands for Generative Pre-trained Transformer technology. ChatGPT is a Natural Language Processing (NLP) model. NLP helps identify meaning, intention, and sentiment in textual content. It leverages computational linguistics that conceptualizes human languages through rules and algorithms. Back to ChatGPT, users can ask the chatbot questions, similar to how they would in a search engine. Alternatively, they can prompt the bot to generate AI-written content or reformat existing text.
What Is DALL-E and How Does It Work?
DALL-E (also developed by OpenAI) is an AI image generator. It uses GenAI to create original images from scratch based on text prompts. For example, if a user inputs the text “an avocado chair with a red colored monkey,” DALL-E will generate a new image of that imaginary object. The more detailed the description, the more detailed the image will be.
DALLE has a specially designed neural network architecture:
- Large dataset. DALL-E was trained on hundreds of millions of image-text pairs, allowing it to connect visual concepts with textual content.
- Hierarchical structure. The network contains top high-level concept layers that understand broad categories (for example birds); and lower layer that recognize fine details (for example beak shape, color, and placement on the face).
- Text encoding. Using the knowledge described, DALL-E can translate written words into mathematical representations of the words. For instance, if a user types “Seagull-tiger,” it understands and mixes up different variations of both animals combined. The translation allows text inputs to produce the visual outputs – a creative image.
GenAI Current State
GPT-4 (the latest model addition) is a multimodal large language modal, meaning it can respond to text and images. For instance, if a user provides GPT-4 with a photo of ingredients in their refrigerator, it will try to come up with recipes.
DALL-E has evolved to DALL-E 3 and can understand more nuance and detail than its predecessors. The model follows complex prompts with better accuracy generating more coherent images. DALL-E 3 also integrates with ChatGPT.
Competitors such as Google Bard, Microsoft Copilot, Anthropic Claude have entered the arena and operate on a comparable level of quality, functionality, and usability. GenAI tools and platforms range from free to paid subscriptions with rapid advancements emerging on an unprecedented scale.
In 2023, McKinsey research estimated that GenAI features could contribute up to $4.4 trillion to the global economy annually. GenAI continues to yield sector impact from education and law to the arts and technology. Enterprises have started exploring its potential with a focus on specific use cases tailored to a particular industry and its functions.
We must tread along this exciting new terrain with curiosity and caution. AI poses significant risks, such as bias, factually incorrect content generation, legal issues related to plagiarism and copyright infringement, privacy concerns, harmful content, and unknown risks to name a few.
Next time on AI 101 we will deep dive into GenAI limitations and responsible AI!