Powered by RND
PodcastsTecnologiaOracle University Podcast
Ouça Oracle University Podcast na aplicação
Ouça Oracle University Podcast na aplicação
(1 200)(249 324)
Guardar rádio
Despertar
Sleeptimer

Oracle University Podcast

Podcast Oracle University Podcast
Oracle Corporation
Oracle University Podcast delivers convenient, foundational training on popular Oracle technologies such as Oracle Cloud Infrastructure, Java, Autonomous Databa...
Veja mais

Episódios Disponíveis

5 de 93
  • Oracle AI in Fusion Cloud Human Capital Management
    In this special episode of the Oracle University Podcast, Lois Houston and Nikita Abraham, along with Principal HCM Instructor Jeff Schuster, delve into the intersection of HCM and AI, exploring the practical applications and implications of this technology in human resources. Jeff shares his insights on bias and fairness, the importance of human involvement, and the need for explainability and transparency in AI systems. The discussion also covers the various AI features embedded in HCM and their impact on talent acquisition, performance management, and succession planning.  Oracle AI in Fusion Cloud Human Capital Management: https://mylearn.oracle.com/ou/learning-path/oracle-ai-in-fusion-cloud-human-capital-management-hcm/136722 Oracle Fusion Cloud HCM: Dynamic Skills: https://mylearn.oracle.com/ou/course/oracle-fusion-cloud-hcm-dynamic-skills/116654/ Oracle University Learning Community: https://education.oracle.com/ou-community LinkedIn: https://www.linkedin.com/showcase/oracle-university/ Twitter: https://twitter.com/Oracle_Edu Special thanks to Arijit Ghosh, David Wright, and the OU Studio Team for helping us create this episode. -------------------------------------------------------- 00:00 Welcome to the Oracle University Podcast, the first stop on your cloud journey. During this series of informative podcasts, we’ll bring you foundational training on the most popular Oracle technologies. Let’s get started!  00:26 Lois: Hello and welcome to the Oracle University Podcast! I’m Lois Houston, Director of Innovation Programs here at Oracle University, and with me, is Nikita Abraham, Team Lead of Editorial Services. Nikita: Hi everyone! Last week’s conversation was all about Oracle Database 23ai backup and recovery, where we dove into instance recovery and effective recovery strategies. Today’s episode is a really special one, isn’t it, Lois? 00:53 Lois: It is, indeed, Niki. Of course, all of our AI episodes are special. But today, we have our friend and colleague Jeff Schuster with us. I think our listeners are really going to enjoy what Jeff has to share with us. Nikita: Yeah definitely! Jeff is a Principal HCM Instructor at Oracle University. He recently put together this really fantastic course on MyLearn, all about the intersection of HCM and AI, and that’s what we want to pick his brain about today. Hi Jeff! We’re so excited to have you here.  01:22 Jeff: Hey Niki! Hi Lois! I feel special already. Thanks you guys so much for having me. Nikita: You’ve had a couple of busy months, haven’t you?  01:29 Jeff: I have! It’s been a busy couple of months with live classes. I try and do one on AI in HCM at least once a month or so so that we can keep up with the latest/greatest stuff in that area. And I also got to spend a few days at Cloud World teaching a few live classes (about artificial intelligence in HCM, as a matter of fact) and meeting our customers and partners. So yeah, absolutely great week. A good time was had by me.  01:55 Lois: I’m sure. Cloud World is such a great experience. And just to clarify, do you think our customers and partners also had a good time, Jeff? It wasn’t just you, right? Jeff: Haha! I don’t think it was just me, Lois. But, you know, HCM is always a big deal, and now with all the embedded AI functionality, it really wasn’t hard to find people who wanted to spend a little extra time talking about AI in the context of our HCM apps. So, there are more than 30 separate AI-powered features in HCM. AI features for candidates to find the right jobs; for hiring managers to find the right candidates; skills, talent, performance management, succession planning— all of it is there and it really covers everything across the Attract/Grow/Keep buckets of the things that HR professionals do for a living. So, anyway, yeah, lots to talk about with a lot of people! There’s the functional part that people want to know about—what are these features and how do they work? But obviously, AI carries with it all this cultural significance these days. There’s so much uncertainty that comes from this pace of development in that area. So in fact, my Cloud World talk always starts with this really silly intro that we put in place just to knock down that anxiety and get to the more practical, functional stuff. 03:11 Nikita: Ok, we’re going to need to discuss the functional stuff, but I feel like we’re getting a raw deal if we don’t also get that silly intro. Lois: She makes a really good point.  Jeff: Hahaha! Alright, fair enough. Ok, but you guys are gonna have to imagine I’ve got a microphone and a big room and a lot of echo. AI is everywhere. In your home. In your office. In your homie’s home office. 03:39 Lois: I feel like I just watched the intro of a sci-fi movie. Jeff: Yeah. I’m not sure it’s one I’d watch, but I think more importantly it’s a good way to get into discussing some of the overarching things we need to know about AI and Oracle’s approach before we dive into the specific features, so you know, those features will make more sense when we get there?  03:59  Nikita: What are these “overarching” things?  Jeff: Well, the things we work on anytime we’re touching AI at Oracle. So, you know, it starts with things like Bias and Fairness. We usually end up in a pretty great conversation about things like how we avoid bias on the front end by making sure we don’t ingest things like bias-generating content, which is to say data that doesn’t necessarily represent bias by itself, but could be misused. And that pretty naturally leads us into a talk about guardrails. Nikita: Guardrails? Jeff:  Yeah, you can think of those as checkpoints. So, we’ve got rules about ingestion and bias. And if we check the output coming out of the LLM to ensure it complied with the bias and fairness rules, that’s a guardrail. So, we do that. And we do it again on the apps side. And so that’s to say, even though it’s already been checked on the AI side, before we bring the output into the HCM app, it’s checked again. So another guardrail.  04:58 Lois: How effective is that? The guardrails, and not taking in data that’s flagged as bias-generating? Jeff: Well, I’ll say this: It’s both surprisingly good, and also nowhere near good enough.  Lois: Ok, that’s as clear as mud. You want to elaborate on that?  Jeff: Haha! I think all it means is that approach does a great job, but our second point in the whole “standards” discussion is about the significance of having a human in the loop. Sometimes more than one, but the point here is that, particularly in HCM, where we’re handling some really important and sensitive data, and we’re introducing really powerful technology, the H in HCM gets even more important. So, throughout the HCM AI course, we talk about opportunities to have a human in the loop. And it’s not just for reviewing things. It’s about having the AI make suggestions, and not decisions, for example. And that’s something we always have a human in the loop for all the time. In fact, when I started teaching AI for HCM, I always said that I like to think of it is as a great big brain, without any hands.  06:00 Nikita: So, we’re not talking about replacing humans in HCM with AI.                                                                         Jeff: No, but we’re definitely talking about changing what the humans do and why it’s more important than ever what the humans do. So, think of it this way, we can have our embedded AI generate this amazing content, or create really useful predictions, whatever it is that we need. We can use whatever tools we want to get there, but we can still expect people to ask us, “Where did that come from?” or “Does this account for [whatever]?”. So we still have to be able to answer that. So that’s another thing we talk about as kind of an overarching important concept: Explainability and Transparency. 06:41 Nikita: I’m assuming that’s the part about showing our work, right? Explaining what's being considered, how it's being processed, and what it is that you're getting back. Jeff: That’s exactly it. So we like to have that discussion up front, even before we get to things like Gen and Non-Gen AI, because it’s great context to have in mind when you start thinking about the technology. Whenever we’re looking at the tech or the features, we’re always thinking about whether people are appropriately involved, and whether people can understand the AI product as well as they need to.  07:11 Lois: You mentioned Gen and Non-Gen AI. I’ve also heard people use the term “Classic AI.” And lately, a lot more about RAG and Agents. When you're teaching the course, does everybody manage to keep all the terminology straight? Jeff: Yeah, people usually do a great job with this. I think the trick is, you have to know that you need to know it, if that makes sense.  Lois: I think so, but why don’t you spell it out for us. Jeff: Well, the temptation is sometimes to leave that stuff to the implementers or product developers, who we know need to have a deep understanding of all of that. But I think what we’ve learned is, especially because of all the functional implications, practitioners, product owners, everybody needs to know it too. If for no other reason so they can have more productive conversations with their implementers. You need to know that Classic or Non-Generative AI leverages machine learning, and that that’s all you need in order to do some incredibly powerful things like predictions and matching. So in HCM, we’re talking about things like predicting time to hire, identifying suggested candidates for job openings, finding candidates similar to ones you already like, suggesting career paths for employees, and finding recommended successors. All really powerful matching stuff. And all of that stuff uses machine learning and it’s certainly AI, but none of that uses Generative AI to do that because it doesn’t need to. 08:38 Nikita: So how does that fit in with all the hype we’ve been hearing for a long time now about Gen AI and how it’s such a transformative technology that’s going to be more impactful than anything else? Jeff: Yeah, and that can be true too. And this is what we really lean into when we do the AI in HCM course live. It’s much more of a “right AI for the right job” kind of proposition. Lois: So, just like you wouldn’t use a shovel to mix a cake. Use the right tool for the job. I think I’ve got it. So, the Classic AI is what’s driving those kinds of features in HCM? The matching and recommendations?  Jeff: Exactly right. And where we need generative content, that’s where we add on the large language model capability. With LLMs, we get the ability to do natural language processing. So it makes sense that that’s the technology we’d use for tasks like “write me a job description” or “write me performance development tips for my employee”. 09:33 Nikita: Ok, so how does that fit in with what Lois was asking about RAG and Agents? Is that something people care about, or need to? Jeff: I think it’s easiest to think about those as the “what’s next” pieces, at least as it relates to the embedded AI. They kind of deal with the inherent limitations of Gen and Non-Gen components. So, RAG, for example - I know you guys know, but your listeners might not...so what’s RAG stand for? Lois & Nikita: Retrieval. Augmented. Generation. Jeff: Hahaha! Exactly. Obviously. But I think everything an HCM person needs to know about that is in the name. So for me, it’s easiest to read that one backwards. Retrieval Augmented Generation. Well, the Generation just means it’s more generative AI. Augmented means it’s supplementing the existing AI. And Retrieval just tells you that that’s how it’s doing it. It’s going out and fetching something it didn’t already have in order to complete the operation. 10:31 Lois: And this helps with those limitations you mentioned? Nikita: Yeah, and what are they anyway?  Jeff: I think an example most people are familiar with is that large language models are trained on this huge set of information. To a certain point. So that model is trained right up to the point where it stopped getting trained. So if you’re talking about interacting with ChatGPT, as an example, it’ll blow your doors off right up until you get to about October of 2023 and then, it just hasn’t been trained on things after that. So, if you wanted to have a conversation about something that happened after that, it would need to go out and retrieve the information that it needed. For us in HCM, what that means is taking the large language model that you get with Oracle, and using retrieval to augment the AI generation for the things that the large language model wouldn’t have had.  11:22 Nikita: So, things that happened after the model was trained? Company-specific data? What kind of augmenting are you talking about? Jeff: It’s all of that. All those things happen and it’s anything that might be useful, but it’s outside the LLM’s existing scope. So, let’s do an example. Let’s say you and Lois are in the market to hire someone. You’re looking for a Junior Podcast Assistant. We’d like the AI in HCM to help, and in order to do that, it would be great if it could not just generate a generic job description for the posting, but it could really make it specific to Oracle. Even better, to Oracle University.  So, you’d need the AI to know a few more things in order to make that happen. If it knows the job level, and the department, and the organization—already the job posting description gets a lot better. So what other things do you think it might need to know? 12:13 Lois: Umm I’m thinking…does it need to account for our previous hiring decisions? Can it inform that at all? Jeff: Yes! That’s actually a key one. If the AI is aware not only of all the vacancies and all of the transactional stuff that goes along with it (like you know who posted it, what’s its metadata, what business group it was in, and all that stuff)...but it also knows who we hired, that’s huge. So if we put all that together, we can start doing the really cool stuff—like suggesting candidates based not only on their apparent match on skills and qualifications, but also based on folks that we’ve hired for similar positions. We know how long it took to make those hires from requisition open to the employee’s first start date. So we can also do things like predicting time to hire for each vacancy we have with a lot more accuracy. So now all of a sudden, we’re not just doing recruiting, but we have a system that accounts for “how we do it around here,” if that makes any sense.  But the point is, it’s the augmented data, it’s that kind of training that we do throughout ingestion, going out to other sources for newer or better information, whatever it is we need. The ability to include it alongside everything that’s already in the LLM, that’s a huge deal.  13:31  Nikita: Ok, so I think the only one we didn’t get to was Agents. Jeff: Yeah, so this one is maybe a little less relevant in HCM—for now anyway. But it’s something to keep an eye on. Because remember earlier when I described our AI as having a great big brain but no hands?  Lois: Yeah... Jeff: Well, agents are a way of giving it hands. At least for a very well-defined, limited set of purposes. So routine and repetitive tasks. And for obvious reasons, in the HCM space, that causes some concerns. You don’t want, for example, your AI moving people forward in the recruiting process or changing their status to “not considered” all by itself. So going forward, this is going to be a balancing act. When we ask the same thing of the AI over and over again, there comes a point where it makes sense to kind of “save” that ask. When, for example, we get the “compare a candidate profile to a job vacancy” results and we got it working just right, we can create an agent. And just that one AI call that specializes in getting that analysis right. It does the analysis, it hands it back to the LLM, and when the human has had what they need to make sure they get what they need to make a decision out of it, you’ve got automation on one hand and human hands on the other...hand. 14:56 Have you mastered the basics of AI? Are you ready to take your skills to the next level? Unlock the potential of advanced AI with our OCI Generative AI Professional course and certification that covers topics like large language models, the OCI Generative AI Service, and building Q&A chatbots for real-world applications. Head over to mylearn.oracle.com to find out more. 15:26 Nikita: Welcome back! Jeff, you’ve mentioned the “Time to Hire” feature a few times? Is that a favorite with people who take your classes? Jeff: The recruiting folks definitely seem to enjoy it, but I think it’s just a great example for a couple of reasons. First, it’s really powerful non-generative AI. So it helps emphasize the point around the right AI for the right job. And if we’re talking about things in chronological order, it’s something that shows up really early in the hire-to-retire cycle. And, you know, just between us learning nerds, I like to use Time to Hire as an early example because it gets folks in the habit of working through some use cases. You don’t really know if a feature is going to get you what you need until you’ve done some of that. So, for example, if I tell you that Time to Hire produces an estimated number of days to your first hire. And you’re still Lois, and you’re still Niki, and you’re hiring for a Junior Podcast Assistant. So why do you care about time to hire? And I’m asking you for real—What would you do with that prediction if you had it?  16:29 Nikita: I guess I’d know how long it is before I can expect help to arrive, and I could plan my work accordingly. Jeff: Absolutely. What else. What could you do with a prediction for Time to Hire? Lois: Think about coverage? Jeff: Yeah! Exactly the word I was looking for. Say more about that.  Lois: Well, if I know it’s gonna be three months before our new assistant starts, I might be able to plan for some temporary coverage for that work. But if I had a prediction that said it’s only going to be two weeks before a new hire could start, it probably wouldn’t be worth arranging temporary coverage. Niki can hold things down for a couple of weeks. Jeff: See, I’m positive she could! That’s absolutely perfect! And I think that’s all you really need to have in terms of prerequisites to understand any of the AI features in HCM. When you know what you might want to do with it, like predicting the need for temp cover, and you’ve got everything we talked about in the foundation part of the course—the Gen and the Classic, all that stuff, you can look at a feature like Time to Hire and then you can probably pick that up in 30 seconds. 17:29 Nikita: Can we try it? Jeff: Sure! I mean, you know, we’re not looking at screens for this conversation, but we can absolutely try it. You’re a recruiter. If I tell you that Time to Hire is a feature that you run into on the job requisition and it shows you just a few editable fields, and then of course, the prediction of the number of days to hire—tell me how you think that feature is going to work when you get there. Lois: So, what are the fields? And does it matter? Jeff: Probably not really, but of course you can ask. So, let me tell you. Ready? The fields—they are these. Requisition Title, Location, and Education Level.  Nikita: Ok, well, I have to assume that as I change those things… like from a Junior Podcast Assistant to a Senior Podcast Assistant, or change the location from Redwood Shores to Detroit, or change the required education, the time to hire is going to change, right?  Jeff: 100%, exactly. And it does it in real time as you make those changes to those values. So when you pick a new location, you immediately get a new number of days, so it really is a useful tool. But how does it work? Well, we know it’s using a few fields from the job requisition, but that’s not enough. Besides those fields, what else would you need in order to make this prediction work? 18:43 Lois: The part where it translates to a number of days. So, this is based on our historic hiring data? How long it took us to hire a podcast assistant the last time? Jeff: Yep! And now you have everything you need. We call that “historic data from our company” bit “ingestion,” by the way. And there’s always a really interesting discussion around that when it comes up in the course. But it’s the process we use to bring in the HCM data to the AI so it can be considered or predictions exactly like this. Lois: So it’s the HCM data making the AI smarter and more powerful. Nikita: And tailored. Jeff: Exactly, it’s all of that. And obviously, the HCM is better because we’ve given it the AI. But the AI is also better because it has the HCM in it. But look, I was able to give you a quick description of Time to Hire, and you were able to tell me what it does, which data it uses, and how it works in just a few seconds. So, that’s kind of the goal when we teach this stuff. It’s getting everybody ready to be productive from moment #1 because what is it and how does it work stuff is already out of the way, you know?  19:52 Lois: I do know! Nikita: Can we try it with another one? Jeff: Sure! How about we do...Suggested Candidates. Lois: And you’re going to tell us what we get on the screen, and we have to tell you how it works, right? Jeff: Yeah, yeah, exactly. Ok—Suggested Candidates. You’re a recruiter or a hiring manager. You guys are still looking for your Junior Podcast Assistant. On the requisition, you’ve got a section called Suggested Candidates. And you see the candidate’s name and some scores. Those scores are for profile match, skills match, experience match. And there’s also an overall match score, and the highest rated people you notice are sorted to the top of the list. So, you with me so far?  Lois: Yes! Jeff: So you already know that it’s suggesting candidates. But if you care about explainability and transparency like we talked about at the start, then you also care about where these suggested candidates came from. So let’s see if we can make progress against that. Let’s think about those match scores. What would you need in order to come up with match scores like that? 20:54 Nikita: Tell me if I’m oversimplifying this, but everything about the job on the requisition, and everything about the candidate? Their skills and experience? Jeff: Yeah, that’s actually simplified pretty perfectly. So in HCM, the candidate profile has their skills and experience, and the req profile has the req requirements.  Lois: So we’re comparing the elements of the job profile and the person/candidate profile. And they’re weighted, I assume? Jeff: That’s exactly how it works. See, 30 seconds and you guys are nailing these! In fairness, when we discuss these things in the course, we go into more detail. And I think it’s helpful for HCM practitioners to know which data from the person and the job profiles is being considered (and sometimes just as important, which is not being considered). And don’t forget we’re also considering our ingested data. Our previously selected candidates. 21:45 Lois: Jeff, can I change the weighting? If I care more about skills than experience or education, can I adjust the weighting and have it re-sort the candidates? Jeff: Super important question. So let me give you the answer first, which is “no.” But because it’s important, I want to tell you more. This is a discussion we have in the class around Oracle’s Embedded vs. Custom AI. And they’re both really important offerings. With Embedded, what we’re talking about are the features that come in HCM like any other feature. They might have some enablement steps like profile options, and there’s an activation panel. But essentially, that’s it. There’s no inspection panel for you to open up and start sticking your screwdriver in there and making changes. Believe it or not, that’s a big advantage with Embedded AI, if you ask me anyway.  Nikita: It’s an advantage to not be able to configure it? Jeff: In this context, I think you can say that it is. You know, we talk about the advantages about the baked-in, Embedded AI in this course, but one of the key things is that it’s pre-built and pre-tested. And the big one: that it’s ready to use on day one. But one little change in a prompt can have a pretty big butterfly effect across all of your results. So, Oracle provides the Embedded AI because we know it works because we’ve already tested it, and it’s, therefore, ready on day one. And I think that story maybe changes a little bit when you open up the inspection panel and bust out that screwdriver. Now you’re signing up to be a test pilot. And that’s just fundamentally different than “pre-built and ready on day one.” Not that it’s bad to want configuration. 23:24 Lois: That’s what the Custom AI path and OCI are about though, right? For when customers have hyper-specific needs outside of Oracle’s business processes within the apps, or for when that kind of tuning is really required. And your AI for HCM course—that focuses on the Embedded AI instead of Custom, yes? Jeff: That is exactly it, yes. Nikita: You said there are about 30 of these AI features across HCM. So, when you teach the course, do you go through all of them or are there favorites? Ones that people want to spend more time on so you focus on those? Jeff: The professional part of me wants to tell you that we do try to cover all of them, because that explainability and transparency business we talked about at the beginning. That’s for real, so I want our customers to have that for the whole scope.  24:12  Nikita: The professional part? What’s the other part?  Jeff: I guess that’s the part that says sure, we need to hit all of them. But some of them are just inherently more fun to work on. So, it’s usually the learners who drive that in the live classes when they get into something, that’s where we spend the most time. So, I have my favorites too. The learners have their favorites. And we spend time where it’s everybody’s favorite. Lois: Like where? Jeff: Ok, so one is far from the most complex one, but I think it’s really elegant in its simplicity. And it’s the Celebrate feature, where we do employee recognition. There’s an AI Assist available there. So when it’s time to recognize a colleague, you just need to enter the headline or the title, and the AI takes it from there and just writes up the recognition. 24:56 Lois: What about that makes it a good example, Jeff? You said it’s elegant. What do you mean?  Jeff: I think it’s a few things. So, start with the prompt. It’s just the one line—just the headline. And that’s your one input. So, type in the headline, get the recognition below. It’s a great demonstration of not just the simplicity, but the power we get out of that simplicity. I always ask it to recognize my employees for implementing AI features in Oracle HCM, just to see what it comes up with. When it tells the employee that they’re helping the company by automating routine tasks, bringing efficiency to the HR department, and then launches into specific examples of how AI features help in HCM, it really is pretty incredible. So, it’s a simple demo, but it explains a lot about how the Gen AI works. Lois: That’s really cool. 25:45 Nikita: So this one is generative AI. It’s using the large language model to create the recognition based on the prompt, which is basically just whatever you entered in the headline. But how does that help explain how Gen AI works in HCM? Jeff: Well, let’s take our simple prompt for example. There’s a lot happening behind the scenes. It’s taking our prompt, it’s doing its LLM thing, but before it’s done, it’s creating the results in a very specific way. An employee recognition reads really differently than a job description. So, I usually describe this as the hidden part of our prompt. The visible part is what we typed. But it needs to know things like our desired output format. Make sure to use the person’s name, summarize the benefits, and be sure to thank them for their contribution, that kind of stuff. So, those things are essentially hard-coded into the page. And that’s to say, this is another area where we don’t get an inspection panel that lets us go in and tweak the prompt.  26:42 Nikita: And that’s generally how generative AI works? Jeff: Pretty much. Wherever you see an AI Assist button in HCM, that’s more or less what’s going on. And so when you get to some of the other more complex features, it’s helpful to know that that is what’s going on.  Lois: Like where? Jeff: Well, it works that way for the About Me part of your employee profile, for goal creation in performance, and I think a really great example is in performance, where managers are providing the competency development tips. So the prompt there is a little more complex there because it involves the employee’s proficiency rating instead of free text. But still, pretty straightforward. You’re gonna click AI Assist and it’s gonna generate all the development tips for any specific competency listed for that employee. Good development tips. Five of them. Nicely formatted with bullet points. And these aren’t random words assembled by an AI. So they conform to best practices in the development of competencies. So, something is telling the LLM to give us results that are that good, in that particular way.  So, it’s just another good example of the work AI is doing while protected behind the inspection panel that doesn’t exist. So, the coding of that page, in combination with what the LLM generates and the agent that it uses, is what produces the result. That’s generally the approach. In the class, we always have a good time digging into what must be going on behind that inspection panel. Generally speaking, the better feel we have for what’s going on on these pages, the better we’re able to get the results we want, even without having that screwdriver out. 28:21  Nikita: So it’s time well-spent, looking at all the individual features? Jeff: I think so, especially if you’re anticipating really using any of them. So, the good news is, once you learn a few of them and how they work, and what they’re best at, you stop being surprised after a while. But there are always tips and tricks. And like we talked about at the top, explainability and transparency are absolutely key. So, as much as I’m not a fan of the phrase, I do think this is kind of a “knowledge is power” kind of situation. 28:51 Nikita: Sadly, we’re just about out of time for this episode.  Lois: That’s too bad, I was really enjoying this. Jeff, you were just talking about knowledge—where can we get more?  Jeff: Well, like you mentioned at the start, check out the AI in HCM course on MyLearn. It’s about an hour and a half, but it really is time well spent. And we get into detail on everything the three of us discussed here today, and then we have demoscussions of every feature where we show them and how they work and which data they’re using and a whole bunch more. So, there’s that. Plus, I hear the instructor is excellent. Lois: I can vouch for that! Jeff: Well, then you should definitely look into Dynamic Skills. Different instructor. But we have another course, and again I think about an hour and a half, but when you’re done with the AI course, I always feel like Dynamic Skills is where you really wanna go next to really flesh out all the Talent Management ideas that got stirred up while you were having a great time in the AI course.  And then finally, the live classes. It’s always really fun to take live questions while we talk about AI in HCM.   29:54 Nikita: Thanks, Jeff! This has been really interesting.  Lois: Yeah, thanks for being here, Jeff. We’ve loved having you on. Jeff: Thank you guys so much for having me. It’s been a pleasure.  Lois: If you want to learn more about what we discussed, go to the show notes for today’s episode. You’ll find links to the AI for Human Capital Management and Dynamic Skills courses that Jeff mentioned so you can check them out. You can also head over to mylearn.oracle.com to find the live sessions for MyLearn subscribers that Jeff conducts. Nikita: Join us next week as we kick off our “Best of 2024” season, where we’ll be revisiting some of our most popular episodes of the year. Until then, this is Nikita Abraham…  Lois: And Lois Houston, signing off!   30:35 That’s all for this episode of the Oracle University Podcast. If you enjoyed listening, please click Subscribe to get all the latest episodes. We’d also love it if you would take a moment to rate and review us on your podcast app. See you again on the next episode of the Oracle University Podcast.
    --------  
    31:04
  • Oracle Database 23ai: Backup and Recovery - Part 2
    Lois Houston and Nikita Abraham continue their deep dive into Oracle Database 23ai backup and recovery strategies with Senior Principal Database & MySQL Instructor Bill Millar.   Picking up from Part 1, they explore critical concepts such as instance recovery, checkpoint processes, and the role of redo log files. Bill shares insights into complete and incomplete recovery, flashback technologies, and lots more.   Oracle MyLearn: https://mylearn.oracle.com/ou/course/oracle-database-23ai-backup-and-recovery/141127/   Oracle University Learning Community: https://education.oracle.com/ou-community   LinkedIn: https://www.linkedin.com/showcase/oracle-university/   Twitter: https://twitter.com/Oracle_Edu   Special thanks to Arijit Ghosh, David Wright, Radhika Banka, and the OU Studio Team for helping us create this episode.   --------------------------------------------------------   Episode Transcript:   00:00 Welcome to the Oracle University Podcast, the first stop on your cloud journey. During this series of informative podcasts, we’ll bring you foundational training on the most popular Oracle technologies. Let’s get started! 
00:26 Nikita: Welcome back to the Oracle University Podcast! I’m Nikita Abraham, Team Lead of Editorial Services with Oracle University, and with me is Lois Houston, Director of Innovation Programs. Lois: Hi everyone! Last week, we had a fantastic chat with Bill Millar, our Senior Principal Database & MySQL Instructor. We dug into the basics of backup and recovery. We touched on everything from a DBA’s role in preventing data loss to handling different types of failures, and even some common mistakes that tend to pop up when managing a database. Nikita: Yeah, if you missed that episode, definitely go back and check it out. It’s packed with useful info, especially if you’re in charge of keeping databases safe. 01:10 Lois: Today, we’re picking up where we left off. We’re going to ask Bill about instance recovery and recovery strategies. Bill, can you kick things off by explaining what instance recovery is? Bill: You can understand instant recovery by becoming familiar with the checkpoint process, the redo log files, and the role of the log writer with the redo log files. Automatically instance or crash recovery. What is it doing? What are the phases of instance recovery? How we possibly can tune that instance recovery. We can use the mean time to recovery advisor that can help us determine how we might tune the instance recovery. 01:51 Nikita: OK, so let’s go through some of these concepts and procedures you mentioned. What is the checkpoint process responsible for exactly? Bill: The checkpoint process itself, it's responsible for updating the data file headers with checkpoint information. When a checkpoint is taken, it is going to write into the controlfiles. It tells the DB writer to write. DB writer writes to the data files, and the checkpoint is also annotated in the data files. So updating controlfiles with that checkpoint information also, controlfiles and database files. It signals that DB writer at full check points again, hey, it's time to write. So that way, it has the latest data written to the data files. The controlfile and datafiles, those are in sync with that. 02:40 Lois: Bill, what about the log writer process and the redo log files? Bill: With the log writer process and the redo log files, the redo log files record the changes to the database himself. It should be multiplexed. 02:53 Nikita: What do you mean by that? Bill: More than one redo log group. Now, the redo log groups, it is recommended that they should be multiplexed. Each group member should be on a different disk or in a different disk group if you're using ASM. 03:10 Nikita: And why is that, Bill? Bill: Because if I lose one, if I lose one redo log group, one member, I can continue to operate with just the one. If I only have one redo log group member and the system comes around and tries to write to it, then my system is going to come to a halt. So the log writer is going to write to those redo logs whenever somebody does a commit. When that redo log buffer is 1/3 full or every three seconds and before DB writer writes. So those are the four mechanism that tells log writer to write from that log buffer to the redo log files. And it'll also write, when we do a shut down, all the buffers will be flushed. And so that way, everything will be in sync when the system is shut down. 04:01 Lois: What are the different modes of operation for a database, Bill? And how do these modes impact the recovery capabilities of the database? Bill: So we have two different modes we can operate in. One is called NOARCHIVELOG mode. It is the default. ARCHIVELOG mode, highly encouraged. But not every environment has to be in ARCHIVELOG mode. 04:21 Nikita: So with ARCHIVELOG mode… Bill: Closed database. You have to close it, recover to the last backup. That's as far as I can go. Actually, I could, depending on what happens, I might be able to apply some redo. Suitable for training and test environments or for data warehouses, we don't have a lot of frequent changes. It's mainly bulk loading data at night and querying during the day. So it might be appropriate for that. Because ARCHIVELOG mode, it is a little overhead. Yes. So with that database, it goes down while it's open. The system, when it comes up, it can recover to the last committed transaction. And this is usually the mode we want to operate in for production environments. So we have that data in the buffer cache. We have that redo being buffered. We have the undo tablespace, keeping track of what the data was before a change. The redo keeps track of what was the change. And if we're in ARCHIVELOG mode, as we switch from one redo log to another, we will generate what's referred to as archived log files, and that's what allows us to do a complete recovery. 05:33 Lois: What happens in the case of automatic instance recovery? Bill: For an automatic instance recovery or crash recovery, our system went down unexpectedly. Because it did not do a clean shutdown, the buffers were not flushed. Everything was not synchronized. So the datafile, controlfile, everything is out of sync. 05:53 Nikita: So, how do the files get synchronized then? Bill: It uses the redo log groups to synchronize the files. It's going to roll forward. It rolls forward the changes that were made. So due to different distinct operations. Roll forward applies committed and uncommitted data. And the redo does not keep track of what was committed and uncommitted. It'll keep track of, hey, I had this transaction, hey, here's a commit for that transaction. But hey, I have a transaction. That was never uncommitted. That's the job of the undo. But rolls forward all those changes. And then anything that did not actually receive a commit, it will roll back the uncommitted data, return to the original state. And that is the job of the undo tablespace. 06:37 Lois: Bill, is it possible to tune instance recovery for better performance? Bill: You can try to tune this instance recovery. Tuning it is touchy. Be careful because you can cause more harm than what you think you might be doing good. The instance recovery, what we're doing, we're trying to-- the transactions between checkpoints. When was the last checkpoint? Because the items between the checkpoints, that's what has to be reapplied. So the last checkpoint to the last redo log, what is that time frame there between those? Well, what we're going to do, we're going to try to control that. We're going to try to control the difference between the checkpoint and the end of the redo log. There is a mean-time recovery advisor. You specify the desired times in seconds or minutes that how often you want that checkpoint to occur. There is a parameter, FAST_START_MTTR parameter that you can set. The default value is zero saying, hey, I'm going to let the system take care of it. And the maximum you can set it is to one hour. 07:46 Nikita: And why 1 hour?
 Bill: The reason being, if I set that to one hour and I have a lot of activity, how long is it going to take? How many transactions can happen within that hour? Yeah, I'm not doing a checkpoint as often, so I'm eliminating that workload. But if it has to recover, how long is it going to take? If I set it too small, the system says, hey, right now, it's going to take me 19 seconds based off statistics. If I said, OK, I want it in five seconds. So what does that mean? Every five seconds, I'm saying do a checkpoint. So what is it doing? OK, time to do a checkpoint. OK, time to go ahead and OK, DB writer write. OK, log writer write. OK, let me update the datafiles and the controlfiles. So you're just thrashing your system. So be careful if you decide to try to manually tune it. And when you go out and look at this mean time to recover, and even if you do it through the command line, you'll see that, that value is most likely going to change throughout the day, depending on the workload that you have. 08:46 Lois: How does the process of restoring and recovering data typically work? Bill: So when we restore, we're restoring our datafiles. All the datafiles, tablespace, controlfiles, archived redo log, server parameter file. Then when we recover, it involves depending on the backup that we use and other factors in there, it is going to apply the redo. So automatically done by RMAN. So I tell it, this is what I want to do. Hey, I want to restore a database. OK, RMAN says, all right, what backup are you going to use? What is it I need to restore? And then we tell it to recover. OK, I know what I need to use to recover. So RMAN can do the work for you. So when we restore and recover due to a manual process and there's different methods that we can use, and depending on the failure, we'll drive what type of restore and recovery we might perform. 09:40 Are you looking for practical use cases to help you plan and apply configurations that solve real-world challenges? With the new Applied Learning courses for Cloud Applications, you'll be able to practically apply the concepts learned in our implementation courses and work through case studies featuring key decisions and configurations encountered during a typical Oracle Cloud Applications implementation. Applied learning scenarios are currently available in General Ledger, Payables, Receivables, Accounting Hub, Global Human Resources, Talent Management, Inventory, and Procurement, with many more to come! Visit mylearn.oracle.com to get started. 10:22 Nikita: Welcome back! Can you talk about the different types of recovery scopes, Bill? How do they compare? Bill: Recovery can have two kinds of scope. All right. One is the complete recovery. We are getting the database back to the current time of the crash with no loss of data. We're going to again bring everything back to the present. Incomplete or point-in-time recovery. We're going to take a database or maybe a tablespace or even a table back to a point-in-time in the past. So from the time that we select to take it to recover, everything that was done after that is null and void, is gone missing. That's why it's called incomplete recovery, because it's not complete. 11:09 Lois: What are the steps that take place during complete recovery? Bill: We restore the datafiles. Changes are applied. We're applying the redo. The datafiles contained committed and uncommitted transactions. The undo is applied. Anything that did not receive an actual commit will take back to the original value. And we have our datafiles recovered. 11:33 Nikita: And what about point-in-time recovery? Bill: Point-in-time recovery, very similar. We're going to restore the datafiles from as far back as necessary. Changes are applied. So the data files are going to contain the committed and uncommitted up to that point-in-time. Database is open, that redo, that undo, anything that did not actually receive a commit. The undo is applied. The point-in-time recovered is complete. We're not applying all the redo, all the changes, only up to the time that we specify. 12:08 Lois: Are there any features that can make point-in-time recovery quicker? Bill: We also have the ability to use flashback database. It is an optional feature. And it can be a quick way to do that point-in-time recovery. It is an alternative to that database point-in-time recovery we just looked at. Faster. No restore is required. It's going to rewind the database. It does require some configuration in the environment. We do have to set up in order to use flashback database. 12:41 Nikita: I want to talk about Oracle’s data protection solutions, particularly when it comes to backup and recovery or disaster recovery. Bill: So for physical data protection-- backup and recovery objective. Yep, that works for both physical and logical. My recovery time, hours to days. Possibly minutes to hours for the logical. And Oracle solution, we have the Recovery Manager that's out of the box, RMAN. Oracle Secure Backup, that is Oracle's media management library system backing up to tape. The logical protection, yes, flashback technologies can help me take care of that very easily. For disaster recovery, physical data protection, recovery time objective, seconds to minutes. We're not going to accomplish that with RMAN. You're going to want to use our Data Guard with the Active Data Guard feature to be able to switch over to a standby database within seconds of a failure. 13:41 Lois: Why would someone choose to use flashback technologies for recovery, Bill? Bill: With the flashback technologies, we can use it for viewing data as past dates. What did it look like? We can wind the database back and forth in time. Assist users in an error analysis and recovery, because we have different technologies. This flashback query, version query, transaction query, those allow me to view what was the value of a row at a time. I can even see what were the changes to a row over a period of time? I can also view the query that caused that change. For error recovery, I can back out a transaction. I can take a table back to a non-current time. I can also flashback a table that was dropped. And I can also take an entire database by using flashback. So the different recovery options I might have with the flashback technology. 14:44 Lois: Thank you so much, Bill. These last two episodes have been so insightful, right Niki? Nikita: I couldn’t agree more, Lois! If you want to know more about backup and recovery configuration and other concepts, visit mylearn.oracle.com and search for the Oracle Database 23ai: Backup and Recovery course. Our upcoming episode is a very special one, where we’ll be discussing Oracle AI in Fusion Cloud Human Capital Management. So, watch out for that! Until next week, this is Nikita Abraham… Lois: And Lois Houston, signing off!
 15:16 That’s all for this episode of the Oracle University Podcast. If you enjoyed listening, please click Subscribe to get all the latest episodes. We’d also love it if you would take a moment to rate and review us on your podcast app. See you again on the next episode of the Oracle University Podcast.
    --------  
    15:45
  • Oracle Database 23ai: Backup and Recovery - Part 1
    In this two-part special, Lois Houston and Nikita Abraham delve into the critical topic of backup and recovery in Oracle Database 23ai.   Together with Bill Millar, Senior Principal Database & MySQL Instructor, they discuss the role of database administrators, strategies for protecting data, and dealing with various types of data failure.   Oracle MyLearn: https://mylearn.oracle.com/ou/course/oracle-database-23ai-backup-and-recovery/141127/   Oracle University Learning Community: https://education.oracle.com/ou-community   LinkedIn: https://www.linkedin.com/showcase/oracle-university/   Twitter: https://twitter.com/Oracle_Edu   Special thanks to Arijit Ghosh, David Wright, Radhika Banka, and the OU Studio Team for helping us create this episode.   --------------------------------------------------------- Episode Transcript: 00:00 Welcome to the Oracle University Podcast, the first stop on your cloud journey. During this series of informative podcasts, we’ll bring you foundational training on the most popular Oracle technologies. Let’s get started! 00:26 Nikita: Welcome to the Oracle University Podcast! I’m Nikita Abraham, Team Lead of Editorial Services with Oracle University, and with me is Lois Houston, Director of Innovation Programs. Lois: Hi there! For the last two weeks, we’ve been having really exciting discussions on Oracle AI Vector Search. We covered the fundamentals, benefits, the vector workflow, and lots more. Today, we’re going to talk about backup and recovery in Oracle Database 23ai with Bill Millar. If you’ve been listening this season, you’ll know that Bill is a Senior Principal Database & MySQL Instructor with Oracle University. Nikita: In this two-part special, we’ll dive into some of the things you need to know about backup and recovery, especially if you’re a database and backup admin. So, if you're the person in charge of keeping data safe and handling disaster recovery, this is definitely worth your time. 01:20 Lois: That’s right, Niki. Hi Bill, thanks for joining us again. What’s the role of a Database Administrator, or DBA, when it comes to backup and recovery? Bill: The DBA is typically responsible for ensuring the database is open and available when needed and at times you need to work with system administrators and other people within your organization to achieve that. But we want to try to protect the database from failure wherever possible. We want to increase the mean time between failures. Hopefully, we don't have failures, and we have to increase that time. But it might mean that we need to ensure we have redundant hardware and that in place, again, maybe out of the realm of the DBA, but people within your organization can help with that. We want to protect those critical components by using the redundancy. And we want to decrease the mean time to recover. Failures happen, but how fast can we get access back to that data after that failure. The faster we can do it, the happier customers are. Minimize the loss of data. It's never good to lose data, especially in a critical environment, but maybe in test and development, maybe not so bad.  02:39 Nikita: How do we ensure a separation of duties for backup and recovery processes? Bill: For a separation of duties, we do have a user called SYSBACKUP. It has the privileges that's required to perform backup and recoveries, the privilege to connect and execute the commands in what we refer to as RMAN, our Recovery Manager. As I said, it has permissions for backup and recovery because you do need to shut down the database, start up the database, those type of things. We're able to connect to that closed database to try to troubleshoot it, to get it to the open state again. It does not include any privileges to access data. The SYSBACKUP user is created when we install the database, when we create the database. We can use it explicitly for privileged user connection. It allows us to connect to the database. So RMAN connects as SYSBACKUP. 03:37 Lois: Bill, what should people keep in mind when figuring out what’s considered critical data? Bill: You want to try to identify your critical data. Some data might be highly required to access and make sure we don't lose don't lose data, but then you might have some environments. OK, I don't need to have them up and running as fast. If we lose a little data, it may not hurt, but we want to identify the difference in the different data that we have on different environments. So we want to also prioritize that critical data, which data do we need access to first because how much will the company lose per hour of downtime because we can't do business. We want to make sure the access data protection requirements. Not everybody has access to everything. And there are different types of disaster that can happen that are going to be totally out of your control. There's the physical disaster, a hurricane or tornado, outages, power outages, component failures, failures within the building itself, corruption of data because of some of these failures. And then, the most dreaded one, the one that happens most often, usually those human errors, the logical errors, where the data is just bad, we are able to access and everything. It's just that something has changed that shouldn't have been changed. We want to make sure we access our recovery requirements.  05:04 Lois: So, what are they? What are those requirements? Bill: We want to base that requirement based on how critical is that data, how soon do we need to have access to that? What is our recovery point objective? Do we have a tolerance for any type of data loss? How frequently should we backups? How often they should be taken? What type of backups will be another thing we'll want to figure out? Is point-in-time recovery required? Are we able to or do we ever need to go back to a previous point in time to do something? It's not always just recovery for a database failure. We might need to do a recovery point in time to a different system so we can investigate something. What is my recovery time objective? Again, what is the tolerance for the downtime? How long can I be down? The downtime, the biggest part of when a system goes down is trying to identify what is the problem, then next is what is going to be my plan to recover, and then perform in the recovery. We might have a tiered required time objective based off of critical data, and then depending on the failure. Is that failure at the entire database? Is it just a tablespace? Is it just a table? Is it just a row? That also determines how long it takes to recover and what type of recovery we might try to perform. What is my backup retention policy? Do I have a requirement to where I have to have my backups off site? And it doesn't mean like back in the old days of mainframe computers, you'd back up to tape and you'd take those tapes off site. You might still do that today. Or, am I backing up to a cloud environment? So what do I need to have for that? What about long-term backups? We work with our day-to-day backups, but there's those backups that require for longer, archives like end of year backups. Some places require to keep their end of year backup for like 10 years. How are we going to handle that? So these are some of the things that we have to think about when we start talking about backup and recovery. 07:23 Did you know that the Oracle University Learning Community regularly holds live events hosted by Oracle expert instructors. Find out how to prepare for your certification exams. Learn about the latest technology advances and features. Ask questions in real time and learn from an Oracle subject matter expert. From Ask Me Anything about certification to Ask the Instructor coaching sessions, you’ll be able to achieve your learning goals for 2024 in no time. Join a live event today and witness firsthand the transformative power of the Oracle University Learning Community. Visit mylearn.oracle.com to get started. 08:04 Nikita: Welcome back! Bill, I want to talk about the different failures that can occur in an Oracle database. How would you categorize them? Bill: There are different category of failure. This is not an all-inclusive list by any means. It's just something that possibly can happen. So they can usually be divided into different categories like statement failure. All right. When doing a select and insert, update, delete, the statement itself fails. A user process fails. Single database session fails for some reason. Network failure, connectivity is lost. The user error, probably one of the most common ones we have to deal with. A user successfully completes an operation, but that operation was erroneous. They dropped the wrong table, updated the wrong row. Then there's the instance failure. The database itself shuts down unexpectedly. And then media failure, usually a hard failure of our disk. Something of memory, something failed and caused an error. 09:12 Lois: Ok. I want to dive a little deeper into each of these categories that you mentioned. Let’s start with statement failures. What are typical problems that one might face? Bill: Attempts to enter invalid data into a table. They're trying to put a numeric field in a date field, and usually just working with the user is going to correct that. Is that the DBA responsible? Yes, no, maybe. They attempt to form operations with insufficient privileges. Attempts to allocate space that fails, well, that depends on are they going-- do they have unlimited storage or do they have a limit? Logic errors in the application. Well, that's where we're going to have to work with those developers to try to correct those type of errors. 09:59 Nikita: What about user process failures? Bill: User performs an abnormal disconnect, doesn't close out properly. It can cause something to hang up or even possibly erroneous data to be updated. A user session is abnormally terminated. Well, usually, we don't have to try to resolve those user type errors, but something we might need to look into. A user experiences a program error that terminates the session. Again, usually it's the application developers, but it's something as a DBA, we might want to keep an eye on. Is it the same person? Is it from the same location? Is it the same module within that application? Maybe there's some things we can help to identify what the possible problem can be. 10:43 Nikita: Bill, tell us about common issues that can lead to network failures. What can we do to mitigate these problems and ensure network resilience? Bill: The listener fails. Well, we can connect a backup listener and configure how it can connect time failover can work. A network interface card fails. Well, again, we're not the hardware people, but can we work with our network, our server team, whatever, to possibly have redundant network cards? The network connection fails itself. Can we configure a backup network connection? 11:18 Lois: And what about user errors? How can we recover from those types of scenarios? Bill: The user inadvertently deletes or modifies data. Well, we have some things we'll look at as far as like rollback a transaction along with the dependent transactions. Rewind that table back to where it should have been. You're also can use LogMiner. You can look at our redo logs to try to figure out where that bad transaction was. User drops a table inadvertently. Well, we can recover the table from the recycle bin if we have the recycle bin on or we may need to recover from a backup. 11:56 Nikita: What are common causes of instance failures, Bill? Bill: The dreaded power outage. Well, hopefully, we have some type of up system to keep us running, even if it's not for continuous operation. Maybe if it's just to allow us to gracefully take a system down. The dreaded hardware failure. If you have a way to predict a hardware failure, you can make a lot of money. Always happens at the most inopportune times. But then again, do we have redundant hardware? Do we have something in place to help allow us to continue to operate in case of a hardware failure? Failure of one of the critical background processes. Why did it fail? We can go out. We can look at our alert log, we have trace files. And then we have, you have the Enterprise Manager Cloud Control. We can do the same thing as looking at the alert log and trace files. But the Enterprise Manager Cloud Control gives us a GUI interface to allow us to do that. 12:53 Lois: Before we let you go, Bill, can you tell us about media and data failures? Bill: Failure of a disk drive, failure of a disk controller, deletion or corruption of a file needed for database operation, well, this is the dreaded media failure. So we're going to restore from a backup. If we need to move, we can move a data file to a different location. We can notify, hey, here's that new location. And then recover by applying any of the incremental backups, any of the redo to get it back to where it should be. And then we have the data failures. We can't access the component, missing data files at OS level. And maybe our system administrators deleted something thinking it wasn't needed, or maybe even a developer on a development type system. Don't have the right permissions. Tablespace is offline. Well, why is it offline? Did somebody took the wrong tablespace offline? We have physical corruptions, block checksum failures. It's inconsistent between the header and footer. Invalid block header field values, like all of them are zeroed out. Then we have the logical corruptions, inconsistent dictionary, corrupt row piece, the inconsistencies, a control file not synchronized with the data files, usually because we recovered something and didn't do it the right way. I/O failures, maybe we just exceeded the number of open files that we're allowed to have. Maybe it's just a network or an I/O error itself. And these are different types of failures that you might experience. Again, it's not an all-inclusive list. It's just a few examples. 14:41 Nikita: I know you said it’s not an all-inclusive list and you were just giving us a few examples, but that seemed quite thorough! Thank you so much, Bill, for walking us through all of that today! Lois: Yeah, I totally agree! Thanks Bill! For more on what we discussed today, visit mylearn.oracle.com. Search for the Oracle Database 23ai: Backup and Recovery course. Next week, we’ll get into instance recovery and recovery strategies. Until then, this is Lois Houston… Nikita: And Nikita Abraham, signing off! 15:15 That’s all for this episode of the Oracle University Podcast. If you enjoyed listening, please click Subscribe to get all the latest episodes. We’d also love it if you would take a moment to rate and review us on your podcast app. See you again on the next episode of the Oracle University Podcast.
    --------  
    15:44
  • Oracle AI Vector Search: Part 2
    This week, Lois Houston and Nikita Abraham continue their exploration of Oracle AI Vector Search with a deep dive into vector indexes and memory considerations.   Senior Principal APEX and Apps Dev Instructor Brent Dayley breaks down what vector indexes are, how they enhance the efficiency of search queries, and the different types supported by Oracle AI Vector Search.   Oracle Database 23ai: Oracle AI Vector Search Fundamentals: https://mylearn.oracle.com/ou/course/oracle-database-23ai-oracle-ai-vector-search-fundamentals/140188/   Oracle University Learning Community: https://education.oracle.com/ou-community   LinkedIn: https://www.linkedin.com/showcase/oracle-university/   Twitter: https://twitter.com/Oracle_Edu   Special thanks to Arijit Ghosh, David Wright, Radhika Banka, and the OU Studio Team for helping us create this episode.   --------------------------------------------------------   Episode Transcript:   00:00 Welcome to the Oracle University Podcast, the first stop on your cloud journey. During this series of informative podcasts, we’ll bring you foundational training on the most popular Oracle technologies. Let’s get started!   00:26 Nikita: Welcome back to the Oracle University Podcast! I’m Nikita Abraham, Team Lead of Editorial Services at Oracle University, and with me is Lois Houston, Director of Innovation Programs. Lois: Hi everyone! Last week was Part 1 of our discussion on Oracle AI Vector Search. We talked about what it is, its benefits, the new vector data type, vector embedding models, and the overall workflow. In Part 2, we’re going to focus on vector indices and memory. 00:56 Nikita: And to help us break it all down, we’ve got Brent Dayley back with us. Brent is a Senior Principal APEX and Apps Dev Instructor with Oracle University. Hi Brent! Thanks for being with us today. So, let’s jump right in! What are vector indexes and how are they useful? Brent: Now, vector indexes are specialized indexing data structures that can make your queries more efficient against your vectors. They use techniques such as clustering, and partitioning, and neighbor graphs. Now, they greatly reduce the search space, which means that your queries happen quicker. They're also extremely efficient. They do require that you enable the vector pool in the SGA. 01:42 Lois: Brent, walk us through the different types of vector indices that are supported by Oracle AI Vector Search. How do they integrate into the overall process? Brent: So Oracle AI Vector Search supports two types of indexes, in-memory neighbor graph vector index. HNSW is the only type of in-memory neighbor graph vector index that is supported. These are very efficient indexes for vector approximate similarity search. HNSW graphs are structured using principles from small world networks along with layered hierarchical organization. And neighbor partition vector index, inverted file flat index, is the only type of neighbor partition index supported. It is a partition-based index which balances high search quality with reasonable speed. 02:35 Nikita: Brent, you mentioned that enabling the vector pool in the SGA is a requirement when working with vector indexes. Can you explain that process for us? Brent: In order for you to be able to use vector indexes, you do need to enable the vector pool area. And in order to do that, what you need to do is set the vector memory size parameter. You can set it at the container database level. And the PDB inherits it from the CDB. Now bear in mind that the database does have to be balanced when you set the vector pool. 03:12 Lois: Ok. Are there any other considerations to keep in mind when using vector indices? Brent: Vector indexes are stored in this pool, and vector metadata is also stored here. And you do need to restart the database. So large vector indexes do need lots of RAM, and RAM constrains the vector index size. You should use IVF indexes when there is not enough RAM. IVF indexes use both the buffer cache as well as disk. 03:42 Nikita: And what about memory considerations? Brent: So to remind you, a vector is a numerical representation of text, images, audio, or video that encodes the features or semantic meaning of the data, instead of the actual contents, such as the words or pixels of an image. So the vector is a list of numerical values known as dimensions with a specified format. Now, Oracle does support the int8 format, the float32 format, and the float64 format. Depending on the format depends on the number of bytes. For instance, int8 is one byte, float32 is four bytes. Now, Oracle AI Vector Search supports vectors with up to 65,535 dimensions. 04:34 Lois: What should we know about creating a table with a vector column? Brent: Now, Oracle Database 23ai does have a new vector data type. The new data type was created in order to support vector search. The definition can include the number of dimensions and can include the format. Bear in mind that either one of those are optional when you define your column. The possible dimension formats are int, float 32, and float 64. Float 32 and float 64 are IEEE standards, and Oracle Database will automatically cast the value if needed. 05:18 Nikita: Can you give us a few declaration examples? Brent: Now, if we just do a vector type, then the vectors can have any arbitrary number of dimensions and formats. If we describe the vector type as vector * , *, then that means that vectors can have an arbitrary number of dimensions and formats. Vector and vector * , * are equivalent. Vector with the number of dimensions specified, followed by a comma, and then an asterisk, is equivalent to vector number of dimensions. Vectors must all have the specified number of dimensions, or an error will be thrown. Every vector will have its dimension stored without format modification. And if we do vector asterisk common dimension element format, what that means is that vectors can have an arbitrary number of dimensions, but their format will be up-converted or down-converted to the specified dimension element format, either INT8, float 32, or float 64. 06:25 Working towards an Oracle Certification this year? Take advantage of the Certification Prep live events in the Oracle University Learning Community. Get tips from OU experts and hear from others who have already taken their certifications. Once you’re certified, you’ll gain access to an exclusive forum for Oracle-certified users. What are you waiting for? Visit mylearn.oracle.com to get started.   06:52 Nikita: Welcome back! Brent, what is the vector constructor and why is it useful? Brent: Now, the vector constructor is a function that allows us to create vectors without having to store those in a column in a table. These are useful for learning purposes. You use these usually with a smaller number of dimensions. Bear in mind that most embedding models can contain thousands of different dimensions. You get to specify the vector values, and they usually represent two-dimensional like xy coordinates. The dimensions are optional, and the format is optional as well. 07:29 Lois: Right. Before we wrap up, can you tell us how to calculate vector distances? Brent: Now, vector distance uses the function VECTOR_DISTANCE as the main function. This allows you to calculate distances between two vectors and, therefore, takes two vectors as parameters. Optionally, you can specify a metric. If you do not specify a metric, then the default metric, COSINE, would be used. You can optionally use other shorthand functions, too. These include L1 distance, L2 distance, cosine distance, and inner product. All of these functions also take two vectors as input and return the distance between them. Now the VECTOR_DISTANCE function can be used to perform a similarity search. If a similarity search query does not specify a distance metric, then the default cosine metric will be used for both exact and approximate searches. If a similarity search does specify a distance metric in the VECTOR_DISTANCE function, then an exact search with that distance metric is used if it conflicts with the distance metric specified in a vector index. If the two distance metrics are the same, then this will be used for both exact as well as approximate searches. 08:58 Nikita: I was wondering Brent, what vector distance metrics do we have access to? Brent: We have Euclidean and Euclidean squared distances. We have cosine similarity, dot product similarity, Manhattan distance, and Hamming similarity. Let's take a closer look at the first of these metrics, Euclidean and Euclidean squared distances. This gives us the straight-line distance between two vectors. It does use the Pythagorean theorem. It is sensitive to both the vector size as well as the direction. With Euclidean distances, comparing squared distances is equivalent to comparing distances. So when ordering is more important than the distance values themselves, the squared Euclidean distance is very useful as it is faster to calculate than the Euclidean distance, which avoids the square root calculation. 09:58 Lois: And the cosine similarity metrics? Brent: It is one of the most widely used similarity metrics, especially in natural language processing. The smaller the angle means they are more similar. While cosine distance measures how different two vectors are, cosine similarity measures how similar two vectors are. Dot product similarity allows us to multiply the size of each vector by the cosine of their angle. The corresponding geometrical interpretation of this definition is equivalent to multiplying the size of one of the vectors by the size of the projection of the second vector onto the first one or vice versa. Larger means that they are more similar. Smaller means that they are less similar. Manhattan distance is useful for describing uniform grids. You can imagine yourself walking from point A to point B in a city such as Manhattan. Now, since there are buildings in the way, maybe we need to walk down one street and then turn and walk down the next street in order to get to our result. As you can imagine, this metric is most useful for vectors describing objects on a uniform grid such as city blocks, power grids, or perhaps a chessboard. 11:27 Nikita: And finally, we have Hamming similarity, right? Brent: This describes where vector dimensions differ. They are binary vectors, and it tells us the number of bits that require change to match. It compares the position of each bit in the sequence. Now, these are usually used in order to detect network errors. 11:53 Nikita: Brent, thanks for joining us these last two weeks and explaining what Oracle AI Vector Search is. If you want to learn more about what we discussed today, visit mylearn.oracle.com and search for the Oracle Database 23ai: Oracle AI Vector Search Fundamentals course.   Lois: This concludes our season on Oracle Database 23ai New Features for administrators. In our next episode, we’re going to talk about database backup and recovery, but more on that later! Until then, this is Lois Houston… Nikita: And Nikita Abraham signing off! 12:29 That’s all for this episode of the Oracle University Podcast. If you enjoyed listening, please click Subscribe to get all the latest episodes. We’d also love it if you would take a moment to rate and review us on your podcast app. See you again on the next episode of the Oracle University Podcast.
    --------  
    12:57
  • Oracle AI Vector Search: Part 1
    In this episode, Senior Principal APEX and Apps Dev Instructor Brent Dayley joins hosts Lois Houston and Nikita Abraham to discuss Oracle AI Vector Search. Brent provides an in-depth overview, shedding light on the brand-new vector data type, vector embeddings, and the vector workflow.   Oracle Database 23ai: Oracle AI Vector Search Fundamentals: https://mylearn.oracle.com/ou/course/oracle-database-23ai-oracle-ai-vector-search-fundamentals/140188/   Oracle Database 23ai: SQL Workshop: https://mylearn.oracle.com/ou/course/oracle-database-23ai-sql-workshop/137830/   Oracle University Learning Community: https://education.oracle.com/ou-community   LinkedIn: https://www.linkedin.com/showcase/oracle-university/   Twitter: https://twitter.com/Oracle_Edu   Special thanks to Arijit Ghosh, David Wright, Radhika Banka, and the OU Studio Team for helping us create this episode.   ---------------------------------------------------------   Episode Transcript:   00:00 Welcome to the Oracle University Podcast, the first stop on your cloud journey. During this series of informative podcasts, we’ll bring you foundational training on the most popular Oracle technologies. Let’s get started!   00:26 Lois: Hello and welcome to the Oracle University Podcast! I’m Lois Houston, Director of Innovation Programs here at Oracle University. Joining me as always is our Team Lead of our Editorial Services, Nikita Abraham. Nikita: Hi everyone! Thanks for tuning in over the last few months as we’ve been discussing all the Oracle Database 23ai new features. We’re coming to the end of the season, and to close things off, in this episode and the next one, we’re going to be talking about the fundamentals of Oracle AI Vector Search. In today’s episode, we’ll try to get an overview of what vector search is, why Oracle Vector Search stands out, and dive into the new vector data type. We’ll also get insights into vector embedding models and the vector workflow. 01:11 Lois: To take us through all of this, we’re joined by Brent Dayley, who is a Senior Principal APEX and Apps Development Instructor with Oracle University. Hi Brent! Thanks for joining us today. Can you tell us about the new vector data type? Brent: So this data type was introduced in Oracle Database 23ai. And it allows you to store vector embeddings alongside other business data. Now, the vector data type allows a foundation to store vector embeddings. 01:42 Lois: And what are vector embeddings, Brent? Brent: Vector embeddings are mathematical representations of data points. They assign mathematical representations based on meaning and context of your unstructured data. You have to generate vector embeddings from your unstructured data either outside or within the Oracle Database. In order to get vector embeddings, you can either use ONNX embedding machine learning models or access third-party REST APIs. Embeddings can be used to represent almost any type of data, including text, audio, or visual, such as pictures. And they are used in proximity searches. 02:28 Nikita: Hmmm, proximity search. And similarity search, right? Can you break down what similarity search is and how it functions? Brent: So vector data tends to be unevenly distributed and clustered into groups that are semantically related. Doing a similarity search based on a given query vector is equivalent to retrieving the k nearest vectors to your query vector in your vector space. What this means is that basically you need to find an ordered list of vectors by ranking them, where the first row is the closest or most similar vector to the query vector. The second row in the list would be the second closest vector to the query vector, and so on, depending on your data set. What we need to do is to find the relative order of distances. And that's really what matters rather than the actual distance. Now, similarity searches tend to get data from one or more clusters, depending on the value of the query vector and the fetch size. Approximate searches using vector indexes can limit the searches to specific clusters. Exact searches visit vectors across all clusters. 03:44 Lois: Ok. I want to move on to vector embedding models. What are they and why are they valuable? Brent: Vector embedding models allow you to assign meaning to what a word, or a sentence, or the pixels in an image, or perhaps audio. It allows you to quantify features or dimensions. Most modern vector embeddings use a transformer model. Bear in mind that convolutional neural networks can also be used. Depending on the type of your data, you can use different pretrained open source models to create vector embeddings. As an example, for textual data, sentence transformers can transform words, sentences, or paragraphs into vector embeddings. 04:33 Nikita: And what about visual data? Brent: For visual data, you can use residual network also known as ResNet to generate vector embeddings. You can also use visual spectrogram representation for audio data. And that allows us to use the audio data to fall back into the visual data case. Now, these can also be based on your own data set. Each model also determines the number of dimensions for your vectors. 05:02 Lois: Can you give us some examples of this, Brent? Brent: Cohere's embedding model, embed English version 3.0, has 1,024 dimensions. Open AI's embedding model, text-embedding-3-large, has 3,072 dimensions. 05:24 Want to get the inside scoop on Oracle University? Head over to the Oracle University Learning Community. Attend exclusive events. Read up on the latest news. Get first-hand access to new products. Read the OU Learning Blog. Participate in Challenges. And stay up-to-date with upcoming certification opportunities. Visit mylearn.oracle.com to get started.  05:50 Nikita: Welcome back! Let’s now get into the practical side of things. Brent, how do you import embedding models? Brent: Although you can generate vector embeddings outside the Oracle Database using pre-trained open source embeddings or your own embedding models, you also have the option of doing those within the Oracle Database. In order to use those within the Oracle Database, you need to use models that are compatible with the Open Neural Network Exchange Standard, or ONNX, also known as Onyx. Oracle Database implements an Onyx runtime directly within the database, and this is going to allow you to generate vector embeddings directly inside the Oracle Database using SQL. 06:35 Lois: Brent, why should people choose to use Oracle AI Vector Search? Brent: Now one of the biggest benefits of Oracle AI Vector Search is that semantic search on unstructured data can be combined with relational search on business data, all in one single system. This is very powerful, and also a lot more effective because you don't need to add a specialized vector database. And this eliminates the pain of data fragmentation between multiple systems. It also supports Retrieval Augmented Generation, also known as RAG. Now this is a breakthrough generative AI technique that combines large language models and private business data. And this allows you to deliver responses to natural language questions. RAG provides higher accuracy and avoids having to expose private data by including it in the large language model training data. 07:43 Nikita: In the last part of our conversation today, I want to ask you about the Oracle AI Vector Search workflow, starting with generating vector embeddings. Brent: Generate vector embeddings from your data, either outside the database or within the database. Now, embeddings are a mathematical representation of what your data meaning is. So what does this long sentence mean, for instance? What are the main keywords out of it? You can also generate embeddings not only on your typical string type of data, but you can also generate embeddings on other types of data, such as pictures or perhaps maybe audio wavelengths. 08:28 Lois: Could you give us some examples? Brent: Maybe we want to convert text strings to embeddings or convert files into text. And then from text, maybe we can chunk that up into smaller chunks and then generate embeddings on those chunks. Maybe we want to convert files to embeddings, or maybe we want to use embeddings for end-to-end search. Now you have to generate vector embeddings from your unstructured data, either outside or within the Oracle Database. You can either use the ONNX embedding machine learning models or you can access third-party REST APIs. You can import pre-trained models in ONNX format for vector generation within the database. You can download pre-trained embedding machine learning models, convert them into the ONNX format if they are not already in that format. Then you can import those models into the Oracle Database and generate vector embeddings from your data within the database. Oracle also allows you to convert pre-trained models to the ONNX format using Oracle machine learning for Python. This enables the use of text transformers from different companies. 09:51 Nikita: Ok, so that was about generating vector embeddings. What about the next step in the workflow—storing vector embeddings? Brent: So you can create one or more columns of the vector data type in your standard relational data tables. You can also store those in secondary tables that are related to the primary tables using primary key foreign key relationships. You can store vector embeddings on structured data and relational business data in the Oracle Database. You do store the resulting vector embeddings and associated unstructured data with your relational business data inside the Oracle Database. 10:30 Nikita: And the third step is creating vector indexes? Brent: Now you may want to create vector indexes in the event that you have huge vector spaces. This is an optional step, but this is beneficial for running similarity searches over those huge vector spaces. So once you have generated the vector embeddings and stored those vector embeddings and possibly created the vector indexes, you can then query your data with similarity searches. This allows for Native SQL operations and allows you to combine similarity searches with relational searches in order to retrieve relevant data. 11:15 Lois: Ok. I think I’ve got it. So, Step 1, generate the vector embeddings from your unstructured data. Step 2, store the vector embeddings. Step 3, create vector indices. And Step 4, combine similarity and keyword search. Brent: Now there is another optional step. You could generate a prompt and send it to a large language model for a full RAG inference. You can use the similarity search results to generate a prompt and send it to your generative large language model in order to complete your RAG pipeline. 11:59 Lois: Thank you for sharing such valuable insights about Oracle AI Vector Search, Brent. We can’t wait to have you back next week to talk about vector indices and memory. Nikita: And if you want to know more about Oracle AI Vector Search, visit mylearn.oracle.com and check out the Oracle Database 23ai: Oracle AI Vector Search Fundamentals course. Lois: Yes, and if you're serious about advancing in your development journey, we recommend taking the Oracle Database 23ai SQL workshop. It’s designed for those who might be familiar with SQL from other database platforms or even those completely new to SQL. Nikita: Yeah, we’ll add the link to the workshop in the show notes so you can find it easily. Until next week, this is Nikita Abraham… Lois: And Lois Houston signing off! 12:45 That’s all for this episode of the Oracle University Podcast. If you enjoyed listening, please click Subscribe to get all the latest episodes. We’d also love it if you would take a moment to rate and review us on your podcast app. See you again on the next episode of the Oracle University Podcast.
    --------  
    13:14

Mais podcasts de Tecnologia

Sobre Oracle University Podcast

Site de podcast

Ouça Oracle University Podcast, Area Bitcoin Podcast e muitos outros podcasts de todo o mundo com o aplicativo o radio.net

Obtenha o aplicativo gratuito radio.net

  • Guardar rádios e podcasts favoritos
  • Transmissão via Wi-Fi ou Bluetooth
  • Carplay & Android Audo compatìvel
  • E ainda mais funções

Oracle University Podcast: Podcast do grupo

Radio
Aplicações
Social
v6.28.0 | © 2007-2024 radio.de GmbH
Generated: 11/23/2024 - 9:02:22 AM