Home Podcasts Artificial Intelligence (AI)

Expanding the Scope of Generative AI in the Enterprise with Bal Heroor, CEO and Principal at Mactores

Bal Heroor, CEO and Principal at Mactores, explores common use cases for generative AI, how it's evolving, challenges of data governance and much more.

Updated Aug 2023

Guest

Bal Heroor

Host

Richie Cotton

Key Quotes

There are so many different parameters which one model cannot sustain. So what we might have, what we might end up with is multiple specialized models which are contributing to a final outcome. We are quickly seeing that paradigm shift in, generative AI model building where we might imagine several models contributing to an outcome. It's still a very difficult thing to do because each of those models can affect the outcome differently. And today, they don't communicate with each other. But, there are areas where we can standardize those and have different still generative AI large models generating outcomes, one for material, another for design, another for strength, et cetera. So those are very evolving topics. We are talking about really hyper cutting edge generative AI.

Lineage is a big issue in machine learning. Everybody we have worked with and when we have implemented explainability of data, how did this data come in? Oh, that was somewhere in my data lake. And... How did that data lake build that data set or some spark job ran it? How did that spark job get that data? Nobody knows, so lineage is a major challenge and it is a very confusing topic for various people because lineage can mean two things. One is a lineage which you maintain in data catalogs where you show how those tables are getting evolved from source tables but another lineage is how do you track which file was used to process? Now, with data warehousing systems, it was easier. You can just monitor the table transformations and then generate the final lineage cycle. But with this advancement in Spark and using all these tools, which we use, DBT, there are millions of those now, we generally process data from a file. And nobody's looking into, okay, what was the file which was sent to me? From where did it come? What was the ingestion point? If it is a real time data stream, then which was the session which was used for the real data stream? How did that data come in? And if I'm training on something, is it data been certified? I'm not saying mastering of data is required, but at least you need to know if the data is coming from a certified source, which you are now using to train. So lineage starts from the ingestion point, till you reach it to the consumption point and it has to track everything.

Key Takeaways

The future of AI might not lie in a single large language model doing everything, but rather in multiple specialized models working together. Each of these models will be tailored to handle specific tasks, and their combined output could lead to more accurate and efficient results.

Ensure that good data cleaning practices are in use when using data to train an AI. A large and uncleaned dataset will often lead to an AI providing inaccurate outputs—instead, it’s better to use a smaller dataset that has been properly cleaned.

A user-role-based data access system is not scalable in the world of AI—instead, you can employ a system where data is tagged based on attributes (like PII, healthcare data, location data, etc.), and access is granted based on tags and predefined policies. This approach can enhance data governance and streamline the process for machine learning teams.

Links From The Show

Pulsar

Trifacta

AWS Clarify

[Course] Introduction to ChatGPT

[Course] Implementing AI Solutions in Business

[Course] Generative AI Concepts

Transcript

Richie:

Hi, Bal, thank you for joining me on the show.

Bal Heroor:

Hey, Hi Richie, nice meeting you. Thanks for inviting me.

Richie:

Yeah, brilliant. And I'd like to start off by talking a little bit about use cases. So what are the most common use cases that your customers have for generative AI?

Bal Heroor:

Yeah, that's a very good question. So what is going on is generative AI right now is really helpful to generate a lot of text and images. So the use cases we are dealing since January are around marketing, generating images for marketing, generating product images using some base image, and then using text generation to generate a lot of content. And that content can be for digital marketing, for emails, and content generation. But we see that slowly changing over time as there is more exposure to more models using hugging face, the stability AI. There are so many different new models which are coming up. Cloud is another great one, which is changing the paradigm. And it is helping us work with customers, not in next generation, but now taking some customer data. and generating some unique use cases of analysis. So we are working with customers to generate enterprise strategy based generative AI. So can generative AI generate strategies which will form some foundations based on the enterprise data and then help them quickly start building a business plan around that. So we are working with customers using generative AI to solve some enterp... See more

rise problems. But that is a very recent last one and a half month progress. But for a very long time, generative AI has been wonderful in helping customers accelerate a lot of content generation. And we see that as these models evolve, there will be not many enterprise use cases, as well as not many use cases with your own data. Yeah, so these are some things which we are working on closely. One of the other areas which is getting interesting is creating this virtual environment using Generative AI. Now, let me tell you, this is very niche and very new for gaming customers. So as you know, in gaming, one of the major challenge is generating the scenes and generating the storylines and generating communications with the players is a major challenge. And Jandritv is the perfect solution for that. And it is helping a lot of gaming companies create these virtual environments, create graphics for this virtual environment and be adaptive with the user preferences and auto-tune the environment they like to work in or play with, as well as discuss as a human with it or chat with the person as a human and not be very scripted. For very long time, those were very scripted, didn't know the current topics, current what is going on in the world. So now these generative AI's can be trained with all the news topics, all the trending topics which are upcoming and then in that arena and then help the gaming customers to really adapt that conversations, that images or that videos which needs to be integrated. Now at the video scale, it is still far. from giving the best outcome possible, but mostly for creating virtual environment which are most image-based, is getting pretty rapid adoption in the gaming community.

Richie:

That's absolutely fascinating. And I like that there are some very standard use cases like marketing and product that are applicable to almost every company, but then perhaps gaming have even higher requirements around AI and obviously been using AI for a long time. So they're taking it to the next level. Have you seen any other industries where generative AI is particularly in demand?

Bal Heroor:

So, and this is very preliminary exercise we are working on is using generative AI in manufacturing. One of the major challenge in manufacturing, there are many facets in manufacturing. One of the major challenge in manufacturing is product design. And when you perform any kind of, now manufacturing can be subdivided into discrete manufacturing, silicon, there are so many different tentacles of manufacturing itself. So specifically, we are working in discrete and silicon. And in those use cases, one of the biggest challenge is once you have a design, they want to see how those designs manifest in real time. So let me just, let us imagine this, right? So when you are designing a silicon wafer, you're actually taking sand and heating it up and cooling it down at different temperatures to create those layers. Imagine that like your circuit board. On the circuit board, you have these copper wires going all around and then you are creating layers over layers over layers and compacting all those so you can have different circuits in each layer. Kind of, similarly, when you design a chip, you're trying to do that. Now, a lot of models, machine learning models and a lot of technologies are today available. One of them is Cadence, which does a lot of great job, then there are other players who do great job in using statistical and machine learning methods. But now what Generated VIA is doing is taking that data and actually generating a visual image of how those layers will form over time. Now, let me remind you, this is one of the biggest challenge why the chips are expensive. If you go and buy Nvidia chips, if you go and buy Intel chips. they are still expensive and they are not very, they are not cheap because there is a lot of research which goes in how do I take this new design and stabilize it? And those are in half of billions of dollars, right? So it is like $500 million or a billion dollar research before you can release a chip for common public. Now they are passing on all of this expense to the consumer because they need to pay for the new chip. So now if... this generative AI comes into picture, all those million hours of investment in designing can be substantially compressed because now generative AI can generate those layers and you can visually see them, which was impossible before. You have to actually manufacture it and see how it looks like if those do not create any hotspots. Hotspots is like two layers touching each other, which will create a short circuit. Then there were, if the right amount of heat is not applied, due to the design, they can have disconnections between the circuits. So there are many, many use cases like that in semiconductor and in discrete, specifically in discrete manufacturing, say if you are building a specific product with AR, VR, there have been some great enhancements in how do you take a design and then analyze it using say now with Apple VR, it is going to be much, much different. But till now with all these VR headsets, we could see the existing created 3D designs and analyze them from a third perspective. But those were not generated by AI. Those were all generated by humans. Now, what generative AI is actually trying to do is in that space, you can use generative AI models to come out with various different possibilities. And then you can select which possibility you want to go through. without wasting a lot of time in design. So imagine you're building, let me give you a simple example. You're building a car, right? And now when you're building a car, you are designing the shape, if it is aerodynamic, how it should look, what kind of look you want. Now a designer has to sit, think about it, build the whole car, and then in VR you can experience it before you start putting it in manufacturing, right? But with generative AI, what you can really do is you can generate the same car with various different materials, various different features, shapes, and then that can real time give you an interaction of how the final product might look like before you start working on feasibility of that product, market reach of that product, et cetera. So in manufacturing, we are exploring areas of generative UAI, which can be used in design phase, as in Silicon Valley, as in discrete. We envision it in the operation and manufacturing phase of a product as well, but that's still a very exploratory. But design is something which is very powerful, takes a lot of time, a lot of effort, and provides immediate value for customers because now they can reduce the design time substantially with generative AI.

Richie:

OK, this is a fascinating use case that I've not heard of before. And it sounds like this is potentially a huge market, because if you can generate something as complex as a chip, then I imagine that it's also suitable for generating simpler objects as well. So is this appropriate for anyone who's designing products? Do you think you can make use of these tools?

Bal Heroor:

After this he problem is going to be we will have to evolve different models. It cannot be one model. We think one large language model like ChatGPT can do a lot of things. It can write code, it can write emails, it can write a bunch of content based on the prompts you give. We envision that these models are going to be further now specialized. As Sam Ullkman also said, the era of large language model is coming to an end. which makes less sense because one large model, doing everything is good for common use cases. But when you go with specialized use cases, you need models, generated AI models which are customized for a problem. Like for design, designing a car needs to understand parameters of the material. It needs the strength parameters of material. It needs to know how that material folds and shapes in a particular... environment and what kind of strength it will have. So there are so many different parameters which one model cannot sustain. So what we might have, what we might end up with is multiple specialized models which are contributing to a final outcome. So that is we are quickly seeing that paradigm shift in the generative AI model building is we might imagine each of the model contributing to an outcome. It's still a very difficult thing to do. because each of those models can affect the outcome differently. And today, they don't communicate with each other. But there are areas where we can standardize those and have different still generative AI large models generating outcomes, one for material, another for design, another for strength, et cetera. So those are very evolving topics. We are talking about really hyper cutting edge generative AI today. So this is something which there is a lot of research going on, how to have separate specialized model communicate with each other, they can contribute to design. But yes, that will be a real strong use case. And like 3D printers, you might see people using generative AI to design models, then use 3D printers to print them and start using them in everyday life.

Richie:

OK, that's interesting that you think there's a shift from the sort of general purpose generative AI models to having more specialized models rather than like a single broad model. Now, some of the examples you mentioned, such as chip design and the gaming use case of creating scenes and so on, these seem like quite advanced uses of AI. And I know a lot of companies and organizations are just trying to figure out, well, how do I get started using this? So do you have any examples of what might be a sort of good, high impact first project or something easy that companies can get started with?

Bal Heroor:

Absolutely, and that is what we do with called as customer 360, right? One of the major challenge today of products, which are companies which do retail and so online and offline shopping, right? So imagine retail products being sold in shops and then retail products sold online. Say Nike, Nike sell shoes in the shops as well as they sell it online. One of their major challenge is understanding how they should market the product. To market the product, they need to know what their customers react to and respond to. For a very long time, this was done by market research, and it was done manually with humans. And then there were tons and tons of people analyzing the data, taking the data from the retail stores, aligning it to the online stores, understanding the entire demographic, what gets sold where. And then there are solutions where people could align those identities and then create custom messages, custom offers in different regions so they can optimize their sales and optimize their supply chain so they can distribute the right kind of product in the right area and store their warehouses properly. Now, this is all great, but now with generative AI, what you can achieve is custom messaging for a custom target of people. with the right kind of images, which will, which right kind of content, which can influence a certain segment of your customers by taking the data from these large data lakes, which customers have created in customer call as customer 360 in general. But largely what it does, it tells you different demographics, different responses, different reactions, using social media, using your, you know. customer feedbacks. And it creates a strong foundation for these product marketers and designers to actually design something which is useful to you as a consumer. Generative AI steps in here today without any custom models or anything, right? You don't need to do any fancy stuff. Out of the box Generative AI models today can support generating custom content for them using your data. The only... only challenge is you need to host them because one of the major challenge with open APIs is the data cannot be owned. So when you host them, you can partition the data from the foundation model. So in generative way, there is a very critical component called as foundation model. So foundation model is something which is trained on a lot of data points. And now it is ready for tuning for your data, like hyper parameters tuning. tune your model, make sure the data which you're giving is partitioned from the data which it already has and does not contribute back to the foundation model. And then use that partition data, kind of transfer learning from deep learning. But now you can use your foundation model with your data to generate very customized content for your customers walking into door, coming on your website and can be, you know. can be very useful for consumers like you and me. If I want to walk into an IK shoe and somebody can quickly tell me which is the shoe I would like and I walk out with the right kind of shoe, because of the content marketing, that would be a phenomenal experience. So I don't have to really go out there and search what is the right shoe for me, what kind of style I would like, et cetera. So that kind of hyper personalization is now enabled using this use cases out of the box today. So you don't have to do. You do not have to wait for next two years.

Richie:

That seems brilliant, the idea of just getting a personalized retail experience. And if it's available after the box, then that does seem like a pretty useful way for businesses, particularly those selling products, want to start. Now, you mentioned that there's a bit of a challenge with using things like the OpenAI API, or in fact, I guess any commercial API in that you've got to avoid sending your commercial data to this, because it's going to end up public. problems or challenges that organisations might face when they're trying to adopt AI.

Bal Heroor:

So one of the major challenge we have seen in many organizations is that data is not of high quality. Now, generative AI is not going to solve and automatically solve the problem of data quality. And data quality can result in many serious problems. It can generate a heavy bias in your data when generative AI is generating data. At the same time, it can generate the totally nuisance data, which is not useful. So a lot of companies who are serious about Gen.ai should work backwards. In my, in my opinion, work backwards from the, from the North star. Right. So if you want a great content to be generated, your input should be much better than which is available today. So how do we take care of data quality? How do we take care of. making sure we follow ethical AI practices. How do we make sure we have scalable sources within our data ecosystem, which can help customers make sure the data which is being sent to Generative AI for training is of a quality which will generate better outcomes, right? So if you, it's garbage in garbage out. If you give garbage to Generative AI, it will generate garbage. So it's very critical for companies to understand how to A, evaluate the quality. That means finding out is your data qualitative enough, correct? The best part of generative UI, it doesn't need quantity. It's not like machine learning, where you used to give terabytes and terabytes of data and then it would increase the accuracy levels, or you might have to go and go back and find new features every time the data is changed. The benefit of neural networks and generative UI is you can give very small sample size of data and then it can generate wonders. But if you give that small sample size of data with bad quality, it is not going to give you results. So the quality is very, very important. The second most critical aspect is governance. Now, one of the areas we discussed about is, OK, now, I want to generate custom content for Baal, custom content for Ritchie. So if I want to achieve that, and if I give custom content, if I give generative AI access of everybody. It might end up disclosing information about me to you and your information to somebody else. So, it may not directly do that but there is a possibility because it is an end of the day, it is a mathematical model which is balancing out all the inputs from our personalized information and creating a math model to respond to the next possible you know, next possible question you might ask it, right? So when it's generating a content, it is generating based on a probability of what you might like. And we might both fall in the same demographic possibly, demographical and personality, you know, set where it would go and might disclose data. So we need to take care of data governance carefully. And it's really hard subject in generative AI. That is because you need to literally partition the data in generative UI based on the data sets. And so it does not cross-populate the outcomes. So this is the second major challenge. And the third important challenge, which is more business challenges, is the generative solution going to affect your business in a way which the investment results the outcome? A lot of people today are saying, wow, generative UI, let's jump on it. And everybody's jumping into it. So. One, you need to take a step back and say, hey, I'm going to spend thousands of, or if not thousands, millions of dollars to generate this outcome, right? Because generating is not cheap. You need to get CPU farms, you need to use a lot of infrastructure to train them. There is a lot of cost in research and cost in building these training data sets, which can be trained on. So what we need to first start is first figure out what kind of business outcome my Generate AVA is going to generate for me, right? So if we can hypothesize this Nike solution, if we start with an hypothesis, if this Nike solution can increase sales, increase their top line, increase supply chain solutions, reducing their operational costs. So there has to be either, companies operate on two things, either top line or bottom line, right? So... Either you increase your top line or improve bottleneck. So if generative AI solutions are really going to affect in that way, and the third one is ESG, sustainability. So if any of these three parameters have been affected by generative AI, and what kind of quantitative, unqualitative business outcome it is going to be, then you can work on a generative AI use case. Otherwise, we have seen like in the era of big data. like in the era of machine learning, like in the era of deep neural networks. We have consistently seen customers investing millions of dollars and then the project or that particular, initiatives completely is destroyed because it is not generating business outcome. So it's very important not to jump onto the gun of generative AI, but really look into the factor, do I have right quality? Do I have right kind of governance? Is this business use case? going to result in business outcomes which are benefit for benefiting business, then yes, then that is the step you should say. These are the three major areas we look into whenever we have customers with General TVI.

Richie:

Okay. I mean, there's a sort of old sad joke about how you want to be a data scientist, but actually you spend 80% of your day just cleaning data. And it sounds like that's not really going away with AI. You do need to really focus on that. Make sure you've got good quality data.

Bal Heroor:

Hey, there are many great products or great solutions like CleanLab. I don't know if you have heard of CleanLab. It's a startup from Israel, Tel Aviv, and they have an AI-based solution to clean your data.

Richie:

Okay?

Bal Heroor:

So you don't have to train their models to clean their data. It's a very innovative self-learning platform. So you can check. about that. And there are many other open source projects coming up where data cleaning itself is done by AI. So as you said, 80% of the day time goes in cleaning the data. That might not be the case if you use the right tools. But the problem is not many people realize the data quality is an issue. You cannot ignore that. A lot of we have seen customers ignoring data quality and thinking, oh, I have a database, Oracle database of all of my data I want to train. train my generative AI model on my databases, which generally is not the case. And so I'm just trying to inside the thought process that there are tools available where you don't have to spend a lot of time in cleaning your data. But if you don't think about cleaning your data, it's not going to work. You'll have to make sure.

Richie:

Ok, that does sound very promising, the idea of having AI help with data cleaning. I think a lot of people would be very pleased to spend less time doing that. But you've also mentioned that a lot of the problems with adopting AI are around data privacy concerns. And do you have any advice on how you can deal with that side of things?

Bal Heroor:

So classically people have been using users and roles. So you are a user, you have a role and then you have access to data, which is not scalable enough in the AI world. So what we need to go and enhance is use attribute-based accesses and policy-based accesses. And I have written a lot of articles about it and have been actively working with our customers to... Help them understand when you do attribute-based accesses, you are defining this data is, say, PII, personally identifiable data. This data is about health care. This data is about their location and their private data. So once we tag data and then allow our AI teams to access data based on tags, that becomes One of the biggest challenges in data governance is management of governance. It's not about setting the right practices, it is about following those practices. If you create your systems and make it very hard to follow those practices, you might have built the whole China wall, but you might have a back door open where anybody can get in. What's the point? You spend billions of dollars building this wall of China, but then you have... back doors getting into the country from anywhere, it doesn't make any, it's not helpful. So that happens because you spend a lot of time in following the process, and the process itself was so tedious that it took a lot of time for somebody to, you know, make sure they have closed all the back doors. So that's why it is very important to innovate the way the data governance is done in this massive scale data, where it's more attribute based and policy based. rather than very role and user base. So that gives flexibility to make sure every time the new data set is come, I have tagged it. I don't have to care who has access to it because a policy is going to define who can access what data. Now, when you're actually accessing data, it will check. As a machine learning engineer, do you really have access to it? Can you train your models or these particular columns? And it can be done at column level, by the way. It's not only at table level, it's just hypothetically speaking. If you think everything in tables and columns, so you can attribute tables, you can attribute columns, you can attribute rows. So you can go granular, you can attribute files, you can attribute so many things. So attribute is a very flexible way of giving the right kind of information about that particular data. And then the right kind of people can access it, which makes the machine learning team life easier. They don't have to wait for an admin to give them access. And if they don't wait for admin to access, there's no pressure on admins to give access. And then they will never make mistakes. So we are just taking out that unnecessary grunt work of admins to every time a machine learning team or a data scientist asks for an access, they have to go through a whole workflow. The data is coming in. It is pre-tagged. Policy says you can access it or you cannot access it. you are not defining based on my request every time. And they have to find which data set and give access to it. So we are just changing that thought process will enhance a lot of governance best practices. Other than that, sorry, I'm talking a little bit more on this subject because it's very critical. Other than that, one of the most important aspect is explainability of the AI. And there has been so much of research going on that. that how do these models work? Why are they taking the out, why are they providing the outcomes what they are, right? So actually it's called explainability AI or explaining the model. There are many open source as well as proprietary, you know, solutions available on AWS here's something that's clarified. And there are other open source tools which can help you explain why your model is behaving the certain way, right? And when you find that out, you have to constantly monitor those models to understand why it is taking decisions the way it is taking. And once you can explain that, you can immediately understand what data is trained on and also identify if there was a mismanagement of governance. So it is not only about protecting with your walls, but also monitoring nobody is getting through those walls. And how do you monitor? You monitor it by counting the population every 10 years. And similarly, you cannot do this for every 10 years. With machine learning, you can actually monitor the model and understand what data it is getting trained on. So now you know there is no governance challenges. So these are the two levers you can pull to make sure you are following good ethical AI practices.

Richie:

That's really interesting that you mentioned data access management and also I guess the other one was explainable AI as being like the two things that you really need to take care of and Actually on that first point, I think one of the big challenges particularly in large organizations is tracking the lineage of data like where has your data come from and So do you see similar challenges with tracking the lineage of AI?

Bal Heroor:

A lot, yes. So what has happened over time is lineage of AI was a very, very important topic in data warehousing, right? So I'm talking about six years before, five years before data warehousing before big data was a big thing. It used to build data warehouses, it was very tabular and structured, not unstructured data sets, and it kind of died. over the last five years. And all these products like Snowflake, Databricks, and these products have taken over, which allows you to do a lot of fancy stuff. And people have started slowly ignoring the fact of which this data warehouse world built over 15 years is to have quality of data, lineage of data, and having a stronger presence. Those three tenants somehow have been blurred. over the last couple of years because there has been hyper-productization of these big data solutions as lake houses or as data warehouses. Essentially, they took the concept of big data and implemented it as a data warehouse. But a lot of ingestion practices of those data warehouses are not carried forward from those old school techniques. So Informatica, Talend, all of these guys had a plethora of services to do the data quality right. but they themselves have not scaled enough to support this massive ingestion of these data sets. So people are just now dumping that data into data lakes and data warehouses without realizing these three tenants are very critical. Lineage is a big issue in machine learning. Everybody we have worked with and when we have implemented explainability of data, how did this data come in? Oh, that was somewhere in my data lake. And... How did that data lake build that data set or some spark job ran it? How did that spark job get that data? Nobody knows so Lineage is a major challenge and it is a very confusing topic for Various people because lineage can mean two things One is a lineage which you maintain in data catalogs where you show how those tables are getting evolved from source tables but another lineage is how do you track which file was used to process? Now, with data warehousing systems, it was easier. You can just monitor the table transformations and then generate the final lineage cycle. But with this advancement in Spark and using all these tools, which we use, DBT, Matalian, there are millions of those now, we generally process data from a file. And nobody's looking into, okay, what was the file which was sent to me? from where did it come? What was the ingestion point? If it is a real time data stream, then which was the session which was used for the real data stream? How did that data come in? And if I'm training on something, is it data been certified? I'm not saying mastering of data is required, but at least you need to know if the data is coming from a certified source, which you are now using to train. So lineage starts from the ingestion point. till you reach it to the consumption point and it has to track everything. I have not seen a solution which does that today, end to end. There is no solution which tells me file version, file number, source, everything from start to end. We build those systems. It is a build solution. There is no out of the box product which does that because of the ecosystem. So you have various different injection points, you have various different processing points and you have various different consumption points. So this cannot be an out of the box product. It's a too complex problem to solve. But let me give you a good news. There is a generative AI way of looking at it. Since you are on the topic, because of all the logs, all the, every tool set creates so much of logging. Of course, if you enable them, but that logs can be very powerful in answering lineage. for some of your roles and you can use generative AI models to track them and create reports of lineage through generative AI. This is some, this is where we use generative AI to manage data ops, right? So data ops is one of the subject we're talking about right now. There are many implementations of generative AI within data quality, within data lineage, within data governance we can explore. And that's what we are recently been working on. How do I... make somebody's life simple using generative AI in managing all of this mess of data ops.

Richie:

OK, so excuse me. It does seem like there's then a trade off between having very large data sets where data governance becomes more difficult and data ops becomes more difficult and maybe working with a smaller data set where it's easier to manage this even with more traditional tools. But moving on from this, I'd like to talk a little bit about the roles and responsibilities within the organization for actually getting AI into the company. So who in an organization needs to be responsible for AI?

Bal Heroor:

Responsible should be the CEO of the company. But yeah, so it's a good question. Various organizations have different kind of structures. And we are a very strong proponent of business-driven AI decisions. What does that AI strategy, what does that mean? of a particular division or a particular product should be responsible to defining the outcomes. Now you need a team of people to execute it and that can come from IT. But if the IT is not solving business problems, then it is not really doing anything. So let me explain you little bit further. For very long time, IT was treated as an independent department on its own. and their job was to create infrastructure so people can do their work. Like laptops, servers, providing you with compute power so you can use it, right? And that was not a business outcome. For a very long time, IT was treated as operations. You get your desk, you get your monitor, you get your laptop, you get lights, electricity, and you can work, correct? It doesn't matter, it does not affect my business. But with AI. with generative AI, with data analytics. The world has changed. IT is no more just a facilitator. It needs to be aligned with the business strategy. And that is what we are striving and helping our customers with. That not to treat IT as an operation, as a cost center, and treat it like people who will provide me infrastructure whenever I need, rather treat it like somebody apart to partner with. So your business ideas can translate into more technology-backed solution, which can change your business substantial. And so the roles which we see been successful is a business leader of that unit, leading as a decision maker of how a particular AI solution should manifest. Then you need people from technology. to really think about how this interworking of these structures are going to work. And then you need to in business validating the outcomes of these people of these structures giving out right kind of business results and monitor if they need improvement in efficiency, improvement in cost, etc. So a VP of a business should have a team of data. architects, data analysts and machine learning and data scientists, which can very well be an IT function but reporting to him. And then they need business analysts and business strategists who will look the outcome of these folks who have designed these pipelines and show that and validate that the results which they are generating is affecting business. And so you can say, okay, I invested $5 million in my technology, it is generating $50 million. If the outcome is not 10x or 5x based on every company, what's the point? What's the point of investing in technology? So running this as products. So that is where we came up with the idea of data products, rather than data platforms. IT generally goes out and say, we want to get a data platform. Let's get data bricks. Let's get snowflake. It's no more a platform solution now. Today, it's about data products. How do you build data products within your organization, which can then supercharge your business teams to go out there and generate more business? And that's the philosophy we should be getting into, rather than treating everything like a data platform.

Richie:

Okay, so it seems like there's a lot of people involved in here. So you mentioned like, you have some executives, maybe the CEO getting involved in this from the top. You've got business people involved. You've got technical people. You've got some IT people involved. Oh yeah.

Bal Heroor:

By the way, that was a joke about the CEO. Hahaha!

Richie:

Well, the buck always stops at the CEO.

Bal Heroor:

The entire organization!

Richie:

All right. Yeah. But I guess legal teams are maybe the only people I don't think you mentioned. But. Does there need to be some sort of legal input into any AI project?

Bal Heroor:

Absolutely, and I'm when I said business I somehow automatically thought like legal It might be I might have missed saying legal so business analysts and strategists generally every time loop in legal teams and that the legal teams are Are part of that whole decision-making because there is a substantial amount of change a there is a lot of IP B there is a lot of lot of areas this AI products can affect the customer's business in a way which will need change in contracts, terms of services or privacy policies. So legal has to be part of the discussion before you release it to general public, but legal slows everything down. So our general philosophy is building a product called as MVP, minimum viable product, a data product internally without involving legal. And then once you have a minimum viable product, which is not affecting, because it's not providing any, it's not structurally changing anything for your consumers or customers, show that to Legal and see how the contracting and the Legal framework needs to change. That has helped in a very positive way, because when we did this for a customer and then went to Legal, Legal said, now we can actually see what you're building. Okay? If we go to Legal very early, One of the challenge which happens, they cannot imagine what it is. Their whole mind is based on the legal framework. And then they will slow down the project, slow down the implementation substantially because they want to make sure you are legally following all the boundaries. So legal is very important, but when you go to them is very important as well. So once you have a working prototype, working prototype, you know what it is going to do, you showcase that to legal, you showcase that to marketing, you showcase that to the sales team. And once you showcase that, everybody are on the same page now, okay, this is what this is going to do. And then they can use their sales and marketing techniques to now change their messaging. Legal can go and change the legal framework to change the messaging. So that's why this business strategist and business solution architect is required is because they are going to now manifest all this technology, you know, mesh which has been created by the technology team. to the legal teams and sales teams and marketing teams. That's why we said it should be product and it should be very business focused rather than very technology platform focused.

Richie:

Okay, that is interesting that you mentioned the idea of timing. So if you speak to the legal people too soon, they'll tell you no and shut things down. You wait until your company's getting sued, probably a little bit too late.

Bal Heroor:

Yeah, that will be super neat.

Richie:

Alright, can we talk a bit about, because things are moving so fast in AI at the moment, how do you balance speed of adoption with making sure that things are done right?

Bal Heroor:

Very good and interesting question. There is no right answer for this, unfortunately. Because AI is like a Cambrian explosion, right? So you do a lot of things and few of them will be successful. It's like a startup world today. Unfortunately, we are not in an era of AI which would be like fit it and forget it. Correct? AI is... That's why it's called AI experiments. If you need to have a very scientific view of AI rather than having a mathematical view of an AI. And this is an analogy which I want to give coming from science, I'm a huge physics guy. So I like to follow physics, cosmology, learning the scientific method rather than Socratic method. Socratic method is where you, communicate with other person and try to explain them if it is right or wrong. And the scientific method is you're observing something and make sure your hypothesis are validated and if they are validated, then why they are not validated, if they are validated, how they are validated and you keep on learning from failures and not argue with somebody why it is right. It's a fact-based discussion, whereas Socratic method is more philosophy-based discussion. So what is going on in the market today is... A lot of technology teams and businesses are very Socratic in method. I know this, I have been doing for this for 15 years, for 20 years, and this will work without data, without really experimenting on things. And AI is exactly opposite of that. In AI, you can never tell how things will go. You have to experiment, you have to see how it will work. You have to form hypothesis and work from that. So that is this two world problem. which is being, which people need to slowly solve. So I'm sorry, what was your question again? Let's,

Richie:

It was about balancing the speed of adoption with doing things right.

Bal Heroor:

Yeah correct. So balancing the speed of adoption. So now when you have these two different worlds which you are dealing with, the scientific method wants you to stop. Scientific method wants you to observe. Scientific method wants you to adopt. changes before after you have seen the experiment work. Whereas the Socratic method is very much in gut feeling that okay, now this work, let's go, go. Let's do new thing, right? So that's where I came, this two world philosophy is you need to maintain the right balance between what decisions are scientific, what decisions are Socratic. So something like I have built a product, it is working, it is showing outcomes, it has been validated by the scientific method. Now. how the adoption works needs to be accelerated. But once you get the feedback, you have to throw the ball back to the scientific method and say, I've got this feedback, does this make sense? So it's a very cultural problem. So because adoption will never, should not drive implementations, right? Adoption should drive research. Adoption should drive understanding why the adoption has increased, rather than just coming out with new features. And this can be phenomenally explained by Apple. And why am I taking this a little differently is because Apple is the one company who never releases features as a response to Android or any other OS maker. There has been, they have been critiqued for so long that, hey, my Android has 500 new features every year, and Apple has not released them for the last three years. Apple never releases any of these features for years. What they do is every time they release a feature, they take feedback to take a scientific approach. Is this needed? How much people are going to be affected by this? I never worked for Apple, but I'm just imagining what they might be doing. How much efforts are needed to build this feature? Is this going to really demographically change the implementation of it? Sorry, the adoption of the product. So... And that's why you see VR coming after five years, after Oculus, after Meta took over, almost five years, right? So Meta was going behind, oh, let's get VR, let's build this metaverse and build all of these fancy stuff. But Apple said, no, we want to take step back, see that option, see the research. So that, Apple is a good example to show. And now they have made some horrible mistakes in the past. I'm not saying they have never done. But that's a very good place to start learning how the adoption and implementation should change hands when you have feedback.

Richie:

Okay, I think it may be even longer ago since Meta or Facebook at the time required Oculus, like there's a good sort of, what was it 2014, something like that is going back away. So yeah, you're right. Apple's,

Bal Heroor:

Yeah, yes, sorry.

Richie:

Apple's a little bit behind the curve there. But yeah, they're taking the time. So you mentioned experimentation. And just on that note, I think that's one of the areas where data teams are going to get involved in it. And can you maybe talk more generally about what data teams, all the data practitioners, the data scientists, data analysts need to be thinking about in order to make use of AI?

Bal Heroor:

There are various things, right? So first of all, today, start adopting AI-driven code writing, which is available today. Use GitHub Copilot, AWS Code Whisperer. There are so many different products. Stop waiting to write the code manually. There are so many tools. We use it every day, and it accelerates your code writing skills. So that is the first, you know, there are still, I mean, we... we meet with a lot of people who are not using AI to write code. But other than that, what we have seen, or I have personally seen challenge with the data teams is looking at a big picture and then knowing what is your role in that big picture. One of the major challenge which happens most of the data teams is these individual engineers, machine learning teams, AI experts, data scientists, even data scientists. have a very, very myopic view of, very small scale view of what is my role, but do not have the larger picture of what is going to happen. And the top decision makers have just the largest picture and they never see what is going to work at the, you know, at the ground level, right? So, and... It's very hard for an executive to go deeper because there are many other responsibilities. But at some point in time, it is easier for an engineer to look at the entire platform, may not understand every component of it. You don't have to. What you at least know that where my robot is going to be part of building this entire mega structure, right? Imagine every of the component these guys are building are small robots which are again building a larger robot, right? If you believe in that philosophy of using small units of building to build other bigger things. So if these people are going to contribute to something, they need to know how it is going to be manifested even if they don't understand every component of it. Because that gives clarity, that gives a very strong sense of path. to take right decisions. We have seen most companies, products, or projects failing because there is no clarity across all the team about the small scale implementation or the smallest detail and knowing the largest aspect. So change in the psychology, philosophy of the data teams is very important. Second is retooling and rescaling every six months. one of the major challenges you have seen is that you have used Spark 3.0. It has not changed since then. It has changed 15 times after that. There are so many new tools available which can save my time. Not only save my time, give me better results. Adopt them, learn them, invest in learning every three months. I personally spend at least four hours to say eight hours of my week. based on the week on finding out what am I doing which is not up to the standards of others. Right? What am I doing in my general everyday activity as well as my learning about technology and which is not cutting edge. So you do not have to be cutting edge but you at least need to understand that your job is paying you. But as soon as they stop paying you, you need to find another job where the amount of skills you will require will be exponentially high. So not stopping to learn is the philosophy you need to build. Never stop learning. So that is another very important piece I have seen. In fact, in our organization, we have compensated 20% of their work time to be associated with skilling. So without skilling, there is no progress. And the third and most interesting aspect is not being overwhelmed by technology. I have seen a lot of people getting on, oh, generative AI is here, all of our jobs will be lost now. That is never going to happen. But running behind the crowd all the time, right? I'm a Spark engineer, for example, I'm a Spark ETL engineer, and there is generative AI out there. I should not drop all the ball and start looking at the generative AI, right? start looking into as a Spark Engine how can I influence generative AI? How can I use my skills, my super skills to be contributor to that evolving scene outside? So how to think as a contributor rather than being in a rat race? So as a Spark Engine, I can say, I have built a phenomenal framework which can help you do a lot of transformations in a way which is very helpful for generative AI. Like generative AI cannot understand tables. There are some, but they cannot. So I can write a Spark job which can convert my tables into long forms of text, which, say, Claude can be trained on. I build that and go to my boss and say, hey, this is what I have done. I know I don't know generative AI, but I know what input it needs. Correct? So think differently. Think from a perspective of how can I contribute to this evolving landscape rather than jumping onto the ship and then figuring out, oh, this is either way beyond me or it takes too much time or wasting a lot of time to reach your end goal. So I generally advise people on making sure you can contribute and upscale yourself.

Richie:

That's brilliant advice, and I particularly like the idea that you need to be continuously learning. You need to continuously upscale yourself. So on that note, do you have any ideas for particular skills that data analysts or data scientists or other data practitioners might want to learn in order to get into?

Bal Heroor:

This might be a little, might sound a little adamant, but I would, the first thing I try to tell anybody I meet is first go and learn foundations, right? So what is data? What is data structure? What are the foundations of data? A lot many people do not understand. Foundations of data, right? So... Oh, there is a tabular data, structured data, unstructured data. But what is the form of unstructured data? Unstructured data can be key and values. Unstructured data can be graph in nature. Unstructured data can be columnar in nature. There are so many ways unstructured data can be. And then that is still structured, but we call it somehow unstructured. Say HTML. HTML is still structured. It is in a structure. But you have to read it differently. Then you have unstructured data. There are so many philosophies. They're not philosophies, I would say. There are many techniques in unstructured data, right? So you have Pulsar architecture, you have star schema, you have so many different aspects. Learn from the basics first. Like understand why this was evolved. Understand the history since 1960s, how the data got evolved, right? So you are part of the history now. And you can know that in the era you're living, you're sitting on the shoulders of giants. who have built this over years, right? And now it gives you a very strong perspective that where it might lead, right? Where now the next innovation might happen. So start with the basics foundations, but now once you have done that, what you should be really skilling yourself is to understand these new languages which are coming out. So people are trying to use Rust in analytics, people are trying to use Go in analytics. So Python has been the de facto language. Python R has been de facto language because of the functional nature of it. So I would advise a lot of data engineers to be flexible in their language skills and understand various languages which are possible because each of them bring different benefits to you. And then the third one is learning new tools as they are coming out. So when the new tools are out there, I cannot name them because I don't want to be very salesy here, but there are so many new tools in data analytics which are out there which allow you to quickly do your work. For example, trifecta is a great tool. You don't have to really wait and build a pipeline. You can use trifecta to generate a lot of Spark code. There are other tools similar to that which can be used. So try and expand your skill sets. from the tools perspective. So a combination of tool and programming is what your job should be, rather than just programming and not thinking about any other tools out there, or just using tools and not thinking about programming. So the world, the data analysts, data scientists should be living in what can I do where I need to write code and what can I use which is already there, right? So mix of both will help them a lot.

Richie:

I have to say a lot of your answer there was really surprising to me. Like I didn't expect you to talk about things like data modeling or using Rust and Go and languages like that. So I suppose a challenge for all of our listeners then, if you are using Rust or Go to work with data, then please do message me on social media. I'm Richie Rocks, most places on the internet. Let me know what you're using it for there. So yeah, that's a really fascinating take there. All right. So just before we wrap up then, can you tell me about what exciting things you're working on at the moment?

Bal Heroor:

So as we started with the call, we are actually working on a quite difficult problem of communicating with multiple generative AI models to contribute to an outcome. So we are building some of our, we are using foundation models and we are building some of our AI models to be very customized for manufacturing, for CRM purposes like customer 360 marketing and sales. But at the same time, we are trying to solve the problem of how can we have specialized generative AI models which can communicate with each other, each other's input to generate and find the log. So we have heard about large language models, we are thinking about micro models. Now, those micro models can now be much faster, cheaper to train and can then generate, look at the problem form from various specialized angles. Like in photography you need to look about the light, the subject, the depth. So there are various parameters right. If I put all these parameters in one model, great it will do the job. But if I have small models for each of these, they will generate better results because they are working on a very small compute of very small parameters and the probability distribution on those parameters are going to be much more effective in our hypothesis. So this is an experimentation we are doing. is if we build a model over generative AI, it's not a model essentially, but it's a collaboration of multiple small models of generative AI to contribute to an outcome that will generate much better outcomes than one large model doing everything. That's our hypothesis and we are working on it.

Richie:

That's absolutely fascinating. I guess it's like, do you want a gorilla or do you want a swarm of bees or something? It's like having lots of little models working together. Alright, yeah. Good luck with that research. It sounds absolutely fascinating stuff. So yeah, have fun with that and let us know how that progresses. All right, just to finish then, do you have any final advice for organizations wanting to adopt artificial intelligence?

Bal Heroor:

Great. So that's a phenomenal question. The first advice is let's create a business case. I know a lot of you guys do, but really hone on to the outcome. I'm a big proponent of this culture in AWS called as working backwards. And working backwards essentially mean you release a PR article, like a public release article. Imagining your product is already out it can be six months from now two years from now don't care about it and Imagining the product is out. It's there and you are writing a peer article explaining how the product works right and then So and then everybody in your team aligns to that PR article like this is what we want and Everybody contributes to it and says this is what we want and then have FAQs as if you have FAQs on your website answering questions of all of your team members, once you have this document, make that your Bible and then work backwards from that, that what you need to do today to achieve that, right? That has created this plethora of innovation in AWS that actually triggered this idea of innovation, right? Because now anybody can send a worker in these warehouses can stand and write a PR FAQ if there is an idea. and then collaborate with the seniors to explain the PR to them what his vision is, his or her vision is, his, her or they's vision is, and then go behind and plan what I need to do today to achieve that from one year from now, two years from now. So this is a very powerful idea. Even if you don't like AWS, you are anti-AWS, but I generally like to take these nuggets of information. nuggets of patterns and practices from various organizations and try to implement them because they are very powerful. The second thing I have really understood is from Jim Collins book is the flywheel effect. So flywheel effect is how the systems work together. Like for an instance, a good flywheel for AI is you start with databases. You take data from databases. use either data warehouse or data lake and build an experimentation environment, which then leads to an AI product, which then leads to again new data being generated, which again comes in your database. So it's a flywheel effect, right? So it's a flywheel. So you have to constantly see the, it's kind of a lineage, but it's a lineage from a thought process of what is affecting what and how it is leading to. So understand these flywheels in your organization and understand how you can innovate things better using this flywheel because nothing in the organization is very linear, it is very affecting various other things and there are cycles of things. So understanding that before you start building the AI journey is quite essential.

Richie:

Actually, I have to say I really like the idea of writing the PR article like before you start developing anything, just make sure the end product is really something sensible. That's absolutely a fantastic product development idea and I think especially applicable to AI. All right.

Bal Heroor:

Absolutely.

Richie:

Yeah. So thank you for your time. Thanks for coming on the show.

Bal Heroor:

Thank you, Richie. Thank you very much. It was great talking to you.

Topics

Artificial Intelligence (AI)

You’re invited! Join us for Radar: AI Edition

Join us for two days of events sharing best practices from thought leaders in the AI space

DataCamp Team

2 min

The Art of Prompt Engineering with Alex Banks, Founder and Educator, Sunday Signal

Alex and Adel cover Alex’s journey into AI and what led him to create Sunday Signal, the potential of AI, prompt engineering at its most basic level, chain of thought prompting, the future of LLMs and much more.

Adel Nehme

44 min

The Future of Programming with Kyle Daigle, COO at GitHub

Adel and Kyle explore Kyle’s journey into development and AI, how he became the COO at GitHub, GitHub’s approach to AI, the impact of CoPilot on software development and much more.

Adel Nehme

48 min

A Comprehensive Guide to Working with the Mistral Large Model

A detailed tutorial on the functionalities, comparisons, and practical applications of the Mistral Large Model.

Josep Ferrer

12 min

Serving an LLM Application as an API Endpoint using FastAPI in Python

Unlock the power of Large Language Models (LLMs) in your applications with our latest blog on "Serving LLM Application as an API Endpoint Using FastAPI in Python." LLMs like GPT, Claude, and LLaMA are revolutionizing chatbots, content creation, and many more use-cases. Discover how APIs act as crucial bridges, enabling seamless integration of sophisticated language understanding and generation features into your projects.

Moez Ali

How to Improve RAG Performance: 5 Key Techniques with Examples

Explore different approaches to enhance RAG systems: Chunking, Reranking, and Query Transformations.

Eugenia Anello

See More See More