{"id":1149,"date":"2024-05-30T16:20:39","date_gmt":"2024-05-30T16:20:39","guid":{"rendered":"https:\/\/ailabor.appsters.me\/?page_id=1149"},"modified":"2024-08-26T10:55:53","modified_gmt":"2024-08-26T10:55:53","slug":"text2sql","status":"publish","type":"page","link":"https:\/\/ailabor.appsters.me\/en\/text2sql\/","title":{"rendered":"Text2SQL"},"content":{"rendered":"<div style=\"background-image:url(&apos;https:\/\/ailabor.appsters.me\/wp-content\/uploads\/2024\/06\/Ellipse-3-4.png&apos;);background-position:100% 50%;background-repeat:no-repeat;background-size:contain;\" class=\"wp-block-group has-background\"><div class=\"wp-block-group__inner-container is-layout-constrained wp-block-group-is-layout-constrained\">\n<div class=\"wp-block-group group-1240\"><div class=\"wp-block-group__inner-container is-layout-constrained wp-block-group-is-layout-constrained\">\n<p class=\"has-text-align-center wp-block-paragraph\"><em>Written by: Dr.  Istv\u00e1n Szakad\u00e1t<\/em><\/p>\n\n\n\n\n\n\n<div style=\"height:40px\" aria-hidden=\"true\" class=\"wp-block-spacer\"><\/div>\n\n\n\n<div class=\"wp-block-columns is-layout-flex wp-container-core-columns-is-layout-8f761849 wp-block-columns-is-layout-flex\">\n<div class=\"wp-block-column is-layout-flow wp-block-column-is-layout-flow\" style=\"flex-basis:50%\">\n<h3 class=\"wp-block-heading\"><strong>Machine search in text corpora<\/strong><\/h3>\n\n\n\n<p class=\"wp-block-paragraph\">When CD-ROMs appeared in the early nineties, they were\nadvertised as being able to store the text of as many books on one disc as would fit on a\ntraditional bookshelf in paper format. Moreover, with the help of computers, searching\nthrough them was very fast, and this could be even more contrasted with human searching\nabilities. \"Let's find how many times dogs are mentioned in all of Shakespeare's works!\" This\ntask seemed an impossible challenge for a human, while the computer answered in an instant\nand even showed the results immediately. The incredible performance of machines became\neven more evident when the web appeared, and the digital re-accumulation of human\nknowledge began. We started building a new world of human knowledge in which anyone\ncould access anything, anytime, from anywhere (with some exaggeration). The quantity of\ndocuments accessible through the web increased at an astounding rate. In 2010, Eric Schmidt,\nthe then-leader of Google, stated that:<\/p>\n<\/div>\n\n\n\n<div class=\"wp-block-column is-layout-flow wp-block-column-is-layout-flow\" style=\"flex-basis:50%\">\n<figure class=\"wp-block-image size-full\"><img loading=\"lazy\" decoding=\"async\" width=\"1024\" height=\"1024\" src=\"https:\/\/ailabor.appsters.me\/wp-content\/uploads\/2024\/06\/07.webp\" alt=\"\" class=\"wp-image-2336\" srcset=\"https:\/\/ailabor.appsters.me\/wp-content\/uploads\/2024\/06\/07.webp 1024w, https:\/\/ailabor.appsters.me\/wp-content\/uploads\/2024\/06\/07-300x300.webp 300w, https:\/\/ailabor.appsters.me\/wp-content\/uploads\/2024\/06\/07-150x150.webp 150w, https:\/\/ailabor.appsters.me\/wp-content\/uploads\/2024\/06\/07-768x768.webp 768w, https:\/\/ailabor.appsters.me\/wp-content\/uploads\/2024\/06\/07-400x400.webp 400w\" sizes=\"auto, (max-width: 1024px) 100vw, 1024px\" \/><\/figure>\n<\/div>\n<\/div>\n\n\n\n<div style=\"height:40px\" aria-hidden=\"true\" class=\"wp-block-spacer\"><\/div>\n\n\n\n<p class=\"wp-block-paragraph\">&quot;Nowadays, we produce as much information every two days as we did from the beginning of\ntime until 2003.&quot;<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">The digital universe continuously expanded, and access to the content of the gigantic text\ncorpus was made possible by the emerging machine services, the search engines. In 2000,\nTim Berners-Lee, the father of the World-Wide-Web project, stated that the first decade of the\nweb had fulfilled its mission. We store the entire knowledge of humanity in a single\ninterconnected system, and search engines can read the accumulated vast amount of text\nlightning fast, effectively helping people find the documents they are looking for. At this\npoint, Berners-Lee announced the web program for the coming decades, setting new goals for\nthe next developments.<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">&quot;The first decade of the web was about teaching machines to read; now it&#039;s about teaching\nthem to understand the texts.&quot;<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">Why was it necessary to set new goals? What was the reason for this? Although the\nimpressive performance of the machines was remarkable, it was also clear that computer\nsearch had its limitations. When we tasked the computer with searching for the word &#039;gyula&#039;, it\nimmediately returned the results (increasingly more over time), but it couldn&#039;t handle\nsentences like:<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">&quot;Gyula Gyula was the gyula in the city named Gyula for five years.&quot;<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">The machine found the search term &#039;gyula&#039; four times within the sentence, but it couldn&#039;t deal\nwith the distinction that a human would immediately recognize \u2013 that the found term appears\nin four different senses in the quoted sentence: &#039;gyula&#039; could be a surname, a first name, a title,\nand a city name (and we could find or create even more interpretative\/usages). At a certain\nlevel of language, syntactically, the search engine is efficient, but at the next \u2013 semantic \u2013\nlevel, it is not. The machine \u2013 at this point, at this time \u2013 does not know that &#039;gyula&#039; can mean\nseveral things within a sentence. The above example illustrates what Tim Berners-Lee might\nhave meant when he said, &quot;machines don&#039;t understand what they read.&quot;&nbsp;<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">The promise of resolving this issue lay in the development of semantic search techniques.\nThis is why the new web program was named the &#039;Semantic Search Initiative.&#039; However, this\nprogram did not achieve the rapid and spectacular successes seen in the previous decade of\nthe web project.<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">There was, of course, another problem with search engines. When a person formulated a\nsearch query, the machines returned a list of results consisting of documents containing the\nsearch term. Over time, the length of these lists grew exponentially, becoming a permanent\ntrend. Soon, search results lists containing billions of items appeared. But what was a person\nsupposed to do with such quantities? In a certain sense, we returned to where we started: the\ntask of searching fell back on humans. They had to manually explore the documents in the\nresults list, clicking one after another to see in what context the search term appeared in the\nselected text. We asked for little but received too much. We could say that the machine had\nbecome overly verbose. What we needed was for the search to be more focused.&nbsp;<\/p>\n\n\n\n<div class=\"wp-block-essential-blocks-accordion  root-eb-accordion-4dc52\"><div class=\"eb-parent-wrapper eb-parent-eb-accordion-4dc52\"><div class=\"eb-accordion-container eb-accordion-4dc52\" data-accordion-type=\"accordion\" data-tab-icon=\"fas fa-angle-right\" data-expanded-icon=\"fas fa-angle-down\" data-transition-duration=\"500\"><div class=\"eb-accordion-inner\">\n<div class=\"wp-block-essential-blocks-accordion-item eb-accordion-item-h9g5o eb-accordion-wrapper\" data-clickable=\"false\"><div class=\"eb-accordion-title-wrapper eb-accordion-title-wrapper-eb-accordion-4dc52\" tabindex=\"0\"><span class=\"eb-accordion-icon-wrapper eb-accordion-icon-wrapper-eb-accordion-4dc52\"><span class=\"fas fa-angle-right eb-accordion-icon\"><\/span><\/span><div class=\"eb-accordion-title-content-wrap title-content-eb-accordion-4dc52\"><h3 class=\"eb-accordion-title\"><strong>Reliability, Verbosity, Relevance Handling<\/strong><\/h3><\/div><\/div><div class=\"eb-accordion-content-wrapper eb-accordion-content-wrapper-eb-accordion-4dc52\"><div class=\"eb-accordion-content\">\n<p class=\"wp-block-paragraph\">There was, of course, a response from search engine developers to this verbosity. Whether\nthrough self-awareness or experience, we could assume that users would only look at the\nsuggestions at the top of the list if it was too long. In this case, the most crucial service aspect\nbecame the quality of the search result list&#039;s relevance ranking (what and why the algorithm\nplaces at the top of the list). There were many attempts at this in the early days. Then Google\nappeared and quickly dominated the search engine market with its service. This was because\nGoogle&#039;s search engine had new and much better relevance handling capabilities compared to\nits competitors (AltaVista, Lycos, HotBot, and others).&nbsp;<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">However, the success of Google&#039;s search engine was not due to semantic competence or any\nother linguistic ability but rather to its novelty in how it could gather and algorithmically\nprocess evaluative information expressed by people scattered across web documents (value\nexpressions). We can now say that by collecting these data traces and creating some relevance\nindicators from them, Google did nothing more than exploit one of the peculiar manifestations\nof collective human intelligence (and although it is not often talked about, it is true that the\nGoogle search engine hides perhaps the most significant voluntary and unconscious\ncrowdsourcing project). Of course, this does not diminish the merits of Google&#039;s engineers, as\nbuilding the technical infrastructure and developing the algorithm required a lot of\nengineering knowledge.<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">Another type of criticism could also be levelled against search engines. If we consider this\nhuman-machine relationship a communication act where the human asks and the machine\nanswers, and we know that the search engine&#039;s response is to return a results list, it can be\nstated that the question and the answer differ in genre and linguistic quality. The person enters\na search term, thus asking a question (an interrogative sentence), and the machine returns a\nlist of titles (shorter or longer series of sentences) in response, with the &quot;task assignment&quot; that\nthe person should follow the offered links and read the accessible texts one after another. This\nrelationship significantly differs from the basic form of human communication, where we\ncommunicate with each other in sentences, ask questions (sentences), and expect answers\n(sentences) from the other person.&nbsp;<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">This difference provided the basis for developing new types of search systems. It was logical\nthat if we expect &quot;one-sentence&quot; answers to &quot;one-sentence&quot; questions, the new functionality\nsystem should be named &#039;Q&amp;A&#039; (question and answer) or &#039;question-answer system.&#039;&nbsp;<\/p>\n<\/div><\/div><\/div>\n\n\n\n<div class=\"wp-block-essential-blocks-accordion-item eb-accordion-item-xihzd eb-accordion-wrapper\" data-clickable=\"false\"><div class=\"eb-accordion-title-wrapper eb-accordion-title-wrapper-eb-accordion-4dc52\" tabindex=\"0\"><span class=\"eb-accordion-icon-wrapper eb-accordion-icon-wrapper-eb-accordion-4dc52\"><span class=\"fas fa-angle-right eb-accordion-icon\"><\/span><\/span><div class=\"eb-accordion-title-content-wrap title-content-eb-accordion-4dc52\"><h3 class=\"eb-accordion-title\"><strong>Question-Answer Systems<\/strong><\/h3><\/div><\/div><div class=\"eb-accordion-content-wrapper eb-accordion-content-wrapper-eb-accordion-4dc52\"><div class=\"eb-accordion-content\">\n<p class=\"wp-block-paragraph\">Attempts to develop question-answer systems have been ongoing for a long time, but no one\nhas achieved a breakthrough success. The obvious reason for this is that successful operation\nrequires modelling human semantic capabilities and implementing these models on\ncomputers.<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">This task is far from as easily achievable as executing the operation of syntactic-based\n(character-based similarity) search. It requires uncovering the internal structure of sentences,\nthe relationships between words and phrases, their meanings, and their connections to\nlanguage use contexts\u2014in other words, the entire rule system of natural language. So far, this\nhas not been accomplished. Many believe it may not even be possible, and some think it is a\ncompletely pointless endeavour. However, many have tried and continue to seek solutions.<\/p>\n<\/div><\/div><\/div>\n\n\n\n<div class=\"wp-block-essential-blocks-accordion-item eb-accordion-item-3c3ux eb-accordion-wrapper\" data-clickable=\"false\"><div class=\"eb-accordion-title-wrapper eb-accordion-title-wrapper-eb-accordion-4dc52\" tabindex=\"0\"><span class=\"eb-accordion-icon-wrapper eb-accordion-icon-wrapper-eb-accordion-4dc52\"><span class=\"fas fa-angle-right eb-accordion-icon\"><\/span><\/span><div class=\"eb-accordion-title-content-wrap title-content-eb-accordion-4dc52\"><h3 class=\"eb-accordion-title\"><strong>Semantic Search<\/strong><\/h3><\/div><\/div><div class=\"eb-accordion-content-wrapper eb-accordion-content-wrapper-eb-accordion-4dc52\"><div class=\"eb-accordion-content\">\n<p class=\"wp-block-paragraph\">Google began developing the semantic capabilities of its search engine quite early (in the\nearly 2000s). It has integrated many smaller and larger significance, but always focused on\nspecific areas of knowledge or particular data types, semantic modules into its search engine.\nHowever, the major breakthrough is still awaited. The major breakthrough would be if Google\nsuddenly became a semantic search engine, but the ongoing quantitative changes have visibly\nnot yet reached the level necessary for a qualitative change.<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">Compared to the early days, for quite some time now, Google\u2019s search results page has been\nproviding different services for different questions, due to the continuous integration of new\nfunctionalities that rely on semantic capabilities. From the beginning, the search engine has\nfunctioned as a calculator, a currency converter, a postal code search tool, and over time, it\nhas handled an increasing number of knowledge areas a little differently from others in\ngeneral.&nbsp;<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">More and more frequently, we receive search results pages where, instead of a list of related\nweb pages (i.e., a set of document titles) at the top of the page, we see relevant \u2013 unique \u2013\ninformation related to the question asked (setting aside page transformations applied for\nadvertising and relevance management reasons, as there have been many changes due to those\nas well).<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">As a convincing example, we can refer to the solution where, when we enter the query 'Author of The Man with the Golden Touch' as a search term, we get the specific answer to the specific question at the top of the page (updated to the present, of course). This is followed by a lot of additional information, and only then comes the long list of pages that contain the searched expression.<\/p>\n\n\n<div class=\"wp-block-image\">\n<figure class=\"aligncenter size-large\"><img loading=\"lazy\" decoding=\"async\" width=\"1024\" height=\"530\" src=\"https:\/\/ailabor.appsters.me\/wp-content\/uploads\/2024\/07\/Kepernyofoto-2024-07-29-14.34.06.png\" alt=\"\" class=\"wp-image-2927\" srcset=\"https:\/\/ailabor.appsters.me\/wp-content\/uploads\/2024\/07\/Kepernyofoto-2024-07-29-14.34.06.png 2592w, https:\/\/ailabor.appsters.me\/wp-content\/uploads\/2024\/07\/Kepernyofoto-2024-07-29-14.34.06-300x162.png 300w, https:\/\/ailabor.appsters.me\/wp-content\/uploads\/2024\/07\/Kepernyofoto-2024-07-29-14.34.06-1024x553.png 1024w, https:\/\/ailabor.appsters.me\/wp-content\/uploads\/2024\/07\/Kepernyofoto-2024-07-29-14.34.06-768x415.png 768w, https:\/\/ailabor.appsters.me\/wp-content\/uploads\/2024\/07\/Kepernyofoto-2024-07-29-14.34.06-1536x830.png 1536w, https:\/\/ailabor.appsters.me\/wp-content\/uploads\/2024\/07\/Kepernyofoto-2024-07-29-14.34.06-2048x1106.png 2048w, https:\/\/ailabor.appsters.me\/wp-content\/uploads\/2024\/07\/Kepernyofoto-2024-07-29-14.34.06-18x10.png 18w\" sizes=\"auto, (max-width: 1024px) 100vw, 1024px\" \/><\/figure>\n<\/div>\n\n\n<p class=\"wp-block-paragraph\">Despite spectacular partial successes, recognizing and understanding the internal structure of\nsentences, the meanings of words and phrases, that is, developing the semantic capabilities of\nmachines, remains a persistent challenge. Meanwhile, other technologies designed for different purposes have emerged and matured, which\u2014partially or entirely\u2014have\ntransformed and continue to transform our views on the linguistic capabilities of machines.<\/p>\n<\/div><\/div><\/div>\n<\/div><\/div><\/div><\/div>\n\n\n\n<h3 class=\"wp-block-heading\"><strong>Machine Conversation Based on Text Corpora<\/strong><\/h3>\n\n\n\n<p class=\"wp-block-paragraph\">Developments in artificial intelligence (AI) based on deep learning algorithms have followed\na very different logic compared to search engines, and these developments achieved\nincredible and astonishingly rapid successes by the 2020s. The new technologies based on\nLLM (Large Language Model) - from a few but very important aspects - elevated the speech\ncapabilities of machines to the quality of human discourse. With ChatGPT, one can\ncommunicate just as people do with each other. You can engage in a discourse with the\nmachine exactly in the same way \u2013 and in many aspects with the same quality \u2013 as we\nconverse with each other. We ask questions and receive well-formed sentences in response,\nboth syntactically and semantically. Such a breakthrough and explosive change have rarely\nbeen experienced since the beginning of the digital world.<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">What can an AI-based speaking machine do? Speak \u2013 just like we humans have been able to.\nWhat underlies this technology? A multitude of algorithms, numerous calculations, and a vast\namount of text. We haven't delved into the algorithms, the computational capacity\nrequirements, or the technical infrastructure issues deep within the service so far, and we can\ndisregard them now as well. These are obviously extremely important for ensuring systematic\noperation, but they are not really necessary for comparing the processes analyzed here,\nestablishing the interpretive framework, and making final evaluations.&nbsp;<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">To simplify matters, let&#039;s say that this speaking machine also uses the same vast text corpus as\nthe search engine. This &quot;shared&quot; text corpus is nothing but a massive collection of previously\nscattered, human-created texts that have been gathered and digitally processed.&nbsp;<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">This corpus enables the search engine to find relevant text locations within the entire set when\ngiven a query and return them as results, allowing humans to see the context in which the\nsearch term appeared in a previously recorded document. If the search engine works well \u2013 in\na technical, syntactic sense \u2013 the semantic correctness of the answers provided by the machine\ncannot be questioned. If we jump to a page offered by the search engine and find the\ninformation there unsuitable or insufficient for us, the machine cannot be held responsible\nbecause it did not produce it. Of course, one of the important services of the search engine is\nthe relevance criteria by which it ranks the result documents, as this significantly influences\nwhat we read (and what we don&#039;t). However, even in this case, we cannot hold the machine\naccountable for the quality of the readable answers. The search engine can always defend\nitself in case of a &quot;bad answer&quot; by pointing to some part of the text corpus, saying that another\nperson said (wrote) this, so they are responsible for the content of the answer.<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">This corpus enables the search engine to find relevant text locations within the entire set when\ngiven a query and return them as results, allowing humans to see the context in which the\nsearch term appeared in a previously recorded document. If the search engine works well \u2013 in\na technical, syntactic sense \u2013 the semantic correctness of the answers provided by the machine\ncannot be questioned. If we jump to a page offered by the search engine and find the\ninformation there unsuitable or insufficient for us, the machine cannot be held responsible\nbecause it did not produce it. Of course, one of the important services of the search engine is\nthe relevance criteria by which it ranks the result documents, as this significantly influences\nwhat we read (and what we don&#039;t). However, even in this case, we cannot hold the machine\naccountable for the quality of the readable answers. The search engine can always defend\nitself in case of a &quot;bad answer&quot; by pointing to some part of the text corpus, saying that another\nperson said (wrote) this, so they are responsible for the content of the answer.&nbsp;<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">The aspect of context management is important here. Human linguistic communication is\nextremely context-sensitive, meaning that our linguistic utterances can only be truly\nunderstood in a given context, and our similar or very similar utterances (words, expressions,\nsentences) can carry different meanings from context to context. This is what we mean by\nsaying our language use (our language) is extremely flexible. This was one of the biggest\nobstacles to developing effective machine language capabilities for a long time. The speaking\nmachine solved this problem by being able to identify and learn from the vast number of\nlanguage use contexts found in its text corpus, based on the words, expressions, sentences,\ntext environments, and statistical patterns among them. Since its corpus consists of texts\npreviously recorded by humans, we can say that the machine learns in which contexts people\nuse what words, what expressions, and what answers to questions with what probability. With\nsufficient data, computational capacity, money, and of course, sufficient human intelligence,\nthis combination can eventually become operational.<\/p>\n\n\n\n<div class=\"wp-block-essential-blocks-accordion  root-eb-accordion-y63ap\"><div class=\"eb-parent-wrapper eb-parent-eb-accordion-y63ap\"><div class=\"eb-accordion-container eb-accordion-y63ap\" data-accordion-type=\"accordion\" data-tab-icon=\"fas fa-angle-right\" data-expanded-icon=\"fas fa-angle-down\" data-transition-duration=\"500\"><div class=\"eb-accordion-inner\">\n<div class=\"wp-block-essential-blocks-accordion-item eb-accordion-item-ccje3 eb-accordion-wrapper\" data-clickable=\"false\"><div class=\"eb-accordion-title-wrapper eb-accordion-title-wrapper-eb-accordion-y63ap\" tabindex=\"0\"><span class=\"eb-accordion-icon-wrapper eb-accordion-icon-wrapper-eb-accordion-y63ap\"><span class=\"fas fa-angle-right eb-accordion-icon\"><\/span><\/span><div class=\"eb-accordion-title-content-wrap title-content-eb-accordion-y63ap\"><h3 class=\"eb-accordion-title\"><strong>Hallucination<\/strong><\/h3><\/div><\/div><div class=\"eb-accordion-content-wrapper eb-accordion-content-wrapper-eb-accordion-y63ap\"><div class=\"eb-accordion-content\">\n<p class=\"wp-block-paragraph\">Although the conversational machine can engage in dialogue similarly to how humans\nconverse with each other, we cannot say that it has reached the level of intelligence. But why\nnot? The conversational machine does not use its corpus as a reference but to participate in\nconceptually well-formed discourse. It knows that a dog barks, a cat has kittens, not chicks,\nan airplane flies but is not a bird, a penguin cannot fly yet is a bird, etc. Its sentences are\nalmost always well-formed, and its knowledge about the world is convincing. It often\nperforms well on the particular, individual, factual level of human knowledge, possessing an\nextensive factual knowledge base. However, on this level, it can easily make mistakes. It\nquickly became evident that the conversational machine often hallucinates, producing\nfactually unfounded, erroneous responses. We will not delve into the reasons for this here,\neven though perhaps the most serious criticism against the conversational machine is directed\nat this weakness. From the perspective of our train of thought, however, this shortcoming is\nnot particularly significant.<\/p>\n<\/div><\/div><\/div>\n\n\n\n<div class=\"wp-block-essential-blocks-accordion-item eb-accordion-item-v7wab eb-accordion-wrapper\" data-clickable=\"false\"><div class=\"eb-accordion-title-wrapper eb-accordion-title-wrapper-eb-accordion-y63ap\" tabindex=\"0\"><span class=\"eb-accordion-icon-wrapper eb-accordion-icon-wrapper-eb-accordion-y63ap\"><span class=\"fas fa-angle-right eb-accordion-icon\"><\/span><\/span><div class=\"eb-accordion-title-content-wrap title-content-eb-accordion-y63ap\"><h3 class=\"eb-accordion-title\"><strong>Translation<\/strong><\/h3><\/div><\/div><div class=\"eb-accordion-content-wrapper eb-accordion-content-wrapper-eb-accordion-y63ap\"><div class=\"eb-accordion-content\">\n<p class=\"wp-block-paragraph\">Although the phenomenon of hallucination may raise doubts, we can also debate whether\nsubstantial improvement can be hoped for in this area, and if so, how and to what extent.\nHowever, it is hardly debatable that the conversational machine&#039;s speech capabilities are at a\nvery high level. Consequently, this implies that based on this ability, the system has excellent\ntranslation capabilities between different languages. It is not only adept at translating between\ntwo natural languages but is also capable of translating between natural and formal languages.\nExploiting this latter capability holds enormous potential, particularly when used to make the\nconversational machine function as a translator between natural languages and a specific\nformal language, SQL.<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">SQL (Structured Query Language) is the query language for databases. Early in the\ndevelopment of computer science, this formal language was established to standardize the\nextraction of information stored in databases. With some simplification, we can say that all\nknowledge built into databases by humans over the past decades (nearly fifty years) is\naccessible through SQL commands. While this language is not complex, laypersons cannot\nuse it, necessitating the help of specialists to access the information stored in databases. This\nhas imposed limitations on those for whom the continuous access and everyday use of this\nwealth of knowledge would otherwise be important.&nbsp;<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">At this point, the conversational machine offers new opportunities. Before we delve into how\nand why this is the case, we need to understand what the concept of a database means and\nwhy it is so important from the perspective of knowledge representation and knowledge\nmanagement.<\/p>\n<\/div><\/div><\/div>\n<\/div><\/div><\/div><\/div>\n<\/div><\/div>\n<\/div><\/div>\n\n\n\n<div style=\"background-image:url(&apos;https:\/\/ailabor.appsters.me\/wp-content\/uploads\/2024\/06\/Ellipse-4.png&apos;);background-position:50% 30%;background-size:cover;\" class=\"wp-block-group has-background\"><div class=\"wp-block-group__inner-container is-layout-constrained wp-block-group-is-layout-constrained\">\n<div class=\"wp-block-group group-1240\"><div class=\"wp-block-group__inner-container is-layout-constrained wp-block-group-is-layout-constrained\">\n<h3 class=\"wp-block-heading\"><strong>Machine Search in Databases<\/strong><\/h3>\n\n\n\n<p class=\"wp-block-paragraph\">At the dawn of computer science, experts began building databases, and this fact alone is\nremarkable. However, the importance of databases is further highlighted by the immense\nfinancial value they represent within the economy, as well as the vast amount of information\nstored in this form. The quantitative dominance of databases can be illustrated using a\nconceptual pair introduced for seemingly different purposes. In the early 2000s, the concepts\nof the surface web and deep web emerged, which have since been used in various senses and\nfor various purposes. The surface web is defined as the collection of information freely\naccessible through the web. Here, free access means that both people and machines can reach\nand read the information stored on the given page. This is contrasted with the notion of the\ndeep web, which refers to those sites that are technically freely accessible via the network but\nare practically restricted in some way. There are multiple answers to why the deep web pages\ncannot be used as freely and unrestrictedly as those on the surface web.\nOn one hand, there are sites (quite a few) that place technical barriers to entry (password-\nprotecting the given area). In these cases, technical and legal barriers prevent free use. But\nthere is another access barrier, not explained by technical and legal obstacles, but by the fact\nthat to use the data stored in freely accessible web databases, one needs to know the internal\nstructure of the database and have the ability to use the SQL language. Neither machines nor\nhumans can overcome this obstacle. When search engines reach such sites and want to harvest\nthe content found there to incorporate it into their search services, they cannot query the\ndatabase content because they do not know the schema information. If they did, they could\nharvest data just like they do with content found on the surface web. Humans are even more\n\"helpless\" since, even with schema information, they could not use the query language to extract data from the databases because they do not know how to use the SQL query\nlanguage. Yet, the stakes are high at this point. Expert estimates suggest that overall,\nmagnitudes more information is stored on these deep web pages compared to the entire\nexpanse of the surface web.<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">Databases are not only important because they provide access to an incredible amount of\ninformation, but also because their quality is somehow better, more accurate, and clearer than\nthat of simple text documents. To understand why this is, we need to know the essence,\nquality of a database, and the difference between &quot;simple&quot; text and a database.<\/p>\n\n\n\n<div class=\"wp-block-essential-blocks-accordion  root-eb-accordion-6c6qd\"><div class=\"eb-parent-wrapper eb-parent-eb-accordion-6c6qd\"><div class=\"eb-accordion-container eb-accordion-6c6qd\" data-accordion-type=\"accordion\" data-tab-icon=\"fas fa-angle-right\" data-expanded-icon=\"fas fa-angle-down\" data-transition-duration=\"500\"><div class=\"eb-accordion-inner\">\n<div class=\"wp-block-essential-blocks-accordion-item eb-accordion-item-w907c eb-accordion-wrapper\" data-clickable=\"false\"><div class=\"eb-accordion-title-wrapper eb-accordion-title-wrapper-eb-accordion-6c6qd\" tabindex=\"0\"><span class=\"eb-accordion-icon-wrapper eb-accordion-icon-wrapper-eb-accordion-6c6qd\"><span class=\"fas fa-angle-right eb-accordion-icon\"><\/span><\/span><div class=\"eb-accordion-title-content-wrap title-content-eb-accordion-6c6qd\"><h3 class=\"eb-accordion-title\"><strong>Database<\/strong><\/h3><\/div><\/div><div class=\"eb-accordion-content-wrapper eb-accordion-content-wrapper-eb-accordion-6c6qd\"><div class=\"eb-accordion-content\">\n<p class=\"wp-block-paragraph\">When shopkeepers thousands of years ago started keeping records on paper (or clay tablets) to\ntrack how much of each product they sold daily, they were essentially entering words,\nphrases, and numbers into a table. The tabular arrangement of linguistic information back\nthen was a form of knowledge representation similar to today&#039;s databases. In other words, we\nhave long known that writing text in tables can have advantages over simply stringing\nsentences together. Information stored in a table can be read linearly, just like written text, but\nit doesn&#039;t require some of the grammatical rules necessary for well-formed sentences.\nHowever, a table is not only arranged in one direction (like text) but also organizes its\nelements into columns. This means the table can be read and evaluated both horizontally and\nvertically (as the shopkeeper does when they enter the number and price of products sold each\nday and then sums these values at the end of the day to find out the daily turnover or revenue).\nThe essence of the tabular format is its arrangement in two dimensions. We can say that a\ntable is structured text.<\/p>\n<\/div><\/div><\/div>\n\n\n\n<div class=\"wp-block-essential-blocks-accordion-item eb-accordion-item-v85iy eb-accordion-wrapper\" data-clickable=\"false\"><div class=\"eb-accordion-title-wrapper eb-accordion-title-wrapper-eb-accordion-6c6qd\" tabindex=\"0\"><span class=\"eb-accordion-icon-wrapper eb-accordion-icon-wrapper-eb-accordion-6c6qd\"><span class=\"fas fa-angle-right eb-accordion-icon\"><\/span><\/span><div class=\"eb-accordion-title-content-wrap title-content-eb-accordion-6c6qd\"><h3 class=\"eb-accordion-title\"><strong>Short Linguistic Theory Digression<\/strong><\/h3><\/div><\/div><div class=\"eb-accordion-content-wrapper eb-accordion-content-wrapper-eb-accordion-6c6qd\"><div class=\"eb-accordion-content\">\n<p class=\"wp-block-paragraph\">At this point, a more thorough discussion would necessarily address a type of human\nlinguistic capability not yet covered, for which Ferdinand de Saussure&#039;s theory could provide\ninsight. According to his theory, we can explain why and how we can create structured text.\nHere, it is sufficient to recall from Saussure&#039;s theory that since language use requires two\ntypes of abilities from humans, our communicative activity must be understood and\ninterpreted in two dimensions.&nbsp;<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">The first linguistic ability is when, during our utterances, we string our words together\nlinearly to form meaningful sentences. This can be understood in the morphosyntactic\ndimension, and the question here is how we can form well-constructed sentences. This is the\nprimary level of our linguistic ability, where immediate visible\/audible results are produced.\nAt this level (with this ability), our sentences are formed, and from these sentences, our\naudible speeches or readable written texts and documents are created.<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">However, Saussure recognized that we have another linguistic ability: when we compose our\nsentences, we also pay attention, in another dimension, to the rules by which we can insert\ncurrent words into morphosyntactic forms, patterns (sentence schemas). This ability can be\ndescribed as a kind of classificatory (semantic, ontological) knowledge that operates based on\nour knowledge of the world. When we form sentences, certain words&#039; usage is permitted or\nforbidden in a specific position within a given sentence type (sentence schema). This\nknowledge \u2013 under normal circumstances \u2013 does not appear in either the acoustic or visual\nspace, but we still use it. Saussure called this the associative dimension; his followers today\noften use the term paradigmatic dimension for the same concept. When we write something in\na table, during the process of organizing into columns (the vertical dimension), we utilize this\nability.<\/p>\n\n\n\n\n\n<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">Using a database (table) means that we record (represent) our statements (knowledge) about the world in such a way that within the statements, we separate the meaningful components (words, phrases). By doing so, we are able to handle the components of our statements separately (reference, search, calculate, etc.). This way of shaping linguistic messages makes the content of our statements much clearer and more precise compared to statements expressed in free text. Because of the clarity resulting from the structure, we can do much more with the sentences expressed this way, perform various operations, and extract more from the same set of sentences. This qualitative advantage gives databases their strength and benefits.<\/p>\n\n\n<p>\n\n\n\n\n\n\n<p class=\"wp-block-paragraph\">Therefore, databases are richer in data compared to text, providing us with more, but at a cost. This added value arises because building databases requires a lot of work \u2013 consisting of organizational operations \u2013 and this initial investment is often quite high. Fortunately, many have taken on this work, resulting in many databases being built in the past and continuing to\nbe built in the present.<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">And at this point, we can return to our original line of thought.<\/p>\n<\/div><\/div><\/div>\n<\/div><\/div><\/div><\/div>\n\n\n\n<h3 class=\"wp-block-heading\"><strong>Text2SQL<\/strong><\/h3>\n\n\n\n<p class=\"wp-block-paragraph\">We left off at the point where:<\/p>\n\n\n\n<ul class=\"wp-block-list\">\n<li>On one hand, we have a conversational machine (chatbot) that can communicate\nvery well in natural language, translates fairly well between natural languages, as\nwell as between natural and certain formal languages, but is not factually reliable\nand often hallucinates.<\/li>\n\n\n\n<li>On the other hand, we have databases that organize knowledge in a structured way,\nensuring that the accessible knowledge is reliable, accurate, and can be searched,\nreorganized, and calculated in many ways.<\/li>\n<\/ul>\n\n\n\n<p class=\"wp-block-paragraph\">As previously indicated, these two capabilities can be connected by using the conversational\nmachine solely to translate natural language questions posed by SQL novices into SQL\ncommands, which can then be sent to the database. The tabular answers received from the\ndatabase can be translated back into free text. In this case, there is no risk of hallucination\nbecause the answers are not expected from the conversational machine but from the\ndatabases, with the conversational machine only serving as an interpreter. Of course, this\nrequires metadata describing the structure of the databases, schema information, and the\nconversational machine must be tuned for this specific translation task. This, however, seems\nfeasible.<\/p>\n\n\n\n\n\n<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">For the database maintainer, an important consideration might be that the schema information does not need to be publicly released; it is sufficient to show or teach it to the chatbot. If the chatbot knows the structure of the database (tables, fields of the tables, types, relationships between tables, etc.), it will know how to formulate queries to extract the desired data from the database. Naturally, it needs access to do this, but that can be arranged. It is possible that the chatbot cannot yet formulate every query as a human expert would, but it can already generate simpler commands accurately and will surely be able to improve this capability significantly in the near future.<\/p>\n\n\n<p>\n\n\n\n\n\n\n<div style=\"height:40px\" aria-hidden=\"true\" class=\"wp-block-spacer\"><\/div>\n<\/div><\/div>\n<\/div><\/div>","protected":false},"excerpt":{"rendered":"<p>Szerz\u0151: Szakad\u00e1t Istv\u00e1n G\u00e9pi keres\u00e9s sz\u00f6vegkorpuszban Amikor a CD-ROM-ok megjelentek a kilencvenes \u00e9vek elej\u00e9n, azzal hirdett\u00e9k \u0151ket, hogy egy CD-lemezen annyi k\u00f6nyv sz\u00f6vege t\u00e1rolhat\u00f3, amennyi a hagyom\u00e1nyos, pap\u00edr alap\u00fa form\u00e1tumban egy polcon f\u00e9rne el. R\u00e1ad\u00e1sul a sz\u00e1m\u00edt\u00f3g\u00e9pek seg\u00edts\u00e9g\u00e9vel nagyon gyorsan lehetett benn\u00fck keresni, \u00e9s ezt m\u00e9g ink\u00e1bb kontrasztba lehetett \u00e1ll\u00edtani az ember keres\u00e9si k\u00e9pess\u00e9g\u00e9vel. \u201eKeress\u00fck&#8230;<\/p>","protected":false},"author":5,"featured_media":0,"parent":0,"menu_order":0,"comment_status":"closed","ping_status":"closed","template":"","meta":{"_eb_attr":"","inline_featured_image":false,"_kad_post_transparent":"","_kad_post_title":"","_kad_post_layout":"","_kad_post_sidebar_id":"","_kad_post_content_style":"","_kad_post_vertical_padding":"","_kad_post_feature":"","_kad_post_feature_position":"","_kad_post_header":false,"_kad_post_footer":false,"footnotes":""},"class_list":["post-1149","page","type-page","status-publish","hentry"],"_links":{"self":[{"href":"https:\/\/ailabor.appsters.me\/en\/wp-json\/wp\/v2\/pages\/1149","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/ailabor.appsters.me\/en\/wp-json\/wp\/v2\/pages"}],"about":[{"href":"https:\/\/ailabor.appsters.me\/en\/wp-json\/wp\/v2\/types\/page"}],"author":[{"embeddable":true,"href":"https:\/\/ailabor.appsters.me\/en\/wp-json\/wp\/v2\/users\/5"}],"replies":[{"embeddable":true,"href":"https:\/\/ailabor.appsters.me\/en\/wp-json\/wp\/v2\/comments?post=1149"}],"version-history":[{"count":23,"href":"https:\/\/ailabor.appsters.me\/en\/wp-json\/wp\/v2\/pages\/1149\/revisions"}],"predecessor-version":[{"id":3119,"href":"https:\/\/ailabor.appsters.me\/en\/wp-json\/wp\/v2\/pages\/1149\/revisions\/3119"}],"wp:attachment":[{"href":"https:\/\/ailabor.appsters.me\/en\/wp-json\/wp\/v2\/media?parent=1149"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}