Get Data - part 3: AI/natural language options
Choosing local vs. commercial language models
Exponam Analyst Intelligence has robust natural language text-to-SQL capabilities. We'll go through choices in this article. The next article will step through model configuration.
Do you want to use a local model or a commercial vendor? It depends upon your regulatory environment, your company's AI rules, economics, and your computer.
Local Models
To use a local model, you will need to install an llm model on your computer. It will run locally using your machine's resources. Generally speaking, the capabilities of the model improve as the model size increases. And as the size increases, the performance decreases - sometimes to unacceptable levels. How large a model you can run is a function of your computer's hardware. Computers with dedicated Nvidia graphics cards can handle larger models than those without VGA cards.
Benefits of using local models:
- Cost - there is no cost and no tokens are expended when running a model locally
- Security - no data of any kind is transmitted outside of your company. There is no risk of "shadow AI" or of data leakage
- Speed - using an appropriately sized model for your computer can actually be faster than using commercial models
- Fine-tuning - your company may have models which have been specifically trained and tuned for use on your data sets
Disadvantages of using local models:
- Speed - on some computers, even small models can take a long time to run. We provide a wide range of model size suggestions
- Quality - relatively smaller models are, by definition, less well trained and have a smaller base of knowledge on which to draw than commercial models.
- No descriptive output - the only output of the local model is the SQL requested. You will not also get a natural language description of the output, like with a commercial model
Local Model Options
While you can use any llm on your machine, we provide the following suggestions for different size models
|
Size |
# of params |
Name |
Download url |
|
Small |
8B |
llama-3-sqlcoder-8b-Q3_K_L.gguf |
|
|
Medium |
14B |
OmniSQL-14B.Q8_0.gguf |
https://huggingface.co/mradermacher/OmniSQL-14B-GGUF/resolve/main/OmniSQL-14B.Q8_0.gguf |
|
Medium |
32B |
OmniSQL-32B.i1-Q4_K_M.gguf |
https://huggingface.co/mradermacher/OmniSQL-32B-i1-GGUF/resolve/main/OmniSQL-32B.i1-Q4_K_M.gguf |
Commercial AI Models
To use a commercial vendor like Anthropic, you will need to have your own API account with the vendor. When you use the model for generating SQL, your question, a detailed set of instructions, and your data's schema (meta data) is sent to the vendor. The commercial models are constantly evolving and improving.
Benefits of using commercial models:
- Quality of SQL generation - commercial models use vastly greater resources and much more training when generating your requested SQL. As a result, the SQL generated can (but not necessarily) be of better quality than with a local model.
- More natural language descriptive output - a commercial model will generate not only your requested SQL, but will also provide a natural language explanation of what data is being returned and how it is derived. This is very convenient, especially for users who are less familiar with reading SQL themselves
Disadvantages of using commercial models:
- Cost - you are charged a fee every time you make a request of the commercial model
- Security - you are transmitting your data schema and metadata (not your actual data) outside of your organization. This may be a breach of your company security protocol.
See part 2 for configuring SQL scratchpads.
See part 4 for AI model configurations.