Vertica for MySQL Users

Dave Andre

Version 1.0.1, February 20, 2013

Copyright © 2013. All Rights Reserved.

 

Overview

This document provides an introductory overview of the Vertica analytics database system from the perspective of a user who is already familiar with MySQL.  The information here is introductory only, and is not intended to be a thorough discussion.

What is Vertica?  Why use it?

Vertica is a high-performance SQL-based database that is specifically designed for analytics and reporting.  Because it is SQL-based, it can integrate easily with tools and applications that integrate with SQL, including for example SAS and Tableau.  As a MySQL user, you will have a very fast learning curve with Vertica.

Vertica achieves high performance for analytics and reporting applications in a number of ways that make it superior to traditional database solutions such as MySQL:

·         Data in Vertica is organized on disk by columns, not by rows.  Analytics and reporting queries typically only access a few columns, and by organizing data in columns disk I/O is dramatically reduced.

·         Vertica furthermore economizes on disk I/O by automatically compressing data in columns, often to a small fraction of the data’s original size.

·         Vertica is designed to work in a cluster configuration with 3+ nodes.  It is very effective at spreading workload in a query to all available nodes, as well as effectively using all available processor capacity.  Multiple nodes effectively increases disk throughput as well.  Unlike MySQL, this “sharding” is completely invisible to the user.

·         Instead of using indexes, Vertica organizes columnar data on disk in multiple sort orders which it can use to optimally execute queries.  The Vertica Database Designer is a tool that administrators can use to analyze queries that have been run frequently, and automatically optimize data layout for those queries.

·         Vertica provides a number of SQL extensions to support analytics applications efficiently.  As a simple example, Vertica natively supports the ability to find a median of a data set, while standard SQL does not.  Vertica also is highly integrated with the open source “R” analytics language.

·         Vertica’s architecture allows high query performance while simultaneously loading data.

The performance advantages of Vertica are dramatic.  Most commonly, where MySQL queries take minutes or hours, Vertica queries return in mere seconds.

Despite Vertica’s advantages for analytics and reporting, it is not necessarily a good choice for other applications such as transaction processing.  Vertica works best if most interactions with it are INSERT and SELECT, and relatively few DELETE/UPDATE operations.  Vertica can also be difficult to use if your application relies on the database to enforce uniqueness or foreign key constraints.

Accessing Vertica

Vertica provides a command line tool, vsql, with much of the same functionality as the mysql command line tool.  vsql is very flexible for shell scripting, data import/export, administration, and quick querying.  Here are common flags you may be using with mysql, and their correspondences in vsql:

 

mysql

vsql

-h myserver

-h myserver

-u bob

-U bob

-p[password]

-w [password] (note the space)

-e “sqlcommand

-c “sqlcommand

< infile.sql

-f infile.sql

Databasename

(no equivalent, see Databases and Schemas)

 

Once inside the tool, you type SQL commands terminated by semicolons.  In addition, the following commands are often useful:

 

mysql

vsql

\? (help)

\? (help)

use databasename

set search_path=schemaname[, schemaname]…;

show tables;

\d

desc tablename;

\d tablename

source inputfile.sql

\i inputfile.sql

 

\timing (toggles timing information)

 

Example Vertica session, using a user account named “readonly”:

 

$ vsql -h sjc-bo-347 -U readonly

Password:

Welcome to vsql, the Vertica Analytic Database interactive terminal.

 

Type:  \h or \? for help with vsql commands

       \g or terminate with semicolon to execute query

       \q to quit

readonly=> \timing

Timing is on.

readonly=> select count(*) from mnlog.weblog;

   count

------------

 2078088070

(1 row)

 

Time: First fetch (1 row): 1248.609 ms. All rows formatted: 1248.653 ms

readonly=> select type, count(*) from mnlog.weblog group by type;

 type |   count

------+------------

 POST |   62666094

 GET  | 2012525601

 HEAD |    2896375

(3 rows)

 

Time: First fetch (3 rows): 1278.400 ms. All rows formatted: 1278.450 ms

 

Many applications such as Tableau can access Vertica directly.  By default, these applications connect to the server on port 5433.  (MySQL defaults to 3306.) 

 

Example of accessing Vertica in SAS via ODBC:

 

PROC SQL;

   CONNECT TO ODBC (datasrc="VerticaDSN" user=myusername password=mypassword);

   CREATE TABLE tablename AS

      SELECT * FROM connection to odbc (

         SELECT * FROM myschema.mytable

      ) ;

   DISCONNECT FROM ODBC;

QUIT;

 

Example of accessing Vertica in PHP via ODBC:

 

$db = odbc_connect('VerticaDSN', 'myusername', 'mypassword');

$res = odbc_exec($db, 'select count(*) as n from myschema.mytable;');

$row = odbc_fetch_array($res);

echo "count is ".$row['n']."\n";

 

Example of accessing Vertica in Java through JDBC.  Note that this is one of the few places where you actually need to specify the Vertica Database name (MYDB below).

 

import java.sql.Connection;

import java.sql.DriverManager;

import java.sql.PreparedStatement;

import java.sql.ResultSet;

import java.util.Properties;

 

/* ... */

 

Properties props = new Properties();

props.setProperty("user", "myusername");

props.setProperty("password", "mypassword");

con = DriverManager.getConnection("jdbc:vertica://myserver:5433/MYDB", props);

pst = con.prepareStatement("select stuff from mytable;");

rs = pst.executeQuery();

while rs.next() {

  System.out.println(rs.getString("n"));

}

Databases and Schemas

In MySQL, each server contains databases, and each database contains tables.  In the context of your “current” database, you can refer to tables simply with the table name.  You can also refer to tables in any database using the syntax databasename.tablename.  Example:

 

select count(*) from myschema.mytable;

 

In Vertica, each server cluster has one database, and you usually don’t need to refer to this database name.  In this database, there are schemas, and each schema contains tables.  You can refer to any database in any schema using the syntax schemaname.tablename.  If you refer only to a table name without a schema, then Vertica searches a “search path” of schemas to resolve the table name.  By default, the search path is the schema named “public”.  The following example demonstrates how you can use the search path to access myschema.mytable without qualification.

 

set search_path=myschema;

select ... from mytable ... ;

Data Types

Data types are much simpler in Vertica than in MySQL.  Because Vertica compresses all effectively, it is not necessary to specify the same level of storage precision as in MySQL.

 

int

There is only one data type for integers: int.  Types such as bigint, smallint, and tinyint are not allowed.  The unsigned qualifier is not allowed.

float

There is one floating point data type in Vertica, typically named “float”.  It can also be called “double precision” or “real”, but annoyingly it cannot be called “double”, so the naming is incompatible with MySQL.  Length qualifiers (i.e. float(8)) are ignored.

varchar(x)

The text and tinytext datatypes are not supported in Vertica, use varchar(x) instead.  Tinytext is equivalent to varchar(255).  The maximum length is 65000.

Character Sets

Vertica support utf8 only.  All character data should be converted to utf8 when loaded.

enum

Enum is not supported, but varchar fields compress very well. 

set

Set is not supported.  Instead use varchar or if bitmap functionality is required then int.

date, datetime

Vertica has a rich set of date and time data types including with timezone support.  However, it does not support MySQL’s concept of “timestamp” that is automatically updated when a row is inserted or updated.  Usually use date and datetime types.

 

Important:  Zero values for date types (i.e. 0000-00-00) are not allowed in Vertica.  These must be converted to null or a valid date when loading.

Type coercion

Vertica is often more picky about data type conversions than MySQL.  You can coerce an expression to another data type using the :: operator.  Example: ‘2013-01-01’::date

Tables and Indexes

The CREATE TABLE syntax is the nearly same in Vertica as MySQL.  MySQL extensions such as ENGINE=MyISAM are not allowed.

 

Generally in order to replicate a table in Vertica from one in MySQL, you need to do the following:

 

·         Replace all MySQL types with Vertica types, i.e. tinyint => int, tinytext => varchar(255), etc.

·         Remove the “unsigned” keyword.

·         Check all date/datetime/timestamp columns carefully.  If they can contain zero-valued dates, make the columns nullable and remember to null out the dates when loading.

·         Remove all MySQL extensions such as AUTO_INCREMENT, ENGINE=InnoDB, etc.

·         Generally, you should leave PRIMARY or UNIQUE indexes, as they can help Vertica optimize its queries.  Similarly keep foreign key constraints.  It is not necessary to keep non-unique indexes.  Note that in Vertica indexes don’t have names, and you say UNIQUE rather than UNIQUE KEY.

·         Consider carefully character set and collation.  All data must be converted to utf8 for loading into Vertica.

 

 

Example:

 

MySQL

Vertica

create table mytable (

   id bigint unsigned not null auto_increment,

   name tinytext not null,

   type enum('small','medium','large'),

   create_date datetime not null,

   primary key pk (id),

   unique key idx_name (name),

   key idx_create_date (create_date)

) ENGINE=InnoDB;

create table mytable (

   id int not null,

   name varchar(255) not null,

   type varchar(12),

   create_date datetime, -- allow null

   primary key (id),

   unique (name)

);

 

IMPORTANT:  Uniqueness and foreign key constraints are not automatically enforced in Vertica.  This is because the database has no index with which to enforce the constraints. For example, you can insert multiple rows with the same primary key, and Vertica will not complain.  However, Vertica may subsequently complain when executing a query that there has been a uniqueness constraint violation.

 

Vertica does enforce NOT NULL and similar data type constraints.

 

Vertica’s ALTER TABLE statements are completely incompatible with MySQL’s, but provide much of the same functionality.  Consult the documentation for more information.

 

To load data from a file into Vertica, use the COPY statement, which functions much like MySQL’s LOAD DATA INFILE statement.  Here is an example, which allows loading of data produced by MySQL with SELECT INTO OUTFILE or mysqldump -T:

 

MySQL

Vertica

load data infile '/tmp/mytable.txt'

into table mytable;

copy mytable from '/tmp/mytable.txt'

abort on error

null as '\N'

delimiter E'\t';

 

Like LOAD DATA INFILE, Vertica’s COPY statement has many options, allowing you to load data files in many formats, and perform data transformations on load.  Consult the documentation for more information.

 

If you are designing a new schema for Vertica, consider designing a schema that requires fewer joins but flattens the base tables to have a large number of columns.

SQL Language Differences

Here are some language differences that may trip up users who are used to MySQL:

 

IF

Vertica does not have the if(condition, expr, expr) statement like MySQL.

Instead use CASE WHEN condition THEN expr ELSE expr END

TRUNCATE

MySQL:  TRUNCATE TABLE mytable;

Vertica:  TRUNCATE mytable;

Date Functions

Vertica’s date functions are completely different than MySQL’s date functions.  Consult the documentation.

ORDER BY

MySQL by default will usually order results the same as in the original table, or as specified in the GROUP BY statement.  Vertica, on the other hand will return rows in an arbitrary order, often even returning different orders for the same query during the same session.  If order is important in results, make sure it is specified in Vertica.

GROUP BY

MySQL allows you to GROUP BY items that are not in the selection list.  MySQL also does not require you to GROUP BY all of the non-aggregate elements in the selection list.  Vertica, however, requires that you only GROUP BY items that are in the selection list, and furthermore that you must GROUP BY all non-aggregate elements in the selection list.

COMMIT and ROLLBACK

MySQL does not open a transaction unless you explicitly specify the transaction (and use a transaction-enabled storage engine).  Vertica, on the other hand, always opens a transaction when insert, delete, or update operations are performed.  If you do not COMMIT the transaction, results will be lost!

 

Warning:  Some statements, notably COPY and TRUNCATE, automatically commit.  This can lead to unexpected results if you’re not expecting this behavior.  Consult the documentation for more information.

AUTO_INCREMENT

Vertica has a concept of AUTO_INCREMENT that doesn’t exactly match with MySQL’s definition.  If AUTO_INCREMENT is important, you should consider using Vertica sequences.  Consult the documentation for more information.

String Literals

In MySQL, you can enclose string literals in either double quotes or single quotes.  In Vertica, you must use single quotes.

Other Considerations

Projections.  Vertica internally takes table data and maps it into projections.  A projection specifies a set of columns, and how they are encoded, compressed, and sorted.  You can create projections yourself, but Vertica provides automated tools to analyze data and queries, and automatically create projection definitions.

 

Deletes.  Vertica implements UPDATE as a DELETE followed by an INSERT.  Vertica’s DELETE does not actually delete data; rather, it marks the row as deleted as of a certain time.  This creates the very interesting capability of running a query “as of” a specific in the past.   However, with frequent DELETEs, tables can become very fragmented, resulting in reduced performance.  You can purge deleted rows from a table with the following statement:

 

select purge_table('myschema.mytable');

 

System Tables.  There are system tables in Vertica that allow you to view information about the database, schemas, tables, users, projections, sessions, queries, and more.  Use \dS in vsql to see a list of system tables and brief documentation.  May MySQL-specific statements such as SHOW TABLES can be implemented as simple queries on system tables.

 

Database Management.  Database management is typically done using the adminTools utility, which must be run as the dbadmin user.  adminTools is a menu-driven text console program and is fairly easy to understand and use.

 

Licensing Considerations.  Vertica’s licensing costs are based solely on the amount of data stored.  When calculating the size of data for licensing purposes, Vertica excludes temporary tables and multiple projections.  Roughly speaking, the size of the data for licensing purposes is the amount of storage that would be used if all tables were dumped into uncompressed tab-delimited files.

 

It sometimes makes sense to take licensing considerations into account when designing tables.  For example, consider a column that is varchar(255) and can have one a handful of values, averaging 20 bytes each.  If that column is in a table with 50 billion rows, it would then “cost” 1 Terabyte (20 bytes x 50 billion rows).  Alternatively, if the row were defined to have integer values from 0-9 or char(1) values, it would “cost” only 50 Gigabytes.  (In both case Vertica stores the values compressed; the only difference is licensing cost.)

 

Another licensing implication is that it may not make sense to have multiple tables with different snapshots of the same information.  Instead, define projections on the base table that basically accomplishes the same summarization.

Documentation

MySQL’s online documentation is publicly available on dev.mysql.com.  The best way to search the documentation is to simply use Google, for example search for “mysql 5.5 create table”.

 

Vertica has very good online documentation, but it must be accessed through Vertica’s portal, http://my.vertica.com, which requires a user account.  The online documentation includes a search function.