Thursday, April 11, 2013

Php Script to load data to HIVE from MySQL tables


I used the below code to load the data from MySQL to hive with no transformation what so ever.

<?
$startTime = microtime(true);
$host = "localhost";
$user = "root";
$password = "";
$backupFolder = '/tmp/hivebackup';
$hiveRestoreFolder = '/tmp/hivebackup';

$mysqli = new mysqli($host, $user, $password);

/* check connection */
if ($mysqli->connect_errno) {
    printf("Connect failed: %s\n", $mysqli->connect_error);
    exit();
}

//tables to backup
$tablesArr = array(
"db.tbl1",
"db2.tbl2",
);

$fp = fopen("$backupFolder/restore.sql", "w");
$hiveql = "\ncreate database if not exists db1; ";
$hiveql.= "\ncreate database if not exists db2;";
fwrite($fp, $hiveql);

$tablesCompleted = 0;
//foreach table take backup
foreach($tablesArr as $table)
{
$tableStartTime = microtime(true);
print "\nProcessing $table";
@unlink("$backupFolder/$table.log");

//get the table desc
if ($result = $mysqli->query("DESC $table", MYSQLI_USE_RESULT))
{
$sql = "SELECT ";
$hiveql  = "\n\nDROP table if exists $table;";
$hiveql .= "\nCREATE TABLE $table(";

while($col = $result->fetch_array())
{
   $sql .= ' replace(replace(replace( `'.$col["Field"].'`, "\t", ""), "\n", ""),"\r", "") ,';  $hiveql .= "\n\t`". $col["Field"]. "` " .getHiveType($col["Type"]).",";
}
$sql = substr($sql,0, -1);
$hiveql = substr($hiveql,0, -1);

$sql .= " INTO outfile '$backupFolder/$table.log' from $table \n";
$hiveql .= "\n)\n".'ROW FORMAT DELIMITED FIELDS TERMINATED BY "\t" LINES TERMINATED BY "\n"; ';
$hiveql .= "\nLOAD DATA INPATH '$hiveRestoreFolder/$table.log' INTO TABLE $table;";

$mysqli->query($sql);
fwrite($fp, $hiveql);

$tablesCompleted++;
}
print "\nBackup taken for $table in ". number_format(microtime(true) - $tableStartTime, 2). " sec";
}
fclose($fp);
print "\n$tablesCompleted tables are backed up in ".number_format(microtime(true) - $startTime, 2). " sec";

print "\n\t\t===== ===== ===== ===== ===== Done ===== ===== ===== ===== =====\n";

function getHiveType($mysqlType)
{
$mysqlType = explode('(', $mysqlType);
$mysqlType = $mysqlType[0];

switch(strtoupper($mysqlType))
{
case 'TINYINT' : return 'TINYINT';
case 'SMALLINT' : return 'SMALLINT';
case 'INT' : return 'INT';
case 'BIGINT' : return 'BIGINT';
case 'FLOAT' : return 'FLOAT';
case 'DECIMAL' : return 'FLOAT';
case 'DOUBLE' : return 'DOUBLE';
case 'BOOLEAN' : return 'BOOLEAN';
case 'TIMESTAMP': return 'TIMESTAMP';
case 'DATETIME' : return 'TIMESTAMP';
default : return 'STRING';

}
}
?>

## Now execute the php file.
$ php myfile.php

# restore from the file you have created
$ hive -f  /tmp/hivebackup/restore.sql


Note: I needed only selective tables for my purpose, to load whole DB can find all table or tables based on some condition, SHOW TABLES FROM DB1 like 'xx%' can be used

No comments:

Post a Comment