Concurrency in PowerShell: Background jobs

This is part one of a series.
Download code for this post.

There are three methods to achieve asynchronous processing in PowerShell: background jobs, System.Management.Automation namespace classes, and (coming soon in PoSH v3!) workflows. Workflows actually share characteristics of the other two, but they are different enough to be a new method. I suppose you could also use the System.Threading .NET namespace, but then you’re writing straight .NET code and that’s just cheating. Over the next three weeks, I’ll be posting information about all three of these methods. There will be a lot of information, but it will all be useful and necessary. If you can avoid some of the problems I ran into it will be worth while. I’ll try not to be long-winded, but no promises there.

Before you get started you should know that there are a lot of ‘gotchas.’ Like a lot of PowerShell you’ll come across behavior that isn’t particularly intuitive. There are also a few ways you can get yourself into real trouble. You should familiarize yourself with the concept of thread safety, for example.

The scenario for this series is a common one: I have 50 Excel worksheets that need to be loaded into an SQL Server database as a staging phase for an ETL process. Of course, I could load these one at a time, but depending on how large they are it could take a significant amount of time. Most importantly, I am not fully utilizing the computing resources at my disposal. I want to be to control the amount of resources I use so I can ramp it up for speed if I so desire. One way to do attempt this is through the concurrent processing pattern. Backup jobs are enabled through the use of the *-Job cmdlets and operate very similarly to the forking concurrent process model. Using these cmdlets is straightforward and generally protects you from doing anything particularly harmful, although it isn’t hard to make your computer unusable for a while as I’ll demonstrate shortly.

To get started, I need a data structure for the Excel documents that I’ll be working with as well as a script block to define the work. I’m using a version of some code I posted a while back for this data load task that I have wrapped into a function called Import-ExcelToSQLServer. Note also that this series is based on a presentation I gave at SQL Saturday #111 so you’ll see references throughout this series.

$files = Get-ChildItem '<path>' 

$ScriptBlock = `
{
    Param($File)
   
    Import-ExcelToSQLServer -ServerName 'localhost' -DatabaseName 'SQLSaturday' -SheetName "SQLSaturday_1" `
            -TableName $($File.Name.Replace('.xlsx','')) -FilePath $($File.FullName) -EA Stop
}

Note the properties of the $File object that are being used. Astute PoSHers will note that I’m having to jump through some hoops to get the parameters formatted, which shouldn’t be necessary with System.IO.FileInfo objects. The problem is that though there are System.IO.FileInfo objects in the $Files array the objects only exist in the memory space of the dispatcher process. This means that the new process created by the Start-Job cmdlet has no access to the object across the process context barrier. As a result, the only mechanism available to pass information to the spawned process is through serialization. PowerShell converts the object to XML and then deserializes the XML into an object of type [Deserialized.System.IO.FileInfo] that has a modified set of parameters and only the ToString() method. This is quite frustrating if you are hoping to utilize methods of a FileInfo object, which are no longer available. Not only that, but the overhead of serialization and deserialization is non-trivial, not to mention the overhead of instantiating another FileInfo object if required. On the other hand, the upside is that issues of concurrency control are eliminated and thread safety is guaranteed.

The intention is now to execute the script block for every file in the $files array and I do not want to wait for one to complete before the other begins. So, I can use a foreach loop to execute Start-Job.

Foreach($file in $files)
{
    Start-Job -ScriptBlock $ScriptBlock -ArgumentList $file `
         -InitializationScript {. Import-ExcelToSQLServer.ps1}
}

Note the need to include the file containing the function code as an initialization script, even if it is loaded in the host process, due to the process barrier. The new process will, however, execute the PowerShell profile if you have one defined. If you have task manager opened after executing this you’ll see something like this.

Start-Job Task Manager results

A process has been created for each file. The Start-Job cmdlet has also registered handles in the session. These handles are the connection from the host session to the pipeline running in the forked process.

Once the loop exits, control is returned to the host. This is the real intention of the *-Job cmdlets. They allow you to execute work and continue to utilize the shell. The jobs are available to check at the PoSHer’s leisure. If desired, you could use the Wait-Job cmdlet to enter a synchronous state and concede control until the specified job(s) is/are completed.

When the time comes, check to see if there are any jobs still running like so.

Get-Job -State "Running"

The Receive-Job cmdlet utilizes the handles registered by the Start-Job cmdlet to access the external process pipelines. Of course, you could provide a single job number or array to Receive-Job. To receive the results of all jobs at one time simply use the pipeline.

Get-Job | Receive-Job

When completed, remember to clear the job-handles.

Remove-Job *

Should I Use Background Jobs?

Fifty running PowerShell processes at ~20MB of memory each (or more, depending on the work), plus the required CPU time (compounded by the need to manage schedulers, etc.) makes for a huge amount of overhead. This results in a run-time that is more than three times longer than executing the tasks synchronously. The individual units of work (one script block execution) needs to be sufficiently long to overcome the overhead of the forked process model.

Significant improvement can be made by throttling the number of processes that are allowed to run at one time. To do so, modify the Start-Job loop like so:

Foreach($file in $files)
{
    Start-Job -ScriptBlock $ScriptBlock -ArgumentList $file `
         -InitializationScript {. Import-ExcelToSQLServer.ps1}

    While((Get-Job -State 'Running').Count -ge 2)
    {
        Start-Sleep -Milliseconds 10
    }
} 

The addition of the while loop forces a wait if there are two or more running jobs. This dramatically improves the performance of the process by reducing the amount of overhead at the cost of causing the jobs to essentially run synchronously relative to the dispatching process. Unfortunately, it is still slower than running the jobs synchronously.

Next week I will look at a much more effective method. I am also currently working on a module that will (hopefully) provide a replacement for the Job cmdlets that will operate in essentially the same manner, just much more efficiently. However, it is still important to understand the Job cmdlets if for no other reason than they are on every computer with PowerShell. Additionally, they are simple and perfectly effective in some use cases, such as kicking off a long running process while maintaining control of the shell.

Speaking at SQL Saturday #111

I’m completely thrilled to be invited back to Atlanta on April 14th to speak at SQL Saturday #111. I’ll be throwing caution to the proverbial wind and tackling a rather difficult subject…concurrency in PowerShell. Concurrency is a fascinating and complicated subject that both solves and creates all kinds of interesting Computer Science problems. There’s no way I’ll be able to talk about everything in 60 minutes, but there should be plenty of time to make everyone in the room awfully dangerous with multiple processors.

I chose the word “concurrency” carefully. Many people think that running processes at the same time (asynchronously) is multi-threading. To me, however, true multi-threading requires shared memory space and not all of PowerShell’s asynchronous capabilities use that model. In fact, you have to dig in to the PowerShell .NET assemblies to start understanding how to actually do true multi-threading. It isn’t the most intuitive, but it’s incredibly powerful when you wrap your hands around it.

I’m enjoying the topic so much, in fact, that I’m going to start my first series on this blog. Starting the Monday after SQL Saturday, I’ll be posting an in-depth look at the various methods of asynchronous processing in PowerShell. There is a small amount of information scattered around on various blog posts and forums out there, but I don’t know of anywhere that has a consolidated, more-or-less exhaustive look at the subject, so I think I’ll be bringing something good to the awesome PoSH community.

If you read this blog and come to SQL Saturday, make sure to say ‘hello!’ And make sure to follow the #sqlsat111 hash tag on Twitter.

Verifying SQL Server Database Backups with #PowerShell

Like all DBAs I’m a bit obsessive about database backups. And, of course, an untested backup isn’t much of a backup at all (my IO subsystems could be corrupting my backups RIGHT NOW). But, testing your backups is time consuming, dull work. So, to satisfy my needs for validated backups and my highly tuned aversion to repetitive tasks, I’ve automated this process. This is a PowerShell script that, given an array of servers, will read the past 24 hour backup history and restore the oldest full backup that occurred along with all log backups to the present point in time. Every day I get an email telling me (mostly) good things about my database backups and that means the tiny flame of life still burning in this DBAs jaded heart burns a little brighter. And if I do have a problem (I have issues with log backups from time to time), it tells me that, too.

Of course, this script is written with a certain environment in mind. For instance, I have the luxury of being able to take a full backup every night for my critical databases. You may have to tweak it to work for you. I also use Idera’s SQL Safe (but I don’t use it for log backups), so I’ve included code for both standard .bak and .safe files

There are also a lot of ways to improve / expand this script. For instance, it only works on databases with one data file, which will be limiting to a lot of people. But, I hope this provides a foundation to build something that works in your environment.

<############################################################################################################

.SYNOPSIS
   This script will, for all databases on a list of servers (except for master), restore the most recent full
   backup and any log backups that it finds up to the current time. 
   
.Notes
    AUTHOR: Jon Boulineau
    LASTEDIT: 26 October 2011
.Link
     http://newsqlblog.com
     
############################################################################################################>

# Use Chad Miller's Invoke-Sqlcmd2 to avoid SQLPS
# www.sev17.com
function Invoke-Sqlcmd2
{
	param
	(
	[string]$ServerInstance,
	[string]$Database,
	[string]$Query,
    # IMPORTANT MODIFICATION: 
    # The default timeout of 30 seconds which Chad set is insufficient for the purpose.  If a restore takes longer, it will give up
    # on the process and kick off the next restore in the list (also creating errors in the report).  7200 seconds is 2 hours, but 
    # adjust as needed.  Of course, in any legitimate timeout situation you could end up with a very very long running script. This can
    # also be set to 0 for an infinite timeout.  More info: http://msdn.microsoft.com/en-us/library/system.data.sqlclient.sqlcommand.commandtimeout.aspx
	[Int32]$QueryTimeout = 7200
	)

	$conn = new-object System.Data.SqlClient.SQLConnection
	$conn.ConnectionString="Server={0};Database={1};Integrated Security=True" -f $ServerInstance,$Database
	$conn.Open()
	$cmd = new-object system.Data.SqlClient.SqlCommand($Query,$conn)
	$cmd.CommandTimeout=$QueryTimeout
	$ds = New-Object system.Data.DataSet
	$da = New-Object system.Data.SqlClient.SqlDataAdapter($cmd)
	[void]$da.fill($ds)
	$conn.Close()
	$ds.Tables[0]
}

<####################### SET THESE VARIABLES TO CONFORM TO YOUR ENVIRONMENT ######################>

[Array]$serverNames = ""

# Specify the server to which the restores will be made.  BE CAREFUL WITH THIS !!!
[String]$testServer = ""

[String]$dataPath = "" 		# e:\SQLFiles\
[String]$logPath = "" 		# e:\SQLFiles\
[String]$SMTPServer = ""	# mail.mydomain.com
[String]$EmailTo = ""		# dba@mydomain.com
[String]$EmailFrom = ""		# admin@mydomain.com

# These are only necessary if you're using Idera SQLSafe
[String]$windowsUsername = "<Username>";
[String]$windowsPassword = "<EncryptedPassword>";

<####################### SET THESE VARIABLES TO CONFORM TO YOUR ENVIRONMENT ######################>


# There's probably a better way to format this using ConvertTo-HTML
[String]$resultHTMLTable = "<html><body><table>";

$historyQuery = 
    "SELECT    
        [bs].[database_name]
        , [bs].[backup_finish_date]
        , [bs].[server_name]
        , [bs].[type]
        , [mf].[physical_device_name]
    FROM
        [msdb].[dbo].[backupset] bs
        INNER JOIN [msdb].[dbo].[backupmediafamily] mf ON [bs].[media_set_id] = [mf].[media_set_id]
    WHERE
        DATEDIFF(hh , [bs].[backup_finish_date] , CURRENT_TIMESTAMP) <= 24
        AND [mf].[family_sequence_number] = 1"

foreach($name in $serverNames)
{

	# Get a list of the databases on the server
	$databases = Invoke-Sqlcmd2 -ServerInstance $name -Database master -Query "SELECT name FROM sys.databases WHERE name NOT IN ('tempdb','master')"

	# Get the entire backup history for the last 24 hours
	$backupSets = Invoke-Sqlcmd2 -ServerInstance $name -Database "master" -Query $historyQuery

	#For each of the databases, check for backups
	foreach($db in $databases)
	{
        clear;
		$resultHTMLTable = $resultHTMLTable + "<tr><td>$name</td><td>$($db.Name)</td>";
		
        # Note : This only works when one log or data file exists. 
		$dataPath = $dataPath + $($db.Name)_data.mdf
		$logPath = $logPath + $($db.Name)_log.ldf

		# We will need to bring the database online when the restores are complete
		$recoverQuery = "RESTORE DATABASE $($db.Name)"; 
		
		$dropQuery = "DROP DATABASE $($db.Name)";

		# Capture the most recent full backup
		$fullBackup = $backupSets | Where-Object{$_.database_name -eq $db.name -and $_.type -eq "D" } | Sort-Object -Property @{Expression={$_.backup_finish_date}; Ascending=$false} | Select-Object -First 1 ;

		if($fullBackup -eq $null) 
		{
			$resultHTMLTable = $resultHTMLTable + "<tr><td>$($db.Name)</td><td>Backups do not exist</td></tr>";
		}
		else
		{
       
			# Store an ordered list of log backups for the database
            $logBackups = $backupSets | Where-Object{$_.database_name -eq $db.name -and $_.type.ToString() -eq "L" -and $_.backup_finish_date -ge $($fullBackup.backup_finish_date)} | Sort-Object -Property @{Expression={$_.backup_finish_date}; Ascending=$true} ;

            # Idera
			#$fileList = Invoke-Sqlcmd2 -ServerInstance $testServer -Query "EXEC [master].[dbo].[xp_ss_restorefilelistonly] @filename = '$($fullBackup.physical_device_name)', @WindowsUsername = '$windowsUsername', @EncryptedWindowsPassword = '$windowsPassword'"

            # SQL Server standard        
            $fileList = Invoke-Sqlcmd2 -ServerInstance $testServer -Query "RESTORE FILELISTONLY FROM DISK = '$($fullBackup.physical_device_name)'"
            
			
			$dataFile = $fileList | Where-Object{$_.Type -eq "D"} 
			$logFile = $fileList | Where-Object{$_.Type -eq "L"} 
            
            # For Idera backups, specify a formatted parameter for data and log paths
			$dataMove = $($dataFile.LogicalName) + ' ' + $dataPath
			$logMove = $($logFile.LogicalName) + ' ' + $logPath

			# Idera
			# Note: This uses WITH REPLACE. Make sure this is what you want.
			<#
            $restoreQuery = "EXEC [master].[dbo].[xp_ss_restore]
                  @database = '$($fullBackup.database_name)'
    			, @filename = '$($fullBackup.physical_device_name)'
    			, @withmove = '$dataMove'
    			, @withmove = '$logMove'
    			, @Replace = 1 -- overwrite existing database 
    			, @DisconnectUsers = 1
    			, @WindowsUsername = '$windowsUsername'
    			, @EncryptedWindowsPassword = '$windowsPassword'
    			, @RecoveryMode = 'norecovery' -- may need to restore logs
    			, @nostatus = 0 -- send status to repository
    			, @ContinueAfterError = 0"
            #>
            
			# Note: This uses WITH REPLACE. Make sure this is what you want.
			$restoreQuery = "RESTORE DATABASE $($fullBackup.database_name) FROM DISK = '$($fullBackup.physical_device_name)' WITH REPLACE, NORECOVERY, CHECKSUM, MOVE '$($dataFile.LogicalName)' TO '$dataPath', MOVE '$($logFile.LogicalName)' TO '$logPath'"
            
			# Execute the restore query
			Write-Host "Restoring full backup for $($db.Name)";
			try
			{
				Invoke-Sqlcmd2 -ServerInstance $testserver -Database master -Query $restoreQuery -ErrorAction Stop

				if($logBackups -eq $null)
				{
                    $resultHTMLTable = $resultHTMLTable + "<td>No Log Backups</td>"
				} else
				{
                    $resultHTMLTable = $resultHTMLTable + "<td>Log Backups Exist</td>"
					foreach($log in $logBackups)
					{
						$restoreQuery = "RESTORE LOG $($db.Name) FROM DISK = '$($log.physical_device_name)' WITH NORECOVERY";

						Write-Host "Restoring log file $($log.physical_device_name) from $($log.backup_finish_date)";

						Invoke-Sqlcmd2 -ServerInstance $testserver -Database master -Query $restoreQuery -ErrorAction Stop
					}
				}

			}
			catch
			{
				$resultHTMLTable = $resultHTMLTable + "<td><font color=`"red`">$Error[0]</font></td>"

				# Recovering Database
				Invoke-Sqlcmd2 -ServerInstance $testServer -Query $recoverQuery -ErrorAction continue
                
				$Error.Clear();
			} 

			# Recovering Database
			Invoke-Sqlcmd2 -ServerInstance $testServer -Query $recoverQuery -ErrorAction continue
            
            # Drop the database
            Invoke-Sqlcmd2 -ServerInstance $testServer -Query $dropQuery -ErrorAction continue
            
			$resultHTMLTable = $resultHTMLTable + "<td>Success</td>"

			$resultHTMLTable = $resultHTMLTable + "</tr>"
		} # end if
	}
} 

$resultHTMLTable = $resultHTMLTable + "</table></body></html>";

# Send email report
Write-Host "Sending email";
Send-MailMessage -Subject "Database Restore Report" -BodyAsHtml $resultHTMLTable -SmtpServer $SMTPServer -To $EmailTo -From $EmailFrom

#sqlsat89: No Excuses. Be Involved.

When I first submitted my sessions to SQL Saturday #89 in Atlanta, I had no idea that so many fantastic speakers would be there.  If I had known, I probably would not have gotten involved.  You shouldn’t be like me.

From my point of view, it was an injustice to the attendees for me to take a speaking slot from any one of the amazing experts in attendance.  Would those who attended my session have learned more about PowerShell from Nic Cain (T | B), Jim Christopher (T | B) or Aaron Nelson (T | B)?  Yep.  Is there someone who was overlooked who would have had a better session than mine?  Certainly.  It would have been a stronger event if someone else had taken my place, but it may not have been a stronger community.

What I understand now is that there isn’t much to a community where only the top experts contribute.  We need them.  I need them.  But, a community is composed of people who are involved.  I have some knowledge of and a lot of passion for PowerShell and SQL Server and I wanted to give as much of that as I could to the people who were in my session.  And that’s what PASS is all about: giving whatever you can, even if others can give more.

Speaking at #sqlsat89 is one of the best experiences I’ve ever had in my career.  My session went very well and some new opportunities opened for me. I learned about parallelism from Adam Machanic (T | B) and about waits and latches from Bob Ward (T | B) .  Most importantly, I met many fantastic people I hope to see many more times at future events.  I will speak again and I will continue getting better. I will learn more and contribute more.  And one day I will be one of the experts getting bumped for a first time speaker.  I hope that will be someone who reads this and decides to become part of this fantastic SQL Server community.

#sqlsat89 scripts

Here are the scripts I used during my session, Getting Started With PowerShell. I’ll have an after event post soon.

Starting Services:


#Start SQL Server
Get-Command
Get-Command -CommandType Cmdlet
Get-Command -CommandType Cmdlet | Format-Table name

# List all services
Get-Service 

# Attempt to filter on the "SQL" string
Get-Service | Select-String -Pattern "SQL"

# The filter doesn't work, why not?
Get-Help Select-String

# The output of Get-Services is an array of objects!
Get-Service | Get-Member ;

# Use the Where-Object cmdlet to filter the results
Get-Service | Where-Object{$_.Name -match "SQL"}

# You can start every service on the server at once if you want. 
# Get-Service | Where-Object{$_.Name -match "SQL"} | Start-Service

# Use the -Force parameter to ignore dependencies
# Get-Service | Where-Object{$_.Name -match "SQL"} | Stop-Service -Force

# There are several ways to start the services
# Start-Service "SQLSERVERAGENT" ;
# Start-Service "MSSQLSERVER" ;

Start-Service "MSSQLSERVER", "SQLSERVERAGENT" ; 
"MSSQLSERVER", "SQLSERVERAGENT" | Start-Service;

Renaming Agent Jobs:


# Declare a variable for the desired username
$job_owner = "DBAsRule" ;

# Even text is an object!
$job_owner |Get-Member

# Declare the intention to use the Microsoft.SqlServer.SMO .NET namespace
[System.Reflection.Assembly]::LoadWithPartialName('Microsoft.SqlServer.SMO') | Out-Null

# The PowerShell 2.0 convention is to use Add-Type, but it is sometimes problematic.
# Add-Type -AssemblyName "Microsoft.SqlServer.SMO"
# Add-Type -path “C:\Windows\assembly\GAC_MSIL\Microsoft.SqlServer.Smo\11.0.0.0__89845dcd8080cc91\Microsoft.SqlServer.Smo.dll”

# Store an instance of the local SQL in a variable
$sql_server_object = New-Object Microsoft.SqlServer.Management.SMO.Server('localhost')

# Get-Help Get-Member

# Get the SQL Agent SMO object
$sql_server_agent = $sql_server_object.JobServer;

# $sql_server_agent | Get-Member

# Get the JobCollection object
$jobs = $sql_server_agent.Jobs;



foreach($j in $jobs)
{
	$j.OwnerLoginName;
}


foreach($j in $jobs)
{
	if($j.OwnerLoginName -ne $job_owner )
	{
		$j.OwnerLoginName = $job_owner;
	}	
}

First Impressions of VoltDB

VoltDB is an intriguing competitor in the quickly developing field of data store alternatives. Answering the charge of NoSQL supporters that “OldSQL” is simply too slow to scale effectively to meet the demands of high transaction applications, VoltDB seeks to demonstrate that it is neither SQL nor the Relational Model that is at fault, but the antiquated disk-centric architecture of the venerable relational database management systems (RDBMS).

VoltDB is a natively partitioned (sharded), single threaded, memory resident data store. Other RDBMS implement sharding to allow for distributed data storage across commodity hardware. However, VoltDB was written with this principle as the basis of the architecture. Sharding in other RDBMS often must be implemented as part of the application stack and presents onerous maintenance hazards (such as managing backups and restores). In recent versions, MySQL has added partitioning abstraction to the product and many organizations have experienced success with MySQL sharding. I can’t speak to how VoltDB compares, but certainly MySQL was not designed with partitioning as a fundamental principle. Regardless, the VoltDB implementation effectively and cleanly abstracts the details of partitioning from the application tiers.

But the most important differentiator is that VoltDB is entirely memory resident. All of the mechanisms that traditional RDBMS rely on to function in a disk-centric architecture have been discarded in the name of performance. This simple and instructive graphic demonstrates how VoltDB rethinks storing relational data.

In addition, VoltDB rejects concurrency in favor of single threaded transactions, thereby eliminating costly locking. As a result VoltDB can measure throughput in hundreds of thousands of transactions per second, a truly astonishing accomplishment.

Even though common performance issues of relational data stores are addressed by VoltDB, NoSQL solutions can still rightly claim that it does nothing to solve the impedance mismatch that software developers have spent so much time trying to work around. Certainly, in time, ORMs will be developed which will greatly ease the pain developers experience. Until someone is able to make the translation as seamless as, say, working with the excellent MongoDB, this will remain a legitimate complaint and an impediment to adoption. However, VoltDB’s response to this should be, “yes, some NoSQL stores speak your language better than VoltDB, but look at what you are giving up!” Losing ad-hoc querying, transactional consistency (ACID), a standardized data access convention (ANSI SQL) and the data integrity promoted by the relational model are a huge price to pay. And as Dr. David DeWitt rightly asked in his presentation to the PASS Summit 2010, “why would you want to write your own query optimizer?” On-Demand Recording (signup required) .

This brings us to what I see are the greatest drawbacks to VoltDB. Currently, only a very limited subset of SQL is actually supported, which takes some of the effectiveness out of the ad-hoc query argument. No doubt this will improve over time, but remains a challenge at this point. Additionally, due to architectural constraints, VoltDB can only loosely call itself ‘relational.’ The lack of foreign key and check constraints are worrisome. The responsibility for maintaining data integrity in this way is pushed to the app, just like in NoSQL solutions. I see no reason why check constraints, at least in limited form, could not be supported, but enforcing foreign key constraints across partitions is a key architectural problem.

However, the most disconcerting problem is the lack of uniqueness constraints. Although VoltDB enforces uniqueness for keys, it can do so only on a partition basis. That is, for any given relation the key is only guaranteed to be unique within the partition in which it resides. If your table is partitioned across n nodes, a key may exist in n partitions. To my mind, this calls into question the claim that VoltDB makes to be a relational data store, but this is perhaps debatable. Nevertheless, uniqueness constraints, along with foreign keys and check constraints, are essential to the data integrity promises of the relational model. ACID is fantastic and certainly something that NoSQL advocates are too quick to dismiss. And it seems that VoltDB is mounting a powerful challenge to the CAP theorem. But, my conclusion is that VoltDB is not a truly relational data store, yet it presents the vital correction to NoSQL to not assume that transactional integrity must be thrown out in the name of performance. For this reason, if for nothing else, it deserves to be taken very seriously.

There is much more that can be said about VoltDB and I encourage you to read their excellent whitepaper. VoltDB, and others with a similar approach, are extraordinarily promising and welcome rewrites of relational data storage. The established players in the relational market are going to be forced to take notice, as they certainly have of NoSQL. And when the NoSQL crowd realizes that they are now dealing with the same issues that Codd was trying to solve forty years ago, VoltDB will be there with part of the solution. But, much more work must be done to find creative solutions to the lack of some essential relational features before we can truly say that it can carry both the data integrity and performance torches. In the meantime, VoltDB can rightfully claim to be one of the best options out there for what it claims to do well: high throughput OLTP workloads.

A special thanks to Seb Coursol of VoltDB for his openness and enthusiasm for his product.

Moving Data Between SQL Servers with Powershell

Scripting data is a pain, especially if you do not want the entire contents of a table. OpenRowset is verbose and clunky. And working with linked servers may not be the seventh layer of hell, but you’re going to smell some smoke. What I really want is to pass some parameters to a function and be done with it. Hello, PowerShell. Here’s how it works:

Create a test table on two different servers

CREATE TABLE EmailAddress
(
     AddressID INT NOT NULL,
     EmailAddress VARCHAR(100) NOT NULL
) ;

Create some test data on one server.

WITH email AS
(
     SELECT 1 AS id, 'bgates@hotmail.com' AS EmailAddress
     UNION ALL
     SELECT 2, 'mzuckerberg@facebook.com' 
)
INSERT INTO EmailAddress
SELECT id,EmailAddress FROM email 

Here’s the function

function Move-Data 
{

Param
(
$sourceServerName ,
$sourceDatabaseName,
$destinationServerName,
$destinationDatabaseName,
$destinationTableName,
$query
)

$dataTable = Invoke-Sqlcmd -Query $query -Database $sourceDatabaseName -ServerInstance $sourceServerName ;

$connectionString = "Data Source=$destinationServerName;Integrated Security=true;Initial Catalog=$destinationDatabaseName;" ;
$bulkCopy = new-object ("Data.SqlClient.SqlBulkCopy") $connectionString ;
$bulkCopy.DestinationTableName = "$destinationTableName" ;
$bulkCopy.WriteToServer($dataTable) ;
}

Call the function

$query = "SELECT AddressID,EmailAddress FROM EmailAddresses WHERE AddressID = 1" ;
Move-Data server1 MyDB server2 MyDB EmailAddress $query